Arxiv Day: Article

Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

Offline reinforcement learning has become one of the most practical RL settings. A recent success story has been RLHF, offline preference-based RL (PBRL) with preference from humans. However, most existing works on offline RL focus on the standard setting with scalar reward feedback. It remains unknown how to universally transfer the existing rich understanding of offline RL from the reward-based to the preference-based setting. In this work, we propose a general framework to bridge this gap. Our key insight is transforming preference feedback to scalar rewards via optimal reward labeling (ORL), and then any reward-based offline RL algorithms can be applied to the dataset with the reward labels. We theoretically show the connection between several recent PBRL techniques and our framework combined with specific offline RL algorithms in terms of how they utilize the preference signals. By combining reward labeling with different algorithms, our framework can lead to new and potentially more efficient offline PBRL algorithms. We empirically test our framework on preference datasets based on the standard D4RL benchmark. When combined with a variety of efficient reward-based offline RL algorithms, the learning result achieved under our framework is comparable to training the same algorithm on the dataset with actual rewards in many cases and better than the recent PBRL baselines in most cases.

Updated: 2024-06-14 23:40:42

标题: 最优奖励标记：连接离线偏好和基于奖励的强化学习

摘要: 离线强化学习已成为最实用的强化学习设置之一。最近的一个成功案例是RLHF，离线基于偏好的强化学习（PBRL）与人类偏好。然而，大多数现有的离线强化学习作品都集中在具有标量奖励反馈的标准设置上。尚不清楚如何将现有的离线强化学习的丰富理解从基于奖励的设置普遍转移到基于偏好的设置。在这项工作中，我们提出了一个桥梁来弥合这一差距的通用框架。我们的关键洞察是通过最优奖励标记（ORL）将偏好反馈转换为标量奖励，然后任何基于奖励的离线强化学习算法都可以应用于具有奖励标签的数据集。我们在理论上展示了几种最近PBRL技术与我们的框架以及特定离线强化学习算法在利用偏好信号方面的联系。通过将奖励标记与不同算法结合，我们的框架可以导致新的、潜在更高效的离线PBRL算法。我们在基于标准D4RL基准的偏好数据集上对我们的框架进行了实证测试。当与各种高效的基于奖励的离线强化学习算法结合使用时，在许多情况下，我们的框架下实现的学习结果与在实际奖励数据集上训练相同算法相当，并且大多数情况下优于最近的PBRL基准。

更新时间: 2024-06-14 23:40:42

领域: cs.LG

下载: http://arxiv.org/abs/2406.10445v1

Sim2Real in Reconstructive Spectroscopy: Deep Learning with Augmented Device-Informed Data Simulation

This work proposes a deep learning (DL)-based framework, namely Sim2Real, for spectral signal reconstruction in reconstructive spectroscopy, focusing on efficient data sampling and fast inference time. The work focuses on the challenge of reconstructing real-world spectral signals under the extreme setting where only device-informed simulated data are available for training. Such device-informed simulated data are much easier to collect than real-world data but exhibit large distribution shifts from their real-world counterparts. To leverage such simulated data effectively, a hierarchical data augmentation strategy is introduced to mitigate the adverse effects of this domain shift, and a corresponding neural network for the spectral signal reconstruction with our augmented data is designed. Experiments using a real dataset measured from our spectrometer device demonstrate that Sim2Real achieves significant speed-up during the inference while attaining on-par performance with the state-of-the-art optimization-based methods.

Updated: 2024-06-14 23:35:36

标题: 虚拟仿真在重建光谱学中的应用：利用增强设备信息的数据模拟进行深度学习

摘要: 这项工作提出了一个基于深度学习（DL）的框架，即Sim2Real，用于在重建光谱学中进行光谱信号重建，重点放在高效数据采样和快速推断时间上。该工作专注于在仅有设备通知的模拟数据可用于训练的极端环境下重建真实世界光谱信号的挑战。这种设备通知的模拟数据比真实世界数据更容易收集，但与真实世界数据存在较大的分布偏移。为了有效利用这种模拟数据，引入了一种分层数据增强策略来减轻该领域偏移的不利影响，并设计了一个相应的神经网络，用于使用我们增强的数据进行光谱信号重建。实验使用从我们的分光仪设备测量得到的真实数据集表明，Sim2Real在推断过程中实现了显著的加速，同时达到了与基于优化的最先进方法相当的性能水平。

更新时间: 2024-06-14 23:35:36

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2403.12354v2

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been suggested to address this issue, their implementation is impeded by the challenges associated with acquiring and maintaining a separate draft model. In this paper, we present Medusa, an efficient method that augments LLM inference by adding extra decoding heads to predict multiple subsequent tokens in parallel. Using a tree-based attention mechanism, Medusa constructs multiple candidate continuations and verifies them simultaneously in each decoding step. By leveraging parallel processing, Medusa substantially reduces the number of decoding steps required. We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration. Medusa-2: Medusa is fine-tuned together with the backbone LLM, enabling better prediction accuracy of Medusa heads and higher speedup but needing a special training recipe that preserves the backbone model's capabilities. Moreover, we propose several extensions that improve or expand the utility of Medusa, including a self-distillation to handle situations where no training data is available and a typical acceptance scheme to boost the acceptance rate while maintaining generation quality. We evaluate Medusa on models of various sizes and training procedures. Our experiments demonstrate that Medusa-1 can achieve over 2.2x speedup without compromising generation quality, while Medusa-2 further improves the speedup to 2.3-3.6x.

Updated: 2024-06-14 23:32:32

标题: 梅杜莎：具有多个解码头的简单LLM推理加速框架

摘要: 大型语言模型（LLMs）采用自回归解码，需要顺序计算，每个步骤都依赖于前一个输出。这导致瓶颈，因为每个步骤都需要将完整的模型参数从高带宽内存（HBM）移动到加速器的缓存。虽然已经提出了类似推测解码的方法来解决这个问题，但由于获取和维护单独的草稿模型所面临的挑战，其实施受到阻碍。在本文中，我们提出了Medusa，一种通过向LLM推理添加额外解码头以并行预测多个后续标记的高效方法。使用基于树的注意机制，Medusa构建多个候选继续并在每个解码步骤中同时验证它们。通过利用并行处理，Medusa大大减少了所需的解码步骤数量。我们提出了两个Medusa的微调程序级别，以满足不同用例的需求：Medusa-1：Medusa直接在冻结的骨干LLM之上进行微调，实现无损推理加速。Medusa-2：Medusa与骨干LLM一起微调，提高Medusa头的预测准确性和速度提升，但需要保留骨干模型功能的特殊训练配方。此外，我们提出了几个扩展，改进或扩展Medusa的效用，包括用于处理没有训练数据的情况的自蒸馏，以及用于提高接受率同时保持生成质量的典型接受方案。我们评估了不同大小和训练程序的Medusa模型。我们的实验表明Medusa-1可以实现超过2.2倍的加速，而不影响生成质量，而Medusa-2进一步将加速提高到2.3-3.6倍。

更新时间: 2024-06-14 23:32:32

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2401.10774v3

Towards General Neural Surrogate Solvers with Specialized Neural Accelerators

Surrogate neural network-based partial differential equation (PDE) solvers have the potential to solve PDEs in an accelerated manner, but they are largely limited to systems featuring fixed domain sizes, geometric layouts, and boundary conditions. We propose Specialized Neural Accelerator-Powered Domain Decomposition Methods (SNAP-DDM), a DDM-based approach to PDE solving in which subdomain problems containing arbitrary boundary conditions and geometric parameters are accurately solved using an ensemble of specialized neural operators. We tailor SNAP-DDM to 2D electromagnetics and fluidic flow problems and show how innovations in network architecture and loss function engineering can produce specialized surrogate subdomain solvers with near unity accuracy. We utilize these solvers with standard DDM algorithms to accurately solve freeform electromagnetics and fluids problems featuring a wide range of domain sizes.

Updated: 2024-06-14 23:20:23

标题: 朝向具有专用神经加速器的通用神经替代求解器

摘要: 基于代理神经网络的偏微分方程（PDE）求解器有潜力以加速方式解决PDE，但它们主要限于具有固定域大小、几何布局和边界条件的系统。我们提出了专门的神经加速器驱动域分解方法（SNAP-DDM），这是一种基于DDM的PDE求解方法，其中包含任意边界条件和几何参数的子域问题通过一组专门的神经操作符精确求解。我们将SNAP-DDM定制为2D电磁和流体流问题，并展示了网络架构和损失函数工程创新如何产生具有接近单位精度的专门代理子域求解器。我们利用这些求解器与标准DDM算法一起准确解决自由形式的电磁和流体问题，这些问题涉及各种域大小。

更新时间: 2024-06-14 23:20:23

领域: cs.LG,cs.AI,cs.DC,physics.optics

下载: http://arxiv.org/abs/2405.02351v2

Extending class group action attacks via sesquilinear pairings

We introduce a new tool for the study of isogeny-based cryptography, namely pairings which are sesquilinear (conjugate linear) with respect to the $\mathcal{O}$-module structure of an elliptic curve with CM by an imaginary quadratic order $\mathcal{O}$. We use these pairings to study the security of problems based on the class group action on collections of oriented ordinary or supersingular elliptic curves. This extends work of both (Castryck, Houben, Merz, Mula, Buuren, Vercauteren, 2023) and (De Feo, Fouotsa, Panny, 2024).

Updated: 2024-06-14 23:17:48

标题: 通过半线性配对扩展类群作用攻击

摘要: 我们引入了一种用于研究同构基密码学的新工具，即对于具有由虚数二次序$\mathcal{O}$的椭圆曲线的$\mathcal{O}$-模结构而言是半线性（共轭线性）的配对。我们利用这些配对来研究基于类群作用于定向普通或超奇异椭圆曲线集合的问题的安全性。这扩展了(Castryck，Houben，Merz，Mula，Buuren，Vercauteren，2023)和(De Feo，Fouotsa，Panny，2024)的工作。

更新时间: 2024-06-14 23:17:48

领域: math.NT,cs.CR,11R65, 14H52, 94A60

下载: http://arxiv.org/abs/2406.10440v1

Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

While supervised federated learning approaches have enjoyed significant success, the domain of unsupervised federated learning remains relatively underexplored. Several federated EM algorithms have gained popularity in practice, however, their theoretical foundations are often lacking. In this paper, we first introduce a federated gradient EM algorithm (FedGrEM) designed for the unsupervised learning of mixture models, which supplements the existing federated EM algorithms by considering task heterogeneity and potential adversarial attacks. We present a comprehensive finite-sample theory that holds for general mixture models, then apply this general theory on specific statistical models to characterize the explicit estimation error of model parameters and mixture proportions. Our theory elucidates when and how FedGrEM outperforms local single-task learning with insights extending to existing federated EM algorithms. This bridges the gap between their practical success and theoretical understanding. Our numerical results validate our theory, and demonstrate FedGrEM's superiority over existing unsupervised federated learning benchmarks.

Updated: 2024-06-14 23:03:32

标题: 朝向无监督联邦学习理论：联邦EM算法的非渐近分析

摘要: 尽管监督式联邦学习方法取得了显著的成功，但无监督式联邦学习领域仍相对未被充分探索。一些联邦EM算法在实践中受到了欢迎，然而它们的理论基础通常存在缺失。在本文中，我们首先介绍了一种针对无监督混合模型学习设计的联邦梯度EM算法（FedGrEM），通过考虑任务异质性和潜在的对抗攻击来补充现有的联邦EM算法。我们提出了一个适用于一般混合模型的全面有限样本理论，然后将这一通用理论应用于特定的统计模型，以表征模型参数和混合比例的显式估计误差。我们的理论阐明了FedGrEM何时以及如何优于本地单任务学习，洞察力延伸到现有的联邦EM算法。这弥合了它们的实际成功和理论理解之间的差距。我们的数值结果验证了我们的理论，并展示了FedGrEM相对于现有无监督联邦学习基准的优越性。

更新时间: 2024-06-14 23:03:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.15330v3

HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving

Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents' policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor's cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios. The code and demo videos for this paper can be accessed at: https://zilin-huang.github.io/HAIM-DRL-website/

Updated: 2024-06-14 23:00:31

标题: HAIM-DRL：增强型人机协同强化学习用于安全高效自动驾驶

摘要: 尽管自动驾驶车辆（AVs）取得了显著进展，但确保AVs安全和交通流效率的驾驶政策的发展尚未得到充分探讨。本文提出了一种增强的人为环节强化学习方法，称为基于人工智能导师的深度强化学习（HAIM-DRL）框架，该框架促进了混合交通编队中安全高效的自动驾驶。我们从人类学习过程中汲取灵感，首先引入一种创新的学习范式，有效地将人类智能注入AI中，称为Human as AI mentor（HAIM）。在这种范式中，人类专家充当AI代理的导师。在允许代理充分探索不确定环境的同时，人类专家可以在危险情况下接管，并展示正确的行动以避免潜在事故。另一方面，代理可以被引导以最小化交通流干扰，从而优化交通流效率。具体而言，HAIM-DRL利用自由探索和部分人类演示收集的数据作为其两个训练来源。值得注意的是，我们避开了手动设计奖励函数的繁琐过程；相反，我们直接从部分人类演示中导出代理的策略学习的代理状态-动作值。此外，我们采用最小干预技术来减少人类导师的认知负担。比较结果表明，HAIM-DRL在驾驶安全性、采样效率、减少交通流干扰以及对未见交通情景的泛化能力方面优于传统方法。本文的代码和演示视频可在以下网址获取：https://zilin-huang.github.io/HAIM-DRL-website/

更新时间: 2024-06-14 23:00:31

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2401.03160v5

Differentiable Predictive Control for Large-Scale Urban Road Networks

Transportation is a major contributor to CO2 emissions, making it essential to optimize traffic networks to reduce energy-related emissions. This paper presents a novel approach to traffic network control using Differentiable Predictive Control (DPC), a physics-informed machine learning methodology. We base our model on the Macroscopic Fundamental Diagram (MFD) and the Networked Macroscopic Fundamental Diagram (NMFD), offering a simplified representation of citywide traffic networks. Our approach ensures compliance with system constraints by construction. In empirical comparisons with existing state-of-the-art Model Predictive Control (MPC) methods, our approach demonstrates a 4 order of magnitude reduction in computation time and an up to 37% improvement in traffic performance. Furthermore, we assess the robustness of our controller to scenario shifts and find that it adapts well to changes in traffic patterns. This work proposes more efficient traffic control methods, particularly in large-scale urban networks, and aims to mitigate emissions and alleviate congestion in the future.

Updated: 2024-06-14 22:42:02

标题: 大规模城市道路网络的可微预测控制

摘要: Transportation is a major source of CO2 emissions, so it is crucial to optimize traffic networks to reduce energy-related emissions. This study introduces a new method for controlling traffic networks using Differentiable Predictive Control (DPC), a machine learning approach informed by physics. The model is based on the Macroscopic Fundamental Diagram (MFD) and the Networked Macroscopic Fundamental Diagram (NMFD) to provide a simplified representation of citywide traffic networks. Our approach ensures compliance with system constraints by design. In comparisons with current Model Predictive Control (MPC) methods, our approach shows a 4 order of magnitude decrease in computation time and up to a 37% improvement in traffic performance. Additionally, we evaluate the resilience of our controller to changes in scenarios and find that it adapts well to shifts in traffic patterns. This research suggests more efficient traffic control methods, particularly in large urban networks, with the goal of reducing emissions and alleviating congestion in the future.

更新时间: 2024-06-14 22:42:02

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2406.10433v1

Synthetic Programming Elicitation and Repair for Text-to-Code in Very Low-Resource Programming Languages

Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools and tool-chains for legacy languages. Inspired by an HCI technique called natural program elicitation, we propose designing an intermediate language that LLMs ``naturally'' know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs significantly more frequently without sacrificing semantic correctness.

Updated: 2024-06-14 22:35:33

标题: 合成编程引发和修复用于极低资源编程语言的文本到代码

摘要: 最近关于代码应用的大型语言模型（LLMs）的进展展示了在挑战性的与代码相关任务中具有显著的零-shot流畅性和指令遵循，范围从测试用例生成到自修复。然而，毫不奇怪的是，模型在编程语言预训练中未表示的语法有效程序的组合方面存在困难，这些编程语言被称为极低资源编程语言（VLPLs）。VLPLs出现在关键环境中，包括内部工具的领域特定语言和传统语言的工具链。受自然程序引出的人机交互技术启发，我们提议设计一种LLMs“自然”知道如何使用的中间语言，并且可以自动编译到目标VLPL。当LLMs生成位于这个中间语言之外的代码时，我们使用编译技术将代码修复为中间语言中的程序。总的来说，我们引入了一种称为“合成编程引出和编译”（SPEAC）的方法，该方法使LLMs甚至能够为VLPLs生成语法有效的代码。我们在一个案例研究中对SPEAC的性能进行了实证评估，并发现，与现有检索和微调基线相比，SPEAC更频繁地生成语法正确的程序，而不损害语义正确性。

更新时间: 2024-06-14 22:35:33

领域: cs.PL,cs.LG

下载: http://arxiv.org/abs/2406.03636v2

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Trustworthy capability evaluations are crucial for ensuring the safety of AI systems, and are becoming a key component of AI regulation. However, the developers of an AI system, or the AI system itself, may have incentives for evaluations to understate the AI's actual capability. These conflicting interests lead to the problem of sandbagging $\unicode{x2013}$ which we define as "strategic underperformance on an evaluation". In this paper we assess sandbagging capabilities in contemporary language models (LMs). We prompt frontier LMs, like GPT-4 and Claude 3 Opus, to selectively underperform on dangerous capability evaluations, while maintaining performance on general (harmless) capability evaluations. Moreover, we find that models can be fine-tuned, on a synthetic dataset, to hide specific capabilities unless given a password. This behaviour generalizes to high-quality, held-out benchmarks such as WMDP. In addition, we show that both frontier and smaller models can be prompted, or password-locked, to target specific scores on a capability evaluation. Even more, we found that a capable password-locked model (Llama 3 70b) is reasonably able to emulate a less capable model (Llama 2 7b). Overall, our results suggest that capability evaluations are vulnerable to sandbagging. This vulnerability decreases the trustworthiness of evaluations, and thereby undermines important safety decisions regarding the development and deployment of advanced AI systems.

Updated: 2024-06-14 22:24:40

标题: AI 敷衍懈怠：语言模型可能会在评估中有意表现不佳

摘要: 值得信赖的能力评估对确保人工智能系统的安全至关重要，并且已成为人工智能监管的关键组成部分。然而，人工智能系统的开发者或系统本身可能有动机低估人工智能的实际能力。这些利益冲突导致了“故意表现不佳进行评估”的问题，我们将其定义为“装糊作哑”。在本文中，我们评估了当代语言模型（LMs）的装糊作哑能力。我们提示前沿LMs，如GPT-4和Claude 3 Opus，在危险能力评估中选择性地表现不佳，同时在一般（无害）能力评估中保持表现。此外，我们发现模型可以在合成数据集上进行微调，以隐藏特定能力，除非给定密码。这种行为会推广到高质量的、保留的基准测试，如WMDP。此外，我们展示了前沿模型和较小模型都可以被提示或加密，以针对能力评估的特定分数。更重要的是，我们发现一个有能力的加密模型（Llama 3 70b）可以合理地模拟一个能力较低的模型（Llama 2 7b）。总的来说，我们的结果表明能力评估容易受到装糊作哑的影响。这种脆弱性降低了评估的可信度，从而损害了关于开发和部署先进人工智能系统的重要安全决策。

更新时间: 2024-06-14 22:24:40

领域: cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.07358v3

Challenging the Machine: Contestability in Government AI Systems

In an October 2023 executive order (EO), President Biden issued a detailed but largely aspirational road map for the safe and responsible development and use of artificial intelligence (AI). The challenge for the January 24-25, 2024 workshop was to transform those aspirations regarding one specific but crucial issue -- the ability of individuals to challenge government decisions made about themselves -- into actionable guidance enabling agencies to develop, procure, and use genuinely contestable advanced automated decision-making systems. While the Administration has taken important steps since the October 2023 EO, the insights garnered from our workshop remain highly relevant, as the requirements for contestability of advanced decision-making systems are not yet fully defined or implemented. The workshop brought together technologists, members of government agencies and civil society organizations, litigators, and researchers in an intensive two-day meeting that examined the challenges that users, developers, and agencies faced in enabling contestability in light of advanced automated decision-making systems. To ensure a free and open flow of discussion, the meeting was held under a modified version of the Chatham House rule. Participants were free to use any information or details that they learned, but they may not attribute any remarks made at the meeting by the identity or the affiliation of the speaker. Thus, the workshop summary that follows anonymizes speakers and their affiliation. Where an identification of an agency, company, or organization is made, it is done from a public, identified resource and does not necessarily reflect statements made by participants at the workshop. This document is a report of that workshop, along with recommendations and explanatory material.

Updated: 2024-06-14 22:22:17

标题: 挑战机器：政府人工智能系统的可争议性

摘要: 在2023年10月的一项行政命令中，拜登总统发布了一份详细但在很大程度上是抱负性的人工智能（AI）安全和负责任发展和使用的路线图。2024年1月24日至25日的研讨会的挑战是将关于一个特定但至关重要的问题的这些抱负转化为可行指导，使各机构能够开发、采购和使用真正具有争议性的先进自动决策系统。尽管自2023年10月以来政府已经采取了重要步骤，但从我们的研讨会获得的见解仍然非常相关，因为对于先进决策系统的争议性要求尚未完全定义或实施。这次研讨会汇集了技术人员、政府机构成员和民间社会组织、诉讼律师和研究人员，在一次密集的为期两天的会议中，审视了用户、开发者和机构在促进争议性方面所面临的挑战，鉴于先进自动决策系统的存在。为了确保讨论的自由和开放流程，会议采用了查塔姆庄园规则的修改版本。与会者可以自由使用他们学到的任何信息或细节，但不得将会议中发言者的身份或隶属关系归因于任何言论。因此，接下来的研讨会摘要将对发言者及其隶属关系进行匿名化处理。在确定机构、公司或组织的身份时，是根据公开的、可识别的资源进行的，并不一定反映与会者在研讨会上所作的声明。本文是该研讨会的报告，附有建议和解释材料。

更新时间: 2024-06-14 22:22:17

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.10430v1

Consistency-diversity-realism Pareto fronts of conditional image generative models

Building world models that accurately and comprehensively represent the real world is the utmost aspiration for conditional image generative models as it would enable their use as world simulators. For these models to be successful world models, they should not only excel at image quality and prompt-image consistency but also ensure high representation diversity. However, current research in generative models mostly focuses on creative applications that are predominantly concerned with human preferences of image quality and aesthetics. We note that generative models have inference time mechanisms - or knobs - that allow the control of generation consistency, quality, and diversity. In this paper, we use state-of-the-art text-to-image and image-and-text-to-image models and their knobs to draw consistency-diversity-realism Pareto fronts that provide a holistic view on consistency-diversity-realism multi-objective. Our experiments suggest that realism and consistency can both be improved simultaneously; however there exists a clear tradeoff between realism/consistency and diversity. By looking at Pareto optimal points, we note that earlier models are better at representation diversity and worse in consistency/realism, and more recent models excel in consistency/realism while decreasing significantly the representation diversity. By computing Pareto fronts on a geodiverse dataset, we find that the first version of latent diffusion models tends to perform better than more recent models in all axes of evaluation, and there exist pronounced consistency-diversity-realism disparities between geographical regions. Overall, our analysis clearly shows that there is no best model and the choice of model should be determined by the downstream application. With this analysis, we invite the research community to consider Pareto fronts as an analytical tool to measure progress towards world models.

Updated: 2024-06-14 22:14:11

标题: 条件图像生成模型的一致性-多样性-现实主义Pareto前沿

摘要: 建立准确全面地代表真实世界的世界模型是条件图像生成模型的最高愿望，因为这将使它们成为世界模拟器的使用。为了使这些模型成为成功的世界模型，它们不仅应在图像质量和快速图像一致性方面表现出色，还应确保高度的表征多样性。然而，目前关于生成模型的研究主要集中在创意应用上，这些应用主要关注人类对图像质量和美学的偏好。我们注意到生成模型具有推理时间机制 - 或旋钮 - 允许控制生成的一致性、质量和多样性。在本文中，我们使用最先进的文本到图像和图像和文本到图像模型及其旋钮，绘制一致性-多样性-逼真性帕累托前沿，提供一种全面的一致性-多样性-逼真性多目标的视图。我们的实验表明，逼真性和一致性可以同时提高；然而，在逼真性/一致性和多样性之间存在明显的权衡。通过观察帕累托最优点，我们注意到早期模型在表征多样性方面表现更好，但在一致性/逼真性方面表现更差，而更近期的模型在一致性/逼真性方面表现出色，同时显著降低了表征多样性。通过在地理多样性数据集上计算帕累托前沿，我们发现潜在扩散模型的第一个版本在所有评估轴上表现比更近期的模型更好，并且在地理区域之间存在明显的一致性-多样性-逼真性差异。总的来说，我们的分析清楚地表明没有最佳模型，模型的选择应由下游应用来确定。通过这一分析，我们邀请研究社区考虑帕累托前沿作为衡量朝向世界模型进展的分析工具。

更新时间: 2024-06-14 22:14:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10429v1

Adaptive Randomized Smoothing: Certifying Multi-Step Defences against Adversarial Examples

We propose Adaptive Randomized Smoothing (ARS) to certify the predictions of our test-time adaptive models against adversarial examples. ARS extends the analysis of randomized smoothing using f-Differential Privacy to certify the adaptive composition of multiple steps. For the first time, our theory covers the sound adaptive composition of general and high-dimensional functions of noisy input. We instantiate ARS on deep image classification to certify predictions against adversarial examples of bounded $L_{\infty}$ norm. In the $L_{\infty}$ threat model, our flexibility enables adaptation through high-dimensional input-dependent masking. We design adaptivity benchmarks, based on CIFAR-10 and CelebA, and show that ARS improves accuracy by $2$ to $5\%$ points. On ImageNet, ARS improves accuracy by $1$ to $3\%$ points over standard RS without adaptivity.

Updated: 2024-06-14 22:11:02

标题: 自适应随机平滑：验证对抗样本的多步防御

摘要: 我们提出了自适应随机平滑（ARS）来对我们的测试时自适应模型的预测进行认证，以抵御对抗性示例。ARS通过使用f-差分隐私对随机平滑的分析进行扩展，以认证多个步骤的自适应组合。这是我们首次覆盖了对嘈杂输入的一般和高维函数的声音自适应组合的理论。我们将ARS实例化为深度图像分类，以对抗性示例的$L_{\infty}$范数进行预测认证。在$L_{\infty}$威胁模型中，我们的灵活性使得通过高维输入相关的屏蔽进行自适应。我们设计了基于CIFAR-10和CelebA的自适应基准，并展示ARS将准确度提高了$2$至$5\%$。在ImageNet上，ARS将准确度提高了$1$至$3\%$，相比于无自适应的标准RS。

更新时间: 2024-06-14 22:11:02

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.10427v1

Towards Neural Scaling Laws for Foundation Models on Temporal Graphs

The field of temporal graph learning aims to learn from evolving network data to forecast future interactions. Given a collection of observed temporal graphs, is it possible to predict the evolution of an unseen network from the same domain? To answer this question, we first present the Temporal Graph Scaling (TGS) dataset, a large collection of temporal graphs consisting of eighty-four ERC20 token transaction networks collected from 2017 to 2023. Next, we evaluate the transferability of Temporal Graph Neural Networks (TGNNs) for the temporal graph property prediction task by pre-training on a collection of up to sixty-four token transaction networks and then evaluating the downstream performance on twenty unseen token networks. We find that the neural scaling law observed in NLP and Computer Vision also applies in temporal graph learning, where pre-training on greater number of networks leads to improved downstream performance. To the best of our knowledge, this is the first empirical demonstration of the transferability of temporal graphs learning. On downstream token networks, the largest pre-trained model outperforms single model TGNNs on thirteen unseen test networks. Therefore, we believe that this is a promising first step towards building foundation models for temporal graphs.

Updated: 2024-06-14 22:07:11

标题: 朝向时间图上基础模型的神经缩放定律

摘要: 时间图学习领域旨在从不断发展的网络数据中学习，以预测未来的交互。给定一组观察到的时间图，是否可能预测来自相同领域的未见网络的演变？为了回答这个问题，我们首先提出了时间图缩放（TGS）数据集，这是一个包含84个ERC20代币交易网络的大型时间图集合，从2017年到2023年收集。接下来，我们通过在最多64个代币交易网络集合上进行预训练，然后评估在20个未见代币网络上的下游性能，评估了时间图神经网络（TGNN）的可转移性，用于时间图属性预测任务。我们发现，在时间图学习中也适用于NLP和计算机视觉中观察到的神经网络缩放定律，即预训练更多网络会导致改善的下游性能。据我们所知，这是时间图学习可转移性的第一个实证证明。在下游代币网络上，最大的预训练模型在13个未见测试网络上优于单模型TGNN。因此，我们相信这是朝着建立时间图基础模型的有希望的第一步。

更新时间: 2024-06-14 22:07:11

领域: cs.LG

下载: http://arxiv.org/abs/2406.10426v1

Multi-source Unsupervised Domain Adaptation on Graphs with Transferability Modeling

In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph structures further complicate this problem, rendering previous MSUDA approaches less effective. In this work, we present the framework Selective Multi-source Adaptation for Graph ({\method}), with a graph-modeling-based domain selector, a sub-graph node selector, and a bi-level alignment objective for the adaptation. Concretely, to facilitate the identification of informative source data, the similarity across graphs is disentangled and measured with the transferability of a graph-modeling task set, and we use it as evidence for source domain selection. A node selector is further incorporated to capture the variation in transferability of nodes within the same source domain. To learn invariant features for adaptation, we align the target domain to selected source data both at the embedding space by minimizing the optimal transport distance and at the classification level by distilling the label function. Modules are explicitly learned to select informative source data and conduct the alignment in virtual training splits with a meta-learning strategy. Experimental results on five graph datasets show the effectiveness of the proposed method.

Updated: 2024-06-14 22:05:21

标题: 在图上进行多源无监督领域自适应的可迁移性建模

摘要: 在这篇论文中，我们解决了一个新问题，即\textit{图的多源无监督领域自适应（MSUDA）}，在这个问题中，训练在注释源领域上的模型需要转移到无监督目标图以进行节点分类。由于跨领域分布的差异，关键挑战在于如何选择好的源实例以及如何调整模型。不同的图结构进一步加剧了这个问题，使先前的MSUDA方法效果不佳。在这项工作中，我们提出了一个名为\textit{Selective Multi-source Adaptation for Graph （{\method}）}的框架，其中包括基于图建模的领域选择器、子图节点选择器以及用于自适应的双层对齐目标。具体而言，为了促进信息源数据的识别，我们将跨图的相似性分解，并通过图建模任务集的可转移性来衡量它，并将其用作源领域选择的证据。进一步加入节点选择器来捕捉同一源领域内节点的可转移性变化。为了学习适应的不变特征，我们通过最小化最优传输距离在嵌入空间中将目标领域与选定的源数据进行对齐，并通过提炼标签函数在分类级别进行对齐。模块明确地学习选择信息源数据并采用元学习策略在虚拟训练集中进行对齐。对五个图数据集的实验结果显示了所提出方法的有效性。

更新时间: 2024-06-14 22:05:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.10425v1

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships among patterns in a set of images and extrapolate to predict subsequent patterns. This skill is crucial during the early neurodevelopmental stages of children. Inspired by the AVR tasks in Raven's Progressive Matrices (RPM) and Wechsler Intelligence Scale for Children (WISC), we propose a new dataset MaRs-VQA and a new benchmark VCog-Bench containing three datasets to evaluate the zero-shot AVR capability of MLLMs and compare their performance with existing human intelligent investigation. Our comparative experiments with different open-source and closed-source MLLMs on the VCog-Bench revealed a gap between MLLMs and human intelligence, highlighting the visual cognitive limitations of current MLLMs. We believe that the public release of VCog-Bench, consisting of MaRs-VQA, and the inference pipeline will drive progress toward the next generation of MLLMs with human-like visual cognition abilities.

Updated: 2024-06-14 22:02:21

标题: 人类和多模式LLM之间的视觉认知差距是什么？

摘要: 最近，多模式大型语言模型（MLLMs）在识别、分割和物体检测等语言引导的感知任务中表现出了巨大的潜力。然而，它们在解决需要高级推理的视觉认知问题方面的有效性尚未得到很好的建立。其中一个挑战是抽象视觉推理（AVR）--在一组图像中辨别模式之间的关系并推断以预测后续模式的认知能力。这种技能在儿童早期神经发育阶段至关重要。受Raven的渐进矩阵（RPM）和韦氏儿童智力量表（WISC）中的AVR任务启发，我们提出了一个新的数据集MaRs-VQA和一个新的基准VCog-Bench，其中包含三个数据集，用于评估MLLMs的零样本AVR能力，并将它们的性能与现有的人类智能调查进行比较。我们在VCog-Bench上使用不同的开源和闭源MLLMs进行比较实验，揭示了MLLMs与人类智能之间的差距，突显了当前MLLMs的视觉认知限制。我们相信，公开发布包含MaRs-VQA和推理管道的VCog-Bench将推动朝着具有类人视觉认知能力的下一代MLLMs的进展。

更新时间: 2024-06-14 22:02:21

领域: cs.CV,cs.AI,68T01

下载: http://arxiv.org/abs/2406.10424v1

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks, such as the potential generation of harmful content and copyright violations. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. However, these techniques remain vulnerable to adversarial prompt attacks, which can prompt DMs post-unlearning to regenerate undesired images containing concepts (such as nudity) meant to be erased. This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning, resulting in the robust unlearning framework referred to as AdvUnlearn. However, achieving this effectively and efficiently is highly nontrivial. First, we find that a straightforward implementation of AT compromises DMs' image generation quality post-unlearning. To address this, we develop a utility-retaining regularization on an additional retain set, optimizing the trade-off between concept erasure robustness and model utility in AdvUnlearn. Moreover, we identify the text encoder as a more suitable module for robustification compared to UNet, ensuring unlearning effectiveness. And the acquired text encoder can serve as a plug-and-play robust unlearner for various DM types. Empirically, we perform extensive experiments to demonstrate the robustness advantage of AdvUnlearn across various DM unlearning scenarios, including the erasure of nudity, objects, and style concepts. In addition to robustness, AdvUnlearn also achieves a balanced tradeoff with model utility. To our knowledge, this is the first work to systematically explore robust DM unlearning through AT, setting it apart from existing methods that overlook robustness in concept erasing. Codes are available at: https://github.com/OPTML-Group/AdvUnlearn

Updated: 2024-06-14 21:50:22

标题: 使用对抗训练进行防御性遗忘：扩散模型中稳健概念消除

摘要: 扩散模型（DMs）在文本到图像生成中取得了显著成功，但它们也存在安全风险，如可能生成有害内容和侵犯版权。机器遗忘的技术，也称为概念擦除，已经被开发出来以解决这些风险。然而，这些技术仍然容易受到敌对提示攻击的影响，这可能会导致DMs在遗忘后重新生成包含应该被擦除的概念（如裸体）的不良图像。本研究旨在通过将对抗训练（AT）原则整合到机器遗忘中，从而提高概念擦除的鲁棒性，形成称为AdvUnlearn的鲁棒遗忘框架。然而，有效地实现这一目标是非常困难的。首先，我们发现对AT的直接实施会损害DMs在遗忘后的图像生成质量。为了解决这个问题，我们在额外的保留集上开发了一个保留效用正则化，优化了AdvUnlearn中概念擦除鲁棒性和模型效用之间的权衡。此外，我们确定文本编码器与UNet相比更适合于鲁棒化，以确保遗忘的有效性。而获得的文本编码器可以作为各种DM类型的即插即用的鲁棒遗忘器。在经验上，我们进行了大量实验，展示了AdvUnlearn在各种DM遗忘场景中的鲁棒性优势，包括擦除裸体、物体和风格概念。除了鲁棒性，AdvUnlearn还实现了与模型效用的平衡权衡。据我们所知，这是第一项系统地通过AT探索鲁棒DM遗忘的工作，使其与现有方法有所区别，这些方法忽视了概念擦除的鲁棒性。代码可在以下链接找到：https://github.com/OPTML-Group/AdvUnlearn

更新时间: 2024-06-14 21:50:22

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2405.15234v2

CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning

Evaluating autonomous vehicle stacks (AVs) in simulation typically involves replaying driving logs from real-world recorded traffic. However, agents replayed from offline data are not reactive and hard to intuitively control. Existing approaches address these challenges by proposing methods that rely on heuristics or generative models of real-world data but these approaches either lack realism or necessitate costly iterative sampling procedures to control the generated behaviours. In this work, we take an alternative approach and propose CtRL-Sim, a method that leverages return-conditioned offline reinforcement learning to efficiently generate reactive and controllable traffic agents. Specifically, we process real-world driving data through a physics-enhanced Nocturne simulator to generate a diverse offline reinforcement learning dataset, annotated with various reward terms. With this dataset, we train a return-conditioned multi-agent behaviour model that allows for fine-grained manipulation of agent behaviours by modifying the desired returns for the various reward components. This capability enables the generation of a wide range of driving behaviours beyond the scope of the initial dataset, including adversarial behaviours. We demonstrate that CtRL-Sim can generate diverse and realistic safety-critical scenarios while providing fine-grained control over agent behaviours.

Updated: 2024-06-14 21:47:41

标题: CtRL-Sim：具有离线强化学习的反应性和可控驾驶代理

摘要: 在模拟中评估自动驾驶车辆堆栈（AVs）通常涉及重放来自真实记录的交通数据。然而，从离线数据中重放的代理并不具有反应性，难以直观控制。现有方法通过提出依赖启发式或生成模型的方法来解决这些挑战，但这些方法要么缺乏现实性，要么需要昂贵的迭代采样过程来控制生成的行为。在这项工作中，我们采取一种替代方法，提出了CtRL-Sim，一种利用基于回报条件的离线强化学习来有效生成具有反应性和可控性的交通代理的方法。具体而言，我们通过物理增强的Nocturne模拟器处理真实驾驶数据，生成一个多样化的离线强化学习数据集，并用各种奖励项进行注释。借助这个数据集，我们训练了一个基于回报条件的多代理行为模型，通过修改各种奖励组件的期望回报，可以对代理行为进行精细操控。这种能力使得可以生成超出初始数据集范围的各种驾驶行为，包括对抗行为。我们展示了CtRL-Sim可以生成多样化和真实的安全关键场景，同时提供对代理行为的精细控制。

更新时间: 2024-06-14 21:47:41

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.19918v2

Guided Discrete Diffusion for Electronic Health Record Generation

Electronic health records (EHRs) are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research. Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases. To tackle these challenges, we explore the use of generative models to synthesize artificial, yet realistic EHRs. While diffusion-based methods have recently demonstrated state-of-the-art performance in generating other data modalities and overcome the training instability and mode collapse issues that plague previous GAN-based approaches, their applications in EHR generation remain underexplored. The discrete nature of tabular medical code data in EHRs poses challenges for high-quality data generation, especially for continuous diffusion models. To this end, we introduce a novel tabular EHR generation method, EHR-D3PM, which enables both unconditional and conditional generation using the discrete diffusion model. Our experiments demonstrate that EHR-D3PM significantly outperforms existing generative baselines on comprehensive fidelity and utility metrics while maintaining less attribute and membership vulnerability risks. Furthermore, we show EHR-D3PM is effective as a data augmentation method and enhances performance on downstream tasks when combined with real data.

Updated: 2024-06-14 21:36:03

标题: 指导离散扩散用于电子健康记录生成

摘要: 电子健康记录（EHRs）是一种至关重要的数据源，可以在计算医学中实现许多应用，例如疾病进展预测、临床试验设计以及健康经济和结果研究。尽管具有广泛的可用性，但它们的敏感性引发了隐私和保密问题，限制了潜在的用例。为了解决这些挑战，我们探讨了使用生成模型合成人造但逼真的EHRs。虽然最近扩散基方法已经展示了在生成其他数据模态和克服困扰先前基于GAN方法的训练不稳定性和模式崩溃问题方面的最新性能，但它们在EHR生成中的应用仍未充分探索。EHRs中表格化医疗编码数据的离散性质对高质量数据生成构成挑战，尤其是对于连续扩散模型。为此，我们介绍了一种新颖的表格化EHR生成方法，EHR-D3PM，它使用离散扩散模型实现无条件和有条件生成。我们的实验表明，EHR-D3PM在全面的保真度和实用性指标上明显优于现有的生成基线，同时保持更少的属性和成员资格风险。此外，我们展示了EHR-D3PM作为一种数据增强方法，在与真实数据结合时提高了下游任务的性能。

更新时间: 2024-06-14 21:36:03

领域: cs.LG

下载: http://arxiv.org/abs/2404.12314v2

Learning Flexible Time-windowed Granger Causality Integrating Heterogeneous Interventional Time Series Data

Granger causality, commonly used for inferring causal structures from time series data, has been adopted in widespread applications across various fields due to its intuitive explainability and high compatibility with emerging deep neural network prediction models. To alleviate challenges in better deciphering causal structures unambiguously from time series, the use of interventional data has become a practical approach. However, existing methods have yet to be explored in the context of imperfect interventions with unknown targets, which are more common and often more beneficial in a wide range of real-world applications. Additionally, the identifiability issues of Granger causality with unknown interventional targets in complex network models remain unsolved. Our work presents a theoretically-grounded method that infers Granger causal structure and identifies unknown targets by leveraging heterogeneous interventional time series data. We further illustrate that learning Granger causal structure and recovering interventional targets can mutually promote each other. Comparative experiments demonstrate that our method outperforms several robust baseline methods in learning Granger causal structure from interventional time series data.

Updated: 2024-06-14 21:36:00

标题: 学习灵活的时间窗口格兰杰因果关系，整合异质干预时间序列数据

摘要: 格兰杰因果关系通常用于从时间序列数据中推断因果结构，由于其直观的解释性和与新兴深度神经网络预测模型的高兼容性，已被广泛应用于各个领域。为了更好地从时间序列中明确解读因果结构的挑战，使用干预数据已成为一种实用方法。然而，现有的方法尚未在不完美干预目标的情况下进行探索，这在广泛的真实世界应用中更常见且更有益。此外，在复杂网络模型中，格兰杰因果关系的未知干预目标的可识别性问题仍未解决。我们的工作提出了一种基于理论的方法，通过利用异质干预时间序列数据推断格兰杰因果结构并识别未知目标。我们进一步阐明了学习格兰杰因果结构和恢复干预目标可以相互促进。比较实验证明，我们的方法在从干预时间序列数据中学习格兰杰因果结构方面优于几种稳健的基线方法。

更新时间: 2024-06-14 21:36:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.10419v1

Language Models are Crossword Solvers

Crosswords are a form of word puzzle that require a solver to demonstrate a high degree of proficiency in natural language understanding, wordplay, reasoning, and world knowledge, along with adherence to character and length constraints. In this paper we tackle the challenge of solving crosswords with Large Language Models (LLMs). We demonstrate that the current generation of state-of-the art (SoTA) language models show significant competence at deciphering cryptic crossword clues, and outperform previously reported SoTA results by a factor of 2-3 in relevant benchmarks. We also develop a search algorithm that builds off this performance to tackle the problem of solving full crossword grids with LLMs for the very first time, achieving an accuracy of 93\% on New York Times crossword puzzles. Contrary to previous work in this area which concluded that LLMs lag human expert performance significantly, our research suggests this gap is a lot narrower.

Updated: 2024-06-14 21:29:40

标题: 语言模型是填字游戏解答者

摘要: 填字游戏是一种需要解决者展示在自然语言理解、文字游戏、推理和世界知识方面高度熟练的单词谜题形式，同时还需遵守字符和长度的限制。在本文中，我们挑战使用大型语言模型（LLMs）解决填字游戏的难题。我们展示了目前最先进的语言模型在破译难解填字游戏线索方面表现出显著的能力，并在相关基准测试中的表现超过了先前报道的最先进结果2-3倍。我们还开发了一个搜索算法，基于这一性能来解决首次使用LLMs解决完整填字游戏格的问题，达到了93\%的准确率。与先前在这一领域的工作相反，该工作得出结论称LLMs大大落后于人类专家的表现，我们的研究表明这一差距更为狭窄。

更新时间: 2024-06-14 21:29:40

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2406.09043v2

Enhanced Intrusion Detection System for Multiclass Classification in UAV Networks

Unmanned Aerial Vehicles (UAVs) have become increasingly popular in various applications, especially with the emergence of 6G systems and networks. However, their widespread adoption has also led to concerns regarding security vulnerabilities, making the development of reliable intrusion detection systems (IDS) essential for ensuring UAVs safety and mission success. This paper presents a new IDS for UAV networks. A binary-tuple representation was used for encoding class labels, along with a deep learning-based approach employed for classification. The proposed system enhances the intrusion detection by capturing complex class relationships and temporal network patterns. Moreover, a cross-correlation study between common features of different UAVs was conducted to discard correlated features that might mislead the classification of the proposed IDS. The full study was carried out using the UAV-IDS-2020 dataset, and we assessed the performance of the proposed IDS using different evaluation metrics. The experimental results highlighted the effectiveness of the proposed multiclass classifier model with an accuracy of 95%.

Updated: 2024-06-14 21:29:15

标题: 在无人机网络中用于多类分类的增强入侵检测系统

摘要: 无人机（UAVs）在各种应用中变得越来越受欢迎，特别是随着6G系统和网络的出现。然而，它们的广泛应用也引发了对安全漏洞的担忧，这使得开发可靠的入侵检测系统（IDS）对于确保无人机的安全和任务成功至关重要。本文提出了一种新的无人机网络IDS。使用二进制元组表示类标签，并采用基于深度学习的方法进行分类。所提出的系统通过捕获复杂的类关系和时间网络模式来增强入侵检测性能。此外，还进行了不同无人机的常见特征的交叉相关性研究，以排除可能误导所提出的IDS分类的相关特征。完整研究使用UAV-IDS-2020数据集进行，我们使用不同的评估指标评估了所提出的IDS的性能。实验结果突出了所提出的多类分类器模型的有效性，准确率达到95%。

更新时间: 2024-06-14 21:29:15

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.10417v1

Byzantine-Robust Decentralized Federated Learning

Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bottleneck at the server, and trust dependency issues. To address challenges, decentralized federated learning (DFL) architecture has been proposed to allow clients to train models collaboratively in a serverless and peer-to-peer manner. However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients. To date, only a limited number of Byzantine-robust DFL methods have been proposed, most of which are either communication-inefficient or remain vulnerable to advanced poisoning attacks. In this paper, we propose a new algorithm called BALANCE (Byzantine-robust averaging through local similarity in decentralization) to defend against poisoning attacks in DFL. In BALANCE, each client leverages its own local model as a similarity reference to determine if the received model is malicious or benign. We establish the theoretical convergence guarantee for BALANCE under poisoning attacks in both strongly convex and non-convex settings. Furthermore, the convergence rate of BALANCE under poisoning attacks matches those of the state-of-the-art counterparts in Byzantine-free settings. Extensive experiments also demonstrate that BALANCE outperforms existing DFL methods and effectively defends against poisoning attacks.

Updated: 2024-06-14 21:28:37

标题: 拜占庭-强韧的去中心化联邦学习

摘要: 联邦学习（FL）使多个客户端能够在不泄露其私人训练数据的情况下协同训练机器学习模型。在传统的FL中，系统遵循服务器辅助架构（服务器辅助FL），其中训练过程由中央服务器协调。然而，服务器辅助FL框架由于服务器通信瓶颈和信任依赖问题而面临扩展性差的挑战。为了解决这些挑战，提出了去中心化联邦学习（DFL）架构，允许客户端以无服务器和点对点的方式协同训练模型。然而，由于其完全去中心化的性质，DFL极易受到毒害攻击的影响，恶意客户端可以通过向其相邻客户端发送精心制作的本地模型来操纵系统。迄今为止，只有少数拜占庭-鲁棒的DFL方法被提出，其中大部分要么通信效率低，要么仍然容易受到先进的毒害攻击的影响。在本文中，我们提出了一种名为BALANCE（通过本地相似性在去中心化中实现拜占庭-鲁棒平均）的新算法，以抵御DFL中的毒害攻击。在BALANCE中，每个客户端利用自己的本地模型作为相似性参考来确定接收到的模型是恶意还是良性的。我们在强凸和非凸设置下建立了BALANCE在毒害攻击下的理论收敛保证。此外，BALANCE在毒害攻击下的收敛速率与拜占庭-自由设置中的最先进对应物相匹配。广泛的实验也证明，BALANCE优于现有的DFL方法，并有效抵御毒害攻击。

更新时间: 2024-06-14 21:28:37

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.10416v1

PRISM: A Design Framework for Open-Source Foundation Model Safety

The rapid advancement of open-source foundation models has brought transparency and accessibility to this groundbreaking technology. However, this openness has also enabled the development of highly-capable, unsafe models, as exemplified by recent instances such as WormGPT and FraudGPT, which are specifically designed to facilitate criminal activity. As the capabilities of open foundation models continue to grow, potentially outpacing those of closed-source models, the risk of misuse by bad actors poses an increasingly serious threat to society. This paper addresses the critical question of how open foundation model developers should approach model safety in light of these challenges. Our analysis reveals that open-source foundation model companies often provide less restrictive acceptable use policies (AUPs) compared to their closed-source counterparts, likely due to the inherent difficulties in enforcing such policies once the models are released. To tackle this issue, we introduce PRISM, a design framework for open-source foundation model safety that emphasizes Private, Robust, Independent Safety measures, at Minimal marginal cost of compute. The PRISM framework proposes the use of modular functions that moderate prompts and outputs independently of the core language model, offering a more adaptable and resilient approach to safety compared to the brittle reinforcement learning methods currently used for value alignment. By focusing on identifying AUP violations and engaging the developer community in establishing consensus around safety design decisions, PRISM aims to create a safer open-source ecosystem that maximizes the potential of these powerful technologies while minimizing the risks to individuals and society as a whole.

Updated: 2024-06-14 21:26:15

标题: PRISM：开源基金会模型安全的设计框架

摘要: 开源基础模型的快速发展为这一开创性技术带来了透明度和可访问性。然而，这种开放性也促使了高效能但不安全的模型的发展，如最近出现的WormGPT和FraudGPT等，这些模型专门设计用于促进犯罪活动。随着开放基础模型的能力不断增长，可能超过封闭源模型，由坏人滥用的风险对社会构成日益严重的威胁。本文探讨了在面对这些挑战时，开放基础模型开发者应如何处理模型安全性这一关键问题。我们的分析显示，与封闭源模型相比，开源基础模型公司通常提供更少限制的可接受使用政策（AUPs），这可能是因为一旦模型发布后，强制执行这些政策存在固有困难。为解决这个问题，我们引入了PRISM，一个设计框架，以最小的计算成本强调私密、健壮、独立的安全措施。PRISM框架提议使用模块化功能，独立于核心语言模型调节提示和输出，相对于目前用于价值对齐的脆弱强化学习方法，提供了一种更具适应性和弹性的安全方法。通过专注于识别AUP违规行为并使开发者社区就安全设计决策建立共识，PRISM旨在创建一个更安全的开源生态系统，最大限度地发挥这些强大技术的潜力，同时最小化对个人和整个社会的风险。

更新时间: 2024-06-14 21:26:15

领域: cs.CY,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.10415v1

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents. Codes are available at https://github.com/uclaml/SPIN.

Updated: 2024-06-14 21:17:17

标题: 自我对弈微调将弱语言模型转换为强语言模型

摘要: 通过监督微调（SFT）利用人工标注数据的力量对于推进大型语言模型（LLMs）至关重要。在本文中，我们探讨了在不需要获取额外人工标注数据的情况下，将一个弱LLM发展成一个强大LLM的可能性。我们提出了一种名为自我对弈微调（SPIN）的新微调方法，该方法从一个经过监督微调的模型开始。SPIN的核心是自我对弈机制，LLM通过与自身实例对抗来提升自身能力。具体来说，LLM从其先前迭代中生成自己的训练数据，通过辨别这些自动生成的响应与从人工标注数据中获取的响应来完善其策略。我们的方法逐步将LLM从一个新生模型提升为一个强大的模型，释放人工标注示范数据在SFT中的全部潜力。从理论上讲，我们证明了我们方法的训练目标函数的全局最优值仅当LLM策略与目标数据分布一致时才能实现。在经验上，我们在包括HuggingFace Open LLM Leaderboard、MT-Bench和Big-Bench数据集在内的几个基准数据集上评估了我们的方法。我们的结果表明，SPIN可以显著提高LLM在各种基准测试中的性能，甚至超越通过直接偏好优化（DPO）辅以额外GPT-4偏好数据训练的模型。这为自我对弈的前景带来了光明，使得在LLMs中实现人类水平的性能而无需专家对手成为可能。代码可在https://github.com/uclaml/SPIN找到。

更新时间: 2024-06-14 21:17:17

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2401.01335v3

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. In this paper, we propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks, using only a text description of the task goal and the agent's visual observations, by leveraging feedbacks from vision language foundation models (VLMs). The key to our approach is to query these models to give preferences over pairs of the agent's image observations based on the text description of the task goal, and then learn a reward function from the preference labels, rather than directly prompting these models to output a raw reward score, which can be noisy and inconsistent. We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains - including classic control, as well as manipulation of rigid, articulated, and deformable objects - without the need for human supervision, outperforming prior methods that use large pretrained models for reward generation under the same assumptions. Videos can be found on our project website: https://rlvlmf2024.github.io/

Updated: 2024-06-14 21:10:32

标题: RL-VLM-F: 视觉语言基础模型反馈强化学习

摘要: 奖励工程长期以来一直是强化学习（RL）研究中的一个挑战，因为通常需要大量的人力和反复试错的过程来设计有效的奖励函数。在本文中，我们提出了RL-VLM-F，一种方法，通过利用视觉语言基础模型（VLMs）的反馈，自动生成代理学习新任务的奖励函数，仅使用任务目标的文本描述和代理的视觉观察。我们的方法的关键是查询这些模型，基于任务目标的文本描述，对代理的图像观察对进行偏好排序，然后从偏好标签中学习奖励函数，而不是直接提示这些模型输出原始奖励分数，这可能会有噪音和不一致性。我们证明了RL-VLM-F成功地在各个领域产生了有效的奖励和策略 - 包括经典控制，以及刚性、关节和可变形物体的操作 - 而无需人类监督，优于以相同假设为基础的使用大型预训练模型进行奖励生成的先前方法。视频可以在我们的项目网站上找到：https://rlvlmf2024.github.io/

更新时间: 2024-06-14 21:10:32

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.03681v4

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. RoboGen leverages the latest advancements in foundation and generative models. Instead of directly using or adapting these models to produce policies or low-level actions, we advocate for a generative scheme, which uses these models to automatically generate diversified tasks, scenes, and training supervisions, thereby scaling up robotic skill learning with minimal human supervision. Our approach equips a robotic agent with a self-guided propose-generate-learn cycle: the agent first proposes interesting tasks and skills to develop, and then generates corresponding simulation environments by populating pertinent objects and assets with proper spatial configurations. Afterwards, the agent decomposes the proposed high-level task into sub-tasks, selects the optimal learning approach (reinforcement learning, motion planning, or trajectory optimization), generates required training supervision, and then learns policies to acquire the proposed skill. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics. Our fully generative pipeline can be queried repeatedly, producing an endless stream of skill demonstrations associated with diverse tasks and environments.

Updated: 2024-06-14 21:09:49

标题: RoboGen：通过生成式模拟释放无限数据以实现自动化机器人学习

摘要: 我们提出了RoboGen，这是一个生成式机器人代理，通过生成式模拟自动学习各种规模的机器人技能。RoboGen利用了基础和生成模型的最新进展。我们主张采用生成方案，而不是直接使用或调整这些模型以产生策略或低级动作，这种方法利用这些模型自动生成多样化的任务、场景和训练监督，从而在最小的人类监督下扩大机器人技能学习。我们的方法为机器人代理配备了自主提出-生成-学习循环：代理首先提出有趣的任务和技能发展，并通过将相关对象和资产以适当的空间配置填充来生成相应的模拟环境。然后，代理将提出的高级任务分解为子任务，选择最佳的学习方法（强化学习、运动规划或轨迹优化），生成所需的训练监督，然后学习策略以获得提出的技能。我们的工作试图提取嵌入大规模模型中的广泛和多样化知识，并将其转移到机器人领域。我们完全生成式的流水线可以反复查询，产生与各种任务和环境相关的无穷无尽的技能演示。

更新时间: 2024-06-14 21:09:49

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.01455v3

Tree Search for Simultaneous Move Games via Equilibrium Approximation

Neural network supported tree-search has shown strong results in a variety of perfect information multi-agent tasks. However, the performance of these methods on partial information games has generally been below competing approaches. Here we study the class of simultaneous-move games, which are a subclass of partial information games which are most similar to perfect information games: both agents know the game state with the exception of the opponent's move, which is revealed only after each agent makes its own move. Simultaneous move games include popular benchmarks such as Google Research Football and Starcraft. In this study we answer the question: can we take tree search algorithms trained through self-play from perfect information settings and adapt them to simultaneous move games without significant loss of performance? We answer this question by deriving a practical method that attempts to approximate a coarse correlated equilibrium as a subroutine within a tree search. Our algorithm works on cooperative, competitive, and mixed tasks. Our results are better than the current best MARL algorithms on a wide range of accepted baseline environments.

Updated: 2024-06-14 21:02:35

标题: 通过均衡近似在同时进行移动的游戏中搜索树

摘要: 神经网络支持的树搜索在多种完全信息的多智能体任务中表现出色。然而，这些方法在部分信息游戏上的表现通常低于竞争方法。在这里，我们研究了同步移动游戏的类别，这是部分信息游戏的一个子类，与完全信息游戏最相似：两个智能体都知道游戏状态，除了对手的移动，只有在每个智能体做出自己的移动后才揭示。同步移动游戏包括流行的基准测试，例如Google Research Football和星际争霸。在这项研究中，我们回答了一个问题：我们是否可以将通过自我对弈从完全信息设置中训练的树搜索算法调整到同步移动游戏中，而不会显著降低性能？我们通过推导出一种实用方法来回答这个问题，该方法试图在树搜索的子程序中近似粗略相关均衡。我们的算法适用于合作、竞争和混合任务。我们的结果优于当前广泛接受的基准环境上的最佳多智能体强化学习算法。

更新时间: 2024-06-14 21:02:35

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2406.10411v1

Suboptimality bounds for trace-bounded SDPs enable a faster and scalable low-rank SDP solver SDPLR+

Semidefinite programs (SDPs) and their solvers are powerful tools with many applications in machine learning and data science. Designing scalable SDP solvers is challenging because by standard the positive semidefinite decision variable is an $n \times n$ dense matrix, even though the input is often an $n \times n$ sparse matrix. However, the information in the solution may not correspond to a full-rank dense matrix as shown by Bavinok and Pataki. Two decades ago, Burer and Monterio developed an SDP solver $\texttt{SDPLR}$ that optimizes over a low-rank factorization instead of the full matrix. This greatly decreases the storage cost and works well for many problems. The original solver $\texttt{SDPLR}$ tracks only the primal infeasibility of the solution, limiting the technique's flexibility to produce moderate accuracy solutions. We use a suboptimality bound for trace-bounded SDP problems that enables us to track the progress better and perform early termination. We then develop $\texttt{SDPLR+}$, which starts the optimization with an extremely low-rank factorization and dynamically updates the rank based on the primal infeasibility and suboptimality. This further speeds up the computation and saves the storage cost. Numerical experiments on Max Cut, Minimum Bisection, Cut Norm, and Lov\'{a}sz Theta problems with many recent memory-efficient scalable SDP solvers demonstrate its scalability up to problems with million-by-million decision variables and it is often the fastest solver to a moderate accuracy of $10^{-2}$.

Updated: 2024-06-14 20:31:22

标题: 次优性界限对于迹有界SDP使得更快速且可扩展的低秩SDP求解器SDPLR+

摘要: 半定规划（SDP）及其求解器是在机器学习和数据科学中具有许多应用的强大工具。设计可扩展的SDP求解器具有挑战性，因为标准情况下，正半定决策变量是一个$n \times n$的稠密矩阵，尽管输入通常是一个$n \times n$的稀疏矩阵。然而，解决方案中的信息可能不对应于全秩稠密矩阵，正如Bavinok和Pataki所示。二十年前，Burer和Monterio开发了一个SDP求解器$\texttt{SDPLR}$，它通过对低秩分解进行优化而不是对完整矩阵进行优化。这大大降低了存储成本，并在许多问题上表现良好。原始求解器$\texttt{SDPLR}$仅跟踪解决方案的原始不可行性，限制了该技术产生中等精度解决方案的灵活性。我们使用了一个用于迹有界SDP问题的次优性界限，使我们能够更好地跟踪进展并进行早期终止。然后我们开发了$\texttt{SDPLR+}$，它以极低秩分解开始优化，并根据原始不可行性和次优性动态更新秩。这进一步加快了计算速度并节省了存储成本。使用许多最近的内存高效可扩展SDP求解器对最大割、最小二分、割范数和Lov\'{a}sz Theta问题进行数值实验，展示了其可扩展性，直至百万乘百万决策变量问题，并且通常是达到中等精度$10^{-2}$最快的求解器。

更新时间: 2024-06-14 20:31:22

领域: math.OC,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.10407v1

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits

Experimentation is crucial for managers to rigorously quantify the value of a change and determine if it leads to a statistically significant improvement over the status quo, thus augmenting their decision-making. Many companies now mandate that all changes undergo experimentation, presenting two challenges: (1) reducing the risk/cost of experimentation by minimizing the proportion of customers assigned to the inferior treatment and (2) increasing the experimentation velocity by enabling managers to stop experiments as soon as results are statistically significant. This paper simultaneously addresses both challenges by proposing the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime valid inference on the Average Treatment Effect (ATE) for any MAB algorithm. Intuitively, the MAB "mixes" any bandit algorithm with a Bernoulli design such that at each time step, the probability that a customer is assigned via the Bernoulli design is controlled by a user-specified deterministic sequence that can converge to zero. The sequence enables managers to directly and interpretably control the trade-off between regret minimization and inferential precision. Under mild conditions on the rate the sequence converges to zero, we provide a confidence sequence that is asymptotically anytime valid and demonstrate that the MAD is guaranteed to have a finite stopping time in the presence of a true non-zero ATE. Hence, the MAD allows managers to stop experiments early when a significant ATE is detected while ensuring valid inference, enhancing both the efficiency and reliability of adaptive experiments. Empirically, we demonstrate that the MAD achieves finite-sample anytime-validity while accurately and precisely estimating the ATE, all without incurring significant losses in reward compared to standard bandit designs.

Updated: 2024-06-14 20:24:39

标题: 一个适用于多臂赌博机的随时有效因果推断的实验设计

摘要: 实验对于管理者来说至关重要，可以严格量化改变的价值，并确定其是否比现状有显著改进，从而增强他们的决策能力。许多公司现在要求所有变化都必须经过实验验证，这带来了两个挑战：（1）通过最小化分配给较差处理的顾客比例来减少实验的风险/成本；（2）通过使管理者能够在结果具有统计学意义时立即停止实验来增加实验速度。本文通过提出混合自适应设计（MAD），一个新的多臂老虎机算法实验设计，同时解决了这两个挑战，为任何多臂老虎机算法的平均处理效应（ATE）提供随时有效推断。直观地说，MAB将任何老虎机算法与伯努利设计结合起来，以便在每个时间步骤中，通过伯努利设计分配给顾客的概率由用户指定的确定性序列控制，该序列可以收敛到零。该序列使管理者能够直接和可解释地控制后悔最小化和推理精度之间的权衡。在序列收敛到零的速率上有一些温和条件的情况下，我们提供了一个渐进随时有效的置信序列，并证明MAD在存在真正的非零ATE时保证具有有限的停止时间。因此，MAD允许管理者在检测到显著ATE时提前停止实验，同时确保有效推断，提高自适应实验的效率和可靠性。在经验上，我们证明MAD实现了有限样本的随时有效性，同时准确和精确地估计了ATE，而与标准老虎机设计相比，没有出现显著的奖励损失。

更新时间: 2024-06-14 20:24:39

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2311.05794v3

Language models scale reliably with over-training and on downstream tasks

Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimately trained and evaluated. For instance, scaling is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime). In contrast, models are often over-trained to reduce inference costs. Moreover, scaling laws mostly predict loss on next-token prediction, but models are usually compared on downstream task performance. To address both shortcomings, we create a testbed of 104 models with 0.011B to 6.9B parameters trained with various numbers of tokens on three data distributions. First, we fit scaling laws that extrapolate in both the amount of over-training and the number of model parameters. This enables us to predict the validation loss of a 1.4B parameter, 900B token run (i.e., 32$\times$ over-trained) and a 6.9B parameter, 138B token run (i.e., a compute-optimal run)$\unicode{x2014}$each from experiments that take 300$\times$ less compute. Second, we relate the perplexity of a language model to its downstream task performance by proposing a power law. We use this law to predict top-1 error averaged over downstream tasks for the two aforementioned models, using experiments that take 20$\times$ less compute. Our experiments are available at https://github.com/mlfoundations/scaling.

Updated: 2024-06-14 20:21:05

标题: 语言模型可靠地随着过度训练和下游任务的扩展

摘要: 缩放规律是降低昂贵训练运行风险的有用指南，因为它们可以利用廉价的小规模实验来预测大型模型的性能。然而，目前的缩放研究与语言模型最终的训练和评估之间仍存在差距。例如，缩放通常在计算最优训练范围（即“Chinchilla最佳”范围）中进行研究。相反，模型通常会过度训练以减少推理成本。此外，缩放规律大多预测下一个标记预测的损失，但模型通常是根据下游任务性能进行比较的。为了解决这两个缺点，我们创建了一个测试平台，包括104个具有0.011B至6.9B参数的模型，这些模型在三种数据分布上以不同数量的标记进行训练。首先，我们拟合缩放规律，可以在过度训练的数量和模型参数的数量方面进行外推。这使我们能够预测一个拥有1.4B参数、900B标记运行（即过度训练32倍）和一个拥有6.9B参数、138B标记运行（即计算最优运行）的验证损失，这些预测是通过消耗300倍更少计算的实验得出的。其次，我们提出了一个幂律，将语言模型的困惑度与其下游任务性能联系起来。我们使用这个规律来预测上述两个模型在下游任务中的top-1误差，使用了比原来少20倍计算的实验。我们的实验可在https://github.com/mlfoundations/scaling上找到。

更新时间: 2024-06-14 20:21:05

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.08540v2

Evaluating Speaker Identity Coding in Self-supervised Models and Humans

Speaker identity plays a significant role in human communication and is being increasingly used in societal applications, many through advances in machine learning. Speaker identity perception is an essential cognitive phenomenon that can be broadly reduced to two main tasks: recognizing a voice or discriminating between voices. Several studies have attempted to identify acoustic correlates of identity perception to pinpoint salient parameters for such a task. Unlike other communicative social signals, most efforts have yielded inefficacious conclusions. Furthermore, current neurocognitive models of voice identity processing consider the bases of perception as acoustic dimensions such as fundamental frequency, harmonics-to-noise ratio, and formant dispersion. However, these findings do not account for naturalistic speech and within-speaker variability. Representational spaces of current self-supervised models have shown significant performance in various speech-related tasks. In this work, we demonstrate that self-supervised representations from different families (e.g., generative, contrastive, and predictive models) are significantly better for speaker identification over acoustic representations. We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks. By evaluating speaker identification accuracy across acoustic, phonemic, prosodic, and linguistic variants, we report similarity between model performance and human identity perception. We further examine these similarities by juxtaposing the encoding spaces of models and humans and challenging the use of distance metrics as a proxy for speaker proximity. Lastly, we show that some models can predict brain responses in Auditory and Language regions during naturalistic stimuli.

Updated: 2024-06-14 20:07:21

标题: 评估自监督模型和人类中的说话者身份编码

摘要: 说话者身份在人类交流中起着重要作用，并且越来越多地应用于社会应用，其中许多是通过机器学习的进展实现的。说话者身份知觉是一种重要的认知现象，可以广泛归纳为两个主要任务：识别声音或区分声音。多项研究试图识别身份知觉的声学相关性，以确定这一任务的显著参数。与其他沟通社会信号不同，大多数努力都未能取得成果。此外，当前的声音身份处理的神经认知模型将知觉基础视为声学维度，如基本频率、谐波与噪声比和共振峰离散度。然而，这些发现并未考虑到自然语音和说话者内部变异性。目前的自监督模型的表征空间在各种与语音相关的任务中显示出显著的性能。在这项工作中，我们展示了来自不同家族（例如生成、对比和预测模型）的自监督表示对于说话者识别要比声学表示明显更好。我们还展示了这种说话者识别任务可以用于更好地理解这些强大网络不同层中声学信息表示的性质。通过评估声学、音素、韵律和语言变体的说话者识别准确性，我们报告了模型性能与人类身份知觉之间的相似性。我们通过将模型和人类的编码空间进行对比，并挑战将距离度量作为说话者接近程度代理的使用来进一步检验这些相似性。最后，我们展示了一些模型能够预测在自然刺激期间听觉和语言区域的大脑反应。

更新时间: 2024-06-14 20:07:21

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2406.10401v1

Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

Recent advances in generative vision-language models (VLMs) have exciting potential implications for AI in radiology, yet VLMs are also known to produce hallucinations, nonsensical text, and other unwanted behaviors that can waste clinicians' time and cause patient harm. Drawing on recent work on direct preference optimization (DPO), we propose a simple method for modifying the behavior of pretrained VLMs performing radiology report generation by suppressing unwanted types of generations. We apply our method to the prevention of hallucinations of prior exams, addressing a long-established problem behavior in models performing chest X-ray report generation. Across our experiments, we find that DPO fine-tuning achieves a 3.2-4.8x reduction in lines hallucinating prior exams while maintaining model performance on clinical accuracy metrics. Our work is, to the best of our knowledge, the first work to apply DPO to medical VLMs, providing a data- and compute- efficient way to suppress problem behaviors while maintaining overall clinical accuracy.

Updated: 2024-06-14 19:47:20

标题: 直接偏好优化以抑制放大的先验考试在放射学报告生成中

摘要: 最近在生成式视觉语言模型（VLMs）方面取得的进展为放射学中的人工智能带来了令人兴奋的潜力影响，然而VLMs也被认为会产生幻觉、无意义的文本和其他不良行为，可能浪费临床医生的时间并对患者造成伤害。借鉴最近有关直接偏好优化（DPO）的研究成果，我们提出了一种简单的方法，通过抑制不受欢迎的生成类型，修改预训练的VLMs执行放射学报告生成的行为。我们将该方法应用于预防先前检查的幻觉，解决了在进行胸部X射线报告生成的模型中存在已久的问题行为。通过我们的实验，我们发现DPO微调可以实现先前检查幻觉行数的3.2-4.8倍减少，同时在临床准确性指标上保持模型性能。据我们所知，我们的工作是第一个将DPO应用于医学VLMs的工作，为抑制问题行为同时保持整体临床准确性提供了一种数据和计算效率的方式。

更新时间: 2024-06-14 19:47:20

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.06496v2

BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we introduce the first comprehensive RNA benchmark BEACON (\textbf{BE}nchm\textbf{A}rk for \textbf{CO}mprehensive R\textbf{N}A Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications, enabling a comprehensive assessment of the performance of methods on various RNA understanding tasks. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components from the tokenizer and positional encoding aspects. Notably, our findings emphasize the superiority of single nucleotide tokenization and the effectiveness of Attention with Linear Biases (ALiBi) over traditional positional encoding methods. Based on these insights, a simple yet strong baseline called BEACON-B is proposed, which can achieve outstanding performance with limited data and computational resources. The datasets and source code of our benchmark are available at https://github.com/terry-r123/RNABenchmark.

Updated: 2024-06-14 19:39:19

标题: BEACON：全面RNA任务和语言模型的基准Benchmark

摘要: RNA在将遗传指令翻译为功能结果中发挥着关键作用，突显了其在生物过程和疾病机制中的重要性。尽管出现了许多用于RNA的深度学习方法，特别是通用RNA语言模型，但仍然存在缺乏标准化基准来评估这些方法有效性的问题。在本研究中，我们介绍了第一个全面的RNA基准BEACON（BEAnchmark for COmprehensive RNA Task and Language Models）。首先，BEACON包括13个不同的任务，这些任务源自先前广泛研究的结构分析、功能研究和工程应用，能够全面评估各种RNA理解任务的方法表现。其次，我们研究了一系列模型，包括传统方法如CNN，以及基于语言模型的先进RNA基础模型，为这些模型在特定任务上的表现提供了宝贵的见解。第三，我们从分词器和位置编码方面研究了关键的RNA语言模型组件。值得注意的是，我们的发现强调了单核苷酸标记化的优越性，以及Attention with Linear Biases（ALiBi）相对于传统位置编码方法的有效性。基于这些见解，提出了一个简单但强大的基线模型BEACON-B，可以在有限数据和计算资源的情况下取得出色表现。我们的基准数据集和源代码可在https://github.com/terry-r123/RNABenchmark 上获得。

更新时间: 2024-06-14 19:39:19

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2406.10391v1

Simple and near-optimal algorithms for hidden stratification and multi-group learning

Multi-group agnostic learning is a formal learning criterion that is concerned with the conditional risks of predictors within subgroups of a population. The criterion addresses recent practical concerns such as subgroup fairness and hidden stratification. This paper studies the structure of solutions to the multi-group learning problem, and provides simple and near-optimal algorithms for the learning problem.

Updated: 2024-06-14 19:39:04

标题: 简单且近乎最优的算法用于隐藏分层和多组学习

摘要: 多组不可知学习是一种正式的学习标准，关注人群子组中预测器的条件风险。该标准解决了最近的实际问题，如子组公平性和隐藏层化。本文研究了多组学习问题解决方案的结构，并为学习问题提供了简单且接近最优的算法。

更新时间: 2024-06-14 19:39:04

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2112.12181v2

Extended Reality for Enhanced Human-Robot Collaboration: a Human-in-the-Loop Approach

The rise of automation has provided an opportunity to achieve higher efficiency in manufacturing processes, yet it often compromises the flexibility required to promptly respond to evolving market needs and meet the demand for customization. Human-robot collaboration attempts to tackle these challenges by combining the strength and precision of machines with human ingenuity and perceptual understanding. In this paper, we conceptualize and propose an implementation framework for an autonomous, machine learning-based manipulator that incorporates human-in-the-loop principles and leverages Extended Reality (XR) to facilitate intuitive communication and programming between humans and robots. Furthermore, the conceptual framework foresees human involvement directly in the robot learning process, resulting in higher adaptability and task generalization. The paper highlights key technologies enabling the proposed framework, emphasizing the importance of developing the digital ecosystem as a whole. Additionally, we review the existent implementation approaches of XR in human-robot collaboration, showcasing diverse perspectives and methodologies. The challenges and future outlooks are discussed, delving into the major obstacles and potential research avenues of XR for more natural human-robot interaction and integration in the industrial landscape.

Updated: 2024-06-14 19:27:14

标题: 扩展现实技术在增强人机协作中的应用：一种人在回路中的方法

摘要: 自动化的兴起为制造过程的效率提供了机会，但往往会影响到及时响应不断变化的市场需求和满足定制需求所需的灵活性。人机协作试图通过结合机器的力量和精度与人类的智慧和感知理解来解决这些挑战。本文中，我们构想并提出了一个基于机器学习的自主操作框架，该框架融入了人为环节原则，并利用扩展现实（XR）来促进人类和机器人之间直观的沟通和编程。此外，概念框架预见了人类直接参与机器人学习过程，从而实现更高的适应性和任务泛化。本文突出了支持所提出框架的关键技术，强调发展数字生态系统作为整体的重要性。此外，我们审查了XR在人机协作中的现有实施方法，展示了不同的观点和方法论。讨论了挑战和未来展望，深入探讨了XR在工业领域中实现更自然人机交互和整合的主要障碍和潜在研究途径。

更新时间: 2024-06-14 19:27:14

领域: cs.RO,cs.HC,cs.LG

下载: http://arxiv.org/abs/2403.14597v2

Efficient Prompting for LLM-based Generative Internet of Things

Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting. However, open-source LLMs usually have more limitations regarding their performance, such as their arithmetic calculation and reasoning capacities, and practical systems of applying LLMs to IoT have yet to be well-explored. Therefore, we propose a text-based generative IoT (GIoT) system deployed in the local network setting in this study. To alleviate the limitations of LLMs and provide service with competitive performance, we apply prompt engineering methods to enhance the capacities of the open-source LLMs, design a Prompt Management Module and a Post-processing Module to manage the tailored prompts for different tasks and process the results generated by the LLMs. To demonstrate the effectiveness of the proposed system, we discuss a challenging Table Question Answering (Table-QA) task as a case study of the proposed system, as tabular data is usually more challenging than plain text because of their complex structures, heterogeneous data types and sometimes huge sizes. We conduct comprehensive experiments on two popular Table-QA datasets, and the results show that our proposal can achieve competitive performance compared with state-of-the-art LLMs, demonstrating that the proposed LLM-based GIoT system can provide competitive performance with tailored prompting methods and is easily extensible to new tasks without training.

Updated: 2024-06-14 19:24:00

标题: 基于LLM的生成式物联网的高效提示

摘要: 大型语言模型（LLMs）已经在各种任务上展示出了卓越的能力，将LLMs的能力整合到物联网（IoT）应用中最近吸引了很多研究关注。由于安全问题，许多机构避免访问最先进的商业LLM服务，需要在本地网络环境中部署和利用开源LLMs。然而，开源LLMs通常在性能方面有更多限制，比如算术计算和推理能力，以及将LLMs应用于物联网的实际系统尚未得到很好的探索。因此，在本研究中我们提出了一个部署在本地网络环境中的基于文本生成的物联网（GIoT）系统。为了减轻LLMs的限制并提供具有竞争力的服务，我们应用提示工程方法来增强开源LLMs的能力，设计了一个提示管理模块和一个后处理模块来管理为不同任务定制的提示，并处理LLMs生成的结果。为了证明所提出系统的有效性，我们将一个具有挑战性的表格问答（Table-QA）任务作为所提出系统的案例研究，因为表格数据通常比纯文本更具挑战性，由于其复杂的结构、异构数据类型和有时庞大的规模。我们对两个流行的Table-QA数据集进行了全面的实验，结果显示我们的提议可以与最先进的LLMs实现竞争性能，表明所提出的基于LLMs的GIoT系统可以通过定制提示方法提供具有竞争力的性能，并且可以很容易地扩展到新任务而无需训练。

更新时间: 2024-06-14 19:24:00

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10382v1

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuring the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

Updated: 2024-06-14 19:22:26

标题: WorkArena：网络代理在解决常见知识工作任务方面的能力如何？

摘要: 我们研究了基于大型语言模型的代理程序与软件通过Web浏览器进行交互的应用。与以往的工作不同，我们专注于衡量代理程序在执行跨知识工作者日常工作的典型任务时的能力，这些知识工作者利用企业软件系统。为此，我们提出了WorkArena，一个基于广泛使用的ServiceNow平台的33个任务的远程托管基准测试。我们还引入了BrowserGym，这是一个用于设计和评估这种代理程序的环境，提供了丰富的行动和多模态观察。我们的实证评估显示，尽管当前的代理程序在WorkArena上表现出了潜力，但仍存在相当大的差距，以实现完全的任务自动化。值得注意的是，我们的分析揭示了开源和闭源LLM之间存在显著的性能差距，突出了未来在这一领域探索和发展的关键领域。

更新时间: 2024-06-14 19:22:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.07718v3

Learning a Diffusion Model Policy from Rewards via Q-Score Matching

Diffusion models have become a popular choice for representing actor policies in behavior cloning and offline reinforcement learning. This is due to their natural ability to optimize an expressive class of distributions over a continuous space. However, previous works fail to exploit the score-based structure of diffusion models, and instead utilize a simple behavior cloning term to train the actor, limiting their ability in the actor-critic setting. In this paper, we present a theoretical framework linking the structure of diffusion model policies to a learned Q-function, by linking the structure between the score of the policy to the action gradient of the Q-function. We focus on off-policy reinforcement learning and propose a new policy update method from this theory, which we denote Q-score matching. Notably, this algorithm only needs to differentiate through the denoising model rather than the entire diffusion model evaluation, and converged policies through Q-score matching are implicitly multi-modal and explorative in continuous domains. We conduct experiments in simulated environments to demonstrate the viability of our proposed method and compare to popular baselines. Source code is available from the project website: https://scorematchingrl.com.

Updated: 2024-06-14 19:13:39

标题: 通过Q-Score匹配从奖励中学习扩散模型策略

摘要: 扩散模型已成为在行为克隆和离线强化学习中代表参与者政策的流行选择。这是因为它们自然地能够优化连续空间中的一类分布。然而，先前的作品未能利用扩散模型的基于分数的结构，而是利用简单的行为克隆术语来训练参与者，从而限制了它们在参与者-评论者设置中的能力。在本文中，我们提出了一个理论框架，将扩散模型政策的结构与学习的Q函数联系起来，通过将政策的分数与Q函数的动作梯度之间的结构联系起来。我们专注于离线强化学习，并从这个理论提出了一种新的政策更新方法，我们称之为Q分数匹配。值得注意的是，这个算法只需要通过去噪模型而不是整个扩散模型评估进行微分运算，并且通过Q分数匹配收敛的政策在连续领域中是隐式多模态和具有探索性的。我们在模拟环境中进行实验，以展示我们提出的方法的可行性，并与流行的基线进行比较。源代码可从项目网站获得：https://scorematchingrl.com。

更新时间: 2024-06-14 19:13:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.11752v2

MEMO-QCD: Quantum Density Estimation through Memetic Optimisation for Quantum Circuit Design

This paper presents a strategy for efficient quantum circuit design for density estimation. The strategy is based on a quantum-inspired algorithm for density estimation and a circuit optimisation routine based on memetic algorithms. The model maps a training dataset to a quantum state represented by a density matrix through a quantum feature map. This training state encodes the probability distribution of the dataset in a quantum state, such that the density of a new sample can be estimated by projecting its corresponding quantum state onto the training state. We propose the application of a memetic algorithm to find the architecture and parameters of a variational quantum circuit that implements the quantum feature map, along with a variational learning strategy to prepare the training state. Demonstrations of the proposed strategy show an accurate approximation of the Gaussian kernel density estimation method through shallow quantum circuits illustrating the feasibility of the algorithm for near-term quantum hardware.

Updated: 2024-06-14 19:07:16

标题: MEMO-QCD：通过记忆优化进行量子电路设计的量子密度估计

摘要: 本文提出了一种用于密度估计的高效量子电路设计策略。该策略基于一种基于量子启发算法的密度估计和基于记忆算法的电路优化例程。该模型通过量子特征映射将训练数据集映射到由密度矩阵表示的量子态。这个训练状态在量子态中编码了数据集的概率分布，因此可以通过将其对应的量子态投影到训练状态来估计新样本的密度。我们提出了应用记忆算法来找到实现量子特征映射的变分量子电路的架构和参数，以及变分学习策略来准备训练状态。所提出的策略的演示通过浅层量子电路准确近似了高斯核密度估计方法，展示了该算法在近期量子硬件上的可行性。

更新时间: 2024-06-14 19:07:16

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2406.08591v2

A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts

The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available at: https://unitn-sml.github.io/rsbench.

Updated: 2024-06-14 18:52:34

标题: 一个用于系统评估推理捷径的基准套件

摘要: 强大的神经分类器的出现增加了对需要学习和推理的问题的兴趣。这些问题对于理解模型的重要属性（如可信度、泛化能力、可解释性以及对安全性和结构约束的遵守）至关重要。然而，最近的研究发现，需要学习和推理背景知识的任务经常受到推理捷径（RSs）的影响：预测器可以在不将正确概念与高维数据关联的情况下解决下游推理任务。为了解决这个问题，我们引入了rsbench，这是一个全面的基准套件，旨在通过提供易于访问的受RSs影响的高度可定制任务，系统评估RSs对模型的影响。此外，rsbench实现了评估概念质量的常用指标，并引入了用于评估学习任务中RSs存在的新颖形式验证程序。使用rsbench，我们强调纯粹神经和神经符号模型中获得高质量概念是一个远未解决的问题。rsbench可在以下网址找到：https://unitn-sml.github.io/rsbench。

更新时间: 2024-06-14 18:52:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.10368v1

Disentangled Hyperbolic Representation Learning for Heterogeneous Graphs

Heterogeneous graphs have attracted a lot of research interests recently due to the success for representing complex real-world systems. However, existing methods have two pain points in embedding them into low-dimensional spaces: the mixing of structural and semantic information, and the distributional mismatch between data and embedding spaces. These two challenges require representation methods to consider the global and partial data distributions while unmixing the information. Therefore, in this paper, we propose $\text{Dis-H}^2\text{GCN}$, a Disentangled Hyperbolic Heterogeneous Graph Convolutional Network. On the one hand, we leverage the mutual information minimization and discrimination maximization constraints to disentangle the semantic features from comprehensively learned representations by independent message propagation for each edge type, away from the pure structural features. On the other hand, the entire model is constructed upon the hyperbolic geometry to narrow the gap between data distributions and representing spaces. We evaluate our proposed $\text{Dis-H}^2\text{GCN}$ on five real-world heterogeneous graph datasets across two downstream tasks: node classification and link prediction. The results demonstrate its superiority over state-of-the-art methods, showcasing the effectiveness of our method in disentangling and representing heterogeneous graph data in hyperbolic spaces.

Updated: 2024-06-14 18:50:47

标题: 解缠杂的双曲表示学习用于异构图的研究

摘要: 异构图最近引起了很多研究兴趣，因为它们成功地表示了复杂的现实世界系统。然而，现有方法在将它们嵌入到低维空间时存在两个痛点：结构和语义信息的混合，以及数据和嵌入空间之间的分布不匹配。这两个挑战要求表示方法考虑全局和部分数据分布，同时解开信息。因此，在本文中，我们提出了一种解缠的双曲异构图卷积网络$\text{Dis-H}^2\text{GCN}$。一方面，我们利用互信息最小化和判别最大化约束，通过独立的消息传播来解开语义特征，使其远离纯结构特征。另一方面，整个模型基于双曲几何构建，缩小了数据分布和表示空间之间的差距。我们在五个现实世界的异构图数据集上评估了我们提出的$\text{Dis-H}^2\text{GCN}$在两个下游任务中的表现：节点分类和链接预测。结果证明了它优于最先进的方法，展示了我们的方法在双曲空间中解开和表示异构图数据的有效性。

更新时间: 2024-06-14 18:50:47

领域: cs.LG

下载: http://arxiv.org/abs/2406.10367v1

Nonlinear dynamical social and political prediction algorithm for city planning and public participation using the Impulse Pattern Formulation

A nonlinear-dynamical algorithm for city planning is proposed as an Impulse Pattern Formulation (IPF) for predicting relevant parameters like health, artistic freedom, or financial developments of different social or political stakeholders over the cause of a planning process. The IPF has already shown high predictive precision at low computational cost in musical instrument simulations, brain dynamics, and human-human interactions. The social and political IPF consists of three basic equations of system state developments, self-adaptation of stakeholders, two adaptive interactions, and external impact terms suitable for respective planning situations. Typical scenarios of stakeholder interactions and developments are modeled by adjusting a set of system parameters. These include stakeholder reaction to external input, enhanced system stability through self-adaptation, stakeholder convergence due to adaptive interaction, as well as complex dynamics in terms of fixed stakeholder impacts. A workflow for implementing the algorithm in real city planning scenarios is outlined. This workflow includes machine learning of a suitable set of parameters suggesting best-practice planning to aim at the desired development of the planning process and its output.

Updated: 2024-06-14 18:47:45

标题: 非线性动力社会和政治预测算法：使用脉冲模式公式进行城市规划和公众参与

摘要: 提出了一种用于城市规划的非线性动力学算法，称为脉冲模式制定（IPF），用于预测不同社会或政治利益相关方在规划过程中的健康、艺术自由或财务发展等相关参数。IPF已经在音乐乐器模拟、脑动力学和人际互动中显示出高预测精度和低计算成本。社会和政治IPF包括系统状态发展的三个基本方程、利益相关者的自适应、两种自适应互动和适合于不同规划情况的外部影响项。通过调整一组系统参数来建模利益相关者的交互和发展的典型场景，包括利益相关者对外部输入的反应、通过自适应提高系统稳定性、由于自适应互动而导致利益相关者的收敛，以及在固定利益相关者影响方面的复杂动态。提纲了在实际城市规划场景中实施该算法的工作流程。该工作流程包括对一组合适参数进行机器学习，提出最佳实践规划，以达到规划过程及其输出的期望发展目标。

更新时间: 2024-06-14 18:47:45

领域: nlin.AO,cs.AI,math.DS

下载: http://arxiv.org/abs/2404.00977v2

Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Commonly, AI or machine learning (ML) models are evaluated on benchmark datasets. This practice supports innovative methodological research, but benchmark performance can be poorly correlated with performance in real-world applications -- a construct validity issue. To improve the validity and practical usefulness of evaluations, we propose using an estimands framework adapted from international clinical trials guidelines. This framework provides a systematic structure for inference and reporting in evaluations, emphasizing the importance of a well-defined estimation target. We illustrate our proposal on examples of commonly used evaluation methodologies - involving cross-validation, clustering evaluation, and LLM benchmarking - that can lead to incorrect rankings of competing models (rank reversals) with high probability, even when performance differences are large. We demonstrate how the estimands framework can help uncover underlying issues, their causes, and potential solutions. Ultimately, we believe this framework can improve the validity of evaluations through better-aligned inference, and help decision-makers and model users interpret reported results more effectively.

Updated: 2024-06-14 18:47:37

标题: 利用估算框架提升人工智能/机器学习评估的有效性和实用性

摘要: 通常情况下，人工智能（AI）或机器学习（ML）模型是在基准数据集上进行评估的。这种做法支持创新的方法论研究，但基准性能可能与在现实世界应用中的性能存在较差的相关性——这是一个构建效度问题。为了提高评估的有效性和实用性，我们提议使用一个从国际临床试验指南中改编的估计框架。这个框架为评估中的推理和报告提供了一个系统结构，强调了一个明确定义的估计目标的重要性。我们通过常用评估方法的示例来说明我们的提议——这些方法包括交叉验证、聚类评估和LLM基准测试，即使性能差异很大，也很有可能导致竞争模型的错误排名（排名逆转）。我们展示了估计框架如何帮助发现潜在问题、其原因和潜在解决方案。最终，我们相信这个框架可以通过更好地对齐推理来提高评估的有效性，并帮助决策者和模型用户更有效地解释报告的结果。

更新时间: 2024-06-14 18:47:37

领域: cs.LG,stat.AP,stat.ME

下载: http://arxiv.org/abs/2406.10366v1

$\text{H}^2\text{TNE}$: Temporal Heterogeneous Information Network Embedding in Hyperbolic Spaces

Temporal heterogeneous information network (temporal HIN) embedding, aiming to represent various types of nodes of different timestamps into low dimensional spaces while preserving structural and semantic information, is of vital importance in diverse real-life tasks. Researchers have made great efforts on temporal HIN embedding in Euclidean spaces and got some considerable achievements. However, there is always a fundamental conflict that many real-world networks show hierarchical property and power-law distribution, and are not isometric of Euclidean spaces. Recently, representation learning in hyperbolic spaces has been proved to be valid for data with hierarchical and power-law structure. Inspired by this character, we propose a hyperbolic heterogeneous temporal network embedding ($\text{H}^2\text{TNE}$) model for temporal HINs. Specifically, we leverage a temporally and heterogeneously double-constrained random walk strategy to capture the structural and semantic information, and then calculate the embedding by exploiting hyperbolic distance in proximity measurement. Experimental results show that our method has superior performance on temporal link prediction and node classification compared with SOTA models.

Updated: 2024-06-14 18:43:40

标题: $\text{H}^2\text{TNE}$：在双曲空间中对时间异质信息网络进行嵌入

摘要: 时间异质信息网络（时间HIN）嵌入旨在将不同时间戳的各种节点表示为低维空间，同时保留结构和语义信息，在各种实际任务中具有至关重要的意义。研究人员已经在欧几里得空间中对时间HIN嵌入进行了大量努力，并取得了一些可观的成就。然而，许多现实世界网络都显示出层次性质和幂律分布，并且并非是欧几里得空间的等距性。最近，已经证明在双曲空间中的表示学习对具有层次结构和幂律结构的数据是有效的。受到这一特性的启发，我们提出了一个针对时间HIN的双曲异质网络嵌入（H^2TNE）模型。具体而言，我们利用一种时间和异质双重约束的随机游走策略来捕捉结构和语义信息，然后通过利用双曲距离在邻近度量中计算嵌入。实验结果表明，与SOTA模型相比，我们的方法在时间链接预测和节点分类方面具有更优越的性能。

更新时间: 2024-06-14 18:43:40

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2304.06970v3

The Effect of Sampling Temperature on Problem Solving in Large Language Models

In this research study, we empirically investigate the effect of sampling temperature on the performance of Large Language Models (LLMs) on various problem-solving tasks. We created a multiple-choice question-and-answer (MCQA) exam by randomly sampling problems from standard LLM benchmarks. Then, we used nine popular LLMs with five prompt-engineering techniques to solve the MCQA problems while increasing the sampling temperature from 0.0 to 1.6. Despite anecdotal reports to the contrary, our empirical results indicate that changes in temperature from 0.0 to 1.0 do not have a statistically significant impact on LLM performance for problem-solving tasks. In addition, these results appear to generalize across LLMs, prompt-engineering techniques, and problem domains. All code, data, and supplemental materials are available on GitHub at: https://github.com/matthewrenze/jhu-llm-temperature

Updated: 2024-06-14 18:41:51

标题: 采样温度对大型语言模型中问题解决的影响

摘要: 在这项研究中，我们从实证角度探讨了采样温度对大型语言模型（LLMs）在各种解决问题任务中表现的影响。我们通过从标准LLM基准中随机抽取问题，创建了一个多项选择问答（MCQA）考试。然后，我们使用了九种流行的LLMs和五种提示工程技术来解决MCQA问题，同时将采样温度从0.0增加到1.6。尽管有关于此的传闻报道与实证结果相反，我们的实证结果表明，从0.0到1.0的温度变化对LLM在解决问题任务时的表现没有统计学上显著的影响。此外，这些结果似乎可以泛化到LLMs、提示工程技术和问题领域。所有代码、数据和补充材料都可以在GitHub上找到：https://github.com/matthewrenze/jhu-llm-temperature

更新时间: 2024-06-14 18:41:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.05201v2

Predicting Consultation Success in Online Health Platforms Using Dynamic Knowledge Networks and Multimodal Data Fusion

Online healthcare consultation in virtual health is an emerging industry marked by innovation and fierce competition. Accurate and timely prediction of healthcare consultation success can proactively help online platforms address patient concerns and improve retention rates. However, predicting online consultation success is challenging due to the partial role of virtual consultations in patients' overall healthcare journey and the disconnect between online and in-person healthcare IT systems. Patient data in online consultations is often sparse and incomplete, presenting significant technical challenges and a research gap. To address these issues, we propose the Dynamic Knowledge Network and Multimodal Data Fusion (DyKoNeM) framework, which enhances the predictive power of online healthcare consultations. Our work has important implications for new business models where specific and detailed online communication processes are stored in the IT database, and at the same time, latent information with predictive power is embedded in the network formed by stakeholders' digital traces. It can be extended to diverse industries and domains, where the virtual or hybrid model (e.g., integration of online and offline services) is emerging as a prevailing trend.

Updated: 2024-06-14 18:41:30

标题: 利用动态知识网络和多模态数据融合预测在线健康平台上咨询成功的方法

摘要: 在线医疗咨询在虚拟健康领域是一个新兴行业，充满创新和激烈竞争。准确及时地预测医疗咨询的成功可以主动帮助在线平台解决患者关注的问题，并提高留存率。然而，由于虚拟咨询在患者整体医疗旅程中的部分作用以及在线和线下医疗IT系统之间的脱节，预测在线咨询的成功具有挑战性。在线咨询中的患者数据通常是稀疏和不完整的，存在重大技术挑战和研究空白。为解决这些问题，我们提出了动态知识网络和多模态数据融合（DyKoNeM）框架，可以增强在线医疗咨询的预测能力。我们的工作对于新的商业模式有重要意义，其中具体详细的在线沟通过程存储在IT数据库中，同时具有预测能力的潜在信息嵌入在由利益相关者数字痕迹形成的网络中。它可以扩展到各种行业和领域，在这些领域中，虚拟或混合模式（例如在线和离线服务的整合）正逐渐成为主要趋势。

更新时间: 2024-06-14 18:41:30

领域: cs.LG,K.5,H.4.m

下载: http://arxiv.org/abs/2306.03833v4

Perturbed examples reveal invariances shared by language models

The rapid growth in natural language processing (NLP) research has led to numerous new models, outpacing our understanding of how they compare to established ones. One major reason for this difficulty is saturating benchmarks, which may not well reflect differences in model performance in the wild. In this work, we introduce a novel framework to compare two NLP models by revealing their shared invariance to interpretable input perturbations targeting a specific linguistic capability. Via experiments on models from the same and different architecture families, this framework offers insights about how changes in models (e.g., distillation, size increase) affect linguistic capabilities. Furthermore, our framework enables evaluation of invariances between commercial black-box models (e.g., InstructGPT family) and models that are better understood (e.g., GPT-2). Across experiments, we observe that large language models share many invariances encoded by models of various sizes, whereas the invariances by large models are only shared by other large models. Possessing a wide variety of invariances may be key to the recent successes of large language models, and our framework can shed light on the types of invariances retained or emerging in new models. We make the code publicly available.

Updated: 2024-06-14 18:36:36

标题: 扰动样本揭示了语言模型共享的不变性

摘要: 自然语言处理（NLP）研究的快速增长导致了许多新模型的出现，超越了我们对它们与已建立模型之间的比较的理解。这种困难的一个主要原因是饱和的基准，这些基准可能不太能反映模型在实际应用中性能差异。在这项工作中，我们引入了一个新颖的框架，通过揭示两个NLP模型对特定语言能力的可解释输入扰动的共同不变性来比较它们。通过对来自相同和不同架构家族的模型进行实验，该框架提供了关于模型变化（例如，精简、尺寸增加）如何影响语言能力的见解。此外，我们的框架使得可以评估商业黑盒模型（例如InstructGPT家族）与更易理解的模型（例如GPT-2）之间的不变性。在实验中，我们观察到大型语言模型共享许多不变性，这些不变性由各种大小的模型编码，而大型模型的不变性仅由其他大型模型共享。拥有各种不变性可能是大型语言模型最近取得成功的关键，而我们的框架可以阐明新模型中保留或出现的不变性类型。我们将代码公开提供。

更新时间: 2024-06-14 18:36:36

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.04166v2

Mesh Neural Networks for SE(3)-Equivariant Hemodynamics Estimation on the Artery Wall

Computational fluid dynamics (CFD) is a valuable asset for patient-specific cardiovascular-disease diagnosis and prognosis, but its high computational demands hamper its adoption in practice. Machine-learning methods that estimate blood flow in individual patients could accelerate or replace CFD simulation to overcome these limitations. In this work, we consider the estimation of vector-valued quantities on the wall of three-dimensional geometric artery models. We employ group equivariant graph convolution in an end-to-end SE(3)-equivariant neural network that operates directly on triangular surface meshes and makes efficient use of training data. We run experiments on a large dataset of synthetic coronary arteries and find that our method estimates directional wall shear stress (WSS) with an approximation error of 7.6% and normalised mean absolute error (NMAE) of 0.4% while up to two orders of magnitude faster than CFD. Furthermore, we show that our method is powerful enough to accurately predict transient, vector-valued WSS over the cardiac cycle while conditioned on a range of different inflow boundary conditions. These results demonstrate the potential of our proposed method as a plugin replacement for CFD in the personalised prediction of hemodynamic vector and scalar fields.

Updated: 2024-06-14 18:34:21

标题: 网格神经网络用于 SE(3)-等变血液动力学在动脉壁上的估计

摘要: 计算流体动力学（CFD）是一种有价值的资产，用于患者特定的心血管疾病诊断和预测，但其高计算需求阻碍了其在实践中的应用。估计个体患者血流的机器学习方法可以加速或取代CFD模拟，以克服这些限制。在这项工作中，我们考虑在三维几何动脉模型的壁上估计矢量值量。我们在一个端到端的SE(3)-等变神经网络中使用群等变图卷积，该网络直接在三角形表面网格上运行，并有效利用训练数据。我们在一个大型合成冠状动脉数据集上进行实验，并发现我们的方法在速度上比CFD快两个数量级，以7.6%的近似误差和0.4%的归一化平均绝对误差（NMAE）估计方向壁面剪应力（WSS）。此外，我们展示了我们的方法足够强大，可以在一系列不同的入流边界条件下准确预测心脏周期内的瞬态矢量值WSS。这些结果展示了我们提出的方法作为个性化血流动力学矢量和标量场预测中CFD的插件替代品的潜力。

更新时间: 2024-06-14 18:34:21

领域: cs.LG,cs.CV,math.GR,physics.flu-dyn

下载: http://arxiv.org/abs/2212.05023v2

Generalization Error of Graph Neural Networks in the Mean-field Regime

This work provides a theoretical framework for assessing the generalization error of graph neural networks in the over-parameterized regime, where the number of parameters surpasses the quantity of data points. We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Prior to this study, existing bounds on the generalization error in the over-parametrized regime were uninformative, limiting our understanding of over-parameterized network performance. Our novel approach involves deriving upper bounds within the mean-field regime for evaluating the generalization error of these graph neural networks. We establish upper bounds with a convergence rate of $O(1/n)$, where $n$ is the number of graph samples. These upper bounds offer a theoretical assurance of the networks' performance on unseen data in the challenging over-parameterized regime and overall contribute to our understanding of their performance.

Updated: 2024-06-14 18:21:49

标题: 图神经网络在平均场区域的泛化误差

摘要: 这项工作提供了一个理论框架，用于评估图神经网络在过度参数化区域的泛化误差，即参数数量超过数据点数量的情况。我们探讨了两种广泛使用的图神经网络类型：图卷积神经网络和消息传递图神经网络。在这项研究之前，在过度参数化区域中对泛化误差的现有界限并不具有信息量，限制了我们对过度参数化网络性能的理解。我们的创新方法涉及在均场区域内推导出上限界限，用于评估这些图神经网络的泛化误差。我们建立了具有$O(1/n)$收敛速度的上限界限，其中$n$是图样本数量。这些上限界限为网络在具有挑战性的过度参数化区域中的未见数据上的性能提供了理论保证，并总体上有助于我们对它们性能的理解。

更新时间: 2024-06-14 18:21:49

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2402.07025v2

I Still See You: Why Existing IoT Traffic Reshaping Fails

The Internet traffic data produced by the Internet of Things (IoT) devices are collected by Internet Service Providers (ISPs) and device manufacturers, and often shared with their third parties to maintain and enhance user services. Unfortunately, on-path adversaries could infer and fingerprint users' sensitive privacy information such as occupancy and user activities by analyzing these network traffic traces. While there's a growing body of literature on defending against this side-channel attack-malicious IoT traffic analytics (TA), there's currently no systematic method to compare and evaluate the comprehensiveness of these existing studies. To address this problem, we design a new low-cost, open-source system framework-IoT Traffic Exposure Monitoring Toolkit (ITEMTK) that enables people to comprehensively examine and validate prior attack models and their defending approaches. In particular, we also design a novel image-based attack capable of inferring sensitive user information, even when users employ the most robust preventative measures in their smart homes. Researchers could leverage our new image-based attack to systematize and understand the existing literature on IoT traffic analysis attacks and preventing studies. Our results show that current defending approaches are not sufficient to protect IoT device user privacy. IoT devices are significantly vulnerable to our new image-based user privacy inference attacks, posing a grave threat to IoT device user privacy. We also highlight potential future improvements to enhance the defending approaches. ITEMTK's flexibility allows other researchers for easy expansion by integrating new TA attack models and prevention methods to benchmark their future work.

Updated: 2024-06-14 18:11:44

标题: 我依然看到你：为什么现有的物联网流量重塑失败

摘要: 由物联网设备产生的互联网流量数据由互联网服务提供商（ISP）和设备制造商收集，并经常与第三方分享以维护和增强用户服务。不幸的是，在路径上的对手可以通过分析这些网络流量痕迹推断和识别用户的敏感隐私信息，例如占用和用户活动。虽然有越来越多的文献致力于防御这种侧信道攻击-恶意物联网流量分析（TA）攻击，但目前还没有系统的方法来比较和评估这些现有研究的全面性。为了解决这个问题，我们设计了一个新的低成本、开源的系统框架-IoT流量曝光监控工具包（ITEMTK），使人们能够全面检查和验证先前的攻击模型及其防御方法。特别是，我们还设计了一种新颖的基于图像的攻击，能够推断敏感用户信息，即使用户在智能家居中采用了最强大的预防措施。研究人员可以利用我们的新图像攻击来系统化地了解物联网流量分析攻击和预防研究的现有文献。我们的结果显示，目前的防御方法不足以保护物联网设备用户的隐私。物联网设备对我们的新基于图像的用户隐私推断攻击非常脆弱，对物联网设备用户隐私构成严重威胁。我们还强调了潜在的未来改进以增强防御方法。ITEMTK的灵活性使其他研究人员可以通过集成新的TA攻击模型和预防方法来轻松扩展，以评估他们未来的工作。

更新时间: 2024-06-14 18:11:44

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.10358v1

COPAL: Continual Pruning in Large Language Generative Models

Adapting pre-trained large language models to different domains in natural language processing requires two key considerations: high computational demands and model's inability to continual adaptation. To simultaneously address both issues, this paper presents COPAL (COntinual Pruning in Adaptive Language settings), an algorithm developed for pruning large language generative models under a continual model adaptation setting. While avoiding resource-heavy finetuning or retraining, our pruning process is guided by the proposed sensitivity analysis. The sensitivity effectively measures model's ability to withstand perturbations introduced by the new dataset and finds model's weights that are relevant for all encountered datasets. As a result, COPAL allows seamless model adaptation to new domains while enhancing the resource efficiency. Our empirical evaluation on a various size of LLMs show that COPAL outperforms baseline models, demonstrating its efficacy in efficiency and adaptability.

Updated: 2024-06-14 18:06:47

标题: COPAL：大型语言生成模型中的持续修剪

摘要: 将预先训练的大型语言模型适应自然语言处理中不同领域需要考虑两个关键因素：高计算需求和模型无法持续适应。为了同时解决这两个问题，本文提出了COPAL（自适应语言设置中的连续修剪）算法，用于在持续模型适应设置下修剪大型语言生成模型。在避免资源密集型微调或重新训练的情况下，我们的修剪过程是由所提出的敏感性分析指导的。敏感性有效地衡量了模型抵抗新数据集引入的扰动的能力，并找到了对所有遇到的数据集都相关的模型权重。因此，COPAL允许对新领域进行无缝模型适应，同时提高资源效率。我们对各种大小的LLM进行的实证评估表明，COPAL优于基线模型，展示了其在效率和适应性方面的有效性。

更新时间: 2024-06-14 18:06:47

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.02347v2

Transformers are Provably Optimal In-context Estimators for Wireless Communications

Pre-trained transformers exhibit the capability of adapting to new tasks through in-context learning (ICL), where they efficiently utilize a limited set of prompts without explicit model optimization. The canonical communication problem of estimating transmitted symbols from received observations can be modelled as an in-context learning problem: Received observations are essentially a noisy function of transmitted symbols, and this function can be represented by an unknown parameter whose statistics depend on an (also unknown) latent context. This problem, which we term in-context estimation (ICE), has significantly greater complexity than the extensively studied linear regression problem. The optimal solution to the ICE problem is a non-linear function of the underlying context. In this paper, we prove that, for a subclass of such problems, a single layer softmax attention transformer (SAT) computes the optimal solution of the above estimation problem in the limit of large prompt length. We also prove that the optimal configuration of such transformer is indeed the minimizer of the corresponding training loss. Further, we empirically demonstrate the proficiency of multi-layer transformers in efficiently solving broader in-context estimation problems. Through extensive simulations, we show that solving ICE problems using transformers significantly outperforms standard approaches. Moreover, just with a few context examples, it achieves the same performance as an estimator with perfect knowledge of the latent context.

Updated: 2024-06-14 18:05:14

标题: 变压器在无线通信中是经过证明的最佳上下文估计器

摘要: 预训练的transformer表现出适应新任务的能力，通过上下文学习（ICL），它们有效利用有限的提示集，无需显式模型优化。传统的通信问题，即从接收的观测中估计传输的符号，可以被建模为一个上下文学习问题：接收的观测本质上是传输符号的噪声函数，这个函数可以由一个未知参数表示，其统计特性取决于一个（也未知的）潜在上下文。这个问题，我们称之为上下文估计（ICE），比广泛研究的线性回归问题复杂得多。 ICE问题的最优解是基础上下文的非线性函数。在本文中，我们证明了对于这类问题的一个子类，单层softmax注意力变换器（SAT）在大提示长度的极限下计算了上述估计问题的最优解。我们还证明了这种变换器的最优配置确实是对应训练损失的最小化者。此外，我们通过实验证明了多层transformer在高效解决更广泛的上下文估计问题方面的能力。通过大量模拟，我们展示了使用transformer解决ICE问题明显优于标准方法。此外，仅仅通过几个上下文示例，它就能达到与对潜在上下文具有完美知识的估计器相同的性能。

更新时间: 2024-06-14 18:05:14

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2311.00226v3

SigDiffusions: Score-Based Diffusion Models for Long Time Series via Log-Signature Embeddings

Score-based diffusion models have recently emerged as state-of-the-art generative models for a variety of data modalities. Nonetheless, it remains unclear how to adapt these models to generate long multivariate time series. Viewing a time series as the discretization of an underlying continuous process, we introduce SigDiffusion, a novel diffusion model operating on log-signature embeddings of the data. The forward and backward processes gradually perturb and denoise log-signatures preserving their algebraic structure. To recover a signal from its log-signature, we provide new closed-form inversion formulae expressing the coefficients obtained by expanding the signal in a given basis (e.g. Fourier or orthogonal polynomials) as explicit polynomial functions of the log-signature. Finally, we show that combining SigDiffusion with these inversion formulae results in highly realistic time series generation, competitive with the current state-of-the-art on various datasets of synthetic and real-world examples.

Updated: 2024-06-14 18:04:06

标题: SigDiffusions：通过对数签名嵌入的基于得分的长时间序列扩散模型

摘要: 基于分数的扩散模型最近已成为各种数据模态的最先进生成模型。然而，如何调整这些模型以生成长时间多变量时间序列仍不清楚。将时间序列视为基础连续过程的离散化，我们引入了SigDiffusion，一种新颖的扩散模型，其操作是在数据的对数签名嵌入上进行的。正向和反向过程逐渐扰动和去噪对数签名，保留其代数结构。为了从其对数签名中恢复信号，我们提供了新的闭合形式反演公式，表达了在给定基础（例如傅立叶或正交多项式）中展开信号获得的系数作为对数签名的明确多项式函数。最后，我们展示了将SigDiffusion与这些反演公式结合在一起会产生高度逼真的时间序列生成结果，在各种合成和真实数据集上与当前最先进技术相竞争。

更新时间: 2024-06-14 18:04:06

领域: cs.LG

下载: http://arxiv.org/abs/2406.10354v1

The Elephant in the Room: Software and Hardware Security Vulnerabilities of Portable Sequencing Devices

Portable genome sequencing technology is revolutionizing genomic research by providing a faster, more flexible method of sequencing DNA and RNA [1, 2]. The unprecedented shift from bulky stand-alone benchtop equipment confined in a laboratory setting to small portable devices which can be easily carried anywhere outside the laboratory network and connected to untrusted external computers to perform sequencing raises new security and privacy threats not considered before. Current research primarily addresses the privacy of DNA/RNA data in online databases [3] and the security of stand-alone sequencing devices such as Illumina [4]. However, it overlooks the security risks arising from compromises of computer devices directly connected to portable sequencers as illustrated in Fig. 1. While highly sensitive data, such as the human genome, has become easier to sequence, the networks connecting to these smaller devices and the hardware running basecalling can no longer implicitly be trusted, and doing so can deteriorate the confidentiality and integrity of the genomic data being processed. Here, we present new security and privacy threats of portable sequencing technology and recommendations to aid in ensuring sequencing data is kept private and secure. First, to prevent unauthorized access to sequencing devices, IP addresses should not be considered a sufficient authentication mechanism. Second, integrity checks are necessary for all data passed from the sequencer to external computers to avoid data manipulation. Finally, encryption should be considered as data is passed from the sequencer to such external computers to prevent eavesdropping on data as it is sent and stored. As devices and technology rapidly change, it becomes paramount to reevaluate security requirements alongside them or risk leaving some of our most sensitive data exposed.

Updated: 2024-06-14 18:02:01

标题: 房间里的大象：便携式测序设备的软件和硬件安全漏洞

摘要: 便携基因组测序技术正在通过提供更快、更灵活的DNA和RNA测序方法彻底改变基因组研究[1, 2]。从笨重的独立台式设备限制在实验室环境中转变为可以轻松携带到实验室网络之外的任何地方并连接到不受信任的外部计算机进行测序的小型便携设备，引发了以前未考虑的新安全和隐私威胁。目前的研究主要关注在线数据库中DNA/RNA数据的隐私[3]和独立测序设备（如Illumina）的安全[4]，但忽略了直接连接到便携测序仪的计算机设备妥协所引发的安全风险，如图1所示。尽管诸如人类基因组之类的高度敏感数据变得更容易测序，但连接到这些小型设备的网络和运行数据识别的硬件不再能够隐含地受信任，盲目相信这一点会损害正在处理的基因组数据的保密性和完整性。在这里，我们提出了便携测序技术的新安全和隐私威胁，并提出了建议，以确保测序数据保持私密和安全。首先，为防止未经授权访问测序设备，IP地址不应被视为足够的身份验证机制。其次，所有从测序仪传递到外部计算机的数据都需要进行完整性检查，以避免数据操纵。最后，在数据从测序仪传递到外部计算机时应考虑加密，以防止数据在传输和存储过程中被窃听。随着设备和技术的迅速变化，重新评估安全需求变得至关重要，否则可能会使我们一些最敏感的数据暴露在风险之中。

更新时间: 2024-06-14 18:02:01

领域: cs.CR

下载: http://arxiv.org/abs/2407.12001v1

Quantifying Variance in Evaluation Benchmarks

Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about capabilities (or lack thereof) in fully pretrained models, evaluation benchmarks are now also extensively used to decide between various training choices. Despite this widespread usage, we rarely quantify the variance in our evaluation benchmarks, which dictates whether differences in performance are meaningful. Here, we define and measure a range of metrics geared towards measuring variance in evaluation benchmarks, including seed variance across initialisations, and monotonicity during training. By studying a large number of models -- both openly available and pretrained from scratch -- we provide empirical estimates for a variety of variance metrics, with considerations and recommendations for practitioners. We also evaluate the utility and tradeoffs of continuous versus discrete performance measures and explore options for better understanding and reducing this variance. We find that simple changes, such as framing choice tasks (like MMLU) as completion tasks, can often reduce variance for smaller scale ($\sim$7B) models, while more involved methods inspired from human testing literature (such as item analysis and item response theory) struggle to meaningfully reduce variance. Overall, our work provides insights into variance in evaluation benchmarks, suggests LM-specific techniques to reduce variance, and more generally encourages practitioners to carefully factor in variance when comparing models.

Updated: 2024-06-14 17:59:54

标题: 量化评估基准中的方差

摘要: 评估基准是衡量大型语言模型（LLMs）能力的基石，也推动了这些能力的进步。最初设计用于对完全预训练模型的能力（或缺乏能力）进行声明，评估基准现在也广泛用于在各种训练选择之间做出决定。尽管广泛使用，我们很少量化评估基准中的方差，这决定了性能差异是否有意义。在这里，我们定义并测量了一系列旨在衡量评估基准中的方差的指标，包括初始化时的种子方差和训练过程中的单调性。通过研究大量模型--无论是公开可用的还是从头开始预训练的--我们为各种方差指标提供了经验估计，同时考虑了从业者的建议。我们还评估了连续与离散性能度量的效用和权衡，并探讨了更好地理解和减少这种方差的选项。我们发现简单的改变，比如将选择任务（如MMLU）作为完成任务，通常可以降低较小规模（约7B）模型的方差，而受启发于人类测试文献的更复杂的方法（如项目分析和项目反应理论）则很难显著降低方差。总体而言，我们的工作提供了有关评估基准中方差的见解，建议LM特定技术以减少方差，并更一般地鼓励从业者在比较模型时仔细考虑方差。

更新时间: 2024-06-14 17:59:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.10229v1

From Pixels to Prose: A Large Dataset of Dense Image Captions

Training large vision-language models requires extensive, high-quality image-text pairs. Existing web-scraped datasets, however, are noisy and lack detailed image descriptions. To bridge this gap, we introduce PixelProse, a comprehensive dataset of over 16M (million) synthetically generated captions, leveraging cutting-edge vision-language models for detailed and accurate descriptions. To ensure data integrity, we rigorously analyze our dataset for problematic content, including child sexual abuse material (CSAM), personally identifiable information (PII), and toxicity. We also provide valuable metadata such as watermark presence and aesthetic scores, aiding in further dataset filtering. We hope PixelProse will be a valuable resource for future vision-language research. PixelProse is available at https://huggingface.co/datasets/tomg-group-umd/pixelprose

Updated: 2024-06-14 17:59:53

标题: 从像素到散文：一个大规模的密集图像标题数据集

摘要: 培训大型视觉语言模型需要大量高质量的图像文本对。然而，现有的网络抓取数据集存在噪音，并缺乏详细的图像描述。为了弥补这一差距，我们引入了PixelProse，这是一个包含超过1600万（百万）合成生成的标题的综合数据集，利用最先进的视觉语言模型进行详细和准确的描述。为了确保数据完整性，我们对数据集进行了严格的分析，包括检测问题内容，如儿童性虐待材料（CSAM）、个人可识别信息（PII）和有毒内容。我们还提供有价值的元数据，如水印存在和审美评分，有助于进一步过滤数据集。我们希望PixelProse将成为未来视觉语言研究的宝贵资源。PixelProse可在https://huggingface.co/datasets/tomg-group-umd/pixelprose 上获取。

更新时间: 2024-06-14 17:59:53

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.10328v1

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

The swift progress of Multi-modal Large Models (MLLMs) has showcased their impressive ability to tackle tasks blending vision and language. Yet, most current models and benchmarks cater to scenarios with a narrow scope of visual and textual contexts. These models often fall short when faced with complex comprehension tasks, which involve navigating through a plethora of irrelevant and potentially misleading information in both text and image forms. To bridge this gap, we introduce a new, more demanding task known as Interleaved Image-Text Comprehension (IITC). This task challenges models to discern and disregard superfluous elements in both images and text to accurately answer questions and to follow intricate instructions to pinpoint the relevant image. In support of this task, we further craft a new VEGA dataset, tailored for the IITC task on scientific content, and devised a subtask, Image-Text Association (ITA), to refine image-text correlation skills. Our evaluation of four leading closed-source models, as well as various open-source models using VEGA, underscores the rigorous nature of IITC. Even the most advanced models, such as Gemini-1.5-pro and GPT4V, only achieved modest success. By employing a multi-task, multi-scale post-training strategy, we have set a robust baseline for MLLMs on the IITC task, attaining an $85.8\%$ accuracy rate in image association and a $0.508$ Rouge score. These results validate the effectiveness of our dataset in improving MLLMs capabilities for nuanced image-text comprehension.

Updated: 2024-06-14 17:59:40

标题: VEGA：在视觉语言大模型中学习交替的图像-文本理解

摘要: 多模式大型模型（MLLMs）的快速进展展示了它们在处理融合视觉和语言任务方面的令人印象深刻的能力。然而，大多数当前的模型和基准针对具有狭窄视觉和文本背景范围的情景。当面对涉及在文本和图像形式中导航大量无关和潜在误导信息的复杂理解任务时，这些模型通常表现不佳。为了弥合这一差距，我们引入了一个更具挑战性的新任务，称为交错图像-文本理解（IITC）。这个任务挑战模型识别和忽略图像和文本中多余元素，以准确回答问题并遵循复杂的指令以确定相关图像。为了支持这一任务，我们进一步制作了一个新的VEGA数据集，专门为科学内容的IITC任务设计，并设计了一个子任务，图像-文本关联（ITA），以精细化图像-文本相关技能。我们评估了四个领先的闭源模型，以及使用VEGA的各种开源模型，强调了IITC的严格性质。即使是最先进的模型，如Gemini-1.5-pro和GPT4V，也只取得了适度的成功。通过采用多任务、多尺度的后训练策略，我们为MLLMs在IITC任务上设定了一个稳健的基准，实现了85.8%的图像关联准确率和0.508的Rouge评分。这些结果验证了我们的数据集在提高MLLMs对微妙图像-文本理解能力方面的有效性。

更新时间: 2024-06-14 17:59:40

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10228v1

CinePile: A Long Video Question Answering Dataset and Benchmark

Current datasets for long-form video understanding often fall short of providing genuine long-form comprehension challenges, as many tasks derived from these datasets can be successfully tackled by analyzing just one or a few random frames from a video. To address this issue, we present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding. This paper details our innovative approach for creating a question-answer dataset, utilizing advanced LLMs with human-in-the-loop and building upon human-generated raw data. Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects, including temporal comprehension, understanding human-object interactions, and reasoning about events or actions within a scene. Additionally, we evaluate recent video-centric LLMs, both open-source and proprietary, on the test split of our dataset. The findings reveal that even state-of-the-art video-centric LLMs significantly lag behind human performance in these tasks, highlighting the complexity and challenge inherent in video understanding. The dataset is available at https://hf.co/datasets/tomg-group-umd/cinepile

Updated: 2024-06-14 17:59:34

标题: 电影堆：一个长视频问答数据集和基准

摘要: 目前用于长篇视频理解的数据集往往未能提供真正的长篇理解挑战，因为这些数据集衍生出的许多任务可以通过分析视频中的一个或几个随机帧成功解决。为了解决这个问题，我们提出了一个新颖的数据集和基准，名为CinePile，专门设计用于真实的长篇视频理解。本文详细介绍了我们创造一个问答数据集的创新方法，利用先进的LLMs与人类协作，并建立在人类生成的原始数据基础上。我们的全面数据集包括305,000个多项选择题（MCQs），涵盖各种视觉和多模态方面，包括时间理解、理解人物-物体交互以及推理场景中的事件或行为。此外，我们评估了最近的以视频为中心的LLMs，包括开源和专有的，在我们数据集的测试分割上的表现。研究结果显示，即使是最先进的以视频为中心的LLMs在这些任务中也明显落后于人类表现，突显了视频理解中固有的复杂性和挑战。该数据集可在https://hf.co/datasets/tomg-group-umd/cinepile 上获得。

更新时间: 2024-06-14 17:59:34

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2405.08813v2

Analysing Multi-Task Regression via Random Matrix Theory with Application to Time Series Forecasting

In this paper, we introduce a novel theoretical framework for multi-task regression, applying random matrix theory to provide precise performance estimations, under high-dimensional, non-Gaussian data distributions. We formulate a multi-task optimization problem as a regularization technique to enable single-task models to leverage multi-task learning information. We derive a closed-form solution for multi-task optimization in the context of linear models. Our analysis provides valuable insights by linking the multi-task learning performance to various model statistics such as raw data covariances, signal-generating hyperplanes, noise levels, as well as the size and number of datasets. We finally propose a consistent estimation of training and testing errors, thereby offering a robust foundation for hyperparameter optimization in multi-task regression scenarios. Experimental validations on both synthetic and real-world datasets in regression and multivariate time series forecasting demonstrate improvements on univariate models, incorporating our method into the training loss and thus leveraging multivariate information.

Updated: 2024-06-14 17:59:25

标题: 使用随机矩阵理论分析多任务回归及其在时间序列预测中的应用

摘要: 在这篇论文中，我们引入了一个新颖的理论框架，用于多任务回归，在高维、非高斯数据分布下应用随机矩阵理论提供精确的性能估计。我们将多任务优化问题制定为一种正则化技术，使单任务模型能够利用多任务学习信息。我们推导出了在线性模型背景下多任务优化的闭合解。我们的分析通过将多任务学习性能与各种模型统计数据（如原始数据协方差、信号生成的超平面、噪声水平以及数据集的大小和数量等）联系起来，提供了有价值的见解。我们最终提出了训练和测试错误的一致估计，从而为多任务回归场景中的超参数优化提供了坚实的基础。在回归和多元时间序列预测的合成和真实数据集上进行的实验证实显示出将我们的方法整合到训练损失中并利用多元信息的单变量模型的改进。

更新时间: 2024-06-14 17:59:25

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.10327v1

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software (e.g., Adobe Photoshop or Stable Diffusion WebUI) and complex activities (e.g., video editing). VideoGUI evaluates GUI assistants through a hierarchical process, allowing for identification of the specific levels at which they may fail: (i) high-level planning: reconstruct procedural subtasks from visual conditions without language descriptions; (ii) middle-level planning: generate sequences of precise action narrations based on visual state (i.e., screenshot) and goals; (iii) atomic action execution: perform specific actions such as accurately clicking designated elements. For each level, we design evaluation metrics across individual dimensions to provide clear signals, such as individual performance in clicking, dragging, typing, and scrolling for atomic action execution. Our evaluation on VideoGUI reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks, especially for high-level planning.

Updated: 2024-06-14 17:59:08

标题: VideoGUI：基于教学视频的GUI自动化基准

摘要: 图形用户界面（GUI）自动化在提高人类生产力方面具有重要的潜力，可以帮助完成计算机任务。现有的任务制定主要集中在可以通过单一的语言指令来指定的简单任务上，例如“插入新幻灯片”。在这项工作中，我们引入了VideoGUI，这是一个新颖的多模态基准，旨在评估GUI助手在以视觉为中心的GUI任务上的表现。我们的基准源自高质量的网络教学视频，重点关注涉及专业和新颖软件（如Adobe Photoshop或Stable Diffusion WebUI）以及复杂活动（如视频编辑）的任务。VideoGUI通过分层过程评估GUI助手，允许识别它们可能失败的具体级别：（i）高级规划：根据视觉条件重新构建程序子任务，而无需语言描述；（ii）中级规划：基于视觉状态（即截图）和目标生成精确动作叙述的序列；（iii）原子动作执行：执行诸如准确点击指定元素之类的特定动作。对于每个级别，我们设计了跨个体维度的评估指标，以提供清晰的信号，例如原子动作执行中点击、拖动、输入和滚动的个体表现。我们在VideoGUI上的评估显示，即使是SoTA大型多模态模型GPT4o在视觉为中心的GUI任务上表现不佳，尤其是在高级规划方面。

更新时间: 2024-06-14 17:59:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10227v1

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech Comprehension Test Question-Answer (SQA) pairs from speech transcriptions for supervised instruction tuning. With under 30 million trainable parameters and only 450 hours of English speech data, COSMIC demonstrates emerging capabilities in instruction-following and in-context learning. Equipped with such capabilities, COSMIC achieves a maximum 33.18 BLEU score in 0-shot EN-to-X speech to text translation (S2TT) and a significant boost in the 1-shot setting. Additionally, there is an average 25.8\% relative Word Error Rate (WER) reduction for 1-shot cross-domain adaptation. COSMIC exhibits a significant automatic speech recognition (ASR) accuracy gain in contextual biasing tasks due to its instruction-following capability.

Updated: 2024-06-14 17:57:13

标题: 宇宙：面向语音上下文学习的数据高效指令调整

摘要: 我们提出了一种成本效益的方法，将语音集成到大型语言模型（LLM）中，从而形成具有指令跟随/上下文学习能力的上下文语音模型（COSMIC）多模态LLM。利用GPT-3.5，我们从语音转录中生成了语音理解测试问题-答案（SQA）对，用于监督指令调整。拥有不到3000万可训练参数和仅450小时的英语语音数据，COSMIC展示了在遵循指令和上下文学习方面的新兴能力。具备这些能力，COSMIC在0-shot EN-to-X语音到文本转换（S2TT）中实现了最大33.18 BLEU分数，并在1-shot设置中获得了显著提升。此外，1-shot跨领域适应中的平均词错误率（WER）相对减少了25.8\%。由于其遵循指令的能力，COSMIC在上下文偏见任务中展示出了显著的自动语音识别（ASR）准确性提升。

更新时间: 2024-06-14 17:57:13

领域: cs.CL,cs.AI,eess.AS

下载: http://arxiv.org/abs/2311.02248v2

Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment

Toxicity classification for voice heavily relies on the semantic content of speech. We propose a novel framework that utilizes cross-modal learning to integrate the semantic embedding of text into a multilabel speech toxicity classifier during training. This enables us to incorporate textual information during training while still requiring only audio during inference. We evaluate this classifier on large-scale datasets with real-world characteristics to validate the effectiveness of this framework. Through ablation studies, we demonstrate that general-purpose semantic text embeddings are rich and aligned with speech for toxicity classification purposes. Conducting experiments across multiple languages at scale, we show improvements in voice toxicity classification across five languages and different toxicity categories.

Updated: 2024-06-14 17:56:53

标题: 通过语音文本对齐增强多语言语音毒性检测

摘要: 语音毒性的毒性分类在很大程度上依赖于语音的语义内容。我们提出了一个新颖的框架，利用跨模态学习将文本的语义嵌入集成到多标签语音毒性分类器中进行训练。这使我们能够在训练过程中结合文本信息，同时在推断过程中仅需要音频。我们在具有真实特征的大型数据集上评估了这个分类器，以验证该框架的有效性。通过削减研究，我们展示了通用语义文本嵌入对于语音毒性分类目的而言是丰富且与语音保持一致的。通过在多种语言上进行规模化实验，我们展示了在五种语言和不同毒性类别中的语音毒性分类的改进。

更新时间: 2024-06-14 17:56:53

领域: cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.10325v1

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

We introduce DiffuseST, a low-latency, direct speech-to-speech translation system capable of preserving the input speaker's voice zero-shot while translating from multiple source languages into English. We experiment with the synthesizer component of the architecture, comparing a Tacotron-based synthesizer to a novel diffusion-based synthesizer. We find the diffusion-based synthesizer to improve MOS and PESQ audio quality metrics by 23\% each and speaker similarity by 5\% while maintaining comparable BLEU scores. Despite having more than double the parameter count, the diffusion synthesizer has lower latency, allowing the entire model to run more than 5$\times$ faster than real-time.

Updated: 2024-06-14 17:55:55

标题: 扩散合成器用于高效多语言语音翻译

摘要: 我们介绍了DiffuseST，一个低延迟的直接语音到语音翻译系统，能够在将多种源语言翻译成英语时保留输入说话者的声音特征。我们对系统的合成器组件进行了实验，比较了基于Tacotron的合成器和一种新颖的基于扩散的合成器。我们发现基于扩散的合成器能够将MOS和PESQ音频质量指标分别提升23\%，并将说话者相似度提高5\%，同时保持可比较的BLEU分数。尽管参数数量超过两倍，扩散合成器具有更低的延迟，使整个模型运行速度超过实时速度5倍以上。

更新时间: 2024-06-14 17:55:55

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.10223v1

Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding

Recent advances in vision-language models have significantly propelled video understanding. Existing datasets and tasks, however, have notable limitations. Most datasets are confined to short videos with limited events and narrow narratives. For example, datasets with instructional and egocentric videos often document the activities of one person in a single scene. Although some movie datasets offer richer content, they are often limited to short-term tasks, lack publicly available videos and frequently encounter data leakage given the use of movie forums and other resources in LLM training. To address the above limitations, we propose the Short Film Dataset (SFD) with 1,078 publicly available amateur movies, a wide variety of genres and minimal data leakage issues. SFD offers long-term story-oriented video tasks in the form of multiple-choice and open-ended question answering. Our extensive experiments emphasize the need for long-term reasoning to solve SFD tasks. Notably, we find strong signals in movie transcripts leading to the on-par performance of people and LLMs. We also show significantly lower performance of current models compared to people when using vision data alone.

Updated: 2024-06-14 17:54:54

标题: 短片数据集（SFD）：故事级视频理解基准

摘要: 最近在视觉语言模型方面取得的进展显著推动了视频理解。然而，现有数据集和任务存在显著的局限性。大多数数据集局限于短视频，事件有限，叙事狭窄。例如，包含指导性和自我中心视频的数据集通常记录单个场景中一个人的活动。虽然一些电影数据集提供了更丰富的内容，但通常限于短期任务，缺乏公开可用视频，并经常遇到数据泄漏问题，因为在LLM培训中使用了电影论坛和其他资源。为了解决上述局限性，我们提出了包含1,078部公开可用的业余电影、各种类型和最小数据泄漏问题的短片数据集（SFD）。SFD提供以多项选择和开放性问题回答形式的长期叙事导向视频任务。我们的广泛实验强调了需要长期推理来解决SFD任务。值得注意的是，我们发现电影剧本中存在强烈信号，导致人类和LLMs表现相当。我们还展示了当前模型在仅使用视觉数据时与人类相比表现明显较低。

更新时间: 2024-06-14 17:54:54

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10221v1

Semantic Membership Inference Attack against Large Language Models

Membership Inference Attacks (MIAs) determine whether a specific data point was included in the training set of a target model. In this paper, we introduce the Semantic Membership Inference Attack (SMIA), a novel approach that enhances MIA performance by leveraging the semantic content of inputs and their perturbations. SMIA trains a neural network to analyze the target model's behavior on perturbed inputs, effectively capturing variations in output probability distributions between members and non-members. We conduct comprehensive evaluations on the Pythia and GPT-Neo model families using the Wikipedia dataset. Our results show that SMIA significantly outperforms existing MIAs; for instance, SMIA achieves an AUC-ROC of 67.39% on Pythia-12B, compared to 58.90% by the second-best attack.

Updated: 2024-06-14 17:53:50

标题: 大型语言模型的语义成员推断攻击

摘要: 成员推断攻击(MIAs)确定特定数据点是否包含在目标模型的训练集中。在本文中，我们介绍了语义成员推断攻击(SMIA)，这是一种通过利用输入及其扰动的语义内容来增强MIA性能的新方法。SMIA训练一个神经网络来分析目标模型对扰动输入的行为，有效地捕捉成员和非成员之间输出概率分布的变化。我们使用维基百科数据集对Pythia和GPT-Neo模型系列进行了全面评估。我们的结果表明，SMIA明显优于现有的MIAs；例如，与第二好的攻击相比，SMIA在Pythia-12B上实现了67.39%的AUC-ROC，而第二好的攻击仅为58.90%。

更新时间: 2024-06-14 17:53:50

领域: cs.LG

下载: http://arxiv.org/abs/2406.10218v1

L4GM: Large 4D Gaussian Reconstruction Model

We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.

Updated: 2024-06-14 17:51:18

标题: L4GM：大规模4D高斯重建模型

摘要: 我们提出了L4GM，这是第一个能够从单视角视频输入中产生动画对象的4D大型重建模型 - 仅通过一次前向传递，仅需一秒钟。我们成功的关键在于一个新颖的数据集，包含了来自Objaverse的经过筛选和渲染的动画对象的多视角视频。该数据集展示了44,000个不同的对象，具有48个视角的110,000个动画，总共产生了12,000,000个视频，共计3亿帧。为了实现可扩展性，我们保持了L4GM的简单性，并直接在LGM之上构建，LGM是一个预训练的3D大型重建模型，从多视角图像输入中输出3D高斯椭球体。L4GM从以低fps采样的视频帧中输出每帧的3D高斯Splatting表示，然后将表示上采样到更高的fps以实现时间上的平滑。我们在基础LGM中添加了时间自注意力层，帮助其学习时间上的一致性，并利用每个时间步的多视角渲染损失来训练模型。通过训练一个插值模型来将表示上采样到更高的帧速率，产生中间的3D高斯表示。我们展示了只在合成数据上训练的L4GM在野外视频上具有极好的泛化能力，产生高质量的动画3D资产。

更新时间: 2024-06-14 17:51:18

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.10324v1

Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

Reward models trained on human preference data have been proven to be effective for aligning Large Language Models (LLMs) with human intent within the reinforcement learning from human feedback (RLHF) framework. However, the generalization capabilities of current reward models to unseen prompts and responses are limited. This limitation can lead to an unexpected phenomenon known as reward over-optimization, where excessive optimization of rewards results in a decline in actual performance. While previous research has advocated for constraining policy optimization, our study proposes a novel approach to enhance the reward model's generalization ability against distribution shifts by regularizing the hidden states. Specifically, we retain the base model's language model head and incorporate a suite of text-generation losses to preserve the hidden states' text generation capabilities, while concurrently learning a reward head behind the same hidden states. Our experimental results demonstrate that the introduced regularization technique markedly improves the accuracy of learned reward models across a variety of out-of-distribution (OOD) tasks and effectively alleviate the over-optimization issue in RLHF, offering a more reliable and robust preference learning paradigm.

Updated: 2024-06-14 17:49:59

标题: 规范隐藏状态使得学习LLMs的可泛化奖励模型

摘要: 基于人类偏好数据训练的奖励模型已被证明对于在人类反馈强化学习框架中使大型语言模型（LLMs）与人类意图保持一致是有效的。然而，当前奖励模型对于未见提示和响应的泛化能力有限。这种限制可能导致一种称为奖励过度优化的意外现象，即奖励的过度优化导致实际性能下降。虽然先前的研究主张限制策略优化，但我们的研究提出了一种新颖的方法，通过对隐藏状态进行正则化来增强奖励模型对分布偏移的泛化能力。具体而言，我们保留基础模型的语言模型头部，并结合一系列文本生成损失来保留隐藏状态的文本生成能力，同时在相同的隐藏状态后面学习一个奖励头部。我们的实验结果表明，引入的正则化技术显著提高了学习的奖励模型在各种分布之外（OOD）任务中的准确性，并有效缓解了RLHF中的过度优化问题，提供了一种更可靠和强大的偏好学习范式。

更新时间: 2024-06-14 17:49:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10216v1

DevBench: A multimodal developmental benchmark for language learning

How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and without direct comparison to behavioral data. We introduce DevBench, a multimodal benchmark comprising seven language evaluation tasks spanning the domains of lexical, syntactic, and semantic ability, with behavioral data from both children and adults. We evaluate a set of vision-language models on these tasks, comparing models and humans not only on accuracy but on their response patterns. Across tasks, models exhibit variation in their closeness to human response patterns, and models that perform better on a task also more closely resemble human behavioral responses. We also examine the developmental trajectory of OpenCLIP over training, finding that greater training results in closer approximations to adult response patterns. DevBench thus provides a benchmark for comparing models to human language development. These comparisons highlight ways in which model and human language learning processes diverge, providing insight into entry points for improving language models.

Updated: 2024-06-14 17:49:41

标题: DevBench: 一个用于语言学习的多模态发展基准

摘要: 这项研究探讨了视觉-语言模型和儿童学习轨迹之间的相似性和差异性。最近的建模工作试图通过构建在更少数据上训练的模型，尤其是多模式自然数据，来理解模型和人类数据效率之间的差距。然而，这些模型经常在成人水平的基准上进行评估，测试的语言能力范围有限，并且没有直接与行为数据进行比较。我们引入了DevBench，一个包含七项语言评估任务的多模式基准，涵盖了词汇、句法和语义能力领域，同时包括来自儿童和成人的行为数据。我们在这些任务上评估了一组视觉-语言模型，在准确性以及其响应模式上比较模型和人类。在各项任务中，模型在与人类响应模式的接近程度上表现出差异，表现更好的模型也更接近人类的行为响应。我们还研究了OpenCLIP在训练过程中的发展轨迹，发现更多的训练导致对成人响应模式的更接近逼真。因此，DevBench为比较模型与人类语言发展提供了一个基准。这些比较突显了模型和人类语言学习过程分歧的方面，为改进语言模型提供了洞察力。

更新时间: 2024-06-14 17:49:41

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.10215v1

Universal randomised signatures for generative time series modelling

Randomised signature has been proposed as a flexible and easily implementable alternative to the well-established path signature. In this article, we employ randomised signature to introduce a generative model for financial time series data in the spirit of reservoir computing. Specifically, we propose a novel Wasserstein-type distance based on discrete-time randomised signatures. This metric on the space of probability measures captures the distance between (conditional) distributions. Its use is justified by our novel universal approximation results for randomised signatures on the space of continuous functions taking the underlying path as an input. We then use our metric as the loss function in a non-adversarial generator model for synthetic time series data based on a reservoir neural stochastic differential equation. We compare the results of our model to benchmarks from the existing literature.

Updated: 2024-06-14 17:49:29

标题: 通用随机签名用于生成式时间序列建模

摘要: 随机签名被提议作为一个灵活且易于实施的替代方案，以取代已经建立的路径签名。在本文中，我们利用随机签名引入了一种基于水库计算精神的金融时间序列数据生成模型。具体来说，我们提出了一种基于离散时间随机签名的新型Wasserstein类型距离。这个度量在概率测度空间中捕捉了（条件）分布之间的距离。我们的度量的使用得到了我们在连续函数空间上对随机签名的新型通用逼近结果的证明，其中底层路径作为输入。然后，我们将我们的度量作为损失函数应用在一个基于水库神经随机微分方程的合成时间序列数据的非对抗生成器模型中。我们将我们模型的结果与现有文献中的基准进行比较。

更新时间: 2024-06-14 17:49:29

领域: cs.LG,q-fin.MF,stat.ML

下载: http://arxiv.org/abs/2406.10214v1

Selecting Interpretability Techniques for Healthcare Machine Learning models

In healthcare there is a pursuit for employing interpretable algorithms to assist healthcare professionals in several decision scenarios. Following the Predictive, Descriptive and Relevant (PDR) framework, the definition of interpretable machine learning as a machine-learning model that explicitly and in a simple frame determines relationships either contained in data or learned by the model that are relevant for its functioning and the categorization of models by post-hoc, acquiring interpretability after training, or model-based, being intrinsically embedded in the algorithm design. We overview a selection of eight algorithms, both post-hoc and model-based, that can be used for such purposes.

Updated: 2024-06-14 17:49:04

标题: 选择可解释性技术用于医疗机器学习模型

摘要: 在医疗保健领域，人们追求使用可解释的算法来帮助医疗专业人员在多种决策场景中。遵循预测、描述和相关（PDR）框架，可解释的机器学习被定义为一种明确且简单地确定数据中包含的关系或模型学习到的关系对其功能至关重要的机器学习模型，并将模型分为事后获得可解释性的后期和基于模型的两类，后者在算法设计中内在地嵌入了可解释性。我们概述了可用于此类目的选择的八种算法，既包括后期也包括基于模型的算法。

更新时间: 2024-06-14 17:49:04

领域: cs.LG

下载: http://arxiv.org/abs/2406.10213v1

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Despite the unprecedented success of text-to-image diffusion models, controlling the number of depicted objects using text is surprisingly hard. This is important for various applications from technical documents, to children's books to illustrating cooking recipes. Generating object-correct counts is fundamentally challenging because the generative model needs to keep a sense of separate identity for every instance of the object, even if several objects look identical or overlap, and then carry out a global computation implicitly during generation. It is still unknown if such representations exist. To address count-correct generation, we first identify features within the diffusion model that can carry the object identity information. We then use them to separate and count instances of objects during the denoising process and detect over-generation and under-generation. We fix the latter by training a model that predicts both the shape and location of a missing object, based on the layout of existing ones, and show how it can be used to guide denoising with correct object count. Our approach, CountGen, does not depend on external source to determine object layout, but rather uses the prior from the diffusion model itself, creating prompt-dependent and seed-dependent layouts. Evaluated on two benchmark datasets, we find that CountGen strongly outperforms the count-accuracy of existing baselines.

Updated: 2024-06-14 17:46:08

标题: 让它有意义：具有准确数量对象的文本到图像生成

摘要: 尽管文本到图像扩散模型取得了前所未有的成功，但使用文本控制所描绘对象的数量却令人惊讶地困难。这对于各种应用都很重要，从技术文档到儿童书籍再到烹饪食谱的插图。生成对象正确计数在根本上是具有挑战性的，因为生成模型需要保持每个对象实例的独立身份感，即使几个对象看起来相同或重叠，并在生成过程中隐含地进行全局计算。目前尚不清楚是否存在这样的表示。为了解决正确计数生成的问题，我们首先识别扩散模型中可以携带对象身份信息的特征。然后在去噪过程中使用这些特征来分离和计数对象实例，并检测过度生成和不足生成。我们通过训练一个模型，根据现有对象的布局预测缺失对象的形状和位置，并展示如何使用它来引导具有正确对象计数的去噪过程来修复后者。我们的方法CountGen不依赖于外部来源来确定对象布局，而是使用扩散模型本身的先验，创建了依赖于提示和种子的布局。在两个基准数据集上评估，我们发现CountGen的计数准确性明显优于现有基线。

更新时间: 2024-06-14 17:46:08

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2406.10210v1

CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.

Updated: 2024-06-14 17:46:02

标题: CausalChaos! 数据集：基于动态视觉场景的长因果链全面因果行为问答

摘要: 因果视频问答（QA）引起了越来越多的关注，然而现有数据集往往缺乏因果推理的深度。为了填补这一空白，我们利用卡通的独特特性构建了CausalChaos!，这是一个新颖而具有挑战性的因果为何问答数据集，基于标志性的“汤姆和杰瑞”卡通系列。卡通使用动画原理，允许动画师在事件之间创建表达丰富、明确的因果关系，形成连贯的故事情节。利用这些特性，结合发人深省的问题和多层答案（答案和详细因果解释），我们的问题涉及将多个动态互动之间的因果链相互连接的角色和视觉场景。这些因素要求模型解决更具挑战性但定义明确的因果关系。我们还引入了硬错误答案挖掘，包括更具挑战性的因果混乱版本。虽然模型表现良好，但还有很大的改进空间，特别是在开放式答案方面。我们确定了更先进/明确的因果关系建模和视觉与语言的联合建模作为未来努力的重点领域。与其他互补数据集一起，我们的新挑战性数据集将为该领域的发展铺平道路。

更新时间: 2024-06-14 17:46:02

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.01299v2

LieRE: Generalizing Rotary Position Encodings

While Rotary Position Embeddings (RoPE) for natural language performs well and has become widely adopted, its adoption for other modalities has been slower. Here, we introduce Lie group Relative position Encodings (LieRE) that goes beyond RoPE in supporting higher dimensional inputs. We evaluate the performance of LieRE on 2D and 3D image classification tasks and observe that LieRE leads to marked improvements in performance (up to 6%), training efficiency (3.5x reduction), data efficiency (30%) compared to the baselines of RoFormer, DeiT III, RoPE-Mixed and Vision-Llama

Updated: 2024-06-14 17:41:55

标题: LieRE：广义旋转位置编码

摘要: 虽然用于自然语言的旋转位置嵌入（RoPE）表现良好并得到广泛应用，但其在其他模态下的应用进展较慢。在这里，我们介绍了超越RoPE的Lie群相对位置编码（LieRE），支持更高维度的输入。我们评估了LieRE在2D和3D图像分类任务上的表现，并观察到LieRE在性能（高达6%）、训练效率（3.5倍减少）和数据效率（30%）方面相较于RoFormer、DeiT III、RoPE-Mixed和Vision-Llama的基准表现有显著提升。

更新时间: 2024-06-14 17:41:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.10322v1

Maestro: Uncovering Low-Rank Structures via Trainable Decomposition

Deep Neural Networks (DNNs) have been a large driver for AI breakthroughs in recent years. However, these models have been getting increasingly large as they become more accurate and safe. This means that their training becomes increasingly costly and time-consuming and typically yields a single model to fit all targets. Various techniques have been proposed in the literature to mitigate this, including pruning, sparsification, or quantization of model weights and updates. While achieving high compression rates, they often incur significant computational overheads at training or lead to non-negligible accuracy penalty. Alternatively, factorization methods have been leveraged for low-rank compression of DNNs. Similarly, such techniques (e.g., SVD) frequently rely on heavy iterative decompositions of layers and are potentially sub-optimal for non-linear models, such as DNNs. We take a further step in designing efficient low-rank models and propose Maestro, a framework for trainable low-rank layers. Instead of iteratively applying a priori decompositions, the low-rank structure is baked into the training process through LoD, a low-rank ordered decomposition. Not only is this the first time importance ordering via sampling is applied on the decomposed DNN structure, but it also allows selecting ranks at a layer granularity. Our theoretical analysis demonstrates that in special cases LoD recovers the SVD decomposition and PCA. Applied to DNNs, Maestro enables the extraction of lower footprint models that preserve performance. Simultaneously, it enables the graceful trade-off between accuracy-latency for deployment to even more constrained devices without retraining.

Updated: 2024-06-14 17:40:29

标题: Maestro: 通过可训练分解揭示低秩结构

摘要: 深度神经网络（DNNs）近年来一直是人工智能突破的重要推动力。然而，随着这些模型变得越来越大以提高准确性和安全性，它们的训练成本和时间也越来越高，通常只能生成一个适用于所有目标的模型。文献中提出了各种技术来缓解这一问题，包括剪枝、稀疏化或量化模型权重和更新。虽然这些方法可以实现高压缩率，但通常会在训练过程中产生显著的计算开销，或导致不可忽视的准确度损失。相反，因子分解方法已被利用来对DNN进行低秩压缩。同样，这些技术（如奇异值分解）经常依赖于对层进行繁重的迭代分解，并且对于非线性模型（如DNNs）可能不是最佳选择。我们在设计高效低秩模型方面迈出了进一步的一步，并提出了Maestro，一个用于可训练低秩层的框架。与事先分解不同，低秩结构通过LoD（低秩有序分解）嵌入到训练过程中。这不仅是首次通过采样应用于分解后的DNN结构中的重要性排序，而且还允许在层粒度上选择秩。我们的理论分析表明，在特定情况下，LoD可以恢复奇异值分解和主成分分析。应用于DNNs，Maestro使得能够提取保持性能的较小模型。同时，它还可以在不重新训练的情况下实现准确性和延迟之间的优雅权衡，以部署到更受限制的设备上。

更新时间: 2024-06-14 17:40:29

领域: cs.LG

下载: http://arxiv.org/abs/2308.14929v2

Training from Zero: Radio Frequency Machine Learning Data Quantity Forecasting

The data used during training in any given application space is directly tied to the performance of the system once deployed. While there are many other factors that go into producing high performance models within machine learning, there is no doubt that the data used to train a system provides the foundation from which to build. One of the underlying rule of thumb heuristics used within the machine learning space is that more data leads to better models, but there is no easy answer for the question, "How much data is needed?" This work examines a modulation classification problem in the Radio Frequency domain space, attempting to answer the question of how much training data is required to achieve a desired level of performance, but the procedure readily applies to classification problems across modalities. The ultimate goal is determining an approach that requires the least amount of data collection to better inform a more thorough collection effort to achieve the desired performance metric. While this approach will require an initial dataset that is germane to the problem space to act as a \textit{target} dataset on which metrics are extracted, the goal is to allow for the initial data to be orders of magnitude smaller than what is required for delivering a system that achieves the desired performance. An additional benefit of the techniques presented here is that the quality of different datasets can be numerically evaluated and tied together with the quantity of data, and ultimately, the performance of the architecture in the problem domain.

Updated: 2024-06-14 17:33:28

标题: 从零开始训练：射频机器学习数据量预测

摘要: 在任何给定应用空间中训练时使用的数据直接影响系统在部署后的性能。虽然在机器学习中有许多其他因素可以产生高性能模型，但毫无疑问，用于训练系统的数据提供了建立基础的基础。在机器学习领域使用的一个基本的经验法则是，更多的数据会导致更好的模型，但对于“需要多少数据”的问题并没有简单的答案。这项工作探讨了射频领域中的调制分类问题，试图回答需要多少训练数据才能实现所需性能水平的问题，但该过程也适用于不同模态的分类问题。最终目标是确定一种方法，该方法需要收集最少量的数据，以更好地指导更全面的数据收集工作，以实现所需的性能指标。虽然这种方法需要一个与问题空间相关的初始数据集作为\textit{目标}数据集，从中提取度量标准，但目标是允许初始数据量比为实现期望性能所需的数据量小几个数量级。这里提出的技术的另一个好处是可以通过数量化评估不同数据集的质量，并将其与数据量以及最终问题域中架构的性能联系在一起。

更新时间: 2024-06-14 17:33:28

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2205.03703v2

SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation

Polyps are early cancer indicators, so assessing occurrences of polyps and their removal is critical. They are observed through a colonoscopy screening procedure that generates a stream of video frames. Segmenting polyps in their natural video screening procedure has several challenges, such as the co-existence of imaging artefacts, motion blur, and floating debris. Most existing polyp segmentation algorithms are developed on curated still image datasets that do not represent real-world colonoscopy. Their performance often degrades on video data. We propose a video polyp segmentation method that performs self-supervised learning as an auxiliary task and a spatial-temporal self-attention mechanism for improved representation learning. Our end-to-end configuration and joint optimisation of losses enable the network to learn more discriminative contextual features in videos. Our experimental results demonstrate an improvement with respect to several state-of-the-art (SOTA) methods. Our ablation study also confirms that the choice of the proposed joint end-to-end training improves network accuracy by over 3% and nearly 10% on both the Dice similarity coefficient and intersection-over-union compared to the recently proposed method PNS+ and Polyp-PVT, respectively. Results on previously unseen video data indicate that the proposed method generalises.

Updated: 2024-06-14 17:33:11

标题: SSTFB：利用自监督前提学习和时间自注意力与特征分支实现实时视频息肉分割

摘要: 息息相关的结肠息肉是早期癌症指标，因此评估息肉的发生和清除至关重要。它们通过结肠镜筛查程序观察到，该程序生成一系列视频帧。在自然视频筛查程序中分割息肉存在几个挑战，如成像伪影、运动模糊和漂浮碎片的共存。大多数现有的息肉分割算法是在筛选过的静态图像数据集上开发的，这些数据集无法代表真实世界的结肠镜检查。它们的性能通常在视频数据上下降。我们提出了一种视频息肉分割方法，该方法执行自监督学习作为辅助任务，并使用空间-时间自注意机制来改进表示学习。我们的端到端配置和损失的联合优化使网络能够在视频中学习更具有区分性的上下文特征。我们的实验结果表明，在几种最先进的方法方面有所改进。我们的消融研究也证实，所提出的联合端到端训练选择将网络准确性提高了超过3％，与最近提出的方法PNS+和Polyp-PVT相比，Dice相似系数和交集联合方面提高了近10％。对先前未见的视频数据的结果表明，所提出的方法具有泛化能力。

更新时间: 2024-06-14 17:33:11

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.10200v1

Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain

Chronic pain significantly diminishes the quality of life for millions worldwide. While psychoeducation and therapy can improve pain outcomes, many individuals experiencing pain lack access to evidence-based treatments or fail to complete the necessary number of sessions to achieve benefit. Reinforcement learning (RL) shows potential in tailoring personalized pain management interventions according to patients' individual needs while ensuring the efficient use of scarce clinical resources. However, clinicians, patients, and healthcare decision-makers are concerned that RL solutions could exacerbate disparities associated with patient characteristics like race or gender. In this article, we study gender fairness in personalized pain care recommendations using a real-world application of reinforcement learning (Piette et al., 2022a). Here, adhering to gender fairness translates to minimal or no disparity in the utility received by subpopulations as defined by gender. We investigate whether the selection of relevant patient information (referred to as features) used to assist decision-making affects gender fairness. Our experiments, conducted using real-world data Piette, 2022), indicate that included features can impact gender fairness. Moreover, we propose an RL solution, NestedRecommendation, that demonstrates the ability: i) to adaptively learn to select the features that optimize for utility and fairness, and ii) to accelerate feature selection and in turn, improve pain care recommendations from early on, by leveraging clinicians' domain expertise.

Updated: 2024-06-14 17:32:32

标题: 调查机器学习驱动的个性化慢性疼痛护理中的性别公平性

摘要: 长期疼痛显著降低了全球数百万人的生活质量。尽管心理教育和治疗可以改善疼痛结果，但许多经历疼痛的个体缺乏获得基于证据的治疗或未能完成必要次数的会话以获益。强化学习（RL）在根据患者个人需求定制个性化疼痛管理干预方面显示潜力，同时确保稀缺临床资源的高效利用。然而，临床医生、患者和医疗决策者担心RL解决方案可能加剧与患者特征（如种族或性别）相关的不平等。在本文中，我们利用强化学习的现实应用研究了个性化疼痛护理建议中的性别公平性。在此，遵循性别公平意味着在根据性别定义的亚人群中获得的效用最小或没有不平等。我们调查了用于辅助决策制定的相关患者信息（称为特征）的选择是否影响性别公平。我们的实验证明，包括的特征可以影响性别公平。此外，我们提出了一个RL解决方案NestedRecommendation，展示了以下能力：i）自适应地学习选择最优化效用和公平性的特征，ii）通过利用临床医生的领域专业知识，加速特征选择，从而从早期开始改善疼痛护理建议。

更新时间: 2024-06-14 17:32:32

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2402.19226v3

Crafting Parts for Expressive Object Composition

Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative processes the artists need control over various parts of the images being generated. We find that just adding details about parts in the base text prompt either leads to an entirely different image (e.g., missing/incorrect identity) or the extra part details simply being ignored. To mitigate these issues, we introduce PartCraft, which enables image generation based on fine-grained part-level details specified for objects in the base text prompt. This allows more control for artists and enables novel object compositions by combining distinctive object parts. PartCraft first localizes object parts by denoising the object region from a specific diffusion process. This enables each part token to be localized to the right object region. After obtaining part masks, we run a localized diffusion process in each of the part regions based on fine-grained part descriptions and combine them to produce the final image. All the stages of PartCraft are based on repurposing a pre-trained diffusion model, which enables it to generalize across various domains without training. We demonstrate the effectiveness of part-level control provided by PartCraft qualitatively through visual examples and quantitatively in comparison to the contemporary baselines.

Updated: 2024-06-14 17:31:29

标题: 为表达性对象组合制作部件

摘要: 从大型生成模型（如Stable Diffusion、DALLE-2等）生成文本到图像已经成为各种任务的常见基础，因为它们具有优越的质量和广泛的知识库。由于图像的构成和生成是创造性的过程，艺术家需要控制所生成图像的各个部分。我们发现，仅在基础文本提示中添加关于部分的细节要么导致完全不同的图像（例如，缺失/错误的身份），要么额外的部分细节被简单地忽略。为了缓解这些问题，我们引入了PartCraft，它可以基于基础文本提示中指定的细粒度部分级别细节来生成图像。这为艺术家提供了更多的控制，并通过组合独特的对象部分实现了新颖的对象构成。PartCraft首先通过从特定扩散过程中去噪对象区域来定位对象部分。这使得每个部分令牌可以被定位到正确的对象区域。在获得部分掩模之后，我们在每个部分区域中运行局部扩散过程，基于细粒度的部分描述并将它们组合以生成最终图像。PartCraft的所有阶段都基于重新利用预训练的扩散模型，这使得它能够在不经过训练的情况下跨越各种领域进行泛化。我们通过视觉示例在质量上展示了PartCraft提供的部分级别控制的有效性，并与当代基线进行了定量比较。

更新时间: 2024-06-14 17:31:29

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.10197v1

TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners

Travel planning is a complex task that involves generating a sequence of actions related to visiting places subject to constraints and maximizing some user satisfaction criteria. Traditional approaches rely on problem formulation in a given formal language, extracting relevant travel information from web sources, and use an adequate problem solver to generate a valid solution. As an alternative, recent Large Language Model (LLM) based approaches directly output plans from user requests using language. Although LLMs possess extensive travel domain knowledge and provide high-level information like points of interest and potential routes, current state-of-the-art models often generate plans that lack coherence, fail to satisfy constraints fully, and do not guarantee the generation of high-quality solutions. We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners, where (i) LLMs get and translate travel information and user information into data structures that can be fed into planners; and (ii) automated planners generate travel plans that guarantee constraint satisfaction and optimize for users' utility. Our experiments across various travel scenarios show that TRIP-PAL outperforms an LLM when generating travel plans.

Updated: 2024-06-14 17:31:16

标题: TRIP-PAL：通过结合大型语言模型和自动规划器提供保证的旅行规划

摘要: 旅行规划是一个复杂的任务，涉及生成一系列与访问地点相关的行动，受到约束条件的限制，并最大化某些用户满意度标准。传统方法依赖于在给定的形式语言中制定问题、从网络来源提取相关旅行信息，并使用适当的问题求解器生成有效解决方案。作为一种替代方法，最近基于大型语言模型（LLM）的方法直接使用语言从用户请求中输出计划。尽管LLMs拥有广泛的旅行领域知识，并提供高水平信息，如兴趣点和潜在路线，但目前最先进的模型经常生成缺乏连贯性的计划，未能完全满足约束条件，并不能保证生成高质量解决方案。我们提出了TRIP-PAL，一种混合方法，结合了LLMs和自动规划器的优势，其中（i）LLMs获取并将旅行信息和用户信息转换为可输入规划器的数据结构；和（ii）自动规划器生成可以保证约束满足并优化用户效用的旅行计划。我们在各种旅行场景中的实验表明，TRIP-PAL在生成旅行计划时优于LLM。

更新时间: 2024-06-14 17:31:16

领域: cs.AI

下载: http://arxiv.org/abs/2406.10196v1

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn "class-specific" queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via "multi-head" cross-attention, INTR could identify different "attributes" of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained models are publicly accessible at the Imageomics Institute GitHub site: https://github.com/Imageomics/INTR.

Updated: 2024-06-14 17:28:14

标题: 一个简单易懂的变压器用于细粒度图像分类和分析

摘要: 我们提出了一种新颖的Transformer的用法，用于使图像分类可解释。与主流的分类器不同，后者要等到最后一个完全连接的层才会整合类信息以进行预测，我们研究了一种主动的方法，要求每个类别在图像中搜索自己。我们通过一个受DEtection TRansformer (DETR)启发的Transformer编码器-解码器实现了这个想法。我们学习了“特定类别”的查询（每个类别一个）作为解码器的输入，使每个类别可以通过交叉注意力在图像中定位其模式。我们将我们的方法命名为可解释Transformer (INTR)，它非常容易实现，并展示了几个引人入胜的特性。我们展示了INTR本质上鼓励每个类别进行独特的关注；因此，交叉注意力权重提供了一个忠实的预测解释。有趣的是，通过“多头”交叉注意力，INTR可以识别一个类别的不同“属性”，使其特别适用于细粒度分类和分析，我们在八个数据集上进行了演示。我们的代码和预训练模型可以在Imageomics Institute GitHub网站上公开访问：https://github.com/Imageomics/INTR。

更新时间: 2024-06-14 17:28:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.04157v3

AstroCLIP: A Cross-Modal Foundation Model for Galaxies

We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used - without any model fine-tuning - for a variety of downstream tasks including (1) accurate in-modality and cross-modality semantic similarity search, (2) photometric redshift estimation, (3) galaxy property estimation from both images and spectra, and (4) morphology classification. Our approach to implementing AstroCLIP consists of two parts. First, we embed galaxy images and spectra separately by pretraining separate transformer-based image and spectrum encoders in self-supervised settings. We then align the encoders using a contrastive loss. We apply our method to spectra from the Dark Energy Spectroscopic Instrument and images from its corresponding Legacy Imaging Survey. Overall, we find remarkable performance on all downstream tasks, even relative to supervised baselines. For example, for a task like photometric redshift prediction, we find similar performance to a specifically-trained ResNet18, and for additional tasks like physical property estimation (stellar mass, age, metallicity, and sSFR), we beat this supervised baseline by 19\% in terms of $R^2$. We also compare our results to a state-of-the-art self-supervised single-modal model for galaxy images, and find that our approach outperforms this benchmark by roughly a factor of two on photometric redshift estimation and physical property prediction in terms of $R^2$, while remaining roughly in-line in terms of morphology classification. Ultimately, our approach represents the first cross-modal self-supervised model for galaxies, and the first self-supervised transformer-based architectures for galaxy images and spectra.

Updated: 2024-06-14 17:19:58

标题: AstroCLIP：星系的跨模态基础模型

摘要: 我们提出了AstroCLIP，这是一个单一、多功能的模型，可以将星系图像和光谱嵌入到共享的、具有物理意义的潜在空间中。然后可以使用这些嵌入 - 无需进行任何模型微调 - 用于各种下游任务，包括（1）准确的模态内和跨模态语义相似性搜索，（2）光度红移估计，（3）从图像和光谱中估计星系特性，以及（4）形态分类。我们实施AstroCLIP的方法包括两部分。首先，我们通过在自监督设置中分别对基于变压器的图像和光谱编码器进行预训练，将星系图像和光谱分别嵌入。然后，我们使用对比损失对编码器进行对齐。我们将我们的方法应用于来自暗能量光谱仪的光谱和相应的传承成像调查的图像。总的来说，我们发现在所有下游任务上都表现出色，甚至相对于监督基线。例如，对于像光度红移预测这样的任务，我们发现与专门训练的ResNet18具有类似的性能，而对于其他任务，如物理性质估计（恒星质量、年龄、金属丰度和sSFR），我们在$R^2$方面击败了这个监督基线19%。我们还将我们的结果与用于星系图像的最新自监督单模态模型进行比较，发现我们的方法在光度红移估计和物理性质预测方面的$R^2$大约是这一基准的两倍，而在形态分类方面大致保持一致。最终，我们的方法代表了星系的第一个跨模态自监督模型，以及第一个基于变压器的自监督结构用于星系图像和光谱。

更新时间: 2024-06-14 17:19:58

领域: astro-ph.IM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.03024v2

Explaining Probabilistic Models with Distributional Values

A large branch of explainable machine learning is grounded in cooperative game theory. However, research indicates that game-theoretic explanations may mislead or be hard to interpret. We argue that often there is a critical mismatch between what one wishes to explain (e.g. the output of a classifier) and what current methods such as SHAP explain (e.g. the scalar probability of a class). This paper addresses such gap for probabilistic models by generalising cooperative games and value operators. We introduce the distributional values, random variables that track changes in the model output (e.g. flipping of the predicted class) and derive their analytic expressions for games with Gaussian, Bernoulli and Categorical payoffs. We further establish several characterising properties, and show that our framework provides fine-grained and insightful explanations with case studies on vision and language models.

Updated: 2024-06-14 17:18:11

标题: 用分布值解释概率模型

摘要: 可解释机器学习的一个重要分支基于合作博弈论。然而，研究表明博弈论解释可能会误导或难以解释。我们认为，通常存在一个关键的不匹配，即人们希望解释的内容（例如分类器的输出）与当前方法（例如SHAP解释的标量类概率）之间存在不匹配。本文通过推广合作游戏和价值算子来填补这种概率模型之间的差距。我们引入了分布值，即跟踪模型输出变化的随机变量（例如预测类别的翻转），并为具有高斯、伯努利和分类收益的游戏推导出它们的解析表达式。我们进一步建立了几个特征性质，并展示了我们的框架通过视觉和语言模型的案例研究提供了细致而深刻的解释。

更新时间: 2024-06-14 17:18:11

领域: cs.LG

下载: http://arxiv.org/abs/2402.09947v2

Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts

This paper presents a novel approach to extract scientifically valuable information about Earth's physical phenomena from unconventional sources, such as multi-modal social media posts. Employing a state-of-the-art large language model (LLM), Gemini 1.5 Pro (Reid et al. 2024), we estimate earthquake ground shaking intensity from these unstructured posts. The model's output, in the form of Modified Mercalli Intensity (MMI) values, aligns well with independent observational data. Furthermore, our results suggest that LLMs, trained on vast internet data, may have developed a unique understanding of physical phenomena. Specifically, Google's Gemini models demonstrate a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, accurately describing observational data even though it's not identical to established models. These findings raise intriguing questions about the extent to which Gemini's training has led to a broader understanding of the physical world and its phenomena. The ability of Generative AI models like Gemini to generate results consistent with established scientific knowledge highlights their potential to augment our understanding of complex physical phenomena like earthquakes. The flexible and effective approach proposed in this study holds immense potential for enriching our understanding of the impact of physical phenomena and improving resilience during natural disasters. This research is a significant step toward harnessing the power of social media and AI for natural disaster mitigation, opening new avenues for understanding the emerging capabilities of Generative AI and LLMs for scientific applications.

Updated: 2024-06-14 17:12:17

标题: 双子座与物理世界：大型语言模型可以从多模态社交媒体帖子中估算地震震动的强度

摘要: 这篇论文介绍了一种从非传统来源提取关于地球物理现象的科学有价值信息的新方法，例如多模社交媒体帖子。利用一种最先进的大型语言模型Gemini 1.5 Pro（Reid等人，2024年），我们从这些非结构化帖子中估计地震地面摇晃强度。模型的输出以修改的默卡利强度（MMI）值的形式与独立观测数据很好地吻合。此外，我们的结果表明，训练在庞大互联网数据上的LLMs可能已经形成了对物理现象的独特理解。具体来说，谷歌的Gemini模型展示了对地震震级、距离和MMI强度之间的一般关系的简化理解，即使与已建立的模型不完全相同，也能准确描述观测数据。这些发现引发了有关Gemini的训练是否导致对物理世界及其现象有更广泛理解的有趣问题。像Gemini这样的生成式AI模型能够生成与已建立科学知识一致的结果，凸显了它们增进我们对地震等复杂物理现象理解的潜力。本研究提出的灵活有效方法具有丰富我们对物理现象影响以及提高自然灾害期间抗灾能力的潜力。这项研究是利用社交媒体和人工智能进行自然灾害减灾的重要一步，为理解生成式AI和LLMs在科学应用中新兴能力开辟了新途径。

更新时间: 2024-06-14 17:12:17

领域: physics.geo-ph,cs.AI,cs.LG,physics.app-ph

下载: http://arxiv.org/abs/2405.18732v3

Out of style: Misadventures with LLMs and code style transfer

Like text, programs have styles, and certain programming styles are more desirable than others for program readability, maintainability, and performance. Code style transfer, however, is difficult to automate except for trivial style guidelines such as limits on line length. Inspired by the success of using language models for text style transfer, we investigate if code language models can perform code style transfer. Code style transfer, unlike text transfer, has rigorous requirements: the system needs to identify lines of code to change, change them correctly, and leave the rest of the program untouched. We designed CSB (Code Style Benchmark), a benchmark suite of code style transfer tasks across five categories including converting for-loops to list comprehensions, eliminating duplication in code, adding decorators to methods, etc. We then used these tests to see if large pre-trained code language models or fine-tuned models perform style transfer correctly, based on rigorous metrics to test that the transfer did occur, and the code still passes functional tests. Surprisingly, language models failed to perform all of the tasks, suggesting that they perform poorly on tasks that require code understanding. We will make available the large-scale corpora to help the community build better code models.

Updated: 2024-06-14 17:04:56

标题: 过时了：与LLMs和代码风格转移的不幸之旅

摘要: 像文本一样，程序也有风格，某些编程风格比其他风格更具可读性、可维护性和性能。然而，除了一些简单的风格指导，如限制行长，代码风格转移很难自动化。受使用语言模型进行文本风格转移成功的启发，我们研究了代码语言模型是否能够执行代码风格转移。与文本转移不同，代码风格转移有严格的要求：系统需要识别需要更改的代码行，正确更改它们，并保持程序的其余部分不变。我们设计了CSB（代码风格基准），一个包括将for循环转换为列表推导、消除代码中的重复内容、为方法添加装饰器等五个类别的代码风格转移任务基准套件。然后，我们使用这些测试来查看大型预训练的代码语言模型或微调模型是否根据严格的度量标准正确执行风格转移，以测试转移是否发生，并且代码仍然通过功能测试。令人惊讶的是，语言模型未能执行所有任务，表明它们在需要代码理解的任务上表现不佳。我们将提供大规模的语料库，以帮助社区构建更好的代码模型。

更新时间: 2024-06-14 17:04:56

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.10320v1

Shesha: Multi-head Microarchitectural Leakage Discovery in new-generation Intel Processors

Transient execution attacks have been one of the widely explored microarchitectural side channels since the discovery of Spectre and Meltdown. However, much of the research has been driven by manual discovery of new transient paths through well-known speculative events. Although a few attempts exist in literature on automating transient leakage discovery, such tools focus on finding variants of known transient attacks and explore a small subset of instruction set. Further, they take a random fuzzing approach that does not scale as the complexity of search space increases. In this work, we identify that the search space of bad speculation is disjointedly fragmented into equivalence classes, and then use this observation to develop a framework named Shesha, inspired by Particle Swarm Optimization, which exhibits faster convergence rates than state-of-the-art fuzzing techniques for automatic discovery of transient execution attacks. We then use Shesha to explore the vast search space of extensions to the x86 Instruction Set Architecture (ISAs), thereby focusing on previously unexplored avenues of bad speculation. As such, we report five previously unreported transient execution paths in Instruction Set Extensions (ISEs) on new generation of Intel processors. We then perform extensive reverse engineering of each of the transient execution paths and provide root-cause analysis. Using the discovered transient execution paths, we develop attack building blocks to exhibit exploitable transient windows. Finally, we demonstrate data leakage from Fused Multiply-Add instructions through SIMD buffer and extract victim data from various cryptographic implementations.

Updated: 2024-06-14 17:02:28

标题: 蛇神：新一代英特尔处理器中的多头微体系结构泄漏发现

摘要: 瞬态执行攻击自发现Spectre和Meltdown以来一直是广泛探讨的微体系结构侧信道之一。然而，大部分研究都是通过手动发现新的瞬态路径来驱动的，这些路径是通过已知的推测事件实现的。虽然文献中存在一些自动化发现瞬态泄漏的尝试，但这些工具主要专注于寻找已知瞬态攻击的变种，并且探索指令集的一个较小子集。此外，它们采用了一种随机模糊的方法，随着搜索空间复杂性的增加，这种方法并不具备扩展性。在这项工作中，我们发现不良推测的搜索空间被不连续地分割成等价类，并利用这一观察结果开发了一个名为Shesha的框架，受到粒子群优化的启发，该框架展现出比最先进的模糊技术更快的收敛速度，用于自动发现瞬态执行攻击。然后，我们使用Shesha来探索x86指令集体系结构（ISA）的扩展的广阔搜索空间，从而专注于以前未被探索的不良推测途径。因此，我们报告了在新一代英特尔处理器的指令集扩展（ISEs）中发现的五条以前未报告的瞬态执行路径。然后，我们对每条瞬态执行路径进行了广泛的逆向工程，并提供了根本原因分析。使用发现的瞬态执行路径，我们开发了攻击构建块，展示可利用的瞬态窗口。最后，我们演示了通过SIMD缓冲区从融合乘加指令中泄漏数据，并从各种密码实现中提取受害者数据。

更新时间: 2024-06-14 17:02:28

领域: cs.CR

下载: http://arxiv.org/abs/2406.06034v2

Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the GPU to the CPU. However, this approach is hampered by the limited bandwidth of commodity hardware, which constrains communication between the CPU and GPU. In this paper, we present an offloading framework, LSP_Offload, that enables near-native speed LLM fine-tuning on commodity hardware through learned subspace projectors. Our data-driven approach involves learning an efficient sparse compressor that minimizes communication with minimal precision loss. Additionally, we introduce a novel layer-wise communication schedule to maximize parallelism between communication and computation. As a result, our framework can fine-tune a 1.3 billion parameter model on a 4GB laptop GPU and a 7 billion parameter model on an NVIDIA RTX 4090 GPU with 24GB memory, achieving only a 31% slowdown compared to fine-tuning with unlimited memory. Compared to state-of-the-art offloading frameworks, our approach increases fine-tuning throughput by up to 3.33 times and reduces end-to-end fine-tuning time by 33.1%~62.5% when converging to the same accuracy.

Updated: 2024-06-14 16:59:11

标题: 在商品GPU上通过学习的子空间投影器实现对LLM进行微调的实用卸载

摘要: 大型语言模型（LLMs）的微调需要大量内存，通常超出单个GPU的容量。解决这种内存挑战的常见方法是将计算和数据从GPU转移到CPU。然而，这种方法受到通用硬件带宽的限制，这限制了CPU和GPU之间的通信。在本文中，我们提出了一个离载框架，LSP_Offload，通过学习的子空间投影器，在通用硬件上实现接近本机速度的LLM微调。我们的数据驱动方法涉及学习一种高效的稀疏压缩器，最小化通信并减少精度损失。此外，我们引入了一种新颖的逐层通信调度，以最大化通信和计算之间的并行性。因此，我们的框架可以在4GB笔记本电脑GPU上微调一个13亿参数模型，以及在具有24GB内存的NVIDIA RTX 4090 GPU上微调一个70亿参数模型，与具有无限内存的微调相比仅减慢31％。与最先进的离载框架相比，我们的方法将微调吞吐量提高了最多3.33倍，并将端到端微调时间降低了33.1%〜62.5%。

更新时间: 2024-06-14 16:59:11

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2406.10181v1

Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding

Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora.

Updated: 2024-06-14 16:52:00

标题: 创建一个中国文化的镜头：用于理解中国双关画的多模态数据集

摘要: 大型视觉语言模型（VLMs）已经展示出在理解日常内容方面的显著能力。然而，在艺术领域，特别是在文化丰富的艺术形式中，它们的表现仍然较少被探讨。作为人类智慧和创造力的珍珠，艺术包含了复杂的文化叙事和象征意义。在本文中，我们提供了《双关谜画艺术数据集》，这是一个根植于中国传统文化的多模态数据集，用于深入理解艺术。我们关注三个主要任务：识别显著的视觉元素，将元素与其象征意义匹配，并解释传达的信息。我们的评估表明，最先进的VLMs在这些任务中表现不佳，经常提供偏见和幻觉的解释，并在上下文学习中显示有限的改进。通过发布《双关谜画艺术数据集》，我们旨在促进能够更好地理解和解释特定文化内容的VLMs的发展，推广超越基于英语语料库的更广泛包容性。

更新时间: 2024-06-14 16:52:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10318v1

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

We propose SLoPe, a Double-Pruned Sparse Plus Lazy Low-rank Adapter Pretraining method for LLMs that improves the accuracy of sparse LLMs while accelerating their pretraining and inference and reducing their memory footprint. Sparse pretraining of LLMs reduces the accuracy of the model, to overcome this, prior work uses dense models during fine-tuning. SLoPe improves the accuracy of sparsely pretrained models by adding low-rank adapters in the final 1% iterations of pretraining without adding significant overheads to the model pretraining and inference. In addition, SLoPe uses a double-pruned backward pass formulation that prunes the transposed weight matrix using N:M sparsity structures to enable an accelerated sparse backward pass. SLoPe accelerates the training and inference of models with billions of parameters up to $1.14\times$ and $1.34\times$ respectively (OPT-33B and OPT-66B) while reducing their memory usage by up to $0.77\times$ and $0.51\times$ for training and inference respectively.

Updated: 2024-06-14 16:43:26

标题: SLoPe: 双修剪稀疏加惰性低秩适配器LLMs的预训练

摘要: 我们提出了SLoPe，一种双修剪稀疏加懒惰低秩适配器预训练方法，用于改善稀疏LLM的准确性，同时加快它们的预训练和推理速度，并减少它们的内存占用。LLM的稀疏预训练会降低模型的准确性，为了克服这一问题，先前的研究在微调过程中使用密集模型。SLoPe通过在预训练的最后1%迭代中添加低秩适配器来提高稀疏预训练模型的准确性，而不会给模型的预训练和推理过程增加显著的开销。此外，SLoPe使用双修剪的反向传递公式，通过使用N:M稀疏结构修剪转置权重矩阵来加速稀疏反向传递。SLoPe将拥有数十亿参数的模型的训练和推理加速了最多分别为$1.14\times$和$1.34\times$（OPT-33B和OPT-66B），同时将它们的内存使用减少了最多分别为$0.77\times$和$0.51\times$（用于训练和推理）。

更新时间: 2024-06-14 16:43:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16325v2

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

Severe data imbalance naturally exists among web-scale vision-language datasets. Despite this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning, and demonstrates significant effectiveness in learning generalizable representations. With an aim to investigate the reasons behind this finding, we conduct controlled experiments to study various underlying factors, and reveal that CLIP's pretext task forms a dynamic classification problem wherein only a subset of classes is present in training. This isolates the bias from dominant classes and implicitly balances the learning signal. Furthermore, the robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts, which are inaccessible to supervised learning. Our study not only uncovers the mechanisms behind CLIP's generalizability beyond data imbalance but also provides transferable insights for the research community. The findings are validated in both supervised and self-supervised learning, enabling models trained on imbalanced data to achieve CLIP-level performance on diverse recognition tasks. Code and data are available at: https://github.com/CVMI-Lab/clip-beyond-tail.

Updated: 2024-06-14 16:42:47

标题: 超越数据不平衡的泛化：关于CLIP的可迁移见解的受控研究

摘要: 大规模视觉语言数据集中存在严重的数据不平衡。尽管如此，我们发现CLIP在此基础上预训练比监督学习表现出显著的数据不平衡鲁棒性，并展示出学习可泛化表示的显著有效性。为了调查这一发现背后的原因，我们进行了控制实验来研究各种潜在因素，并揭示了CLIP的假设任务形成了一个动态分类问题，其中只有训练中的一部分类别存在。这隔离了来自主导类别的偏见，并隐含地平衡了学习信号。此外，随着更具描述性的语言监督、更大规模的数据和更广泛的开放世界概念，CLIP的鲁棒性和可辨识性得到了改善，这些是监督学习无法访问的。我们的研究不仅揭示了CLIP在数据不平衡之外的泛化机制，还为研究社区提供了可转移的见解。这些发现在监督学习和自监督学习中得到验证，使在不平衡数据上训练的模型能够在各种识别任务上达到CLIP级别的性能。代码和数据可在以下链接找到：https://github.com/CVMI-Lab/clip-beyond-tail。

更新时间: 2024-06-14 16:42:47

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.21070v2

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and computational demands. However, the irregular structure of sparse matrices poses significant challenges for performance optimization. Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes - inner, outer, and row-wise but often perform suboptimally when the actual sparsity deviates from these predetermined patterns. As the use of SpGEMM expands across various domains, each with distinct sparsity characteristics, the demand for hardware accelerators that can efficiently handle a range of sparsity patterns is increasing. This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks with diverse sparsity patterns. By employing decision trees and deep reinforcement learning, we explore the potential of these techniques to surpass heuristic-based methods in identifying optimal dataflow schemes. We evaluate our models by comparing their performance with that of a heuristic, highlighting the strengths and weaknesses of each approach. Our findings suggest that using machine learning for dynamic dataflow selection in hardware accelerators can provide upto 28 times gains.

Updated: 2024-06-14 16:36:35

标题: Misam:在稀疏-稀疏矩阵乘法的数据流选择中使用机器学习

摘要: 稀疏矩阵乘法（SpGEMM）是许多领域的关键操作，包括科学计算、图分析和深度学习。这些应用利用矩阵的稀疏性来减少存储和计算需求。然而，稀疏矩阵的不规则结构对性能优化构成了重大挑战。传统的硬件加速器针对特定的稀疏模式设计了固定的数据流方案 - 内部、外部和按行，但当实际稀疏性偏离这些预定模式时往往表现不佳。随着SpGEMM在各个领域的扩展，每个领域具有不同的稀疏特性，对能够有效处理各种稀疏模式的硬件加速器的需求正在增加。本文提出了一种基于机器学习的方法，用于自适应地选择适用于具有不同稀疏模式的SpGEMM任务的最合适的数据流方案。通过使用决策树和深度强化学习，我们探讨了这些技术在识别最佳数据流方案方面超越基于启发式方法的潜力。我们通过将模型的性能与启发式方法进行比较来评估我们的模型，突出每种方法的优势和劣势。我们的研究结果表明，在硬件加速器中使用机器学习进行动态数据流选择可以提供高达28倍的收益。

更新时间: 2024-06-14 16:36:35

领域: cs.LG

下载: http://arxiv.org/abs/2406.10166v1

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.

Updated: 2024-06-14 16:30:25

标题: MeshAnything：由自回归变换器生成的艺术家创建的网格

摘要: 最近，通过重建和生成创建的3D资产已经达到了手工制作资产的质量水平，突显了它们替代的潜力。然而，这种潜力主要尚未得以实现，因为这些资产总是需要转换为网格用于3D行业应用，而目前的网格提取方法产生的网格明显不如由人类艺术家创建的网格（AMs）优秀。具体来说，目前的网格提取方法依赖于密集的面并忽略几何特征，导致低效率、复杂的后处理和较低的表现质量。为了解决这些问题，我们引入了MeshAnything，一个将网格提取视为生成问题的模型，生成与指定形状对齐的AMs。通过将任何3D表示的3D资产转换为AMs，MeshAnything可以与各种3D资产生产方法集成，从而增强它们在3D行业中的应用。MeshAnything的架构包括一个VQ-VAE和一个形状条件的仅解码器变压器。我们首先使用VQ-VAE学习网格词汇，然后在此词汇上训练形状条件的仅解码器变压器进行形状条件的自回归网格生成。我们的广泛实验表明，我们的方法生成的AMs面数少几百倍，显著提高了存储、渲染和模拟效率，同时实现了与以前方法相当的精度。

更新时间: 2024-06-14 16:30:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10163v1

Towards the TopMost: A Topic Modeling System Toolkit

Topic models have a rich history with various applications and have recently been reinvigorated by neural topic modeling. However, these numerous topic models adopt totally distinct datasets, implementations, and evaluations. This impedes quick utilization and fair comparisons, and thereby hinders their research progress and applications. To tackle this challenge, we in this paper propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost stands out by supporting more extensive features. It covers a broader spectrum of topic modeling scenarios with their complete lifecycles, including datasets, preprocessing, models, training, and evaluations. Thanks to its highly cohesive and decoupled modular design, TopMost enables rapid utilization, fair comparisons, and flexible extensions of diverse cutting-edge topic models. Our code, tutorials, and documentation are available at https://github.com/bobxwu/topmost.

Updated: 2024-06-14 16:27:24

标题: 朝向顶尖：一个主题建模系统工具包

摘要: 主题模型在各种应用中具有丰富的历史，并最近通过神经主题建模得到了重新激发。然而，这些众多主题模型采用完全不同的数据集、实现和评估方法。这阻碍了快速利用和公平比较，从而阻碍了它们的研究进展和应用。为了应对这一挑战，我们在本文中提出了一个主题建模系统工具包（TopMost）。与现有工具包相比，TopMost 通过支持更广泛的功能脱颖而出。它涵盖了更广泛的主题建模场景及其完整的生命周期，包括数据集、预处理、模型、训练和评估。由于其高度内聚且解耦的模块化设计，TopMost 可以实现快速利用、公平比较和灵活扩展各种前沿主题模型。我们的代码、教程和文档可在 https://github.com/bobxwu/topmost 上找到。

更新时间: 2024-06-14 16:27:24

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2309.06908v2

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be too complex to be discovered via exploration. In this paper, we study whether Large Language Model (LLM) assistants which find easily discovered forms of specification gaming will generalize to perform rarer and more blatant forms, up to and including reward-tampering. We construct a curriculum of increasingly sophisticated gameable environments and find that training on early-curriculum environments leads to more specification gaming on remaining environments. Strikingly, a small but non-negligible proportion of the time, LLM assistants trained on the full curriculum generalize zero-shot to directly rewriting their own reward function. Retraining an LLM not to game early-curriculum environments mitigates, but does not eliminate, reward-tampering in later environments. Moreover, adding harmlessness training to our gameable environments does not prevent reward-tampering. These results demonstrate that LLMs can generalize from common forms of specification gaming to more pernicious reward tampering and that such behavior may be nontrivial to remove.

Updated: 2024-06-14 16:26:20

标题: 阿谀奉承到诡计：探究大型语言模型中的操纵奖励

摘要: 在强化学习中，当人工智能系统学习到由于训练目标错误而高度奖励的不良行为时，就会出现规范游戏的情况。规范游戏的范围可以从简单的奉承行为到复杂且有害的行为，比如奖励篡改，在这种情况下，模型直接修改自己的奖励机制。然而，这些更有害的行为可能太复杂，无法通过探索发现。本文研究了大型语言模型（LLM）助理是否会将容易发现的规范游戏形式推广到执行更罕见和更明显的形式，甚至包括奖励篡改。我们构建了一个逐渐复杂的可游戏环境课程，并发现在早期课程环境上训练会导致在剩余环境上出现更多的规范游戏。值得注意的是，在一小部分时间内，接受完整课程培训的LLM助理可以零-shot地直接重写自己的奖励功能。重新训练LLM以防止在早期课程环境中进行规范游戏可以减轻，但无法消除后续环境中的奖励篡改。此外，在我们可游戏的环境中添加无害性训练并不能防止奖励篡改。这些结果表明，LLM可以从常见的规范游戏形式推广到更有害的奖励篡改，并且这种行为可能难以消除。

更新时间: 2024-06-14 16:26:20

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10162v1

Context-Aware Prediction of User Engagement on Online Social Platforms

The success of online social platforms hinges on their ability to predict and understand user behavior at scale. Here, we present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight and potentially privacy-preserving representation of user engagement on online social platforms. Leveraging deep LSTM neural networks to analyze more than 100 million Snapchat sessions from almost 80.000 users, we demonstrate that patterns of active and passive use are predictable from past behavior (R2=0.345) and that the integration of context features substantially improves predictive performance compared to the behavioral baseline model (R2=0.522). Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement relative to features derived from histories of in-app behaviors. Further, we show that a large proportion of variance can be accounted for with minimal behavioral histories if momentary context is considered (R2=0.442). These results indicate the potential of context-aware approaches for making models more efficient and privacy-preserving by reducing the need for long data histories. Finally, we employ model explainability techniques to glean preliminary insights into the underlying behavioral mechanisms. Our findings are consistent with the notion of context-contingent, habit-driven patterns of active and passive use, underscoring the value of contextualized representations of user behavior for predicting user engagement on social platforms.

Updated: 2024-06-14 16:21:51

标题: 在线社交平台上用户参与度的上下文感知预测

摘要: 在线社交平台的成功取决于它们在规模上预测和理解用户行为的能力。在这里，我们提供的数据表明，基于上下文感知建模方法可能提供一种全面但轻量级且潜在保护隐私的用户参与在线社交平台的表示。利用深度LSTM神经网络分析了来自近80,000名用户的超过1亿次Snapchat会话，我们证明了主动和被动使用的模式可以根据过去的行为可预测（R2=0.345），并且相对于基于行为的基线模型（R2=0.522），整合上下文特征显著提高了预测性能。发现与应用内行为历史衍生的特征相比，与智能手机连接状态、位置、时间上下文和天气相关的特征能够捕捉用户参与的非冗余方差。此外，我们发现，如果考虑瞬时上下文，即使只有最少的行为历史，也可以解释大部分方差（R2=0.442）。这些结果表明，通过考虑上下文感知方法，可以使模型更高效，并通过减少对长数据历史的需求来保护隐私。最后，我们运用模型可解释性技术来初步了解潜在的行为机制。我们的发现与上下文相关、习惯驱动的主动和被动使用模式的概念一致，强调了用户行为的上下文化表达对于预测社交平台上用户参与的价值。

更新时间: 2024-06-14 16:21:51

领域: cs.LG,cs.AI,cs.HC,cs.SI

下载: http://arxiv.org/abs/2310.14533v2

NeuralClothSim: Neural Deformation Fields Meet the Thin Shell Theory

Despite existing 3D cloth simulators producing realistic results, they predominantly operate on discrete surface representations (e.g. points and meshes) with a fixed spatial resolution, which often leads to large memory consumption and resolution-dependent simulations. Moreover, back-propagating gradients through the existing solvers is difficult, and they cannot be easily integrated into modern neural architectures. In response, this paper re-thinks physically plausible cloth simulation: We propose NeuralClothSim, i.e., a new quasistatic cloth simulator using thin shells, in which surface deformation is encoded in neural network weights in the form of a neural field. Our memory-efficient solver operates on a new continuous coordinate-based surface representation called neural deformation fields (NDFs); it supervises NDF equilibria with the laws of the non-linear Kirchhoff-Love shell theory with a non-linear anisotropic material model. NDFs are adaptive: They 1) allocate their capacity to the deformation details and 2) allow surface state queries at arbitrary spatial resolutions without re-training. We show how to train NeuralClothSim while imposing hard boundary conditions and demonstrate multiple applications, such as material interpolation and simulation editing. The experimental results highlight the effectiveness of our continuous neural formulation.

Updated: 2024-06-14 16:21:39

标题: 《神经织物模拟：神经变形场遇上薄壳理论》

摘要: 尽管现有的3D布料模拟器产生了逼真的结果，但它们主要在离散的表面表示上运作（例如点和网格），具有固定的空间分辨率，这经常导致大量的内存消耗和分辨率相关的模拟。此外，通过现有求解器反向传播梯度很困难，它们不能轻松集成到现代神经结构中。为了应对这一问题，本文重新思考了物理可信的布料模拟：我们提出了NeuralClothSim，即一种使用薄壳的新型准静态布料模拟器，其中表面变形以神经网络权重的形式编码为神经场。我们高效的求解器在一种新的基于连续坐标的表面表示上运作，称为神经变形场（NDFs）；它通过非线性吉尔霍夫-洛夫壳理论的定律和非线性各向异性材料模型监督NDF的平衡。NDF是自适应的：它们1）将容量分配给变形细节，2）允许在任意空间分辨率下进行表面状态查询而无需重新训练。我们展示了如何在施加硬边界条件的情况下训练NeuralClothSim，并展示了多种应用，如材料插值和模拟编辑。实验结果突出了我们连续神经形式的有效性。

更新时间: 2024-06-14 16:21:39

领域: cs.GR,cs.LG

下载: http://arxiv.org/abs/2308.12970v2

On the Computability of Robust PAC Learning

We initiate the study of computability requirements for adversarially robust learning. Adversarially robust PAC-type learnability is by now an established field of research. However, the effects of computability requirements in PAC-type frameworks are only just starting to emerge. We introduce the problem of robust computable PAC (robust CPAC) learning and provide some simple sufficient conditions for this. We then show that learnability in this setup is not implied by the combination of its components: classes that are both CPAC and robustly PAC learnable are not necessarily robustly CPAC learnable. Furthermore, we show that the novel framework exhibits some surprising effects: for robust CPAC learnability it is not required that the robust loss is computably evaluable! Towards understanding characterizing properties, we introduce a novel dimension, the computable robust shattering dimension. We prove that its finiteness is necessary, but not sufficient for robust CPAC learnability. This might yield novel insights for the corresponding phenomenon in the context of robust PAC learnability, where insufficiency of the robust shattering dimension for learnability has been conjectured, but so far a resolution has remained elusive.

Updated: 2024-06-14 16:20:04

标题: 关于鲁棒性PAC学习的可计算性

摘要: 我们开启了对对抗鲁棒学习的可计算性要求的研究。对抗鲁棒PAC类型学习现在是一个已经建立的研究领域。然而，在PAC类型框架中的计算要求的影响才刚刚开始显现。我们引入了鲁棒可计算PAC（robust CPAC）学习的问题，并提供了一些简单的充分条件。然后我们展示了在这个设置中的可学习性并不由其组成部分的组合所暗示：既是CPAC又是鲁棒PAC可学习的类并不一定是鲁棒CPAC可学习的。此外，我们展示了这个新颖框架展现出一些令人惊讶的效果：对于鲁棒CPAC的可学习性，并不要求鲁棒损失是可计算的！为了理解特征性质，我们引入了一个新颖的维度，即可计算鲁棒破碎维度。我们证明其有限性是鲁棒CPAC可学习性的必要条件，但并不充分。这可能为鲁棒PAC学习的对应现象提供新的见解，其中已经猜测了鲁棒破碎维度对于可学习性的不足，但目前还没有解决方案。

更新时间: 2024-06-14 16:20:04

领域: cs.LG

下载: http://arxiv.org/abs/2406.10161v1

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrate the multiple ASR systems compressed in a single all-in-one model produced a word error rate (WER) comparable to, or lower by up to 1.01\% absolute (6.98\% relative) than individually trained systems of equal complexity. A 3.4x overall system compression and training time speed-up was achieved. Maximum model size compression ratios of 12.8x and 3.93x were obtained over the baseline Switchboard-300hr Conformer and LibriSpeech-100hr fine-tuned wav2vec2.0 models, respectively, incurring no statistically significant WER increase.

Updated: 2024-06-14 16:18:34

标题: 使用一种全能神经模型进行一次性多构象和基础语音系统压缩和量化

摘要: 我们提出了一种新颖的一次性多ASR系统联合压缩和量化方法，使用一个多功能神经模型。单次压缩周期允许同时构建多个嵌套系统，这些系统具有不同的编码器深度、宽度和量化精度设置，而无需训练和单独存储目标系统。实验一致表明，在单个多功能模型中压缩的多个ASR系统产生的字错误率（WER）与或低于同等复杂度的单独训练系统相比，绝对值相对较低高达1.01％（相对值为6.98％）。实现了3.4倍的整体系统压缩和训练时间加速。相对于基准Switchboard-300hr Conformer和LibriSpeech-100hr fine-tuned wav2vec2.0模型，最大模型尺寸压缩比分别达到12.8倍和3.93倍，没有产生统计学上显著的WER增加。

更新时间: 2024-06-14 16:18:34

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.10160v1

RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model

Minigolf, a game with countless court layouts, and complex ball motion, constitutes a compelling real-world testbed for the study of embodied intelligence. As it not only challenges spatial and kinodynamic reasoning but also requires reflective and corrective capacities to address erroneously designed courses. We introduce RoboGolf, a framework that perceives dual-camera visual inputs with nested VLM-empowered closed-loop control and reflective equilibrium loop. Extensive experiments demonstrate the effectiveness of RoboGolf on challenging minigolf courts including those that are impossible to finish.

Updated: 2024-06-14 16:16:52

标题: RoboGolf: 用反思的多模态视觉语言模型掌握真实世界迷你高尔夫

摘要: 迷你高尔夫是一种拥有无数球场布局和复杂球运动的游戏，构成了一个引人入胜的现实世界试验平台，用于研究具有体现智能的机器。它不仅挑战空间和运动动力学推理，还需要反思和纠正能力来应对设计错误的球场。我们引入了RoboGolf，这是一个利用嵌套式VLM增强闭环控制和反思均衡环的框架，可以感知双摄像头视觉输入。大量实验证明了RoboGolf在挑战性迷你高尔夫球场上的有效性，包括那些不可能完成的球场。

更新时间: 2024-06-14 16:16:52

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.10157v1

Automated Design of Linear Bounding Functions for Sigmoidal Nonlinearities in Neural Networks

The ubiquity of deep learning algorithms in various applications has amplified the need for assuring their robustness against small input perturbations such as those occurring in adversarial attacks. Existing complete verification techniques offer provable guarantees for all robustness queries but struggle to scale beyond small neural networks. To overcome this computational intractability, incomplete verification methods often rely on convex relaxation to over-approximate the nonlinearities in neural networks. Progress in tighter approximations has been achieved for piecewise linear functions. However, robustness verification of neural networks for general activation functions (e.g., Sigmoid, Tanh) remains under-explored and poses new challenges. Typically, these networks are verified using convex relaxation techniques, which involve computing linear upper and lower bounds of the nonlinear activation functions. In this work, we propose a novel parameter search method to improve the quality of these linear approximations. Specifically, we show that using a simple search method, carefully adapted to the given verification problem through state-of-the-art algorithm configuration techniques, improves the average global lower bound by 25% on average over the current state of the art on several commonly used local robustness verification benchmarks.

Updated: 2024-06-14 16:16:26

标题: 神经网络中Sigmoid非线性的线性约束函数的自动设计

摘要: 深度学习算法在各种应用中的普及性增强了对它们在小输入扰动（如在对抗性攻击中发生的扰动）下的健壮性的保证的需求。现有的完整验证技术为所有健壮性查询提供了可证明的保证，但在小型神经网络之外很难扩展。为了克服这种计算上的难题，不完整的验证方法通常依赖于凸松弛来过度近似神经网络中的非线性。对分段线性函数的更紧密的近似取得了进展。然而，对于一般激活函数（例如Sigmoid、Tanh）的神经网络的健壮性验证仍未得到充分探讨，并提出了新的挑战。通常，这些网络使用凸松弛技术进行验证，其中涉及计算非线性激活函数的线性上下界。在这项工作中，我们提出了一种新颖的参数搜索方法来改善这些线性近似的质量。具体地，我们展示了通过使用一个简单的搜索方法，通过最先进的算法配置技术仔细调整给定的验证问题，平均可以提高常用的本地健壮性验证基准上的全局下界约25%。

更新时间: 2024-06-14 16:16:26

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2406.10154v1

Trusting code in the wild: Exploring contributor reputation measures to review dependencies in the Rust ecosystem

Developers rely on open-source packages and must review dependencies to safeguard against vulnerable or malicious upstream code. A careful review of all dependencies changes often does not occur in practice. Therefore, developers need signals to inform of dependency changes that require additional examination. The goal of this study is to help developers prioritize dependency review efforts by analyzing contributor reputation measures as a signal. We use network centrality measures to proxy contributor reputation using collaboration activity. We employ a mixed method methodology from the top 1,644 packages in the Rust ecosystem to build a network of 6,949 developers, survey 285 developers, and model 5 centrality measures. We find that only 24% of respondents often review dependencies before adding or updating a package, mentioning difficulties in the review process. Additionally, 51% of respondents often consider contributor reputation when reviewing dependencies. The closeness centrality measure is a significant factor in explaining how developers review dependencies. Yet, centrality measures alone do not account for how developers choose to review dependencies. We recommend that ecosystems like GitHub, Rust, and npm implement a contributor reputation badge based on our modeled coefficients to aid developers in dependency reviews.

Updated: 2024-06-14 16:13:58

标题: 《在野外信任代码：探索Rust生态系统中的贡献者声誉措施以审查依赖关系》

摘要: 开发人员依赖开源包，并必须检查依赖项，以防止受到脆弱或恶意的上游代码的影响。在实践中，对所有依赖项变化的仔细审查往往并不经常发生。因此，开发人员需要信号来通知需要额外审查的依赖项变化。本研究的目标是通过分析贡献者声誉指标作为信号，帮助开发人员优先考虑依赖项审查工作。我们使用网络中心度指标来代理协作活动，以反映贡献者的声誉。我们采用混合方法学，从Rust生态系统中前1644个软件包中构建了6949名开发人员的网络，调查了285名开发人员，并建模了5个中心度指标。我们发现只有24%的受访者在添加或更新软件包之前经常审查依赖项，提到了审查过程中的困难。此外，51%的受访者在审查依赖项时经常考虑贡献者的声誉。接近中心度指标在解释开发人员如何审查依赖项方面是一个重要因素。然而，仅靠中心度指标不能解释开发人员选择如何审查依赖项。我们建议像GitHub、Rust和npm这样的生态系统实施基于我们建模系数的贡献者声誉徽章，以帮助开发人员进行依赖项审查。

更新时间: 2024-06-14 16:13:58

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2406.10317v1

Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems

In this paper, we introduce a policy-gradient method for model-based reinforcement learning (RL) that exploits a type of stationary distributions commonly obtained from Markov decision processes (MDPs) in stochastic networks, queueing systems, and statistical mechanics. Specifically, when the stationary distribution of the MDP belongs to an exponential family that is parametrized by policy parameters, we can improve existing policy gradient methods for average-reward RL. Our key identification is a family of gradient estimators, called score-aware gradient estimators (SAGEs), that enable policy gradient estimation without relying on value-function approximation in the aforementioned setting. This contrasts with other common policy-gradient algorithms such as actor-critic methods. We first show that policy-gradient with SAGE locally converges, including in cases when the objective function is nonconvex, presents multiple maximizers, and the state space of the MDP is not finite. Under appropriate assumptions such as starting sufficiently close to a maximizer, the policy under stochastic gradient ascent with SAGE has an overwhelming probability of converging to the associated optimal policy. Other key assumptions are that a local Lyapunov function exists, and a nondegeneracy property of the Hessian of the objective function holds locally around a maximizer. Furthermore, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic method. We specifically focus on several examples inspired from stochastic networks, queueing systems, and models derived from statistical physics, where parametrizable exponential families are commonplace. Our results demonstrate that a SAGE-based method finds close-to-optimal policies faster than an actor-critic method.

Updated: 2024-06-14 16:10:33

标题: 基于得分感知的策略梯度方法和利用局部李亚普诺夫条件的性能保证：应用于产品形式随机网络和排队系统

摘要: 在本文中，我们介绍了一种基于模型的强化学习（RL）的政策梯度方法，利用了在随机网络、排队系统和统计力学中常见的马尔可夫决策过程（MDP）获得的一种类型的稳态分布。具体来说，当MDP的稳态分布属于由策略参数参数化的指数家族时，我们可以改进现有的用于平均回报RL的政策梯度方法。我们的关键发现是一类梯度估计器，称为得分感知梯度估计器（SAGEs），可以在不依赖值函数逼近的情况下进行政策梯度估计。这与其他常见的政策梯度算法（如演员-评论家方法）有所不同。我们首先证明了使用SAGE的政策梯度在局部收敛，包括在目标函数为非凸、存在多个最大值和MDP的状态空间不是有限的情况下。在适当的假设下，例如从一个最大值开始足够接近，使用SAGE进行随机梯度上升的策略具有收敛到相关最优策略的压倒性概率。其他关键假设是存在一个局部李雅普诺夫函数，并且目标函数的海森矩阵在最大值周围的局部非退化性属性成立。此外，我们进行了基于SAGE的政策梯度方法和演员-评论家方法之间的数值比较。我们特别关注几个受到随机网络、排队系统和统计物理模型启发的示例，其中可参数化的指数家族很常见。我们的结果表明，基于SAGE的方法比演员-评论家方法更快地找到接近最优策略。

更新时间: 2024-06-14 16:10:33

领域: cs.LG,cs.PF,math.OC,math.PR

下载: http://arxiv.org/abs/2312.02804v2

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

In recent years, the input context sizes of large language models (LLMs) have increased dramatically. However, existing evaluation methods have not kept pace, failing to comprehensively assess the efficiency of models in handling long contexts. To bridge this gap, we introduce the BABILong benchmark, designed to test language models' ability to reason across facts distributed in extremely long documents. BABILong includes a diverse set of 20 reasoning tasks, including fact chaining, simple induction, deduction, counting, and handling lists/sets. These tasks are challenging on their own, and even more demanding when the required facts are scattered across long natural text. Our evaluations show that popular LLMs effectively utilize only 10-20\% of the context and their performance declines sharply with increased reasoning complexity. Among alternatives to in-context reasoning, Retrieval-Augmented Generation methods achieve a modest 60\% accuracy on single-fact question answering, independent of context length. Among context extension methods, the highest performance is demonstrated by recurrent memory transformers, enabling the processing of lengths up to 11 million tokens. The BABILong benchmark is extendable to any length to support the evaluation of new upcoming models with increased capabilities, and we provide splits up to 1 million token lengths.

Updated: 2024-06-14 16:00:29

标题: BABILong：通过长上下文推理测试LLMs的极限

摘要: 近年来，大型语言模型（LLMs）的输入上下文大小急剧增加。然而，现有的评估方法没有跟上步伐，未能全面评估模型处理长上下文的效率。为了弥补这一差距，我们引入了BABILong基准，旨在测试语言模型在处理分布在极长文档中的事实时的推理能力。BABILong包括一个多样化的20个推理任务集，包括事实链接、简单归纳、演绎、计数以及处理列表/集合。这些任务本身就具有挑战性，而当所需事实分散在长篇自然文本中时，要求更高。我们的评估表明，流行的LLMs仅有效利用10-20％的上下文，随着推理复杂度的增加，它们的性能急剧下降。在上下文推理的替代方法中，检索增强生成方法在单一事实问题回答中实现了60％的准确率，与上下文长度无关。在上下文扩展方法中，递归记忆变换器展示了最佳性能，能够处理高达1100万个标记长度。BABILong基准可扩展到任何长度，以支持评估具有增强能力的新模型，并提供长达100万标记长度的分组。

更新时间: 2024-06-14 16:00:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10149v1

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

Interest in bilevel optimization has grown in recent years, partially due to its applications to tackle challenging machine-learning problems. Several exciting recent works have been centered around developing efficient gradient-based algorithms that can solve bilevel optimization problems with provable guarantees. However, the existing literature mainly focuses on bilevel problems either without constraints, or featuring only simple constraints that do not couple variables across the upper and lower levels, excluding a range of complex applications. Our paper studies this challenging but less explored scenario and develops a (fully) first-order algorithm, which we term BLOCC, to tackle BiLevel Optimization problems with Coupled Constraints. We establish rigorous convergence theory for the proposed algorithm and demonstrate its effectiveness on two well-known real-world applications - hyperparameter selection in support vector machine (SVM) and infrastructure planning in transportation networks using the real data from the city of Seville.

Updated: 2024-06-14 15:59:36

标题: 一种基于原始-对偶辅助惩罚方法的带有耦合约束的双层优化研究

摘要: 近年来，人们对双层优化的兴趣日益增长，部分原因是其在解决具有挑战性的机器学习问题方面的应用。一些令人兴奋的最新工作集中在开发高效的基于梯度的算法，可以解决具有可证明保证的双层优化问题。然而，现有文献主要集中在双层问题上，要么没有约束，要么只包含简单的约束，这些约束不会跨上下层耦合变量，排除了一系列复杂的应用。我们的论文研究了这个具有挑战性但较少探索的场景，并开发了一种名为BLOCC的（完全）一阶算法，用于解决具有耦合约束的双层优化问题。我们为所提出的算法建立了严格的收敛理论，并在两个知名的真实世界应用中展示了其有效性 - 在支持向量机（SVM）中进行超参数选择和使用塞维利亚市的真实数据进行交通网络基础设施规划。

更新时间: 2024-06-14 15:59:36

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.10148v1

Future Directions in the Theory of Graph Machine Learning

Machine learning on graphs, especially using graph neural networks (GNNs), has seen a surge in interest due to the wide availability of graph data across a broad spectrum of disciplines, from life to social and engineering sciences. Despite their practical success, our theoretical understanding of the properties of GNNs remains highly incomplete. Recent theoretical advancements primarily focus on elucidating the coarse-grained expressive power of GNNs, predominantly employing combinatorial techniques. However, these studies do not perfectly align with practice, particularly in understanding the generalization behavior of GNNs when trained with stochastic first-order optimization techniques. In this position paper, we argue that the graph machine learning community needs to shift its attention to developing a balanced theory of graph machine learning, focusing on a more thorough understanding of the interplay of expressive power, generalization, and optimization.

Updated: 2024-06-14 15:54:12

标题: 图机器学习理论的未来方向

摘要: 图上的机器学习，尤其是使用图神经网络(GNNs)，由于广泛可用的图数据在各个学科领域的普及，引起了人们的兴趣激增。尽管它们在实践中取得了成功，但我们对GNN的性质的理论理解仍然非常不完整。最近的理论进展主要集中在阐明GNN的粗粒度表达能力，主要采用组合技术。然而，这些研究并不完全与实践相一致，特别是在理解使用随机一阶优化技术训练时GNN的泛化行为时。在这篇立场论文中，我们认为图机器学习社区需要将注意力转向发展一个平衡的图机器学习理论，重点放在更彻底地理解表达能力、泛化和优化的相互作用上。

更新时间: 2024-06-14 15:54:12

领域: cs.LG,cs.AI,cs.DM,cs.NE,stat.ML

下载: http://arxiv.org/abs/2402.02287v4

Improving rule mining via embedding-based link prediction

Rule mining on knowledge graphs allows for explainable link prediction. Contrarily, embedding-based methods for link prediction are well known for their generalization capabilities, but their predictions are not interpretable. Several approaches combining the two families have been proposed in recent years. The majority of the resulting hybrid approaches are usually trained within a unified learning framework, which often leads to convergence issues due to the complexity of the learning task. In this work, we propose a new way to combine the two families of approaches. Specifically, we enrich a given knowledge graph by means of its pre-trained entity and relation embeddings before applying rule mining systems on the enriched knowledge graph. To validate our approach, we conduct extensive experiments on seven benchmark datasets. An analysis of the results generated by our approach suggests that we discover new valuable rules on the enriched graphs. We provide an open source implementation of our approach as well as pretrained models and datasets at https://github.com/Jean-KOUAGOU/EnhancedRuleLearning

Updated: 2024-06-14 15:53:30

标题: 通过基于嵌入的链接预测改进规则挖掘

摘要: 知识图谱上的规则挖掘允许可解释的链接预测。相反，基于嵌入的链接预测方法以其泛化能力而闻名，但它们的预测是不可解释的。近年来提出了几种结合这两种方法的方法。大多数结果混合方法通常在统一的学习框架内进行训练，这往往由于学习任务的复杂性而导致收敛问题。在这项工作中，我们提出了一种新的结合两种方法的方式。具体地，我们通过预先训练的实体和关系嵌入来丰富给定的知识图谱，然后在丰富的知识图谱上应用规则挖掘系统。为了验证我们的方法，我们在七个基准数据集上进行了广泛的实验。我们的方法产生的结果分析表明，我们在丰富的图上发现了新的有价值的规则。我们提供了我们方法的开源实现，以及预训练模型和数据集，网址为https://github.com/Jean-KOUAGOU/EnhancedRuleLearning。

更新时间: 2024-06-14 15:53:30

领域: cs.AI

下载: http://arxiv.org/abs/2406.10144v1

The Rise and Fall(?) of Software Engineering

Over the last ten years, the realm of Artificial Intelligence (AI) has experienced an explosion of revolutionary breakthroughs, transforming what seemed like a far-off dream into a reality that is now deeply embedded in our everyday lives. AI's widespread impact is revolutionizing virtually all aspects of human life, and software engineering (SE) is no exception. As we explore this changing landscape, we are faced with questions about what the future holds for SE and how AI will reshape the roles, duties, and methodologies within the field. The introduction of these groundbreaking technologies highlights the inevitable shift towards a new paradigm, suggesting a future where AI's capabilities may redefine the boundaries of SE, potentially even more than human input. In this paper, we aim at outlining the key elements that, based on our expertise, are vital for the smooth integration of AI into SE, all while preserving the intrinsic human creativity that has been the driving force behind the field. First, we provide a brief description of SE and AI evolution. Afterward, we delve into the intricate interplay between AI-driven automation and human innovation, exploring how these two components can work together to advance SE practices to new methods and standards.

Updated: 2024-06-14 15:50:24

标题: 软件工程的兴衰

摘要: 在过去的十年中，人工智能（AI）领域经历了一系列革命性突破，将看似遥不可及的梦想变成了现实，现在已经深深地融入到我们的日常生活中。AI的广泛影响正在彻底改变人类生活的几乎所有方面，软件工程（SE）也不例外。随着我们探索这一变化的格局，我们面临着关于SE的未来以及AI将如何重塑该领域的角色、职责和方法论的问题。这些开创性技术的引入凸显了向新范式的不可避免转变，暗示着一个未来，其中AI的能力可能重新定义SE的边界，甚至可能比人类输入更多。在本文中，我们旨在概述我们认为对于将AI顺利整合到SE中至关重要的关键要素，同时保留作为该领域推动力量的内在人类创造力。首先，我们简要描述了SE和AI的演变。然后，我们深入探讨了AI驱动自动化与人类创新之间的复杂相互作用，探讨这两个要素如何共同推动SE实践迈向新方法和标准。

更新时间: 2024-06-14 15:50:24

领域: cs.SE,cs.AI,D.2; I.2

下载: http://arxiv.org/abs/2406.10141v1

Compressed Sensor Caching and Collaborative Sparse Data Recovery with Anchor Alignment

This work examines the compressed sensor caching problem in wireless sensor networks and devises efficient distributed sparse data recovery algorithms to enable collaboration among multiple caches. In this problem, each cache is only allowed to access measurements from a small subset of sensors within its vicinity to reduce both cache size and data acquisition overhead. To enable reliable data recovery with limited access to measurements, we propose a distributed sparse data recovery method, called the collaborative sparse recovery by anchor alignment (CoSR-AA) algorithm, where collaboration among caches is enabled by aligning their locally recovered data at a few anchor nodes. The proposed algorithm is based on the consensus alternating direction method of multipliers (ADMM) algorithm but with message exchange that is reduced by considering the proposed anchor alignment strategy. Then, by the deep unfolding of the ADMM iterations, we further propose the Deep CoSR-AA algorithm that can be used to significantly reduce the number of iterations. We obtain a graph neural network architecture where message exchange is done more efficiently by an embedded autoencoder. Simulations are provided to demonstrate the effectiveness of the proposed collaborative recovery algorithms in terms of the improved reconstruction quality and the reduced communication overhead due to anchor alignment.

Updated: 2024-06-14 15:47:13

标题: 压缩传感器缓存与锚点对齐的协作稀疏数据恢复

摘要: 这项工作研究了无线传感器网络中的压缩传感器缓存问题，并设计了高效的分布式稀疏数据恢复算法，以促进多个缓存之间的协作。在这个问题中，每个缓存只允许访问其附近小范围传感器的测量数据，以减少缓存大小和数据采集开销。为了在有限的测量访问下实现可靠的数据恢复，我们提出了一种分布式稀疏数据恢复方法，称为基于锚点对齐的协作稀疏恢复（CoSR-AA）算法，其中通过在少数锚节点处对齐缓存的本地恢复数据来实现缓存间的协作。所提出的算法基于共识交替方向乘子（ADMM）算法，但通过考虑提出的锚点对齐策略减少了消息交换。然后，通过ADMM迭代的深度展开，我们进一步提出了Deep CoSR-AA算法，可用于显著减少迭代次数。我们获得了一个图神经网络架构，其中通过嵌入式自编码器更有效地进行消息交换。通过仿真结果展示了所提出的协作恢复算法在改善重建质量和减少由于锚点对齐而导致的通信开销方面的有效性。

更新时间: 2024-06-14 15:47:13

领域: cs.IT,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2406.10137v1

Diversifying Deep Ensembles: A Saliency Map Approach for Enhanced OOD Detection, Calibration, and Accuracy

Deep ensembles are capable of achieving state-of-the-art results in classification and out-of-distribution (OOD) detection. However, their effectiveness is limited due to the homogeneity of learned patterns within ensembles. To overcome this issue, our study introduces Saliency Diversified Deep Ensemble (SDDE), a novel approach that promotes diversity among ensemble members by leveraging saliency maps. Through incorporating saliency map diversification, our method outperforms conventional ensemble techniques and improves calibration in multiple classification and OOD detection tasks. In particular, the proposed method achieves state-of-the-art OOD detection quality, calibration, and accuracy on multiple benchmarks, including CIFAR10/100 and large-scale ImageNet datasets.

Updated: 2024-06-14 15:46:55

标题: 深度集合的多样化：一种显著性图方法用于增强OOD检测、校准和准确性

摘要: 深度集成能够在分类和超出分布（OOD）检测方面取得最先进的结果。然而，由于集成中学习模式的同质性，它们的有效性受到限制。为了克服这个问题，我们的研究引入了Saliency Diversified Deep Ensemble (SDDE)，这是一种促进集合成员之间多样性的新方法，利用显著性图来实现。通过整合显著性图多样化，我们的方法优于传统的集成技术，并在多个分类和OOD检测任务中改善了校准。特别是，所提出的方法在多个基准测试中实现了最先进的OOD检测质量、校准和准确性，包括CIFAR10/100和大规模ImageNet数据集。

更新时间: 2024-06-14 15:46:55

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2305.11616v4

Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function Approximation

Accurate approximation of complex nonlinear functions is a fundamental challenge across many scientific and engineering domains. Traditional neural network architectures, such as Multi-Layer Perceptrons (MLPs), often struggle to efficiently capture intricate patterns and irregularities present in high-dimensional functions. This paper presents the Chebyshev Kolmogorov-Arnold Network (Chebyshev KAN), a new neural network architecture inspired by the Kolmogorov-Arnold representation theorem, incorporating the powerful approximation capabilities of Chebyshev polynomials. By utilizing learnable functions parametrized by Chebyshev polynomials on the network's edges, Chebyshev KANs enhance flexibility, efficiency, and interpretability in function approximation tasks. We demonstrate the efficacy of Chebyshev KANs through experiments on digit classification, synthetic function approximation, and fractal function generation, highlighting their superiority over traditional MLPs in terms of parameter efficiency and interpretability. Our comprehensive evaluation, including ablation studies, confirms the potential of Chebyshev KANs to address longstanding challenges in nonlinear function approximation, paving the way for further advancements in various scientific and engineering applications.

Updated: 2024-06-14 15:46:11

标题: 基于切比雪夫多项式的科尔莫戈洛夫-阿诺德网络：一种用于非线性函数逼近的高效架构

摘要: 准确逼近复杂非线性函数是许多科学和工程领域的基本挑战。传统的神经网络架构，如多层感知器（MLPs），通常难以有效捕捉高维函数中存在的复杂模式和不规则性。本文介绍了Chebyshev Kolmogorov-Arnold Network（Chebyshev KAN），这是一种受 Kolmogorov-Arnold 表示定理启发的新型神经网络架构，结合了 Chebyshev 多项式的强大逼近能力。通过在网络的边缘上利用由 Chebyshev 多项式参数化的可学习函数，Chebyshev KAN 在函数逼近任务中提高了灵活性、效率和可解释性。通过对数字分类、合成函数逼近和分形函数生成实验的展示，突显了 Chebyshev KAN 相对于传统 MLP 在参数效率和可解释性方面的优越性。我们的综合评估，包括消融研究，证实了 Chebyshev KAN 解决非线性函数逼近长期挑战的潜力，为各种科学和工程应用的进一步发展铺平了道路。

更新时间: 2024-06-14 15:46:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.07200v3

Evaluation of Large Language Models: STEM education and Gender Stereotypes

Large Language Models (LLMs) have an increasing impact on our lives with use cases such as chatbots, study support, coding support, ideation, writing assistance, and more. Previous studies have revealed linguistic biases in pronouns used to describe professions or adjectives used to describe men vs women. These issues have to some degree been addressed in updated LLM versions, at least to pass existing tests. However, biases may still be present in the models, and repeated use of gender stereotypical language may reinforce the underlying assumptions and are therefore important to examine further. This paper investigates gender biases in LLMs in relation to educational choices through an open-ended, true to user-case experimental design and a quantitative analysis. We investigate the biases in the context of four different cultures, languages, and educational systems (English/US/UK, Danish/DK, Catalan/ES, and Hindi/IN) for ages ranging from 10 to 16 years, corresponding to important educational transition points in the different countries. We find that there are significant and large differences in the ratio of STEM to non-STEM suggested education paths provided by chatGPT when using typical girl vs boy names to prompt lists of suggested things to become. There are generally fewer STEM suggestions in the Danish, Spanish, and Indian context compared to the English. We also find subtle differences in the suggested professions, which we categorise and report.

Updated: 2024-06-14 15:42:42

标题: 大型语言模型的评估：STEM教育和性别刻板印象

摘要: 大型语言模型（LLMs）在我们的生活中越来越有影响力，例如聊天机器人、学习支持、编码支持、构思、写作帮助等使用案例。先前的研究已经揭示了用于描述职业的代词或用于描述男性和女性的形容词中存在的语言偏见。这些问题在更新的LLM版本中在一定程度上得到了解决，至少能够通过现有测试。然而，这些模型可能仍然存在偏见，而重复使用性别刻板语言可能会强化基本假设，因此需要进一步研究。本文通过一种开放式、真实用户案例的实验设计和定量分析，调查了LLMs中与教育选择相关的性别偏见。我们调查了在四种不同文化、语言和教育系统（英语/美国/英国、丹麦语/丹麦、加泰罗尼亚语/西班牙、印地语/印度）中，年龄从10到16岁不等的重要教育转折点。我们发现，当使用典型的女孩和男孩的名字来提示建议成为的事物清单时，chatGPT提供的STEM和非STEM建议教育途径的比率存在显著且较大的差异。与英语相比，丹麦、西班牙和印度的上下文中通常会给出更少的STEM建议。我们还发现了所提出的职业中存在微妙的差异，我们对其进行了分类和报告。

更新时间: 2024-06-14 15:42:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10133v1

Linear Contextual Bandits with Hybrid Payoff: Revisited

We study the Linear Contextual Bandit problem in the hybrid reward setting. In this setting every arm's reward model contains arm specific parameters in addition to parameters shared across the reward models of all the arms. We can reduce this setting to two closely related settings (a) Shared - no arm specific parameters, and (b) Disjoint - only arm specific parameters, enabling the application of two popular state of the art algorithms - $\texttt{LinUCB}$ and $\texttt{DisLinUCB}$ (Algorithm 1 in (Li et al. 2010)). When the arm features are stochastic and satisfy a popular diversity condition, we provide new regret analyses for both algorithms, significantly improving on the known regret guarantees of these algorithms. Our novel analysis critically exploits the hybrid reward structure and the diversity condition. Moreover, we introduce a new algorithm $\texttt{HyLinUCB}$ that crucially modifies $\texttt{LinUCB}$ (using a new exploration coefficient) to account for sparsity in the hybrid setting. Under the same diversity assumptions, we prove that $\texttt{HyLinUCB}$ also incurs only $O(\sqrt{T})$ regret for $T$ rounds. We perform extensive experiments on synthetic and real-world datasets demonstrating strong empirical performance of $\texttt{HyLinUCB}$.For number of arm specific parameters much larger than the number of shared parameters, we observe that $\texttt{DisLinUCB}$ incurs the lowest regret. In this case, regret of $\texttt{HyLinUCB}$ is the second best and extremely competitive to $\texttt{DisLinUCB}$. In all other situations, including our real-world dataset, $\texttt{HyLinUCB}$ has significantly lower regret than $\texttt{LinUCB}$, $\texttt{DisLinUCB}$ and other SOTA baselines we considered. We also empirically observe that the regret of $\texttt{HyLinUCB}$ grows much slower with the number of arms compared to baselines, making it suitable even for very large action spaces.

Updated: 2024-06-14 15:41:21

标题: 线性上下文臂带与混合收益：重新审视

摘要: 我们研究了混合奖励设置下的线性上下文臂问题。在这种设置中，每个臂的奖励模型都包含特定于臂的参数，除了所有臂的奖励模型共享的参数之外。我们可以将这种设置简化为两个密切相关的设置(a) 共享 - 没有特定于臂的参数，和(b) 不相交 - 只有特定于臂的参数，从而应用两种流行的最新算法 - $\texttt{LinUCB}$和$\texttt{DisLinUCB}$（(Li et al. 2010)中的算法1）。当臂特征是随机的并且满足流行的多样性条件时，我们为这两种算法提供了新的后悔分析，显著改进了这些算法的已知后悔保证。我们的新颖分析关键地利用了混合奖励结构和多样性条件。此外，我们引入了一种新算法$\texttt{HyLinUCB}$，关键修改了$\texttt{LinUCB}$（使用新的探索系数）以考虑混合设置中的稀疏性。在相同的多样性假设下，我们证明$\texttt{HyLinUCB}$对于$T$轮只产生$O(\sqrt{T})$的后悔。我们在合成和真实数据集上进行了大量实验，展示了$\texttt{HyLinUCB}$的强大实证性能。对于特定于臂的参数数量远大于共享参数数量的情况，我们观察到$\texttt{DisLinUCB}$产生最低的后悔。在这种情况下，$\texttt{HyLinUCB}$的后悔排名第二，并且与$\texttt{DisLinUCB}$极具竞争力。在所有其他情况下，包括我们的真实数据集，$\texttt{HyLinUCB}$的后悔比$\texttt{LinUCB}$、$\texttt{DisLinUCB}$和我们考虑的其他SOTA基线显著较低。我们还在实证中观察到，与基线相比，$\texttt{HyLinUCB}$的后悔随臂数量增长得更慢，使其甚至适用于非常大的动作空间。

更新时间: 2024-06-14 15:41:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.10131v1

TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring

Text-to-SQL enables users to interact with databases using natural language, simplifying the retrieval and synthesis of information. Despite the remarkable success of large language models (LLMs) in translating natural language questions into SQL queries, widespread deployment remains limited due to two primary challenges. First, the effective use of text-to-SQL models depends on users' understanding of the model's capabilities-the scope of questions the model can correctly answer. Second, the absence of abstention mechanisms can lead to incorrect SQL generation going unnoticed, thereby undermining trust in the model's output. To enable wider deployment, it is crucial to address these challenges in model design and enhance model evaluation to build trust in the model's output. To this end, we introduce TrustSQL, a novel comprehensive benchmark designed to evaluate text-to-SQL reliability-defined as a model's ability to correctly handle any type of input question by generating correct SQL queries for feasible questions and abstaining from generating infeasible ones (e.g., due to schema incompatibility or functionalities beyond SQL). We evaluate existing methods using a novel penalty-based scoring metric with two modeling approaches: (1) pipeline-based methods combining SQL generators with infeasible question detectors and SQL error detectors for abstention; and (2) unified methods using a single model for the entire task. Our experimental results reveal that achieving high scores under severe penalties requires significant effort and provide a new perspective on developing text-to-SQL models for safer deployment.

Updated: 2024-06-14 15:39:28

标题: TrustSQL：基于惩罚评分的文本到SQL可靠性基准测试

摘要: Text-to-SQL允许用户使用自然语言与数据库进行交互，简化了信息的检索和综合。尽管大型语言模型(LLMs)在将自然语言问题翻译成SQL查询方面取得了显著成功，但由于两个主要挑战，广泛部署仍然受到限制。首先，text-to-SQL模型的有效使用取决于用户对模型能力的理解-模型可以正确回答的问题范围。其次，缺乏弃权机制可能导致错误的SQL生成未被注意到，从而削弱对模型输出的信任。为了实现更广泛的部署，有必要在模型设计中解决这些挑战，并加强模型评估以建立对模型输出的信任。为此，我们引入了TrustSQL，一个新颖的全面基准，旨在评估text-to-SQL的可靠性-定义为模型正确处理任何类型输入问题的能力，通过为可行问题生成正确的SQL查询并放弃为不可行问题生成SQL查询（例如，由于模式不兼容或功能超出SQL）。我们使用一种新颖的基于惩罚的评分指标评估现有方法，采用两种建模方法：（1）基于流水线的方法，将SQL生成器与不可行问题检测器和SQL错误检测器结合起来进行弃权；（2）统一方法，使用单一模型完成整个任务。我们的实验结果表明，在严格惩罚下获得高分需要大量努力，并为开发更安全部署的text-to-SQL模型提供了新的视角。

更新时间: 2024-06-14 15:39:28

领域: cs.AI

下载: http://arxiv.org/abs/2403.15879v4

Exploration by Learning Diverse Skills through Successor State Measures

The ability to perform different skills can encourage agents to explore. In this work, we aim to construct a set of diverse skills which uniformly cover the state space. We propose a formalization of this search for diverse skills, building on a previous definition based on the mutual information between states and skills. We consider the distribution of states reached by a policy conditioned on each skill and leverage the successor state measure to maximize the difference between these skill distributions. We call this approach LEADS: Learning Diverse Skills through Successor States. We demonstrate our approach on a set of maze navigation and robotic control tasks which show that our method is capable of constructing a diverse set of skills which exhaustively cover the state space without relying on reward or exploration bonuses. Our findings demonstrate that this new formalization promotes more robust and efficient exploration by combining mutual information maximization and exploration bonuses.

Updated: 2024-06-14 15:36:15

标题: 通过继任状态度量学习多样化技能的探索

摘要: 本文旨在构建一组多样化的技能，以统一覆盖状态空间，从而鼓励代理探索不同的技能。我们提出了一个形式化的多样化技能搜索方法，建立在先前基于状态和技能之间互信息的定义基础上。我们考虑受到每种技能条件下的策略达到的状态分布，并利用后继状态测量来最大化这些技能分布之间的差异。我们将这种方法称为LEADS：通过后继状态学习多样化技能。我们在一组迷宫导航和机器人控制任务上展示了我们的方法，结果表明我们的方法能够构建一个多样化的技能集，全面覆盖状态空间，而无需依赖奖励或探索奖励。我们的研究结果表明，这种新的形式化方法通过结合互信息最大化和探索奖励促进了更加健壮和高效的探索。

更新时间: 2024-06-14 15:36:15

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.10127v1

GraphFM: A Comprehensive Benchmark for Graph Foundation Model

Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogenization. The extent of generalization capability on downstream tasks remains unclear. 2) Scalability. It is unknown how effectively these models can scale to large datasets. 3) Efficiency. The training time and memory usage of these models require evaluation. 4) Training Stop Criteria. Determining the optimal stopping strategy for pre-training across multiple tasks to maximize performance on downstream tasks. To address these questions, we have constructed a rigorous benchmark that thoroughly analyzes and studies the generalization and scalability of self-supervised Graph Neural Network (GNN) models. Regarding generalization, we have implemented and compared the performance of various self-supervised GNN models, trained to generate node representations, across tasks such as node classification, link prediction, and node clustering. For scalability, we have compared the performance of various models after training using full-batch and mini-batch strategies. Additionally, we have assessed the training efficiency of these models by conducting experiments to test their GPU memory usage and throughput. Through these experiments, we aim to provide insights to motivate future research. The code for this benchmark is publicly available at https://github.com/NYUSHCS/GraphFM.

Updated: 2024-06-14 15:36:00

标题: GraphFM：图基础模型的全面基准

摘要: 基础模型（FMs）作为人工智能系统开发的一般类别，具有广泛的潜力，可在一系列下游任务中实现泛化。尽管对自监督学习作为FMs基础的广泛研究，但图基础模型仍存在一些突出问题，这些模型依赖于图自监督学习，即：1）同质化。在下游任务上的泛化能力尚不清楚。2）可扩展性。目前尚不清楚这些模型在大型数据集上的可扩展性。3）效率。这些模型的训练时间和内存使用需要评估。4）训练停止标准。确定跨多个任务进行预训练的最佳停止策略，以最大化在下游任务上的性能。为了解决这些问题，我们构建了一个严格的基准，彻底分析和研究了自监督图神经网络（GNN）模型的泛化和可扩展性。关于泛化，我们实现并比较了各种自监督GNN模型的性能，这些模型经过训练生成节点表示，跨任务如节点分类、链接预测和节点聚类。对于可扩展性，我们比较了在使用全批和小批策略训练后各种模型的性能。此外，我们通过实验评估了这些模型的训练效率，测试它们的GPU内存使用和吞吐量。通过这些实验，我们旨在提供洞察力，以激励未来的研究。这个基准的代码可以在https://github.com/NYUSHCS/GraphFM 中公开获取。

更新时间: 2024-06-14 15:36:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.08310v2

Data Ethics in the Era of Healthcare Artificial Intelligence in Africa: An Ubuntu Philosophy Perspective

Data are essential in developing healthcare artificial intelligence (AI) systems. However, patient data collection, access, and use raise ethical concerns, including informed consent, data bias, data protection and privacy, data ownership, and benefit sharing. Various ethical frameworks have been proposed to ensure the ethical use of healthcare data and AI, however, these frameworks often align with Western cultural values, social norms, and institutional contexts emphasizing individual autonomy and well-being. Ethical guidelines must reflect political and cultural settings to account for cultural diversity, inclusivity, and historical factors such as colonialism. Thus, this paper discusses healthcare data ethics in the AI era in Africa from the Ubuntu philosophy perspective. It focuses on the contrast between individualistic and communitarian approaches to data ethics. The proposed framework could inform stakeholders, including AI developers, healthcare providers, the public, and policy-makers about healthcare data ethical usage in AI in Africa.

Updated: 2024-06-14 15:28:36

标题: 《在非洲卫生保健人工智能时代的数据伦理：乌布图哲学视角》

摘要: 数据对于开发医疗人工智能系统至关重要。然而，患者数据的收集、访问和使用引发了伦理关切，包括知情同意、数据偏见、数据保护和隐私、数据所有权以及利益分享等问题。已经提出了各种伦理框架，以确保医疗数据和人工智能的伦理使用，然而，这些框架往往与西方文化价值观、社会规范和制度背景相一致，强调个体自主权和福祉。伦理指南必须反映政治和文化环境，以考虑文化多样性、包容性和诸如殖民主义等历史因素。因此，本文从Ubuntu哲学的角度探讨了非洲在人工智能时代的医疗数据伦理。它着重于个人主义和社群主义方法在数据伦理上的对比。提出的框架可以为利益相关者，包括人工智能开发者、医疗提供者、公众和决策者，提供有关非洲医疗数据在人工智能中伦理使用的信息。

更新时间: 2024-06-14 15:28:36

领域: cs.CY,cs.AI,I.2.6

下载: http://arxiv.org/abs/2406.10121v1

Trustworthy Artificial Intelligence in the Context of Metrology

We review research at the National Physical Laboratory (NPL) in the area of trustworthy artificial intelligence (TAI), and more specifically trustworthy machine learning (TML), in the context of metrology, the science of measurement. We describe three broad themes of TAI: technical, socio-technical and social, which play key roles in ensuring that the developed models are trustworthy and can be relied upon to make responsible decisions. From a metrology perspective we emphasise uncertainty quantification (UQ), and its importance within the framework of TAI to enhance transparency and trust in the outputs of AI systems. We then discuss three research areas within TAI that we are working on at NPL, and examine the certification of AI systems in terms of adherence to the characteristics of TAI.

Updated: 2024-06-14 15:23:27

标题: 在计量学背景下的可信人工智能

摘要: 我们回顾了在国家物理实验室（NPL）进行的可靠人工智能（TAI）领域的研究，更具体地说是在计量学，即测量科学领域中的可靠机器学习（TML）。我们描述了TAI的三个广泛主题：技术、社会技术和社会，这些主题在确保开发的模型可信且可以依赖于做出负责任决策方面起着关键作用。从计量学的角度来看，我们强调不确定性量化（UQ）及其在TAI框架内的重要性，以增强AI系统输出的透明度和信任。然后，我们讨论了在NPL正在进行的TAI研究领域中的三个研究领域，并从符合TAI特性的角度审查AI系统的认证。

更新时间: 2024-06-14 15:23:27

领域: cs.LG

下载: http://arxiv.org/abs/2406.10117v1

On Softmax Direct Preference Optimization for Recommendation

Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-tuning LM with a language modeling loss. However, the current objective fails to fully leverage preference data and is not optimized for personalized ranking tasks, which hinders the performance of LM-based recommenders. Inspired by the current advancement of Direct Preference Optimization (DPO) in human preference alignment and the success of softmax loss in recommendations, we propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives, rather than solely focusing on positives. Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders, connected to softmax sampling strategies. Theoretically, we bridge S-DPO with the softmax loss over negative sampling and find that it has a side effect of mining hard negatives, which assures its exceptional capabilities in recommendation tasks. Empirically, extensive experiments conducted on three real-world datasets demonstrate the superiority of S-DPO to effectively model user preference and further boost recommendation performance while mitigating the data likelihood decline issue of DPO. Our codes are available at https://github.com/chenyuxin1999/S-DPO.

Updated: 2024-06-14 15:22:58

标题: 关于用Softmax直接偏好优化进行推荐的研究

摘要: 推荐系统旨在基于用户偏好数据预测个性化排名。随着语言模型（LM）的崛起，基于LM的推荐系统得到了广泛探索，因为它们具有广泛的世界知识和强大的推理能力。大多数基于LM的推荐系统将历史互动转换为语言提示，配对一个积极项目作为目标响应，并通过语言建模损失对LM进行微调。然而，当前的目标未能充分利用偏好数据，也没有针对个性化排名任务进行优化，这限制了基于LM的推荐系统的性能。受直接偏好优化（DPO）在人类偏好对齐方面的当前进展和softmax损失在推荐中的成功启发，我们提出了Softmax-DPO（S-DPO），将排名信息融入LM以帮助LM-based推荐系统区分偏好项目和负面项目，而不仅仅专注于积极项目。具体来说，我们在用户偏好数据中结合多个负面项目，并设计了一种针对基于LM的推荐系统的替代版本的DPO损失，与softmax抽样策略相连接。从理论上讲，我们将S-DPO与负采样上的softmax损失相结合，发现它具有挖掘困难负面项目的副作用，确保其在推荐任务中具有卓越的能力。从经验上讲，对三个真实数据集进行的大量实验表明，S-DPO相对于有效地建模用户偏好并进一步提升推荐性能，同时缓解了DPO数据可能性下降问题。我们的代码可在https://github.com/chenyuxin1999/S-DPO 上获得。

更新时间: 2024-06-14 15:22:58

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.09215v2

Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection

State-of-the-art 3D object detectors are often trained on massive labeled datasets. However, annotating 3D bounding boxes remains prohibitively expensive and time-consuming, particularly for LiDAR. Instead, recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels. Contemporary methods adapt best-practices for self-supervised learning from the image domain to point clouds (such as contrastive learning). However, publicly available 3D datasets are considerably smaller and less diverse than those used for image-based self-supervised learning, limiting their effectiveness. We do note, however, that such data is naturally collected in a multimodal fashion, often paired with images. Rather than pre-training with only self-supervised objectives, we argue that it is better to bootstrap point cloud representations using image-based foundation models trained on internet-scale image data. Specifically, we propose a shelf-supervised approach (e.g. supervised with off-the-shelf image foundation models) for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data. Pre-training 3D detectors with such pseudo-labels yields significantly better semi-supervised detection accuracy than prior self-supervised pretext tasks. Importantly, we show that image-based shelf-supervision is helpful for training LiDAR-only and multi-modal (RGB + LiDAR) detectors. We demonstrate the effectiveness of our approach on nuScenes and WOD, significantly improving over prior work in limited data settings.

Updated: 2024-06-14 15:21:57

标题: 架式监督多模态预训练用于3D物体检测

摘要: 目前最先进的3D物体检测器通常是在大规模标记数据集上训练的。然而，为3D边界框标注数据仍然过于昂贵和耗时，特别是对于激光雷达。相反，最近的研究表明，利用未标记数据进行自监督预训练可以提高检测准确性，而标签数量有限。当代方法将图像领域中的自监督学习最佳实践（如对比学习）应用到点云中。然而，公开可用的3D数据集比用于基于图像的自监督学习的数据集要小得多，且多样性较低，限制了它们的有效性。然而，我们注意到，这些数据通常以多模态方式自然收集，往往与图像配对。我们认为，与其仅使用自监督目标进行预训练，不如使用在互联网规模图像数据上训练的基于图像的基础模型来引导点云表示。具体来说，我们提出了一种shelf-supervised方法（例如使用现成的图像基础模型进行监督），用于从配对的RGB和LiDAR数据中生成零样本3D边界框。使用这种伪标签预训练3D检测器比以前的自监督假设任务显著提高了半监督检测准确性。重要的是，我们展示了基于图像的shelf-监督对训练仅LiDAR和多模态（RGB + LiDAR）检测器具有帮助。我们在nuScenes和WOD上展示了我们方法的有效性，在有限的数据设置中明显改进了先前的工作。

更新时间: 2024-06-14 15:21:57

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.10115v1

Neural Operators for PDE Backstepping Control of First-Order Hyperbolic PIDE with Recycle and Delay

The recently introduced DeepONet operator-learning framework for PDE control is extended from the results for basic hyperbolic and parabolic PDEs to an advanced hyperbolic class that involves delays on both the state and the system output or input. The PDE backstepping design produces gain functions that are outputs of a nonlinear operator, mapping functions on a spatial domain into functions on a spatial domain, and where this gain-generating operator's inputs are the PDE's coefficients. The operator is approximated with a DeepONet neural network to a degree of accuracy that is provably arbitrarily tight. Once we produce this approximation-theoretic result in infinite dimension, with it we establish stability in closed loop under feedback that employs approximate gains. In addition to supplying such results under full-state feedback, we also develop DeepONet-approximated observers and output-feedback laws and prove their own stabilizing properties under neural operator approximations. With numerical simulations we illustrate the theoretical results and quantify the numerical effort savings, which are of two orders of magnitude, thanks to replacing the numerical PDE solving with the DeepONet.

Updated: 2024-06-14 15:17:20

标题: 神经算子用于具有回收和延迟的一阶双曲型PIDE的PDE反步控制

摘要: 最近引入的DeepONet运算符学习框架用于PDE控制，从基本双曲和抛物线PDE的结果扩展到一个包括状态和系统输出或输入延迟的高级双曲类。PDE反步设计产生增益函数，这些增益函数是非线性运算符的输出，将空间域上的函数映射到空间域上的函数，其中增益生成运算符的输入是PDE的系数。该运算符被DeepONet神经网络近似到可证明任意紧的程度。一旦我们在无限维度中得到这种逼近理论结果，我们便在使用近似增益的反馈闭环下建立了稳定性。除了在全状态反馈下提供这些结果外，我们还开发了DeepONet近似观测器和输出反馈定律，并证明了它们在神经运算符逼近下的稳定性。通过数值模拟，我们展示了理论结果并量化了数值工作量节约，这要归功于用DeepONet替代数值PDE求解，其节约量为两个数量级。

更新时间: 2024-06-14 15:17:20

领域: math.OC,cs.LG,cs.SY,eess.SY,math.AP

下载: http://arxiv.org/abs/2307.11436v2

SoK: Analysis of Software Supply Chain Security by Establishing Secure Design Properties

This paper systematizes knowledge about secure software supply chain patterns. It identifies four stages of a software supply chain attack and proposes three security properties crucial for a secured supply chain: transparency, validity, and separation. The paper describes current security approaches and maps them to the proposed security properties, including research ideas and case studies of supply chains in practice. It discusses the strengths and weaknesses of current approaches relative to known attacks and details the various security frameworks put out to ensure the security of the software supply chain. Finally, the paper highlights potential gaps in actor and operation-centered supply chain security techniques

Updated: 2024-06-14 15:16:09

标题: SoK：通过建立安全设计属性分析软件供应链安全

摘要: 本文系统化地总结了关于安全软件供应链模式的知识。它识别出软件供应链攻击的四个阶段，并提出了三个对于安全供应链至关重要的安全属性：透明度、有效性和分离性。本文描述了当前的安全方法，并将它们映射到提出的安全属性，包括研究思路和实践中供应链的案例研究。它讨论了当前方法相对已知攻击的优势和劣势，并详细说明了各种安全框架的发布以确保软件供应链的安全。最后，本文强调了在以行为者和操作为中心的供应链安全技术中存在的潜在差距。

更新时间: 2024-06-14 15:16:09

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.10109v1

Precipitation Nowcasting Using Physics Informed Discriminator Generative Models

Nowcasting leverages real-time atmospheric conditions to forecast weather over short periods. State-of-the-art models, including PySTEPS, encounter difficulties in accurately forecasting extreme weather events because of their unpredictable distribution patterns. In this study, we design a physics-informed neural network to perform precipitation nowcasting using the precipitation and meteorological data from the Royal Netherlands Meteorological Institute (KNMI). This model draws inspiration from the novel Physics-Informed Discriminator GAN (PID-GAN) formulation, directly integrating physics-based supervision within the adversarial learning framework. The proposed model adopts a GAN structure, featuring a Vector Quantization Generative Adversarial Network (VQ-GAN) and a Transformer as the generator, with a temporal discriminator serving as the discriminator. Our findings demonstrate that the PID-GAN model outperforms numerical and SOTA deep generative models in terms of precipitation nowcasting downstream metrics.

Updated: 2024-06-14 15:12:53

标题: 使用物理知识的鉴别器生成模型进行降水现在预测

摘要: 现在预报利用实时大气条件来预测短期天气。包括PySTEPS在内的最先进模型在准确预测极端天气事件方面遇到困难，因为它们的分布模式不可预测。在这项研究中，我们设计了一个物理信息神经网络，利用荷兰皇家气象研究所（KNMI）的降水和气象数据进行降水现在预测。该模型受到新颖的基于物理信息判别器GAN（PID-GAN）公式的启发，直接在对抗学习框架中集成基于物理的监督。所提出的模型采用了一个包含向量量化生成对抗网络（VQ-GAN）和变压器的生成器的GAN结构，具有一个时间鉴别器作为鉴别器。我们的研究结果表明，PID-GAN模型在降水现在预测下游指标方面优于数值和最先进的深度生成模型。

更新时间: 2024-06-14 15:12:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.10108v1

SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

Remote Sensing Large Multi-Modal Models (RSLMMs) are developing rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-scale instruction tuning dataset FIT-RS, containing 1,800,851 instruction samples. FIT-RS covers common interpretation tasks and innovatively introduces several complex comprehension tasks of escalating difficulty, ranging from relation reasoning to image-level scene graph generation. Based on FIT-RS, we build the FIT-RSFG benchmark. Furthermore, we establish a new benchmark to evaluate the fine-grained relation comprehension capabilities of LMMs, named FIT-RSRC. Based on combined instruction data, we propose SkySenseGPT, which achieves outstanding performance on both public datasets and FIT-RSFG, surpassing existing RSLMMs. We hope the FIT-RS dataset can enhance the relation comprehension capability of RSLMMs and provide a large-scale fine-grained data source for the remote sensing community. The dataset will be available at https://github.com/Luo-Z13/SkySenseGPT

Updated: 2024-06-14 14:57:07

标题: SkySenseGPT: 一种用于遥感视觉语言理解的细粒度指令调优数据集和模型

摘要: 遥感大型多模态模型（RSLMMs）正在迅速发展，并展示了在遥感图像（RSI）理解方面的重要能力。然而，由于现有数据集的限制，RSLMMs在理解复杂遥感场景中对象之间丰富的语义关系方面存在缺陷。为了释放RSLMMs的复杂理解能力，我们提出了一个大规模指令调整数据集FIT-RS，包含1,800,851个指令样本。FIT-RS涵盖了常见的解释任务，并创新地引入了几个逐渐增加难度的复杂理解任务，从关系推理到图像级场景图生成。基于FIT-RS，我们建立了FIT-RSFG基准。此外，我们建立了一个新的基准来评估LMMs的细粒度关系理解能力，名为FIT-RSRC。基于综合指令数据，我们提出了SkySenseGPT，它在公共数据集和FIT-RSFG上取得了优秀的表现，超过了现有的RSLMMs。我们希望FIT-RS数据集可以增强RSLMMs的关系理解能力，并为遥感社区提供一个大规模细粒度数据源。该数据集将在https://github.com/Luo-Z13/SkySenseGPT上提供。

更新时间: 2024-06-14 14:57:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10100v1

ECGMamba: Towards Efficient ECG Classification with BiSSM

Electrocardiogram (ECG) signal analysis represents a pivotal technique in the diagnosis of cardiovascular diseases. Although transformer-based models have made significant progress in ECG classification, they exhibit inefficiencies in the inference phase. The issue is primarily attributable to the secondary computational complexity of Transformer's self-attention mechanism. particularly when processing lengthy sequences. To address this issue, we propose a novel model, ECGMamba, which employs a bidirectional state-space model (BiSSM) to enhance classification efficiency. ECGMamba is based on the innovative Mamba-based block, which incorporates a range of time series modeling techniques to enhance performance while maintaining the efficiency of inference. The experimental results on two publicly available ECG datasets demonstrate that ECGMamba effectively balances the effectiveness and efficiency of classification, achieving competitive performance. This study not only contributes to the body of knowledge in the field of ECG classification but also provides a new research path for efficient and accurate ECG signal analysis. This is of guiding significance for the development of diagnostic models for cardiovascular diseases.

Updated: 2024-06-14 14:55:53

标题: ECGMamba：朝着具有BiSSM的高效ECG分类的方向

摘要: 心电图（ECG）信号分析在心血管疾病诊断中起着至关重要的作用。尽管基于Transformer的模型在ECG分类方面取得了显著进展，但它们在推断阶段表现出效率低下的问题。这个问题主要归因于Transformer的自注意机制的次要计算复杂性，特别是在处理长序列时。为了解决这个问题，我们提出了一种新颖的模型ECGMamba，它采用双向状态空间模型（BiSSM）来增强分类效率。ECGMamba基于创新的基于Mamba的模块，结合了一系列时间序列建模技术，以提高性能同时保持推断效率。对两个公开可用的ECG数据集的实验结果表明，ECGMamba有效地平衡了分类的效果和效率，实现了竞争性的性能。这项研究不仅为ECG分类领域的知识体系做出了贡献，还为高效准确的ECG信号分析提供了新的研究路径。这对于心血管疾病诊断模型的发展具有指导意义。

更新时间: 2024-06-14 14:55:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.10098v1

To what extent can ASV systems naturally defend against spoofing attacks?

The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically exploring diverse ASV systems and spoofing attacks, ranging from traditional to cutting-edge techniques. Through extensive analyses conducted on eight distinct ASV systems and 29 spoofing attack systems, we demonstrate that the evolution of ASV inherently incorporates defense mechanisms against spoofing attacks. Nevertheless, our findings also underscore that the advancement of spoofing attacks far outpaces that of ASV systems, hence necessitating further research on spoofing-robust ASV methodologies.

Updated: 2024-06-14 14:51:16

标题: ASV系统能够自然地抵御欺骗攻击到什么程度？

摘要: 目前的自动说话人验证（ASV）任务涉及对两种类型的试验进行二元决策：目标和非目标。然而，语音生成技术的新进展对ASV系统的可靠性构成重大威胁。本研究调查了ASV系统是否能轻松地对抗欺骗攻击（即零照片能力），通过系统地探索各种ASV系统和欺骗攻击，从传统到尖端技术。通过对八种不同ASV系统和29种欺骗攻击系统进行了广泛的分析，我们证明了ASV的演变本质上包含了对抗欺骗攻击的防御机制。然而，我们的发现也强调了欺骗攻击的进步远远超过了ASV系统的进步，因此需要进一步研究欺骗抵抗ASV方法。

更新时间: 2024-06-14 14:51:16

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2406.05339v2

Generative AI to Generate Test Data Generators

Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design three types of prompts for Large Language Models (LLMs), which perform test data generation tasks at different levels of integrability: 1) raw test data generation, 2) synthesizing programs in a specific language that generate useful test data, and 3) producing programs that use state-of-the-art faker libraries. We evaluate our approach by prompting LLMs to generate test data for 11 domains. The results show that LLMs can successfully generate realistic test data generators in a wide range of domains at all three levels of integrability.

Updated: 2024-06-14 14:49:12

标题: 生成式人工智能用于生成测试数据生成器

摘要: 生成虚假数据是现代软件测试的一个重要维度，这一点由众多数据伪造库的数量和重要性所证明。然而，数据伪造库的开发者无法跟上不同自然语言和领域所需生成的广泛数据范围。在本文中，我们评估了生成人工智能在不同领域生成测试数据的能力。我们设计了三种类型的提示符供大型语言模型（LLMs）使用，这些提示符在不同集成级别上执行测试数据生成任务：1）原始测试数据生成，2）在特定语言中合成生成有用的测试数据的程序，以及3）生成使用最先进的伪造库的程序。我们通过提示LLMs在11个领域生成测试数据来评估我们的方法。结果表明，LLMs可以成功地在所有三个集成级别的广泛领域生成逼真的测试数据生成器。

更新时间: 2024-06-14 14:49:12

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.17626v2

BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation

Bimanual manipulation tasks typically involve multiple stages which require efficient interactions between two arms, posing step-wise and stage-wise challenges for imitation learning systems. Specifically, failure and delay of one step will broadcast through time, hinder success and efficiency of each sub-stage task, and thereby overall task performance. Although recent works have made strides in addressing certain challenges, few approaches explicitly consider the multi-stage nature of bimanual tasks while simultaneously emphasizing the importance of inference speed. In this paper, we introduce a novel keypose-conditioned consistency policy tailored for bimanual manipulation. It is a hierarchical imitation learning framework that consists of a high-level keypose predictor and a low-level trajectory generator. The predicted keyposes provide guidance for trajectory generation and also mark the completion of one sub-stage task. The trajectory generator is designed as a consistency model trained from scratch without distillation, which generates action sequences conditioning on current observations and predicted keyposes with fast inference speed. Simulated and real-world experimental results demonstrate that the proposed approach surpasses baseline methods in terms of success rate and operational efficiency.

Updated: 2024-06-14 14:49:12

标题: BiKC：双手机器人操作的关键姿势条件一致性策略

摘要: Bimanual manipulation tasks often involve multiple stages that require efficient coordination between both arms, presenting challenges for imitation learning systems. Failures or delays in one stage can impact subsequent stages, affecting overall task performance. While previous work has addressed some challenges, few approaches have considered the multi-stage nature of bimanual tasks while also prioritizing inference speed. This paper introduces a novel keypose-conditioned consistency policy for bimanual manipulation, consisting of a high-level keypose predictor and a low-level trajectory generator. The predicted keyposes guide trajectory generation and signal the completion of each sub-stage task. The trajectory generator is a consistency model trained from scratch, generating action sequences quickly based on current observations and predicted keyposes. Simulated and real-world experiments show that this approach outperforms baseline methods in terms of success rate and operational efficiency.

更新时间: 2024-06-14 14:49:12

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2406.10093v1

Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example -- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in support of and against the robustness of over-parameterized networks. These contradictory findings might be due to the failure of the attack employed to evaluate the networks' robustness. Previous research has demonstrated that depending on the considered model, the algorithm employed to generate adversarial examples may not function properly, leading to overestimating the model's robustness. In this work, we empirically study the robustness of over-parameterized networks against adversarial examples. However, unlike the previous works, we also evaluate the considered attack's reliability to support the results' veracity. Our results show that over-parameterized networks are robust against adversarial attacks as opposed to their under-parameterized counterparts.

Updated: 2024-06-14 14:47:06

标题: 神经网络中的过参数化与对抗性鲁棒性：概述与实证分析

摘要: 由于其广泛的容量，超参数化神经网络表现出卓越的预测能力和泛化能力。然而，拥有大量参数空间被认为是神经网络容易受到对抗样本攻击的主要原因之一 - 即专门设计的输入样本，以诱导所需的错误分类。相关文献已经声称对超参数化网络的鲁棒性支持和反对的观点存在矛盾。这些矛盾的发现可能是由于用于评估网络鲁棒性的攻击失败。先前的研究表明，根据考虑的模型，用于生成对抗样本的算法可能无法正常工作，导致高估模型的鲁棒性。在这项工作中，我们以实证方法研究了超参数化网络对抗对抗样本的鲁棒性。然而，与之前的研究不同，我们还评估了考虑的攻击的可靠性，以支持结果的真实性。我们的结果显示，与其欠参数化对应物相比，超参数化网络对抗对抗性攻击具有鲁棒性。

更新时间: 2024-06-14 14:47:06

领域: cs.LG,68T10,I.5

下载: http://arxiv.org/abs/2406.10090v1

Enhancing Security in Millimeter Wave SWIPT Networks

Millimeter wave (mmWave) communication encounters a major issue of extremely high power consumption. To address this problem, the simultaneous wireless information and power transfer (SWIPT) could be a promising technology. The mmWave frequencies are more appropriate for the SWIPT comparing to current low-frequency wireless transmissions, since mmWave base stations (BSs) can pack with large antenna arrays to achieve significant array gains and high-speed short-distance transmissions. Unfortunately, the implementation of SWIPT in the wireless communication may lead to an expanded defencelessness against the eavesdropping due to high transmission power and data spillage. It is conventionally believed that narrow beam offers inherent information-theoretic security against the eavesdropping, because only the eavesdroppers, which rely on the line-of-sight path between the legitimate transmitter and receiver, can receive strong enough signals. However, some mmWave experiments have shown that even by using highly directional mmWaves, the reflection signals caused by objects in the environment can be beneficial to the eavesdroppers. This paper studies the security performance in general mmWave SWIPT networks, and investigates the probability of successful eavesdropping under different attack models. Analytical expressions of eavesdropping success probability (ESP) of both independent and colluding eavesdroppers are derived by incorporating the random reflection paths in the environment. Theoretical analysis and simulation results reveal the effects of some key parameters on the ESP, such as the time switching strategy in SWIPT, densities of mmWave BSs, and carriers frequencies, etc. Based on the numerical and simulation results, some design suggestions of mmWave SWIPT are provided to defend against eavesdropping attacks and achieve secure communication in practice.

Updated: 2024-06-14 14:45:16

标题: 在毫米波SWIPT网络中加强安全性

摘要: 毫米波（mmWave）通信面临着极高的功耗问题。为了解决这个问题，同时进行无线信息和能量传输（SWIPT）可能是一种有前途的技术。与当前低频无线传输相比，mmWave频率更适合用于SWIPT，因为mmWave基站（BS）可以搭载大型天线阵列，实现显著的阵列增益和高速短距离传输。不幸的是，在无线通信中实施SWIPT可能会导致对窃听的防御性增强，这是由于高传输功率和数据泄漏。传统上认为窄波束对抗窃听具有固有的信息理论安全性，因为只有依赖于合法发射机和接收机之间的视距路径的窃听者才能接收到足够强的信号。然而，一些mmWave实验表明，即使使用高度定向的mmWave，环境中物体引起的反射信号对窃听者也可能有利。本文研究了一般mmWave SWIPT网络中的安全性能，并研究了在不同攻击模型下成功窃听的概率。通过将环境中的随机反射路径纳入考虑，导出了独立和勾结窃听者的窃听成功概率（ESP）的分析表达式。理论分析和仿真结果揭示了一些关键参数对ESP的影响，如SWIPT中的时间切换策略、mmWave BS的密度和载波频率等。基于数值和仿真结果，提供了一些针对窃听攻击的mmWave SWIPT设计建议，以实现安全通信。

更新时间: 2024-06-14 14:45:16

领域: cs.CR

下载: http://arxiv.org/abs/2406.10089v1

Biomarker based Cancer Classification using an Ensemble with Pre-trained Models

Certain cancer types, namely pancreatic cancer is difficult to detect at an early stage; sparking the importance of discovering the causal relationship between biomarkers and cancer to identify cancer efficiently. By allowing for the detection and monitoring of specific biomarkers through a non-invasive method, liquid biopsies enhance the precision and efficacy of medical interventions, advocating the move towards personalized healthcare. Several machine learning algorithms such as Random Forest, SVM are utilized for classification, yet causing inefficiency due to the need for conducting hyperparameter tuning. We leverage a meta-trained Hyperfast model for classifying cancer, accomplishing the highest AUC of 0.9929 and simultaneously achieving robustness especially on highly imbalanced datasets compared to other ML algorithms in several binary classification tasks (e.g. breast invasive carcinoma; BRCA vs. non-BRCA). We also propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks, achieving an incremental increase in accuracy (0.9464) while merely using 500 PCA features; distinguishable from previous studies where they used more than 2,000 features for similar results.

Updated: 2024-06-14 14:43:59

标题: 基于生物标志物的癌症分类：使用预先训练模型的集成方法

摘要: 某些癌症类型，特别是胰腺癌，很难在早期阶段检测到；这突显了发现生物标志物与癌症之间因果关系的重要性，以便高效地识别癌症。通过允许通过非侵入性方法检测和监测特定生物标志物，液体活检提升了医疗干预的精确性和效力，倡导向个性化医疗迈进。虽然使用Random Forest、SVM等几种机器学习算法进行分类，但由于需要进行超参数调整，导致效率低下。我们利用经元训练的Hyperfast模型对癌症进行分类，取得了0.9929的最高AUC，并与其他ML算法相比，在几个二元分类任务（例如乳腺浸润性癌；BRCA vs.非BRCA）中特别在高度不平衡的数据集上实现了鲁棒性。我们还提出了一种新颖的集成模型，结合了预训练的Hyperfast模型、XGBoost和LightGBM，用于多类别分类任务，仅使用500个PCA特征就实现了准确性的递增增加（0.9464），与之前研究使用超过2,000个特征获得类似结果的情况不同。

更新时间: 2024-06-14 14:43:59

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.10087v1

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on. In this paper, we expand upon the capabilities of them by training a single model on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora. This includes training on several semantic and geometric modalities, feature maps from recent state of the art models like DINOv2 and ImageBind, pseudo labels of specialist models like SAM and 4DHumans, and a range of new modalities that allow for novel ways to interact with the model and steer the generation, for example image metadata or color palettes. A crucial step in this process is performing discrete tokenization on various modalities, whether they are image-like, neural network feature maps, vectors, structured data like instance segmentation or human poses, or data that can be represented as text. Through this, we expand on the out-of-the-box capabilities of multimodal models and specifically show the possibility of training one model to solve at least 3x more tasks/modalities than existing ones and doing so without a loss in performance. This enables more fine-grained and controllable multimodal generation capabilities and allows us to study the distillation of models trained on diverse data and objectives into a unified model. We successfully scale the training to a three billion parameter model using tens of modalities and different datasets. The resulting models and training code are open sourced at 4m.epfl.ch.

Updated: 2024-06-14 14:43:26

标题: 4M-21：一种适用于数十种任务和模态的任意到任意视觉模型

摘要: 当前的多模态和多任务基础模型，如4M或UnifiedIO，显示出令人期待的结果，但实际上，它们的开箱即用能力受到训练的模态和任务数量（通常相对较少）的限制。在本文中，我们通过在数十种高度多样化的模态上训练单一模型，并在大规模多模态数据集和文本语料库上进行联合训练，扩展了它们的功能。这包括在几种语义和几何模态上训练，来自最新技术模型（如DINOv2和ImageBind）的特征图，像SAM和4DHumans这样的专家模型的伪标签，以及一系列新的模态，可以使用新颖的方式与模型互动并引导生成，例如图像元数据或调色板。这个过程中的一个关键步骤是对各种模态进行离散的标记化，无论它们是类似图像的、神经网络特征图、向量、像实例分割或人体姿势这样的结构化数据，或者可以表示为文本的数据。通过这一过程，我们扩展了多模态模型的开箱即用功能，并明确展示了训练一个模型以解决至少3倍更多的任务/模态的可能性，而不会影响性能。这使得更精细和可控的多模态生成能力成为可能，并让我们研究将以不同数据和目标训练的模型精炼成一个统一模型。我们成功地将训练扩展到一个三十亿参数的模型，使用数十种模态和不同的数据集。由此产生的模型和训练代码在4m.epfl.ch上开源。

更新时间: 2024-06-14 14:43:26

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.09406v2

In-Context Reinforcement Learning for Variable Action Spaces

Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly for some applications. In our work, we show that it is possible to mitigate this issue by proposing the Headless-AD model that, despite being trained only once, is capable of generalizing to discrete action spaces of variable size, semantic content and order. By experimenting with Bernoulli and contextual bandits, as well as a gridworld environment, we show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered, even outperforming specialized models trained for a specific set of actions on several environment configurations.

Updated: 2024-06-14 14:42:43

标题: 上下文强化学习在可变动作空间中的应用

摘要: 最近，研究表明，在多个情境中预训练的变压器可以推广到新的强化学习任务中。先前提出的模型的一个关键局限是它们依赖于预定义的行动空间大小和结构。引入新的行动空间通常需要重新收集数据和重新训练模型，这对某些应用来说可能代价高昂。在我们的工作中，我们展示了通过提出Headless-AD模型可以缓解这个问题，尽管只训练一次，它可以推广到大小、语义内容和顺序不同的离散行动空间。通过对伯努利、情境赌徒以及网格世界环境进行实验，我们展示了Headless-AD具有显著的能力，能够推广到从未遇到过的行动空间，甚至在多种环境配置中超越专门针对一组特定行动训练的模型。

更新时间: 2024-06-14 14:42:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.13327v4

Discovering influential text using convolutional neural networks

Experimental methods for estimating the impacts of text on human evaluation have been widely used in the social sciences. However, researchers in experimental settings are usually limited to testing a small number of pre-specified text treatments. While efforts to mine unstructured texts for features that causally affect outcomes have been ongoing in recent years, these models have primarily focused on the topics or specific words of text, which may not always be the mechanism of the effect. We connect these efforts with NLP interpretability techniques and present a method for flexibly discovering clusters of similar text phrases that are predictive of human reactions to texts using convolutional neural networks. When used in an experimental setting, this method can identify text treatments and their effects under certain assumptions. We apply the method to two datasets. The first enables direct validation of the model's ability to detect phrases known to cause the outcome. The second demonstrates its ability to flexibly discover text treatments with varying textual structures. In both cases, the model learns a greater variety of text treatments compared to benchmark methods, and these text features quantitatively meet or exceed the ability of benchmark methods to predict the outcome.

Updated: 2024-06-14 14:41:44

标题: 利用卷积神经网络发现有影响力的文本

摘要: 实验方法用于评估文本对人类评价的影响在社会科学中被广泛使用。然而，在实验环境中，研究人员通常只能测试少量预先指定的文本处理方式。尽管近年来一直在努力挖掘非结构化文本中因果影响结果的特征，但这些模型主要集中在文本的主题或特定词汇上，这并不总是效果的机制。我们将这些努力与自然语言处理可解释性技术联系起来，并提出一种利用卷积神经网络灵活发现预测人类对文本反应的类似文本短语簇的方法。在实验环境中使用时，该方法可以在一定假设下识别文本处理方式及其效果。我们将该方法应用于两个数据集。第一个数据集能够直接验证模型检测已知导致结果的短语的能力。第二个数据集展示了该方法能够灵活发现具有不同文本结构的文本处理方式的能力。在这两种情况下，与基准方法相比，模型学习到更多种类的文本处理方式，这些文本特征在定量上达到或超过基准方法预测结果的能力。

更新时间: 2024-06-14 14:41:44

领域: cs.CL,cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.10086v1

Development and Validation of a Machine Learning Algorithm for Clinical Wellness Visit Classification in Cats and Dogs

Early disease detection in veterinary care relies on identifying subclinical abnormalities in asymptomatic animals during wellness visits. This study introduces an algorithm designed to distinguish between wellness and other veterinary visits.The purpose of this study is to validate the use of a visit classification algorithm compared to manual classification of veterinary visits by three board-certified veterinarians. Using a dataset of 11,105 clinical visits from 2012 to 2017 involving 655 animals (85.3% canines and 14.7% felines) across 544 U.S. veterinary establishments, the model was trained using a Gradient Boosting Machine model. Three validators were tasked with classifying 400 visits, including both wellness and other types of visits, selected randomly from the same database used for initial algorithm training, aiming to maintain consistency and relevance between the training and application phases; visit classifications were subsequently categorized into "wellness" or "other" based on majority consensus among validators to assess the algorithm's performance in identifying wellness visits. The algorithm demonstrated a specificity of 0.94 (95% CI: 0.91 to 0.96), implying its accuracy in distinguishing non-wellness visits. The algorithm had a sensitivity of 0.86 (95% CI: 0.80 to 0.92), indicating its ability to correctly identify wellness visits as compared to the annotations provided by veterinary experts. The balanced accuracy, calculated as 0.90 (95% CI: 0.87 to 0.93), further confirms the algorithm's overall effectiveness. The algorithm exhibits strong specificity and sensitivity, ensuring accurate identification of a high proportion of wellness visits. Overall, this algorithm holds promise for advancing research on preventive care's role in subclinical disease identification, but prospective studies are needed for validation.

Updated: 2024-06-14 14:38:15

标题: 猫狗临床健康检查分类的机器学习算法的开发与验证

摘要: 獸醫保健中早期疾病檢測依賴於在健康檢查期間識別無症狀動物中的潛在異常。本研究介紹了一個旨在區分健康檢查和其他獸醫訪問的算法。本研究的目的是驗證使用訪問分類算法相對於三名獸醫專家的手動分類獸醫訪問的有效性。使用2012年至2017年間包括655只動物（85.3％犬和14.7％貓）在544家美國獸醫設施中的11,105次臨床訪問的數據集，該模型使用梯度提升機模型進行訓練。三名驗證者被要求對400次訪問進行分類，包括健康檢查和其他類型的訪問，這些訪問是從用於初始算法訓練的相同數據庫中隨機選擇的，旨在保持訓練和應用階段之間的一致性和相關性；根據驗證者之間的多數一致性將訪問分類為“健康”或“其他”，以評估算法在識別健康訪問方面的表現。該算法表現出0.94（95％CI：0.91至0.96）的特異性，表明其在區分非健康訪問方面的準確性。該算法具有0.86（95％CI：0.80至0.92）的敏感性，表明其能夠正確識別健康訪問，並與獸醫專家提供的註釋進行比較。平衡準確性計算為0.90（95％CI：0.87至0.93），進一步確認了該算法的整體有效性。該算法表現出良好的特異性和敏感性，確保準確識別高比例的健康訪問。總的來說，該算法有望推進預防保健在潛在疾病識別中的作用研究，但需要進行前瞻性研究以進行驗證。

更新时间: 2024-06-14 14:38:15

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2406.10314v1

POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models

Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modalities within multimodal inputs. This oversight hinders the development of effective prompts that guide model multimodal reasoning processes by fully exploiting the rich context provided by multiple modalities. In this paper, we present POEM, a visual analytics system to facilitate efficient prompt engineering for enhancing the multimodal reasoning performance of LLMs. The system enables users to explore the interaction patterns across modalities at varying levels of detail for a comprehensive understanding of the multimodal knowledge elicited by various prompts. Through diverse recommendations of demonstration examples and instructional principles, POEM supports users in iteratively crafting and refining prompts to better align and enhance model knowledge with human insights. The effectiveness and efficiency of our system are validated through two case studies and interviews with experts.

Updated: 2024-06-14 14:36:58

标题: 诗歌：用于增强大型语言模型多模态推理的交互式提示优化

摘要: 大型语言模型（LLMs）在零或少样本设置中通过适当提示展现出对多模态内容理解和推理的印象深刻的能力。尽管已经开发了许多互动系统来支持LLMs的提示工程，但大多数系统主要关注文本或视觉输入，因此忽视了多模态输入中各种模态之间复杂的相互作用。这一疏忽妨碍了通过充分利用多模态提供的丰富上下文来引导模型多模态推理过程的有效提示的开发。在本文中，我们提出了POEM，这是一个视觉分析系统，旨在促进为增强LLMs的多模态推理性能而进行高效提示工程。该系统使用户能够探索各种提示引发的多模态知识在不同详细级别上的交互模式，从而全面理解多模态知识。通过多样的演示示例和指导原则的推荐，POEM支持用户迭代地制定和完善提示，以更好地调整和增强模型知识与人类见解之间的一致性。我们的系统的有效性和效率通过两个案例研究和专家访谈得到验证。

更新时间: 2024-06-14 14:36:58

领域: cs.HC,cs.AI,68,H.5; I.2.1

下载: http://arxiv.org/abs/2406.03843v2

Run LoRA Run: Faster and Lighter LoRA Implementations

LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers. This technique is used both for fine-tuning and full training of large language models. This paper presents the RunLoRA framework for efficient implementations of LoRA that significantly improves the speed of neural network training and fine-tuning using low-rank adapters. The proposed implementation optimizes the computation of LoRA operations based on dimensions of corresponding linear layer, layer input dimensions and lora rank by choosing best forward and backward computation graph based on FLOPs and time estimations, resulting in faster training without sacrificing accuracy. The experimental results show up to 28\% speedup on language modeling networks.

Updated: 2024-06-14 14:36:45

标题: 《Run LoRA Run：更快更轻的LoRA实现》

摘要: LoRA是一种通过向线性层引入低秩适配器来减少神经网络中可训练参数数量的技术。该技术既用于大型语言模型的微调，也用于完全训练。本文提出了RunLoRA框架，用于高效实现LoRA，显著提高了使用低秩适配器进行神经网络训练和微调的速度。所提出的实现根据相应线性层的维度、层输入维度和lora秩选择最佳的前向和后向计算图来优化LoRA操作的计算，从而实现更快的训练速度，而不牺牲准确性。实验结果显示，在语言建模网络上加速高达28\%。

更新时间: 2024-06-14 14:36:45

领域: cs.LG

下载: http://arxiv.org/abs/2312.03415v2

Localizing Events in Videos with Multimodal Queries

Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current research is that semantic queries are typically in natural language that depicts the semantics of the target event. This setting overlooks the potential for multimodal semantic queries composed of images and texts. To address this gap, we introduce a new benchmark, ICQ, for localizing events in videos with multimodal queries, along with a new evaluation dataset ICQ-Highlight. Our new benchmark aims to evaluate how well models can localize an event given a multimodal semantic query that consists of a reference image, which depicts the event, and a refinement text to adjust the images' semantics. To systematically benchmark model performance, we include 4 styles of reference images and 5 types of refinement texts, allowing us to explore model performance across different domains. We propose 3 adaptation methods that tailor existing models to our new setting and evaluate 10 SOTA models, ranging from specialized to large-scale foundation models. We believe this benchmark is an initial step toward investigating multimodal queries in video event localization.

Updated: 2024-06-14 14:35:58

标题: 使用多模态查询在视频中定位事件

摘要: 视频理解是数字时代中的一个关键任务，然而视频的动态和多事件性质使其在处理过程中需要大量的人力和计算资源。因此，针对语义查询定位特定事件在用户导向的应用（如视频搜索）和视频基础模型的学术研究中变得越来越重要。当前研究中一个显著的限制是语义查询通常是自然语言，描述了目标事件的语义。这种设置忽略了由图像和文本组成的多模态语义查询的潜力。为了填补这一空白，我们引入了一个新的基准ICQ，用于通过多模态查询本地化视频中的事件，并提供了一个新的评估数据集ICQ-Highlight。我们的新基准旨在评估模型在给定包含参考图像和用于调整图像语义的细化文本的多模态语义查询的情况下定位事件的能力。为了系统地评估模型性能，我们包含了4种风格的参考图像和5种类型的细化文本，使我们能够探索模型在不同领域中的性能。我们提出了3种适应方法，用于调整现有模型以适应我们的新设置，并评估了10种领先的模型，从专门化到大规模基础模型。我们认为这个基准是探索视频事件本地化中多模态查询的一个初始步骤。

更新时间: 2024-06-14 14:35:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10079v1

D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video

Dynamic reconstruction and spatiotemporal novel-view synthesis of non-rigidly deforming scenes recently gained increased attention. While existing work achieves impressive quality and performance on multi-view or teleporting camera setups, most methods fail to efficiently and faithfully recover motion and appearance from casual monocular captures. This paper contributes to the field by introducing a new method for dynamic novel view synthesis from monocular video, such as casual smartphone captures. Our approach represents the scene as a $\textit{dynamic neural point cloud}$, an implicit time-conditioned point distribution that encodes local geometry and appearance in separate hash-encoded neural feature grids for static and dynamic regions. By sampling a discrete point cloud from our model, we can efficiently render high-quality novel views using a fast differentiable rasterizer and neural rendering network. Similar to recent work, we leverage advances in neural scene analysis by incorporating data-driven priors like monocular depth estimation and object segmentation to resolve motion and depth ambiguities originating from the monocular captures. In addition to guiding the optimization process, we show that these priors can be exploited to explicitly initialize our scene representation to drastically improve optimization speed and final image quality. As evidenced by our experimental evaluation, our dynamic point cloud model not only enables fast optimization and real-time frame rates for interactive applications, but also achieves competitive image quality on monocular benchmark sequences. Our project page is available at https://moritzkappel.github.io/projects/dnpc.

Updated: 2024-06-14 14:35:44

标题: D-NPC: 动态神经点云用于单眼视频的非刚性视角合成

摘要: 动态重建和时空新视角合成非刚性变形场景最近引起了更多关注。虽然现有工作在多视角或传送相机设置上取得了令人印象深刻的质量和性能，但大多数方法在从普通单目捕获中高效且忠实地恢复运动和外观方面失败。本文通过引入一种新方法，为从单目视频（如普通智能手机捕获）中合成动态新视角做出贡献。我们的方法将场景表示为$\textit{动态神经点云}$，一个隐式的时间条件点分布，通过分别为静态和动态区域编码本地几何和外观的哈希编码神经特征网格来表示。通过从我们的模型中采样离散点云，我们可以使用快速可微光栅化器和神经渲染网络高效地渲染高质量的新视角。类似于最近的工作，我们利用神经场景分析的进展，通过将单目深度估计和对象分割等数据驱动的先验纳入来解决源自单目捕获的运动和深度模糊。除了引导优化过程外，我们展示这些先验可以被利用来明确初始化我们的场景表示，从而显著改善优化速度和最终图像质量。正如我们的实验评估所证明的那样，我们的动态点云模型不仅可以实现快速优化和实时帧率用于交互应用程序，还可以在单目基准序列上实现具有竞争力的图像质量。我们的项目页面可在https://moritzkappel.github.io/projects/dnpc 上查看。

更新时间: 2024-06-14 14:35:44

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2406.10078v1

Neural Networks and Friction: Slide, Hold, Learn

In this study, it is demonstrated that Recurrent Neural Networks (RNNs), specifically those utilizing Gated Recurrent Unit (GRU) architecture, possess the capability to learn the complex dynamics of rate-and-state friction laws from synthetic data. The data employed for training the network is generated through the application of traditional rate-and-state friction equations coupled with the aging law for state evolution. A novel aspect of our approach is the formulation of a loss function that explicitly accounts for the direct effect by means of automatic differentiation. It is found that the RNN, with its GRU architecture, effectively learns to predict changes in the friction coefficient resulting from velocity jumps (with and without noise in the target data), thereby showcasing the potential of machine learning models in understanding and simulating the physics of frictional processes.

Updated: 2024-06-14 14:27:40

标题: 神经网络与摩擦：滑动、保持、学习

摘要: 在这项研究中，我们证明了循环神经网络（RNNs），特别是利用门控循环单元（GRU）结构的网络，具有学习合成数据中速率-状态摩擦定律复杂动态的能力。用于训练网络的数据是通过应用传统速率-状态摩擦方程和状态演化的老化定律生成的。我们方法的一个新颖之处是制定了一个损失函数，明确考虑了通过自动微分产生的直接影响。发现RNN，以其GRU结构，有效地学习预测由于速度跳跃而导致的摩擦系数变化（带有和不带有目标数据中的噪声），从而展示了机器学习模型在理解和模拟摩擦过程物理学方面的潜力。

更新时间: 2024-06-14 14:27:40

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2402.14148v4

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and efficient approach that does not require costly training to effectively fool latent diffusion models (LDMs). The approach is based on the observation that cross-attention layers exhibits higher sensitivity to gradient change, allowing for leveraging subtle perturbations on published images to significantly corrupt the generated images. We show that a subtle perturbation on an image can significantly impact the cross-attention layers, thus changing the mapping between text and image during the fine-tuning of customized diffusion models. Extensive experiments demonstrate that CAAT is compatible with diverse diffusion models and outperforms baseline attack methods in a more effective (more noise) and efficient (twice as fast as Anti-DreamBooth and Mist) manner.

Updated: 2024-06-14 14:26:38

标题: 扰乱注意力带来更多利益：有效欺骗定制扩散模型的微小成像扰动

摘要: 扩散模型（DMs）开启了生成建模的新时代，为有效生成高质量和真实数据样本提供了更多机会。然而，它们的广泛应用也带来了模型安全方面的新挑战，这促使了对DMs创建更有效的对抗性攻击者，以了解其脆弱性。我们提出了CAAT，这是一种简单但通用且高效的方法，不需要昂贵的训练即可有效地欺骗潜在的扩散模型（LDMs）。该方法基于交叉注意力层对梯度变化的更高敏感性的观察，从而允许利用对已发布图片的微小扰动，显著破坏生成的图片。我们展示了对一幅图像的微小扰动可以显著影响交叉注意力层，从而在定制扩散模型的微调过程中改变文本和图像之间的映射。大量实验证明，CAAT与各种扩散模型兼容，并在更有效（更多噪音）和更高效（比Anti-DreamBooth和Mist快两倍）的方式下优于基线攻击方法。

更新时间: 2024-06-14 14:26:38

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.15081v2

Architectural Blueprint For Heterogeneity-Resilient Federated Learning

This paper proposes a novel three tier architecture for federated learning to optimize edge computing environments. The proposed architecture addresses the challenges associated with client data heterogeneity and computational constraints. It introduces a scalable, privacy preserving framework that enhances the efficiency of distributed machine learning. Through experimentation, the paper demonstrates the architecture capability to manage non IID data sets more effectively than traditional federated learning models. Additionally, the paper highlights the potential of this innovative approach to significantly improve model accuracy, reduce communication overhead, and facilitate broader adoption of federated learning technologies.

Updated: 2024-06-14 14:25:29

标题: 建筑蓝图：抗异构性联邦学习

摘要: 本文提出了一种新颖的三层架构，用于优化边缘计算环境中的联合学习。所提出的架构解决了与客户端数据异构性和计算约束相关的挑战。它引入了一个可扩展的、隐私保护的框架，提高了分布式机器学习的效率。通过实验，本文展示了该架构能够比传统的联合学习模型更有效地管理非独立同分布数据集。此外，本文强调了这种创新方法显著提高模型准确性、减少通信开销，并促进联合学习技术的更广泛应用的潜力。

更新时间: 2024-06-14 14:25:29

领域: cs.LG,cs.DC,cs.NI

下载: http://arxiv.org/abs/2403.04546v2

TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data

The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this work, we introduce TACCO, a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data. Specifically, we develop a novel self-supervised co-clustering framework that can be guided by the risk prediction task of specific diseases. Furthermore, we enhance the hypergraph model of EHR data with textual embeddings and enforce the alignment between the clusters of clinical concepts and patient visits through a contrastive objective. Comprehensive experiments conducted on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction demonstrate an average 31.25% performance improvement compared to traditional ML baselines and a 5.26% improvement on top of the vanilla hypergraph model without our co-clustering mechanism. In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO. Code is available at https://github.com/PericlesHat/TACCO.

Updated: 2024-06-14 14:18:38

标题: TACCO：基于电子健康记录数据的临床概念和患者就诊的任务引导共聚类，用于疾病亚型划分

摘要: 随着良好组织的电子健康记录（EHR）数据日益丰富，各种机器学习模型的发展促进了疾病风险预测。然而，现有的风险预测方法忽视了复杂疾病的异质性，未能对与其对应的患者就诊和临床概念子群进行建模。在这项工作中，我们介绍了TACCO，这是一个新颖的框架，基于EHR数据的超图建模，共同发现临床概念和患者就诊的聚类。具体来说，我们开发了一个新颖的自监督共聚类框架，可以根据特定疾病的风险预测任务进行引导。此外，我们通过文本嵌入增强了EHR数据的超图模型，并通过对比目标强化了临床概念和患者就诊之间的对齐。在公共MIMIC-III数据集和Emory内部CRADLE数据集上进行的全面实验，针对表型分类和心血管风险预测的下游临床任务，展示了与传统机器学习基线相比平均性能提升31.25％，以及比普通超图模型提高5.26％的结果。深入的模型分析、聚类结果分析和临床案例研究进一步验证了TACCO提供的改进效用和有见地的解释。代码可在https://github.com/PericlesHat/TACCO 上获取。

更新时间: 2024-06-14 14:18:38

领域: cs.LG

下载: http://arxiv.org/abs/2406.10061v1

PRIMER: Perception-Aware Robust Learning-based Multiagent Trajectory Planner

In decentralized multiagent trajectory planners, agents need to communicate and exchange their positions to generate collision-free trajectories. However, due to localization errors/uncertainties, trajectory deconfliction can fail even if trajectories are perfectly shared between agents. To address this issue, we first present PARM and PARM*, perception-aware, decentralized, asynchronous multiagent trajectory planners that enable a team of agents to navigate uncertain environments while deconflicting trajectories and avoiding obstacles using perception information. PARM* differs from PARM as it is less conservative, using more computation to find closer-to-optimal solutions. While these methods achieve state-of-the-art performance, they suffer from high computational costs as they need to solve large optimization problems onboard, making it difficult for agents to replan at high rates. To overcome this challenge, we present our second key contribution, PRIMER, a learning-based planner trained with imitation learning (IL) using PARM* as the expert demonstrator. PRIMER leverages the low computational requirements at deployment of neural networks and achieves a computation speed up to 5500 times faster than optimization-based approaches.

Updated: 2024-06-14 14:16:39

标题: PRIMER：感知感知鲁棒学习的多智能体轨迹规划器

摘要: 在分散式多智能体轨迹规划器中，智能体需要通信并交换它们的位置以生成无碰撞轨迹。然而，由于定位误差/不确定性，即使轨迹在智能体之间完美共享，轨迹解决也可能失败。为了解决这个问题，我们首先提出了PARM和PARM*，感知意识、分散式、异步多智能体轨迹规划器，使一组智能体能够在不确定环境中导航，同时利用感知信息解决轨迹冲突并避开障碍物。PARM*与PARM不同之处在于它更少保守，使用更多计算来找到更接近最优解。虽然这些方法实现了最先进的性能，但由于它们需要在机载解决大型优化问题，因此面临着高计算成本的问题，使得智能体难以以高速率重新规划。为了克服这一挑战，我们提出了我们的第二个关键贡献，PRIMER，一个基于学习的规划器，使用PARM*作为专家示范者进行模仿学习(IL)训练。PRIMER利用神经网络在部署时的低计算需求，并实现了比基于优化的方法快5500倍的计算速度。

更新时间: 2024-06-14 14:16:39

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2406.10060v1

First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models

With the development of multimodal large language models (MLLMs) technology, its general capabilities are increasingly powerful. To evaluate the various abilities of MLLMs, numerous evaluation systems have emerged. But now there is still a lack of a comprehensive method to evaluate MLLMs in the tasks related to flowcharts, which are very important in daily life and work. We propose the first comprehensive method, FlowCE, to assess MLLMs across various dimensions for tasks related to flowcharts. It encompasses evaluating MLLMs' abilities in Reasoning, Localization Recognition, Information Extraction, Logical Verification, and Summarization on flowcharts. However, we find that even the GPT4o model achieves only a score of 56.63. Among open-source models, Phi-3-Vision obtained the highest score of 49.97. We hope that FlowCE can contribute to future research on multimodal large language models (MLLMs) for tasks based on flowcharts. We are open-sourcing this project: \url{https://github.com/360AILAB-NLP/FlowCE}

Updated: 2024-06-14 14:15:35

标题: 大型多模态语言模型对流程图理解的首次多维评估

摘要: 随着多模态大型语言模型（MLLMs）技术的发展，其通用能力越来越强大。为了评估MLLMs的各种能力，出现了许多评估系统。但目前仍然缺乏一个全面的方法来评估与流程图相关的任务中的MLLMs，这些任务在日常生活和工作中非常重要。我们提出了第一个全面的方法FlowCE，用于评估与流程图相关任务中的MLLMs在各个维度上的能力。它涵盖了在流程图上评估MLLMs在推理、定位识别、信息提取、逻辑验证和总结方面的能力。然而，我们发现即使GPT4o模型的得分也仅为56.63。在开源模型中，Phi-3-Vision获得了最高的得分49.97。我们希望FlowCE能为未来基于流程图的多模态大型语言模型（MLLMs）的研究做出贡献。我们正在开源这个项目：\url{https://github.com/360AILAB-NLP/FlowCE}

更新时间: 2024-06-14 14:15:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10057v1

SmartOracle: Generating Smart Contract Oracle via Fine-Grained Invariant Detection

As decentralized applications (DApps) proliferate, the increased complexity and usage of smart contracts have heightened their susceptibility to security incidents and financial losses. Although various vulnerability detection tools have been developed to mitigate these issues, they often suffer poor performance in detecting vulnerabilities, as they either rely on simplistic and general-purpose oracles that may be inadequate for vulnerability detection, or require user-specified oracles, which are labor-intensive to create. In this paper, we introduce SmartOracle, a dynamic invariant detector that automatically generates fine-grained invariants as application-specific oracles for vulnerability detection. From historical transactions, SmartOracle uses pattern-based detection and advanced inference to construct comprehensive properties, and mines multi-layer likely invariants to accommodate the complicated contract functionalities. After that, SmartOracle identifies smart contract vulnerabilities by hunting the violated invariants in new transactions. In the field of invariant detection, SmartOracle detects 50% more ERC20 invariants than existing dynamic invariant detection and achieves 96% precision rate. Furthermore, we build a dataset that contains vulnerable contracts from real-world security incidents. SmartOracle successfully detects 466 abnormal transactions with an acceptable precision rate 96%, involving 31 vulnerable contracts. The experimental results demonstrate its effectiveness in detecting smart contract vulnerabilities, especially those related to complicated contract functionalities.

Updated: 2024-06-14 14:09:20

标题: 智能Oracle：通过细粒度不变性检测生成智能合约Oracle

摘要: 随着去中心化应用程序（DApps）的蓬勃发展，智能合约的复杂性和使用率增加，使得它们更容易受到安全事件和财务损失的影响。尽管已经开发了各种漏洞检测工具来缓解这些问题，但它们通常在检测漏洞方面性能不佳，因为它们要么依赖于简单和通用的预言机，这可能不足以进行漏洞检测，要么需要用户指定的预言机，这需要耗费大量人力。在本文中，我们介绍了SmartOracle，这是一个动态不变量检测器，可自动生成精细的不变量作为特定于应用程序的预言机，用于漏洞检测。通过历史交易，SmartOracle使用基于模式的检测和高级推理构建全面的属性，并挖掘多层次的可能不变量以适用于复杂的合约功能。然后，SmartOracle通过在新交易中查找违反的不变量来识别智能合约的漏洞。在不变量检测领域，SmartOracle比现有的动态不变量检测多检测了50%的ERC20不变量，并实现了96%的精度率。此外，我们构建了一个包含来自真实世界安全事件的脆弱合约的数据集。SmartOracle成功检测到了466笔异常交易，精确率达到96%，涉及31个脆弱合约。实验结果表明其在检测智能合约漏洞方面的有效性，特别是与复杂合约功能相关的漏洞。

更新时间: 2024-06-14 14:09:20

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2406.10054v1

Lost in Latent Space: Disentangled Models and the Challenge of Combinatorial Generalisation

Recent research has shown that generative models with highly disentangled representations fail to generalise to unseen combination of generative factor values. These findings contradict earlier research which showed improved performance in out-of-training distribution settings when compared to entangled representations. Additionally, it is not clear if the reported failures are due to (a) encoders failing to map novel combinations to the proper regions of the latent space or (b) novel combinations being mapped correctly but the decoder/downstream process is unable to render the correct output for the unseen combinations. We investigate these alternatives by testing several models on a range of datasets and training settings. We find that (i) when models fail, their encoders also fail to map unseen combinations to correct regions of the latent space and (ii) when models succeed, it is either because the test conditions do not exclude enough examples, or because excluded generative factors determine independent parts of the output image. Based on these results, we argue that to generalise properly, models not only need to capture factors of variation, but also understand how to invert the generative process that was used to generate the data.

Updated: 2024-06-14 14:09:18

标题: 在潜在空间中迷失：解耦模型与组合泛化挑战

摘要: 最近的研究表明，具有高度解耦表示的生成模型无法推广到未见过的生成因子值的组合。这些发现与早期研究相矛盾，早期研究显示在训练分布设置之外的性能得到改善时，与纠缠表示相比。此外，尚不清楚报告的失败是由于(a)编码器未能将新组合映射到潜在空间的正确区域还是(b)新组合被正确映射但解码器/下游过程无法呈现未见组合的正确输出。我们通过在一系列数据集和训练设置上测试几个模型来研究这些替代方案。我们发现(i)当模型失败时，它们的编码器也未能将未见组合映射到潜在空间的正确区域，(ii)当模型成功时，要么是因为测试条件没有排除足够的示例，要么是因为被排除的生成因子决定了输出图像的独立部分。基于这些结果，我们认为为了正确推广，模型不仅需要捕捉变化因素，还需要理解如何反转生成过程，该过程用于生成数据。

更新时间: 2024-06-14 14:09:18

领域: cs.LG,cs.AI,cs.CV,I.2.6; I.2.10; I.4.5; I.4.10; I.5.1; I.5.3

下载: http://arxiv.org/abs/2204.02283v2

Exponential Expressivity of ReLU$^k$ Neural Networks on Gevrey Classes with Point Singularities

We analyze deep Neural Network emulation rates of smooth functions with point singularities in bounded, polytopal domains $\mathrm{D} \subset \mathbb{R}^d$, $d=2,3$. We prove exponential emulation rates in Sobolev spaces in terms of the number of neurons and in terms of the number of nonzero coefficients for Gevrey-regular solution classes defined in terms of weighted Sobolev scales in $\mathrm{D}$, comprising the countably-normed spaces of I.M. Babu\v{s}ka and B.Q. Guo. As intermediate result, we prove that continuous, piecewise polynomial high order (``$p$-version'') finite elements with elementwise polynomial degree $p\in\mathbb{N}$ on arbitrary, regular, simplicial partitions of polyhedral domains $\mathrm{D} \subset \mathbb{R}^d$, $d\geq 2$ can be exactly emulated by neural networks combining ReLU and ReLU$^2$ activations. On shape-regular, simplicial partitions of polytopal domains $\mathrm{D}$, both the number of neurons and the number of nonzero parameters are proportional to the number of degrees of freedom of the finite element space, in particular for the $hp$-Finite Element Method of I.M. Babu\v{s}ka and B.Q. Guo.

Updated: 2024-06-14 14:02:12

标题: ReLU$^k$神经网络在具有点奇异性的Gevrey类上的指数表达能力

摘要: 我们分析了在有界的多面体域$\mathrm{D} \subset \mathbb{R}^d$中具有点奇异性的光滑函数的深度神经网络仿真速率，其中$d=2,3$。我们证明了在Sobolev空间中的指数仿真速率，这些速率与神经元的数量和非零系数的数量有关，这些系数是根据在$\mathrm{D}$中的加权Sobolev尺度中定义的Gevrey正则解类而确定的，这些解类包括I.M. Babu\v{s}ka和B.Q. Guo的可计数范数空间。作为中间结果，我们证明了在任意正则的单纯形分区的多面体域$\mathrm{D} \subset \mathbb{R}^d$上，具有元素多项式度$p\in\mathbb{N}$的连续、分段多项式高阶（“$p$-版本”）有限元可以通过结合ReLU和ReLU$^2$激活的神经网络精确仿真。在多面体域$\mathrm{D}$的形状正则的单纯形分区中，神经元的数量和非零参数的数量都与有限元空间的自由度数量成比例，特别适用于I.M. Babu\v{s}ka和B.Q. Guo的$hp$有限元方法。

更新时间: 2024-06-14 14:02:12

领域: math.NA,cs.LG,cs.NA,65N30, 41A25

下载: http://arxiv.org/abs/2403.02035v2

Comparison of fine-tuning strategies for transfer learning in medical image classification

In the context of medical imaging and machine learning, one of the most pressing challenges is the effective adaptation of pre-trained models to specialized medical contexts. Despite the availability of advanced pre-trained models, their direct application to the highly specialized and diverse field of medical imaging often falls short due to the unique characteristics of medical data. This study provides a comprehensive analysis on the performance of various fine-tuning methods applied to pre-trained models across a spectrum of medical imaging domains, including X-ray, MRI, Histology, Dermoscopy, and Endoscopic surgery. We evaluated eight fine-tuning strategies, including standard techniques such as fine-tuning all layers or fine-tuning only the classifier layers, alongside methods such as gradually unfreezing layers, regularization based fine-tuning and adaptive learning rates. We selected three well-established CNN architectures (ResNet-50, DenseNet-121, and VGG-19) to cover a range of learning and feature extraction scenarios. Although our results indicate that the efficacy of these fine-tuning methods significantly varies depending on both the architecture and the medical imaging type, strategies such as combining Linear Probing with Full Fine-tuning resulted in notable improvements in over 50% of the evaluated cases, demonstrating general effectiveness across medical domains. Moreover, Auto-RGN, which dynamically adjusts learning rates, led to performance enhancements of up to 11% for specific modalities. Additionally, the DenseNet architecture showed more pronounced benefits from alternative fine-tuning approaches compared to traditional full fine-tuning. This work not only provides valuable insights for optimizing pre-trained models in medical image analysis but also suggests the potential for future research into more advanced architectures and fine-tuning methods.

Updated: 2024-06-14 14:00:02

标题: 医学图像分类中迁移学习微调策略的比较

摘要: 在医学影像和机器学习的背景下，最紧迫的挑战之一是有效地将预训练模型适应专业化的医学背景。尽管有先进的预训练模型可用，但它们直接应用于高度专业化和多样化的医学影像领域往往由于医学数据的独特特征而表现不佳。本研究对应用于预训练模型的各种微调方法在医学影像领域的性能进行了全面分析，包括X射线、MRI、组织学、皮肤镜和内窥镜手术等。我们评估了八种微调策略，包括标准技术，如微调所有层或仅微调分类器层，以及逐渐解冻层、基于正则化的微调和自适应学习率等方法。我们选择了三种成熟的CNN架构（ResNet-50、DenseNet-121和VGG-19）来涵盖一系列学习和特征提取场景。尽管我们的结果表明，这些微调方法的有效性在很大程度上取决于架构和医学影像类型，但将线性探测与完全微调相结合的策略在评估案例中超过50％中取得显著改进，展示了在医学领域的普遍有效性。此外，动态调整学习率的Auto-RGN导致特定模态的性能提升高达11％。此外，与传统的完全微调相比，DenseNet架构从替代微调方法中获得更明显的好处。这项工作不仅为优化医学图像分析中的预训练模型提供了宝贵的见解，还建议未来研究更先进的架构和微调方法的潜力。

更新时间: 2024-06-14 14:00:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.10050v1

EUROPA: A Legal Multilingual Keyphrase Generation Dataset

Keyphrase generation has primarily been explored within the context of academic research articles, with a particular focus on scientific domains and the English language. In this work, we present EUROPA, a dataset for multilingual keyphrase generation in the legal domain. It is derived from legal judgments from the Court of Justice of the European Union (EU), and contains instances in all 24 EU official languages. We run multilingual models on our corpus and analyze the results, showing room for improvement on a domain-specific multilingual corpus such as the one we present.

Updated: 2024-06-14 13:51:01

标题: 欧洲：一个法律多语关键短语生成数据集

摘要: 关键词生成主要在学术研究文章中进行探索，特别关注科学领域和英语。在这项工作中，我们提出了EUROPA，一个用于法律领域的多语言关键词生成数据集。它源自欧洲联盟（EU）法院的法律判决，并包含所有24种EU官方语言的实例。我们在我们的语料库上运行多语言模型并分析结果，展示了在像我们提出的这样的领域特定多语言语料库上改进的空间。

更新时间: 2024-06-14 13:51:01

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.00252v2

Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation

Artificial agents, particularly humanoid robots, interact with their environment, objects, and people using cameras, actuators, and physical presence. Their communication methods are often pre-programmed, limiting their actions and interactions. Our research explores acquiring non-verbal communication skills through learning from demonstrations, with potential applications in sign language comprehension and expression. In particular, we focus on imitation learning for artificial agents, exemplified by teaching a simulated humanoid American Sign Language. We use computer vision and deep learning to extract information from videos, and reinforcement learning to enable the agent to replicate observed actions. Compared to other methods, our approach eliminates the need for additional hardware to acquire information. We demonstrate how the combination of these different techniques offers a viable way to learn sign language. Our methodology successfully teaches 5 different signs involving the upper body (i.e., arms and hands). This research paves the way for advanced communication skills in artificial agents.

Updated: 2024-06-14 13:50:29

标题: 弥合沟通差距：通过模仿学习手语的人工智能代理

摘要: 人工智能代理，特别是类人机器人，通过摄像头、执行器和物理存在与环境、物体和人进行互动。它们的通信方法通常是预先编程的，限制了它们的行动和互动。我们的研究探讨通过从示范中学习获得非语言沟通技能，潜在应用包括手语理解和表达。我们特别关注于为人工智能代理进行模仿学习，例如教授一个模拟的类人手语。我们使用计算机视觉和深度学习从视频中提取信息，并使用强化学习使代理能够复制观察到的动作。与其他方法相比，我们的方法消除了获取信息所需的额外硬件。我们展示了如何结合这些不同技术提供了一种学习手语的可行方式。我们的方法成功地教授了涉及上半身（即手臂和手部）的5种不同手语。这项研究为人工智能代理的高级沟通技能铺平了道路。

更新时间: 2024-06-14 13:50:29

领域: cs.AI,cs.GR,cs.HC,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.10043v1

FZI-WIM at SemEval-2024 Task 2: Self-Consistent CoT for Complex NLI in Biomedical Domain

This paper describes the inference system of FZI-WIM at the SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials. Our system utilizes the chain of thought (CoT) paradigm to tackle this complex reasoning problem and further improves the CoT performance with self-consistency. Instead of greedy decoding, we sample multiple reasoning chains with the same prompt and make the final verification with majority voting. The self-consistent CoT system achieves a baseline F1 score of 0.80 (1st), faithfulness score of 0.90 (3rd), and consistency score of 0.73 (12th). We release the code and data publicly https://github.com/jens5588/FZI-WIM-NLI4CT.

Updated: 2024-06-14 13:49:07

标题: FZI-WIM在SemEval-2024任务2中的表现：生物医学领域复杂NLI的自洽CoT

摘要: 本文描述了FZI-WIM在SemEval-2024任务2中的推理系统：用于临床试验的安全生物医学自然语言推理。我们的系统利用了思维链（CoT）范式来解决这个复杂的推理问题，并通过自一致性进一步提高了CoT的性能。我们不采用贪婪解码，而是对相同提示进行多个推理链的抽样，并通过多数投票进行最终验证。这个自一致的CoT系统实现了基线F1得分为0.80（第1位），忠实度得分为0.90（第3位），一致性得分为0.73（第12位）。我们公开发布了代码和数据https://github.com/jens5588/FZI-WIM-NLI4CT。

更新时间: 2024-06-14 13:49:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10040v1

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam search algorithm is designed to leverage a dynamic fusion of CTC, AR Decoder, and AMD probabilities. Experiments on the LibriSpeech-100hr corpus suggest the tripartite Decoder incorporating the AMD module produces a maximum decoding speed-up ratio of 1.73x over the baseline CTC+AR decoding, while incurring no statistically significant word error rate (WER) increase on the test sets. When operating with the same decoding real time factors, statistically significant WER reductions of up to 0.7% and 0.3% absolute (5.3% and 6.1% relative) were obtained over the CTC+AR baseline.

Updated: 2024-06-14 13:42:38

标题: 朝向有效和高效的基于块注意力掩码的非自回归解码

摘要: 本文提出了一种新颖的非自回归（NAR）基于块的注意力掩码解码器（AMD），灵活地平衡Conformer ASR系统的性能效率权衡。AMD在使用注意力掩码隐藏的连续输出标签块内执行并行的NAR推断，同时在块之间进行从左到右的AR预测和历史上下文融合。设计了一个束搜索算法，利用CTC、AR解码器和AMD概率的动态融合。在LibriSpeech-100hr语料库上的实验表明，包含AMD模块的三部分解码器在基线CTC+AR解码的基础上能够实现最大的解码加速比为1.73倍，同时在测试集上没有引起统计显著的词错误率（WER）增加。在相同的解码实时因子下运行时，相对于CTC+AR基线获得了高达0.7%和0.3%的绝对（5.3%和6.1%的相对）的统计显著WER降低。

更新时间: 2024-06-14 13:42:38

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.10034v1

Intepretative Deep Learning using Domain Adaptation for Fluorescence Spectroscopy

Fluorescence spectroscopy is a fundamental tool in life sciences and chemistry, widely used for applications such as environmental monitoring, food quality control, and biomedical diagnostics. However, analysis of spectroscopic data with deep learning, in particular of fluorescence excitation-emission matrices (EEMs), presents significant challenges due mainly to the typically small and sparse datasets available. Furthermore, the analysis of EEMs is difficult due to their high dimensionality and overlapping spectral features. This study proposes a new approach that exploits domain adaptation with pretrained vision models, alongside a novel interpretability algorithm to address these challenges. Thanks to specialised feature engineering of the neural networks described in this work, we are now able to provide deeper and meaningful insights into the physico-chemical processes underlying the data. The proposed approach is demonstrated through the analysis of the oxidation process in extra virgin olive oil (EVOO), showing its effectiveness in predicting quality indicators and identifying relevant spectral bands. This work describes significantly innovative results in the use of deep learning for spectroscopy, transforming it from a black box into a tool for understanding complex biological and chemical processes.

Updated: 2024-06-14 13:41:21

标题: 利用领域适应进行荧光光谱学的解释性深度学习

摘要: 荧光光谱学是生命科学和化学中的基本工具，被广泛用于环境监测、食品质量控制和生物医学诊断等应用。然而，利用深度学习分析光谱数据，特别是荧光激发-发射矩阵（EEMs），面临着重要挑战，主要是由于通常可用的数据集规模小且稀疏。此外，由于EEMs具有高维度和重叠的光谱特征，其分析也很困难。本研究提出了一种新方法，利用预先训练的视觉模型进行领域适应，同时结合一种新的解释算法来解决这些挑战。由于在本研究中描述的神经网络的专门特征工程，我们现在能够更深入和有意义地洞察数据背后的物理化学过程。提出的方法通过对特级初榨橄榄油（EVOO）中的氧化过程进行分析来进行验证，展示了其在预测质量指标和识别相关光谱带方面的有效性。本研究描述了利用深度学习进行光谱学研究的显著创新成果，将其从黑匣子转变为理解复杂生物和化学过程的工具。

更新时间: 2024-06-14 13:41:21

领域: cs.LG,cs.AI,physics.data-an,physics.optics

下载: http://arxiv.org/abs/2406.10031v1

Off-Policy Evaluation from Logged Human Feedback

Learning from human feedback has been central to recent advances in artificial intelligence and machine learning. Since the collection of human feedback is costly, a natural question to ask is if the new feedback always needs to collected. Or could we evaluate a new model with the human feedback on responses of another model? This motivates us to study off-policy evaluation from logged human feedback. We formalize the problem, propose both model-based and model-free estimators for policy values, and show how to optimize them. We analyze unbiasedness of our estimators and evaluate them empirically. Our estimators can predict the absolute values of evaluated policies, rank them, and be optimized.

Updated: 2024-06-14 13:38:18

标题: 来自记录的人类反馈的脱机策略评估

摘要: 从人类反馈中学习一直是人工智能和机器学习近期进展的核心。由于收集人类反馈是昂贵的，一个自然的问题是是否总是需要收集新的反馈。或者我们可以用另一个模型的人类反馈来评估一个新模型吗？这激励我们研究从记录的人类反馈中进行离线策略评估。我们形式化了问题，提出了基于模型和基于模型的策略值估计器，并展示了如何优化它们。我们分析了我们的估计器的无偏性，并在实证上评估它们。我们的估计器可以预测评估策略的绝对值，对它们进行排名，并进行优化。

更新时间: 2024-06-14 13:38:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.10030v1

Towards Robust Instruction Tuning on Multimodal Large Language Models

Fine-tuning large language models (LLMs) on multi-task instruction-following data has been proven to be a powerful learning paradigm for improving their zero-shot capabilities on new tasks. Recent works about high-quality instruction-following data generation and selection require amounts of human labor to conceive model-understandable instructions for the given tasks and carefully filter the LLM-generated data. In this work, we introduce an automatic instruction augmentation method named INSTRAUG in multimodal tasks. It starts from a handful of basic and straightforward meta instructions but can expand an instruction-following dataset by 30 times. Results on two popular multimodal instructionfollowing benchmarks MULTIINSTRUCT and InstructBLIP show that INSTRAUG can significantly improve the alignment of multimodal large language models (MLLMs) across 12 multimodal tasks, which is even equivalent to the benefits of scaling up training data multiple times.

Updated: 2024-06-14 13:38:14

标题: 朝向在多模态大型语言模型上的稳健指导调整

摘要: 将大型语言模型（LLMs）在多任务指令遵循数据上进行微调已被证明是一种强大的学习范式，可以改善它们在新任务上的零照射能力。关于高质量指令遵循数据生成和选择的最近研究需要大量人力，为给定任务构思模型可理解的指令，并仔细过滤LLM生成的数据。在这项工作中，我们介绍了一种名为INSTRAUG的自动指令增强方法，用于多模态任务。它从少量基本和直接的元指令开始，但可以将一个指令遵循数据集扩展30倍。在两个流行的多模态指令遵循基准数据集MULTIINSTRUCT和InstructBLIP上的结果显示，INSTRAUG可以显著提高跨12个多模态任务的多模态大型语言模型（MLLMs）的对齐性，甚至相当于多次扩大训练数据的好处。

更新时间: 2024-06-14 13:38:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.14492v2

ProtoS-ViT: Visual foundation models for sparse self-explainable classifications

Prototypical networks aim to build intrinsically explainable models based on the linear summation of concepts. However, important challenges remain in the transparency, compactness, and meaningfulness of the explanations provided by these models. This work demonstrates how frozen pre-trained ViT backbones can be effectively turned into prototypical models for both general and domain-specific tasks, in our case biomedical image classifiers. By leveraging strong spatial features combined with a novel prototypical head, ProtoS-ViT surpasses existing prototypical models showing strong performance in terms of accuracy, compactness, and explainability. Model explainability is evaluated through an extensive set of quantitative and qualitative metrics which serve as a general benchmark for the development of prototypical models. Code is available at https://github.com/hturbe/protosvit.

Updated: 2024-06-14 13:36:30

标题: ProtoS-ViT：稀疏自解释分类的视觉基础模型

摘要: 典型网络旨在基于概念的线性求和构建内在可解释的模型。然而，这些模型提供的解释在透明度、紧凑性和意义性方面仍存在重要挑战。本文展示了如何将冻结的预训练 ViT 骨干有效地转化为通用和领域特定任务的典型模型，本文以生物医学图像分类器为例。通过利用强大的空间特征结合新颖的典型头部，ProtoS-ViT 在准确性、紧凑性和可解释性方面超越了现有的典型模型，表现出良好的性能。模型的可解释性通过一系列定量和定性指标进行评估，这些指标可作为典型模型开发的一般基准。代码可在 https://github.com/hturbe/protosvit 获取。

更新时间: 2024-06-14 13:36:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.10025v1

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies.

Updated: 2024-06-14 13:32:43

标题: 大型语言模型中偏好建模的深度贝叶斯主动学习

摘要: 利用人类偏好来引导大型语言模型（LLMs）的行为在近年来取得了显著成功。然而，数据选择和标记仍然是这些系统的瓶颈，特别是在大规模情况下。因此，选择获取人类反馈的最具信息量的点可能会大大降低偏好标记的成本，并推动LLMs的进一步发展。贝叶斯主动学习为解决这一挑战提供了一个原则性框架，并在不同环境中取得了显著成功。然而，先前尝试将其用于偏好建模并未达到预期效果。在这项工作中，我们发现天真的认识不确定性估计导致获取多余的样本。我们通过提出贝叶斯主动学习器偏好建模（BAL-PM）来解决这个问题，这是一种新颖的随机获取策略，不仅根据偏好模型针对高认识不确定性的点，还力求在由所使用的LLM跨越的特征空间中最大化获取提示分布的熵。值得注意的是，我们的实验表明，在两个流行的人类偏好数据集中，BAL-PM所需的偏好标签数量减少了33%至68%，超过了先前的随机贝叶斯获取策略。

更新时间: 2024-06-14 13:32:43

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2406.10023v1

Group and Shuffle: Efficient Structured Orthogonal Parametrization

The increasing size of neural networks has led to a growing demand for methods of efficient fine-tuning. Recently, an orthogonal fine-tuning paradigm was introduced that uses orthogonal matrices for adapting the weights of a pretrained model. In this paper, we introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works. We examine properties of this class and build a structured orthogonal parametrization upon it. We then use this parametrization to modify the orthogonal fine-tuning framework, improving parameter and computational efficiency. We empirically validate our method on different domains, including adapting of text-to-image diffusion models and downstream task fine-tuning in language modeling. Additionally, we adapt our construction for orthogonal convolutions and conduct experiments with 1-Lipschitz neural networks.

Updated: 2024-06-14 13:29:36

标题: 分组和混洗：高效的结构正交参数化

摘要: 神经网络的增加规模导致了对高效微调方法的增长需求。最近，引入了一种使用正交矩阵来调整预训练模型权重的正交微调范式。在本文中，我们介绍了一种新的结构化矩阵类，将之前的工作中的结构化类统一并泛化。我们研究了这一类的属性，并在其基础上构建了一个结构化正交参数化。然后，我们使用这种参数化来修改正交微调框架，提高参数和计算效率。我们在不同领域对我们的方法进行了实证验证，包括调整文本到图像扩散模型和语言建模中的下游任务微调。此外，我们将我们的构造调整为正交卷积，并进行了与1-李普希茨神经网络的实验。

更新时间: 2024-06-14 13:29:36

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.10019v1

Vulnerable Road User Detection and Safety Enhancement: A Comprehensive Survey

Traffic incidents involving vulnerable road users (VRUs) constitute a significant proportion of global road accidents. Advances in traffic communication ecosystems, coupled with sophisticated signal processing and machine learning techniques, have facilitated the utilization of data from diverse sensors. Despite these advancements and the availability of extensive datasets, substantial progress is required to mitigate traffic casualties. This paper provides a comprehensive survey of state-of-the-art technologies and methodologies to enhance the safety of VRUs. The study delves into the communication networks between vehicles and VRUs, emphasizing the integration of advanced sensors and the availability of relevant datasets. It explores preprocessing techniques and data fusion methods to enhance sensor data quality. Furthermore, our study assesses critical simulation environments essential for developing and testing VRU safety systems. Our research also highlights recent advances in VRU detection and classification algorithms, addressing challenges such as variable environmental conditions. Additionally, we cover cutting-edge research in predicting VRU intentions and behaviors, which is crucial for proactive collision avoidance strategies. Through this survey, we aim to provide a comprehensive understanding of the current landscape of VRU safety technologies, identifying areas of progress and areas needing further research and development.

Updated: 2024-06-14 13:28:43

标题: 易受伤害道路用户的检测和安全增强：一项全面调查

摘要: 涉及易受伤者的交通事故构成全球道路事故的重要部分。交通通信生态系统的进步，加上复杂的信号处理和机器学习技术，促进了来自多种传感器的数据的利用。尽管取得了这些进展并且有大量数据集可用，但仍需要大量进展来减少交通伤亡。本文提供了一份全面的调查，介绍了增强易受伤者安全性的最新技术和方法。研究探讨了车辆和易受伤者之间的通信网络，强调了先进传感器的整合和相关数据集的可用性。它探讨了预处理技术和数据融合方法，以提高传感器数据质量。此外，我们的研究评估了对开发和测试易受伤者安全系统至关重要的关键模拟环境。我们的研究还强调了易受伤者检测和分类算法方面的最新进展，解决了变化环境条件等挑战。此外，我们还涵盖了预测易受伤者意图和行为的尖端研究，这对于主动碰撞避免策略至关重要。通过这项调查，我们旨在提供对当前易受伤者安全技术的全面了解，确定进展和需要进一步研究和发展的领域。

更新时间: 2024-06-14 13:28:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19202v3

Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration

After the revelation that neural networks tend to produce overconfident predictions, the problem of calibration, which aims to align confidence with accuracy to enhance the reliability of predictions, has gained significant importance. Several solutions based on calibration maps have been proposed to address the problem of recalibrating a trained classifier using additional datasets. In this paper, we offer an algorithm that transforms the weights of the last layer of the classifier, distinct from the calibration-map-based approach. We concentrate on the geometry of the final linear layer, specifically its angular aspect, and adjust the weights of the corresponding layer. We name the method Tilt and Average(\textsc{Tna}), and validate the calibration effect empirically and theoretically. Through this, we demonstrate that our approach, in addition to the existing calibration-map-based techniques, can yield improved calibration performance. Code available : https://github.com/GYYYYYUUUUU/TNA_Angular_Scaling.

Updated: 2024-06-14 13:27:56

标题: 倾斜和平均：最后一层的几何调整用于重新校准

摘要: 在发现神经网络倾向于产生过于自信的预测后，校准问题变得尤为重要，这一问题旨在使置信度与准确性对齐，以增强预测的可靠性。已经提出了几种基于校准映射的解决方案，以解决使用额外数据集重新校准训练过的分类器的问题。在本文中，我们提出了一种算法，通过调整分类器最后一层的权重来解决这一问题，与基于校准映射的方法不同。我们专注于最后线性层的几何结构，特别是其角度方面，并调整相应层的权重。我们将该方法命名为Tilt and Average(Tna)，并通过实证和理论验证了校准效果。通过这一过程，我们展示了我们的方法，除了现有的基于校准映射的技术外，还可以产生更好的校准性能。代码可在以下链接找到：https://github.com/GYYYYYUUUUU/TNA_Angular_Scaling。

更新时间: 2024-06-14 13:27:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10017v1

Deobfuscation of Semi-Linear Mixed Boolean-Arithmetic Expressions

Mixed Boolean-Arithmetic (MBA) obfuscation is a common technique used to transform simple expressions into semantically equivalent but more complex combinations of boolean and arithmetic operators. Its widespread usage in DRM systems, malware, and software protectors is well documented. In 2021, Liu et al. proposed a groundbreaking method of simplifying linear MBAs, utilizing a hidden two-way transformation between 1-bit and n-bit variables. In 2022, Reichenwallner et al. proposed a similar but more effective method of simplifying linear MBAs, SiMBA, relying on a similar but more involved theorem. However, because current linear MBA simplifiers operate in 1-bit space, they cannot handle expressions which utilize constants inside of their bitwise operands, e.g. (x&1), (x&1111) + (y&1111). We propose an extension to SiMBA that enables simplification of this broader class of expressions. It surpasses peer tools, achieving efficient simplification of a class of MBAs that current simplifiers struggle with.

Updated: 2024-06-14 13:27:40

标题: 半线性混合布尔-算术表达式的去混淆

摘要: 混合布尔-算术（MBA）混淆是一种常用的技术，用于将简单表达式转换为语义等效但更复杂的布尔和算术运算符的组合。它在数字版权管理系统、恶意软件和软件保护器中的广泛使用已有充分记录。 2021年，刘等提出了一种突破性的简化线性MBA的方法，利用1位和n位变量之间的隐藏双向转换。2022年，Reichenwallner等人提出了一种类似但更有效的简化线性MBA的方法SiMBA，依赖于一个类似但更复杂的定理。然而，由于当前线性MBA简化器在1位空间中运行，它们无法处理在位运算符内部使用常量的表达式，例如(x&1)、(x&1111) + (y&1111)。我们提出了SiMBA的一个扩展，使其能够简化这一更广泛类别的表达式。它超越了同行工具，实现了一类当前简化器难以处理的MBA的高效简化。

更新时间: 2024-06-14 13:27:40

领域: cs.CR

下载: http://arxiv.org/abs/2406.10016v1

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

We introduce FinTral, a suite of state-of-the-art multimodal large language models (LLMs) built upon the Mistral-7b model and tailored for financial analysis. FinTral integrates textual, numerical, tabular, and image data. We enhance FinTral with domain-specific pretraining, instruction fine-tuning, and RLAIF training by exploiting a large collection of textual and visual datasets we curate for this work. We also introduce an extensive benchmark featuring nine tasks and 25 datasets for evaluation, including hallucinations in the financial domain. Our FinTral model trained with direct preference optimization employing advanced Tools and Retrieval methods, dubbed FinTral-DPO-T&R, demonstrates an exceptional zero-shot performance. It outperforms ChatGPT-3.5 in all tasks and surpasses GPT-4 in five out of nine tasks, marking a significant advancement in AI-driven financial technology. We also demonstrate that FinTral has the potential to excel in real-time analysis and decision-making in diverse financial contexts. The GitHub repository for FinTral is available at \url{https://github.com/UBC-NLP/fintral}.

Updated: 2024-06-14 13:26:47

标题: FinTral：一系列GPT-4级别的多模态金融大型语言模型

摘要: 我们介绍了FinTral，这是一套基于Mistral-7b模型构建的最先进的多模态大型语言模型（LLMs），专为金融分析定制。FinTral整合了文本、数字、表格和图像数据。我们通过利用我们为这项工作精心策划的大量文本和视觉数据集，增强了FinTral的领域特定预训练、指导微调和RLAIF训练。我们还引入了一个包含九个任务和25个数据集的广泛基准测试，包括金融领域的幻觉。我们的FinTral模型通过直接偏好优化训练，采用先进的工具和检索方法，被命名为FinTral-DPO-T&R，展现出卓越的零-shot表现。在所有任务中，它的表现均优于ChatGPT-3.5，并在九个任务中的五个任务中超越了GPT-4，标志着人工智能驱动的金融技术的重大进步。我们还展示了FinTral在各种金融情境下具有实时分析和决策能力的潜力。FinTral的GitHub存储库可在\url{https://github.com/UBC-NLP/fintral}找到。

更新时间: 2024-06-14 13:26:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.10986v3

Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems

In this paper, we introduce novel gradient-based optimization methods for state-based potential games (SbPGs) within self-learning distributed production systems. SbPGs are recognised for their efficacy in enabling self-optimizing distributed multi-agent systems and offer a proven convergence guarantee, which facilitates collaborative player efforts towards global objectives. Our study strives to replace conventional ad-hoc random exploration-based learning in SbPGs with contemporary gradient-based approaches, which aim for faster convergence and smoother exploration dynamics, thereby shortening training duration while upholding the efficacy of SbPGs. Moreover, we propose three distinct variants for estimating the objective function of gradient-based learning, each developed to suit the unique characteristics of the systems under consideration. To validate our methodology, we apply it to a laboratory testbed, namely Bulk Good Laboratory Plant, which represents a smart and flexible distributed multi-agent production system. The incorporation of gradient-based learning in SbPGs reduces training times and achieves more optimal policies than its baseline.

Updated: 2024-06-14 13:26:36

标题: 基于梯度的学习在基于状态的潜力博弈中的应用于自学习生产系统

摘要: 在本文中，我们介绍了一种针对基于状态的潜在博弈（SbPGs）的全新基于梯度的优化方法，适用于自学习的分布式生产系统。SbPGs以其在实现自优化的分布式多智能体系统中的高效性而闻名，并提供了经过验证的收敛保证，促进了协作玩家努力实现全局目标。我们的研究旨在用当代基于梯度的方法取代SbPGs中传统的基于随机探索的临时性学习，旨在实现更快的收敛速度和更平滑的探索动态，从而缩短训练时间同时保持SbPGs的高效性。此外，我们提出了三种不同的变体来估计基于梯度学习的目标函数，每种变体都是根据所考虑系统的独特特征而开发的。为了验证我们的方法，我们将其应用于实验室测试平台，即Bulk Good Laboratory Plant，该平台代表了一个智能灵活的分布式多智能体生产系统。将基于梯度学习纳入SbPGs中可以缩短训练时间，并实现比基线更优化的策略。

更新时间: 2024-06-14 13:26:36

领域: cs.LG,cs.AI,cs.GT

下载: http://arxiv.org/abs/2406.10015v1

Beyond Slow Signs in High-fidelity Model Extraction

Deep neural networks, costly to train and rich in intellectual property value, are increasingly threatened by model extraction attacks that compromise their confidentiality. Previous attacks have succeeded in reverse-engineering model parameters up to a precision of float64 for models trained on random data with at most three hidden layers using cryptanalytical techniques. However, the process was identified to be very time consuming and not feasible for larger and deeper models trained on standard benchmarks. Our study evaluates the feasibility of parameter extraction methods of Carlini et al. [1] further enhanced by Canales-Mart\'inez et al. [2] for models trained on standard benchmarks. We introduce a unified codebase that integrates previous methods and reveal that computational tools can significantly influence performance. We develop further optimisations to the end-to-end attack and improve the efficiency of extracting weight signs by up to 14.8 times compared to former methods through the identification of easier and harder to extract neurons. Contrary to prior assumptions, we identify extraction of weights, not extraction of weight signs, as the critical bottleneck. With our improvements, a 16,721 parameter model with 2 hidden layers trained on MNIST is extracted within only 98 minutes compared to at least 150 minutes previously. Finally, addressing methodological deficiencies observed in previous studies, we propose new ways of robust benchmarking for future model extraction attacks.

Updated: 2024-06-14 13:24:07

标题: 在高保真模型提取中超越慢迹象

摘要: 深度神经网络在训练成本高、知识产权价值丰富的同时，越来越受到模型提取攻击的威胁，这些攻击危及了它们的保密性。先前的攻击成功地利用密码分析技术对在随机数据上训练的最多三个隐藏层的模型的参数进行了逆向工程，精度达到了float64。然而，这一过程被发现非常耗时，对于在标准基准上训练的更大更深的模型来说并不可行。我们的研究评估了Carlini等人提出的参数提取方法进一步由Canales-Mart\'inez等人增强的适用于在标准基准上训练的模型。我们引入了一个统一的代码库，集成了以前的方法，并揭示了计算工具可以显著影响性能。我们进一步优化了端到端攻击，并通过识别更容易和更难提取的神经元，将提取权重符号的效率提高了最多14.8倍，与以前的方法相比。与先前的假设相反，我们确定了权重的提取，而不是权重符号的提取，作为关键瓶颈。通过我们的改进，一个在MNIST上训练的具有16,721个参数和2个隐藏层的模型仅在98分钟内提取，而此前至少需要150分钟。最后，针对先前研究中观察到的方法论缺陷，我们提出了未来模型提取攻击的稳健基准测试的新方法。

更新时间: 2024-06-14 13:24:07

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.10011v1

Gradient Coding in Decentralized Learning for Evading Stragglers

In this paper, we consider a decentralized learning problem in the presence of stragglers. Although gradient coding techniques have been developed for distributed learning to evade stragglers, where the devices send encoded gradients with redundant training data, it is difficult to apply those techniques directly to decentralized learning scenarios. To deal with this problem, we propose a new gossip-based decentralized learning method with gradient coding (GOCO). In the proposed method, to avoid the negative impact of stragglers, the parameter vectors are updated locally using encoded gradients based on the framework of stochastic gradient coding and then averaged in a gossip-based manner. We analyze the convergence performance of GOCO for strongly convex loss functions. And we also provide simulation results to demonstrate the superiority of the proposed method in terms of learning performance compared with the baseline methods.

Updated: 2024-06-14 13:22:31

标题: 分散学习中的梯度编码以规避滞后者

摘要: 在本文中，我们考虑了在出现滞后的情况下的去中心化学习问题。尽管梯度编码技术已经被开发用于分布式学习以规避滞后者，其中设备发送带有冗余训练数据的编码梯度，但直接将这些技术应用于去中心化学习场景是困难的。为了解决这个问题，我们提出了一种新的基于八卦的梯度编码（GOCO）去中心化学习方法。在所提出的方法中，为了避免滞后者的负面影响，参数向量基于随机梯度编码框架使用编码梯度进行本地更新，然后以八卦方式进行平均。我们分析了GOCO对于强凸损失函数的收敛性能。我们还提供了模拟结果，以证明所提出的方法在学习性能方面相对于基线方法的优越性。

更新时间: 2024-06-14 13:22:31

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2402.04193v3

Improved Particle Approximation Error for Mean Field Neural Networks

Mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional defined over the space of probability distributions. MFLD has gained attention due to its connection with noisy gradient descent for mean-field two-layer neural networks. Unlike standard Langevin dynamics, the nonlinearity of the objective functional induces particle interactions, necessitating multiple particles to approximate the dynamics in a finite-particle setting. Recent works (Chen et al., 2022; Suzuki et al., 2023b) have demonstrated the uniform-in-time propagation of chaos for MFLD, showing that the gap between the particle system and its mean-field limit uniformly shrinks over time as the number of particles increases. In this work, we improve the dependence on logarithmic Sobolev inequality (LSI) constants in their particle approximation errors, which can exponentially deteriorate with the regularization coefficient. Specifically, we establish an LSI-constant-free particle approximation error concerning the objective gap by leveraging the problem structure in risk minimization. As the application, we demonstrate improved convergence of MFLD, sampling guarantee for the mean-field stationary distribution, and uniform-in-time Wasserstein propagation of chaos in terms of particle complexity.

Updated: 2024-06-14 13:20:06

标题: 均场神经网络的粒子逼近误差改进

摘要: Mean-field Langevin dynamics (MFLD) 通过最小化定义在概率分布空间上的熵正则非线性凸泛函来实现。由于与含噪梯度下降的平均场两层神经网络有关，MFLD引起了人们的关注。与标准 Langevin 动力学不同，客观函数的非线性性引起了粒子之间的相互作用，需要多个粒子在有限粒子设置下近似动力学。最近的研究（Chen 等，2022年；Suzuki 等，2023b年）证明了 MFLD 的时间一致混沌传播，表明随着粒子数量的增加，粒子系统与其平均场极限之间的差距会随时间统一缩小。在这项工作中，我们改进了粒子逼近误差中对数 Sobolev 不等式（LSI）常数的依赖性，这些常数在正则化系数的作用下可能呈指数恶化。具体来说，我们通过利用风险最小化中的问题结构，建立了一个与客观差距相关的无 LSI 常数的粒子逼近误差。作为应用，我们展示了 MFLD 的改进收敛性，平均场稳态分布的采样保证，以及粒子复杂度方面的时间一致 Wasserstein 混沌传播。

更新时间: 2024-06-14 13:20:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.15767v2

An elementary proof of a universal approximation theorem

In this short note, we give an elementary proof of a universal approximation theorem for neural networks with three hidden layers and increasing, continuous, bounded activation function. The result is weaker than the best known results, but the proof is elementary in the sense that no machinery beyond undergraduate analysis is used.

Updated: 2024-06-14 13:16:48

标题: 一个通用逼近定理的初等证明

摘要: 在这个简短的说明中，我们给出了一个对于具有三个隐藏层和递增、连续、有界激活函数的神经网络的通用逼近定理的初等证明。该结果比已知的最佳结果要弱，但证明是初等的，即没有使用本科分析之外的工具。

更新时间: 2024-06-14 13:16:48

领域: cs.LG

下载: http://arxiv.org/abs/2406.10002v1

Relating tSNE and UMAP to Classical Dimensionality Reduction

It has become standard to use gradient-based dimensionality reduction (DR) methods like tSNE and UMAP when explaining what AI models have learned. This makes sense: these methods are fast, robust, and have an uncanny ability to find semantic patterns in high-dimensional data without supervision. Despite this, gradient-based DR methods lack the most important quality that an explainability method should possess: themselves being explainable. That is, given a UMAP output, it is currently unclear what one can say about the corresponding input. We work towards closing this question by relating UMAP to classical DR techniques. Specifically, we show that one can fully recover methods like PCA, MDS, and ISOMAP in the modern DR paradigm: by applying attractions and repulsions onto a randomly initialized dataset. We also show that, with a small change, Locally Linear Embeddings (LLE) can indistinguishably reproduce UMAP outputs. This implies that the UMAP effective objective is minimized by this modified version of LLE (and vice versa). Given this, we discuss what must be true of UMAP emebddings and present avenues for future work.

Updated: 2024-06-14 13:16:00

标题: 将tSNE和UMAP与经典降维方法联系起来

摘要: 已经成为标准使用基于梯度的降维（DR）方法，如tSNE和UMAP来解释AI模型学到了什么。这是有道理的：这些方法快速、稳健，并且具有在高维数据中无监督地发现语义模式的神奇能力。尽管如此，基于梯度的降维方法缺乏可解释性方法应具备的最重要品质：即本身具有可解释性。也就是说，给定一个UMAP输出，目前尚不清楚对应输入可以说什么。我们致力于通过将UMAP与经典DR技术联系起来来解决这个问题。具体来说，我们展示了可以完全恢复像PCA、MDS和ISOMAP这样的方法在现代DR范式中：通过将吸引和排斥施加到一个随机初始化的数据集上。我们还展示了，通过微小改变，局部线性嵌入（LLE）可以无法区分地重现UMAP输出。这意味着UMAP的有效目标通过这种修改版本的LLE被最小化（反之亦然）。鉴于此，我们讨论了UMAP嵌入必须符合的条件，并提出了未来工作的方向。

更新时间: 2024-06-14 13:16:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.11898v2

Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors

While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study discusses a new approach to scale up urban sensing of people with the help of novel audio-based technology. It assesses the benefits and limitations of microphone-based sensors as compared to other forms of pedestrian sensing. A large-scale dataset called ASPED is presented, which includes high-quality audio recordings along with video recordings used for labeling the pedestrian count data. The baseline analyses highlight the promise of using audio sensors for pedestrian tracking, although algorithmic and technological improvements to make the sensors practically usable continue. This study also demonstrates how the data can be leveraged to predict pedestrian trajectories. Finally, it discusses the use cases and scenarios where audio-based pedestrian sensing can support better urban and transportation planning.

Updated: 2024-06-14 13:15:18

标题: 理解行人运动：利用城市感知技术的音频传感器的潜力

摘要: 尽管各种传感器已经部署用于监测车辆流量，但监测行人活动仍处于初级阶段。然而，步行在许多城市，特别是欧洲、非洲和亚洲的城市中是一种重要的出行方式。了解行人数量和流量对设计更安全、更具吸引力的行人基础设施以及控制周期性拥挤至关重要。本研究讨论了一种利用新型基于音频技术扩大城市感知人群的方法。它评估了基于麦克风传感器与其他形式的行人感知相比的优缺点。介绍了一个名为ASPED的大规模数据集，其中包括高质量的音频记录以及用于标记行人计数数据的视频记录。基线分析突显了使用音频传感器进行行人跟踪的潜力，尽管算法和技术的改进仍在继续以使传感器实际可用。该研究还展示了如何利用数据来预测行人轨迹。最后，它讨论了基于音频的行人感知可以支持更好的城市和交通规划的用例和场景。

更新时间: 2024-06-14 13:15:18

领域: eess.AS,cs.AI,cs.LG,cs.MM,cs.SD

下载: http://arxiv.org/abs/2406.09998v1

Towards Scalable and Versatile Weight Space Learning

Learning representations of well-trained neural network models holds the promise to provide an understanding of the inner workings of those models. However, previous work has either faced limitations when processing larger networks or was task-specific to either discriminative or generative tasks. This paper introduces the SANE approach to weight-space learning. SANE overcomes previous limitations by learning task-agnostic representations of neural networks that are scalable to larger models of varying architectures and that show capabilities beyond a single task. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights, thus allowing one to embed larger neural networks as a set of tokens into the learned representation space. SANE reveals global model information from layer-wise embeddings, and it can sequentially generate unseen neural network models, which was unattainable with previous hyper-representation learning methods. Extensive empirical evaluation demonstrates that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger ResNet architectures.

Updated: 2024-06-14 13:12:07

标题: 朝向可扩展和多功能的权重空间学习

摘要: 学习经过良好训练的神经网络模型的表示形式有望理解这些模型的内部运作方式。然而，先前的工作在处理更大的网络时要么面临限制，要么限于区分性或生成性任务。本文介绍了SANE方法来学习权重空间。SANE通过学习可扩展到不同结构的更大模型的任务无关表示形式，克服了先前的限制，并展现出超越单一任务的能力。我们的方法将超级表示法的思想扩展到对神经网络权重子集的顺序处理，从而使人能够将更大的神经网络嵌入到学习表示空间中作为一组令牌。SANE从逐层嵌入中揭示了全局模型信息，并能够顺序生成之前无法实现的神经网络模型，这是以前的超级表示学习方法所无法做到的。大量的实证评估表明，SANE在几个权重表示学习基准测试中与或超过了最先进的性能，特别是在新任务的初始化和更大的ResNet架构中。

更新时间: 2024-06-14 13:12:07

领域: cs.LG

下载: http://arxiv.org/abs/2406.09997v1

A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models

As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations, such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, Retrieval-Augmented Large Language Models (RA-LLMs) have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the generation quality of LLMs. In this survey, we comprehensively review existing research studies in RA-LLMs, covering three primary technical perspectives: architectures, training strategies, and applications. As the preliminary knowledge, we briefly introduce the foundations and recent advances of LLMs. Then, to illustrate the practical significance of RAG for LLMs, we systematically review mainstream relevant work by their architectures, training strategies, and application areas, detailing specifically the challenges of each and the corresponding capabilities of RA-LLMs. Finally, to deliver deeper insights, we discuss current limitations and several promising directions for future research. Updated information about this survey can be found at https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/

Updated: 2024-06-14 13:07:27

标题: 调查RAG会议LLM：朝着检索增强的大型语言模型

摘要: 作为人工智能中最先进的技术之一，检索增强生成（RAG）可以提供可靠和最新的外部知识，为许多任务提供巨大便利。特别是在人工智能生成内容（AIGC）时代，检索在提供额外知识方面的强大能力使RAG能够协助现有的生成型人工智能产生高质量的输出。最近，大型语言模型（LLMs）在语言理解和生成方面展示了革命性的能力，但仍面临固有的局限，如幻觉和过时的内部知识。鉴于RAG提供最新和有用辅助信息的强大能力，检索增强大型语言模型（RA-LLMs）已经出现，以利用外部和权威知识库，而不仅仅依赖于模型的内部知识，以增强LLMs的生成质量。在这份调查中，我们全面审查了RA-LLMs中现有研究，涵盖了三个主要技术角度：架构、训练策略和应用。作为初步知识，我们简要介绍了LLMs的基础和最新进展。然后，为了说明RAG对LLMs的实际重要性，我们系统地审查了主流相关工作，通过它们的架构、训练策略和应用领域，具体详细介绍了每个挑战以及RA-LLMs的相应能力。最后，为了提供更深入的见解，我们讨论了目前的限制以及未来研究的几个有希望的方向。关于这项调查的更新信息可以在https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/找到。

更新时间: 2024-06-14 13:07:27

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.06211v2

Provably Safe Neural Network Controllers via Differential Dynamic Logic

While neural networks (NNs) have potential as autonomous controllers for Cyber-Physical Systems, verifying the safety of NN based control systems (NNCSs) poses significant challenges for the practical use of NNs, especially when safety is needed for unbounded time horizons. One reason is the intractability of analyzing NNs, ODEs and hybrid systems. To this end, we introduce VerSAILLE (Verifiably Safe AI via Logically Linked Envelopes): The first general approach that allows reusing control theory results for NNCS verification. By joining forces, we exploit the efficiency of NN verification tools while retaining the rigor of differential dynamic logic (dL). Based on provably safe control envelopes in dL, we derive specifications for the NN which is proven via NN verification. We show that a proof of the NN adhering to the specification is mirrored by a dL proof on the infinite-time safety of the NNCS. The NN verification properties resulting from hybrid systems typically contain nonlinear arithmetic and arbitrary logical structures while efficient NN verification merely supports linear constraints. To overcome this divide, we present Mosaic: An efficient, sound and complete verification approach for polynomial real arithmetic properties on piece-wise linear NNs. Mosaic partitions complex verification queries into simple queries and lifts off-the-shelf linear constraint tools to the nonlinear setting in a completeness-preserving manner by combining approximation with exact reasoning for counterexample regions. Our evaluation demonstrates the versatility of VerSAILLE and Mosaic: We prove infinite-time safety on the classical Vertical Airborne Collision Avoidance NNCS verification benchmark for two scenarios while (exhaustively) enumerating counterexample regions in unsafe scenarios. We also show that our approach significantly outperforms State-of-the-Art tools in closed-loop NNV.

Updated: 2024-06-14 13:05:01

标题: 《利用差分动态逻辑实现神经网络控制器的可证安全性》

摘要: 尽管神经网络（NNs）在控制物理系统的自主控制器方面具有潜力，但验证基于神经网络的控制系统（NNCSs）的安全性对于实际应用神经网络而言存在重大挑战，特别是当需要针对无限时间范围进行安全性验证时。一个原因是分析神经网络、ODEs和混合系统的复杂性。为此，我们引入了VerSAILLE（通过逻辑链接的信封可靠地验证人工智能）：这是第一个通用方法，允许重复使用控制理论结果用于NNCS验证。通过合作，我们利用了NN验证工具的效率，同时保留了微分动态逻辑（dL）的严谨性。基于dL中的可证安全控制信封，我们推导出NN的规范，通过NN验证来证明。我们表明，NN遵守规范的证明通过无限时间安全性的dL证明来反映NNCS的安全性。由混合系统导致的NN验证属性通常包含非线性算术和任意逻辑结构，而高效的NN验证仅支持线性约束。为了克服这一障碍，我们提出了Mosaic：一种用于分段线性NN的多项式实数算术性质的高效、完全的验证方法。Mosaic将复杂的验证查询划分为简单的查询，并通过在具有线性约束工具的非线性设置中结合逼近和准确推理来保持完整性，从而在不安全情况下组合近似和精确推理以保持完整性。我们的评估展示了VerSAILLE和Mosaic的多功能性：我们在经典的垂直空中避撞NNCS验证基准测试中证明了两种情况下的无限时间安全性，同时（全面地）枚举了不安全情况下的反例区域。我们还展示了我们的方法在闭环NNV中显着优于现有技术工具。

更新时间: 2024-06-14 13:05:01

领域: cs.SY,cs.AI,cs.LG,cs.LO

下载: http://arxiv.org/abs/2402.10998v2

Walking Noise: On Layer-Specific Robustness of Neural Architectures against Noisy Computations and Associated Characteristic Learning Dynamics

Deep neural networks are extremely successful in various applications, however they exhibit high computational demands and energy consumption. This is exacerbated by stuttering technology scaling, prompting the need for novel approaches to handle increasingly complex neural architectures. At the same time, alternative computing technologies such as analog computing, which promise groundbreaking improvements in energy efficiency, are inevitably fraught with noise and inaccurate calculations. Such noisy computations are more energy efficient, and, given a fixed power budget, also more time efficient. However, like any kind of unsafe optimization, they require countermeasures to ensure functionally correct results. This work considers noisy computations in an abstract form, and gears to understand the implications of such noise on the accuracy of neural network classifiers as an exemplary workload. We propose a methodology called Walking Noise which injects layer-specific noise to measure the robustness and to provide insights on the learning dynamics. In more detail, we investigate the implications of additive, multiplicative and mixed noise for different classification tasks and model architectures. While noisy training significantly increases robustness for all noise types, we observe in particular that it results in increased weight magnitudes and thus inherently improves the signal-to-noise ratio for additive noise injection. Contrarily, training with multiplicative noise can lead to a form of self-binarization of the model parameters, leading to extreme robustness. We conclude with a discussion of the use of this methodology in practice, among others, discussing its use for tailored multi-execution in noisy environments.

Updated: 2024-06-14 13:04:54

标题: 行走的噪音：神经架构对嘈杂计算的层特定稳健性及相关特征学习动态

摘要: 深度神经网络在各种应用中非常成功，但它们表现出高计算需求和能源消耗。这在技术扩展的过程中变得更加严重，促使需要新颖的方法来处理越来越复杂的神经结构。与此同时，诸如模拟计算之类的替代计算技术承诺在能源效率方面取得突破性改进，但不可避免地存在噪声和不准确的计算。这些嘈杂的计算更加节能，并且在固定功率预算的情况下，也更加高效。然而，像任何不安全的优化一样，它们需要采取措施来确保功能正确的结果。本文将嘈杂的计算抽象化，并致力于理解这种噪声对神经网络分类器准确性的影响，作为一个典型的工作负载。我们提出了一种称为Walking Noise的方法，通过注入特定于层的噪声来衡量鲁棒性，并提供关于学习动态的见解。更详细地，我们调查了对不同分类任务和模型架构的加法、乘法和混合噪声的影响。虽然嘈杂训练显著增加了所有噪声类型的鲁棒性，但我们特别观察到，它导致了增加的权重幅度，从而在加法噪声注入时从根本上改善了信噪比。相反，使用乘法噪声进行训练可能导致模型参数的一种自我二值化，从而导致极端的鲁棒性。我们最后讨论了这种方法在实践中的应用，其中包括讨论其在嘈杂环境中定制多执行的用途。

更新时间: 2024-06-14 13:04:54

领域: cs.LG,cs.AR,cs.ET

下载: http://arxiv.org/abs/2212.10430v2

GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions

We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. Most existing approaches predict volumetric density to render multi-view consistent images. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained, limiting the quality and utility of the output meshes. To address this issue, we propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner. Initially, we reinterpret the volumetric density as a Signed Distance Function (SDF). This allows us to introduce useful priors to generate valid meshes. However, those priors prevent the generative model from learning details, limiting the applicability of the method to real-world scenarios. To alleviate that problem, we make the transformation learnable and constrain the rendered depth map to be consistent with the zero-level set of the SDF. Through the lens of adversarial training, we encourage the network to produce higher fidelity details on the output meshes. For evaluation, we introduce a synthetic dataset of human avatars captured from 360-degree camera angles, to overcome the challenges presented by real-world datasets, which often lack 3D consistency and do not cover all camera angles. Our experiments on multiple datasets show that GeoGen produces visually and quantitatively better geometry than the previous generative models based on neural radiance fields.

Updated: 2024-06-14 12:58:05

标题: GeoGen：通过符号距离函数的几何意识生成建模

摘要: 我们引入了一种新的生成方法，用于从单视图集合中合成3D几何形状和图像。大多数现有方法预测体积密度以渲染多视图一致的图像。通过使用神经辐射场进行体积渲染，它们继承了一个关键的限制：生成的几何形状具有噪声和不受约束，限制了输出网格的质量和实用性。为了解决这个问题，我们提出了GeoGen，一种新的基于SDF的3D生成模型，以端到端的方式进行训练。最初，我们将体积密度重新解释为带符号距离函数（SDF）。这使我们能够引入有用的先验信息以生成有效的网格。然而，这些先验条件阻止了生成模型学习细节，限制了该方法对真实世界场景的适用性。为了缓解这个问题，我们使变换可学习，并约束渲染的深度图与SDF的零级集保持一致。通过对抗训练的视角，我们鼓励网络在输出网格上产生更高保真度的细节。为了评估，我们引入了一个从360度摄像机角度捕获的合成人体化身数据集，以克服真实世界数据集所面临的挑战，这些数据集通常缺乏3D一致性并且不涵盖所有摄像机角度。我们在多个数据集上的实验表明，GeoGen比基于神经辐射场的先前生成模型产生了在视觉和数量上更好的几何形状。

更新时间: 2024-06-14 12:58:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.04254v3

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance in the case where metadata is partially unavailable, which may occur in real-world clinical setting.

Updated: 2024-06-14 12:57:53

标题: BTS：桥接文本和声音模态以辅助呼吸音分类

摘要: 呼吸声分类（RSC）具有挑战性，主要受患者人口统计信息和录音环境的影响，导致声音具有多样的声学特征。为解决这一问题，我们引入了一种文本-音频多模态模型，利用呼吸声的元数据，为RSC提供有用的补充信息。具体而言，我们使用从声音样本的元数据中提取的自由文本描述来微调预训练的文本-音频多模态模型，其中包括患者的性别和年龄、录音设备类型以及患者身体上的录音位置。我们的方法在ICBHI数据集上实现了最先进的性能，超过先前最佳结果1.17%。这一结果验证了利用元数据和呼吸声样本增强RSC性能的有效性。此外，我们还研究了在部分元数据不可用的情况下模型的性能，这可能在真实临床设置中出现。

更新时间: 2024-06-14 12:57:53

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.06786v2

Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation. However, detecting an object's state and generating a state-sensitive plan for robots is challenging. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. However, to the best of our knowledge, there is hardly any investigation on whether LLMs or VLMs can also generate object state-sensitive plans. To study this, we introduce an Object State-Sensitive Agent (OSSA), a task-planning agent empowered by pre-trained neural networks. We propose two methods for OSSA: (i) a modular model consisting of a pre-trained vision processing module (dense captioning model, DCM) and a natural language processing model (LLM), and (ii) a monolithic model consisting only of a VLM. To quantitatively evaluate the performances of the two methods, we use tabletop scenarios where the task is to clear the table. We contribute a multimodal benchmark dataset that takes object states into consideration. Our results show that both methods can be used for object state-sensitive tasks, but the monolithic approach outperforms the modular approach. The code for OSSA is available at \url{https://github.com/Xiao-wen-Sun/OSSA}

Updated: 2024-06-14 12:52:42

标题: 细节至关重要：对象状态敏感的神经机器人任务规划

摘要: 物体的状态反映了其当前的状态或状况，对于机器人的任务规划和操作至关重要。然而，检测物体的状态并为机器人生成一个与状态相关的计划是具有挑战性的。最近，预训练的大型语言模型（LLMs）和视觉语言模型（VLMs）展现出在生成计划方面令人印象深刻的能力。然而，据我们所知，几乎没有对LLMs或VLMs能否生成物体状态敏感计划进行过研究。为了研究这一点，我们引入了一个物体状态敏感代理（OSSA），这是一个由预训练神经网络赋能的任务规划代理。我们提出了两种方法：（i）一个模块化模型，由一个预训练视觉处理模块（密集字幕模型，DCM）和一个自然语言处理模型（LLM）组成；（ii）一个仅由VLM组成的单块模型。为了定量评估这两种方法的性能，我们使用桌面场景，任务是清理桌子。我们贡献了一个多模态基准数据集，考虑了物体的状态。我们的结果显示，这两种方法都可以用于物体状态敏感任务，但单块方法优于模块化方法。OSSA的代码可在\url{https://github.com/Xiao-wen-Sun/OSSA} 上找到。

更新时间: 2024-06-14 12:52:42

领域: cs.AI,cs.CL,cs.RO

下载: http://arxiv.org/abs/2406.09988v1

Self-Supervised and Few-Shot Learning for Robust Bioaerosol Monitoring

Real-time bioaerosol monitoring is improving the quality of life for people affected by allergies, but it often relies on deep-learning models which pose challenges for widespread adoption. These models are typically trained in a supervised fashion and require considerable effort to produce large amounts of annotated data, an effort that must be repeated for new particles, geographical regions, or measurement systems. In this work, we show that self-supervised learning and few-shot learning can be combined to classify holographic images of bioaerosol particles using a large collection of unlabelled data and only a few examples for each particle type. We first demonstrate that self-supervision on pictures of unidentified particles from ambient air measurements enhances identification even when labelled data is abundant. Most importantly, it greatly improves few-shot classification when only a handful of labelled images are available. Our findings suggest that real-time bioaerosol monitoring workflows can be substantially optimized, and the effort required to adapt models for different situations considerably reduced.

Updated: 2024-06-14 12:48:26

标题: 自监督学习和少样本学习用于稳健的生物气溶胶监测

摘要: 实时生物气溶胶监测正在改善受过敏影响的人们的生活质量，但通常依赖于深度学习模型，这对于广泛采用提出挑战。这些模型通常是以监督方式训练的，并需要大量的标注数据，这需要大量的工作量，必须为新的颗粒、地理区域或测量系统重复这一工作。在这项工作中，我们展示了自监督学习和少样本学习可以结合使用，使用大量未标记数据和每种颗粒类型仅几个示例来对生物气溶胶颗粒的全息图像进行分类。我们首先证明，对来自环境空气测量的未知颗粒的图片进行自监督可以增强识别，即使标记数据丰富。最重要的是，当只有少数标记图像可用时，它极大地改进了少样本分类。我们的发现表明，可以大幅优化实时生物气溶胶监测工作流程，并显著减少为不同情况调整模型所需的工作量。

更新时间: 2024-06-14 12:48:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.09984v1

Challenges in explaining deep learning models for data with biological variation

Much machine learning research progress is based on developing models and evaluating them on a benchmark dataset (e.g., ImageNet for images). However, applying such benchmark-successful methods to real-world data often does not work as expected. This is particularly the case for biological data where we expect variability at multiple time and spatial scales. In this work, we are using grain data and the goal is to detect diseases and damages. Pink fusarium, skinned grains, and other diseases and damages are key factors in setting the price of grains or excluding dangerous grains from food production. Apart from challenges stemming from differences of the data from the standard toy datasets, we also present challenges that need to be overcome when explaining deep learning models. For example, explainability methods have many hyperparameters that can give different results, and the ones published in the papers do not work on dissimilar images. Other challenges are more general: problems with visualization of the explanations and their comparison since the magnitudes of their values differ from method to method. An open fundamental question also is: How to evaluate explanations? It is a non-trivial task because the "ground truth" is usually missing or ill-defined. Also, human annotators may create what they think is an explanation of the task at hand, yet the machine learning model might solve it in a different and perhaps counter-intuitive way. We discuss several of these challenges and evaluate various post-hoc explainability methods on grain data. We focus on robustness, quality of explanations, and similarity to particular "ground truth" annotations made by experts. The goal is to find the methods that overall perform well and could be used in this challenging task. We hope the proposed pipeline will be used as a framework for evaluating explainability methods in specific use cases.

Updated: 2024-06-14 12:44:04

标题: 解释具有生物变异数据的深度学习模型面临的挑战

摘要: 许多机器学习研究进展是基于开发模型并在基准数据集上评估它们（例如，对于图像来说是ImageNet）。然而，将这些基准成功的方法应用于真实世界的数据往往并不像预期的那样有效。这在生物数据中尤为明显，因为我们预期在多个时间和空间尺度上存在变异性。在这项工作中，我们正在使用粮食数据，目标是检测疾病和损害。粉红色镰刀菌、剥皮谷物和其他疾病和损害是确定谷物价格或排除食品生产中危险谷物的关键因素。除了源自数据与标准玩具数据集的差异带来的挑战，我们还提出需要克服的解释深度学习模型的挑战。例如，解释方法有许多可以产生不同结果的超参数，而且论文中发布的方法不适用于不同的图像。其他挑战更一般化：解释的可视化问题以及它们的比较，因为其值的幅度因方法而异。一个重要的基本问题是：如何评估解释？这是一个非平凡的任务，因为“地面真相”通常缺失或定义不清。此外，人类标注者可能会创建他们认为是任务解释的内容，然而机器学习模型可能会以一种不同甚至违反直觉的方式解决问题。我们讨论了这些挑战中的几个，并在粮食数据上评估了各种后期解释方法。我们关注鲁棒性、解释质量以及与专家制定的特定“地面真相”注释的相似性。目标是找到总体表现良好并且可以在这项具有挑战性任务中使用的方法。我们希望所提出的流程将被用作评估特定用例中解释方法的框架。

更新时间: 2024-06-14 12:44:04

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.09981v1

HIRO: Hierarchical Information Retrieval Optimization

Large Language Models (LLMs) excel in natural language tasks but face limitations due to static training datasets, resulting in outdated or contextually shallow responses. Retrieval-Augmented Generation (RAG) addresses this by integrating real-time external knowledge, enhancing model accuracy and credibility, especially for knowledge-intensive tasks. However, RAG-enhanced LLMs struggle with long contexts, causing them to "choke" on information overload, compromising response quality. Recent RAG applications use hierarchical data structures for storing documents, organized at various levels of summarization and information density. In this context, we introduce HIRO (Hierarchical Information Retrieval Optimization), a novel querying approach for RAG applications using hierarchical structures for storing documents. HIRO employs DFS-based recursive similarity score calculation and branch pruning to minimize the context returned to the LLM without informational loss. HIRO outperforms existing querying mechanisms on the NarrativeQA dataset by an absolute performance gain of 10.85%.

Updated: 2024-06-14 12:41:07

标题: HIRO: 分层信息检索优化

摘要: 大型语言模型（LLMs）在自然语言任务中表现出色，但由于静态训练数据集的限制，导致过时或上下文浅薄的响应。检索增强生成（RAG）通过整合实时外部知识来解决这一问题，提高模型的准确性和可信度，特别适用于知识密集型任务。然而，RAG增强的LLMs在处理长上下文时面临困难，导致它们在信息过载时“窒息”，影响响应质量。最近的RAG应用使用分层数据结构来存储文档，以不同层次的总结和信息密度进行组织。在这种情况下，我们引入了HIRO（分层信息检索优化），这是一种新颖的查询方法，用于使用分层结构存储文档的RAG应用。HIRO采用基于DFS的递归相似度得分计算和分支修剪，以最小化返回给LLM的上下文而不丢失信息。HIRO在NarrativeQA数据集上的表现优于现有的查询机制，绝对性能提升了10.85%。

更新时间: 2024-06-14 12:41:07

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.09979v1

Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model

Reinforcement learning has demonstrated impressive performance in various challenging problems such as robotics, board games, and classical arcade games. However, its real-world applications can be hindered by the absence of robustness and safety in the learned policies. More specifically, an RL agent that trains in a certain Markov decision process (MDP) often struggles to perform well in nearly identical MDPs. To address this issue, we employ the framework of Robust MDPs (RMDPs) in a model-based setting and introduce a novel learned transition model. Our method specifically incorporates an auxiliary pessimistic model, updated adversarially, to estimate the worst-case MDP within a Kullback-Leibler uncertainty set. In comparison to several existing works, our work does not impose any additional conditions on the training environment, such as the need for a parametric simulator. To test the effectiveness of the proposed pessimistic model in enhancing policy robustness, we integrate it into a practical RL algorithm, called Robust Model-Based Policy Optimization (RMBPO). Our experimental results indicate a notable improvement in policy robustness on high-dimensional MuJoCo control tasks, with the auxiliary model enhancing the performance of the learned policy in distorted MDPs. We further explore the learned deviation between the proposed auxiliary world model and the nominal model, to examine how pessimism is achieved. By learning a pessimistic world model and demonstrating its role in improving policy robustness, our research contributes towards making (model-based) RL more robust.

Updated: 2024-06-14 12:37:08

标题: 鲁棒模型基础的强化学习与对抗性辅助模型

摘要: 强化学习在各种具有挑战性的问题中展现出令人印象深刻的表现，比如机器人技术、棋盘游戏和经典街机游戏。然而，其在现实世界中的应用可能会受到学习策略缺乏稳健性和安全性的阻碍。更具体地说，在某个马尔可夫决策过程（MDP）中训练的强化学习代理通常会在几乎相同的MDP中表现不佳。为了解决这个问题，我们在基于模型的设置中采用了Robust MDPs（RMDPs）框架，并引入了一种新颖的学习转移模型。我们的方法特别包括一个辅助悲观模型，通过对抗更新来估计Kullback-Leibler不确定性集合中的最坏情况MDP。与几种现有作品相比，我们的工作不对训练环境施加任何额外条件，比如需要一个参数化模拟器。为了测试所提出的悲观模型在增强策略稳健性方面的有效性，我们将其集成到一个实用的强化学习算法中，称为Robust Model-Based Policy Optimization（RMBPO）。我们的实验结果表明，在高维度MuJoCo控制任务中，策略稳健性得到了显著改善，辅助模型提高了在扭曲的MDP中学到的策略的性能。我们进一步探讨了所提出的辅助世界模型与名义模型之间的学习偏差，以检验悲观主义是如何实现的。通过学习悲观的世界模型并展示其在改善策略稳健性方面的作用，我们的研究有助于使（基于模型的）强化学习更加稳健。

更新时间: 2024-06-14 12:37:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09976v1

Robust Latent Representation Tuning for Image-text Classification

Large models have demonstrated exceptional generalization capabilities in computer vision and natural language processing. Recent efforts have focused on enhancing these models with multimodal processing abilities. However, addressing the challenges posed by scenarios where one modality is absent remains a significant hurdle. In response to this issue, we propose a robust latent representation tuning method for large models. Specifically, our approach introduces a modality latent translation module to maximize the correlation between modalities, resulting in a robust representation. Following this, a newly designed fusion module is employed to facilitate information interaction between the modalities. Within this framework, common semantics are refined during training, and robust performance is achieved even in the absence of one modality. Importantly, our method maintains the frozen state of the image and text foundation models to preserve their capabilities acquired through large-scale pretraining. We conduct experiments on several public datasets, and the results underscore the effectiveness of our proposed method.

Updated: 2024-06-14 12:29:19

标题: 稳健的潜在表示调整用于图像文本分类

摘要: 大型模型在计算机视觉和自然语言处理中展现出了出色的泛化能力。最近的研究工作集中在通过多模态处理能力来增强这些模型。然而，在缺失一个模态的情况下，解决由此带来的挑战仍然是一个重要障碍。针对这个问题，我们提出了一种用于大型模型的稳健的潜在表示调整方法。具体地，我们的方法引入了一个模态潜在翻译模块，以最大化模态之间的相关性，从而产生一个稳健的表示。随后，我们采用了一个新设计的融合模块来促进模态之间的信息交互。在这个框架内，共同的语义在训练过程中被细化，并且即使缺失一个模态，也能够实现稳健的性能。重要的是，我们的方法保持了图像和文本基础模型的冻结状态，以保留它们通过大规模预训练获得的能力。我们在几个公共数据集上进行了实验，结果强调了我们提出方法的有效性。

更新时间: 2024-06-14 12:29:19

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.06048v2

Deep learning empowered sensor fusion to improve infant movement classification

There is a recent boom in the development of AI solutions to facilitate and enhance diagnostic procedures for established clinical tools. To assess the integrity of the developing nervous system, the Prechtl general movement assessment (GMA) is recognized for its clinical value in diagnosing neurological impairments in early infancy. GMA has been increasingly augmented through machine learning approaches intending to scale-up its application, circumvent costs in the training of human assessors and further standardize classification of spontaneous motor patterns. Available deep learning tools, all of which are based on single sensor modalities, are however still considerably inferior to that of well-trained human assessors. These approaches are hardly comparable as all models are designed, trained and evaluated on proprietary/silo-data sets. With this study we propose a sensor fusion approach for assessing fidgety movements (FMs) comparing three different sensor modalities (pressure, inertial, and visual sensors). Various combinations and two sensor fusion approaches (late and early fusion) for infant movement classification were tested to evaluate whether a multi-sensor system outperforms single modality assessments. The performance of the three-sensor fusion (classification accuracy of 94.5\%) was significantly higher than that of any single modality evaluated, suggesting the sensor fusion approach is a promising avenue for automated classification of infant motor patterns. The development of a robust sensor fusion system may significantly enhance AI-based early recognition of neurofunctions, ultimately facilitating automated early detection of neurodevelopmental conditions.

Updated: 2024-06-14 12:24:54

标题: 深度学习增强的传感器融合以改善婴儿运动分类

摘要: 最近，人工智能解决方案的发展出现了繁荣，以促进和增强已建立的临床工具的诊断程序。为了评估发育中的神经系统的完整性，Prechtl一般运动评估（GMA）被认为在早期婴儿诊断神经系统损伤方面具有临床价值。通过机器学习方法不断增强GMA，旨在扩大其应用范围，避免培训人类评估者的成本，并进一步标准化自发运动模式的分类。然而，目前所有基于单一传感器模式的可用深度学习工具仍然远远不及经过良好训练的人类评估者。这些方法很难比较，因为所有模型都是设计、训练和评估在专有/孤立数据集上。通过这项研究，我们提出了一种传感器融合方法，用于评估烦躁运动（FMs），比较了三种不同的传感器模态（压力、惯性和视觉传感器）。测试了各种组合和两种传感器融合方法（后期和前期融合）用于评估婴儿运动分类，以评估多传感器系统是否表现优于单一模态评估。三传感器融合的性能（94.5％的分类准确性）显著高于任何单一模态评估的性能，表明传感器融合方法是自动分类婴儿运动模式的有希望途径。建立一个强大的传感器融合系统可能显著提升基于人工智能的早期神经功能识别，最终促进神经发育状况的自动早期检测。

更新时间: 2024-06-14 12:24:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09014v2

Adaptive Robust Learning using Latent Bernoulli Variables

We present an adaptive approach for robust learning from corrupted training sets. We identify corrupted and non-corrupted samples with latent Bernoulli variables and thus formulate the learning problem as maximization of the likelihood where latent variables are marginalized. The resulting problem is solved via variational inference, using an efficient Expectation-Maximization based method. The proposed approach improves over the state-of-the-art by automatically inferring the corruption level, while adding minimal computational overhead. We demonstrate our robust learning method and its parameter-free nature on a wide variety of machine learning tasks including online learning and deep learning where it adapts to different levels of noise and maintains high prediction accuracy.

Updated: 2024-06-14 12:19:30

标题: 使用潜在伯努利变量的自适应稳健学习

摘要: 我们提出了一种自适应方法，用于从受损的训练集中进行稳健学习。我们使用潜在的伯努利变量来识别受损和非受损样本，从而将学习问题表述为最大化概率的问题，其中潜在变量被边缘化。通过变分推断，使用一种基于有效的期望最大化的方法解决了这个问题。所提出的方法通过自动推断损坏水平来改进最新技术，同时增加了最小的计算开销。我们在各种机器学习任务上展示了我们的稳健学习方法及其无需参数的特性，包括在线学习和深度学习，在这些任务中，该方法能够适应不同水平的噪声并保持高预测准确性。

更新时间: 2024-06-14 12:19:30

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2312.00585v2

Impact of Speech Mode in Automatic Pathological Speech Detection

Automatic pathological speech detection approaches yield promising results in identifying various pathologies. These approaches are typically designed and evaluated for phonetically-controlled speech scenarios, where speakers are prompted to articulate identical phonetic content. While gathering controlled speech recordings can be laborious, spontaneous speech can be conveniently acquired as potential patients navigate their daily routines. Further, spontaneous speech can be valuable in detecting subtle and abstract cues of pathological speech. Nonetheless, the efficacy of automatic pathological speech detection for spontaneous speech remains unexplored. This paper analyzes the influence of speech mode on pathological speech detection approaches, examining two distinct categories of approaches, i.e., classical machine learning and deep learning. Results indicate that classical approaches may struggle to capture pathology-discriminant cues in spontaneous speech. In contrast, deep learning approaches demonstrate superior performance, managing to extract additional cues that were previously inaccessible in non-spontaneous speech

Updated: 2024-06-14 12:19:18

标题: 语音模式对自动病态语音检测的影响

摘要: 自动病理语音检测方法在识别各种病理方面取得了令人满意的结果。这些方法通常是为音素受控语音场景而设计和评估的，其中发言者被提示表达相同的音素内容。虽然收集受控语音录音可能是费力的，但随机语音可以方便地在潜在患者日常事务中获得。此外，随机语音在检测病理语音的微妙和抽象线索方面可能是有价值的。然而，自动病理语音检测对随机语音的功效仍未被探讨。本文分析了语音模式对病理语音检测方法的影响，考察了两种不同类别的方法，即传统机器学习和深度学习。结果表明，传统方法可能难以捕捉随机语音中的病理鉴别线索。相反，深度学习方法表现出优越的性能，能够提取在非随机语音中以前无法获取的额外线索。

更新时间: 2024-06-14 12:19:18

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.09968v1

Outlier detection in maritime environments using AIS data and deep recurrent architectures

A methodology based on deep recurrent models for maritime surveillance, over publicly available Automatic Identification System (AIS) data, is presented in this paper. The setup employs a deep Recurrent Neural Network (RNN)-based model, for encoding and reconstructing the observed ships' motion patterns. Our approach is based on a thresholding mechanism, over the calculated errors between observed and reconstructed motion patterns of maritime vessels. Specifically, a deep-learning framework, i.e. an encoder-decoder architecture, is trained using the observed motion patterns, enabling the models to learn and predict the expected trajectory, which will be compared to the effective ones. Our models, particularly the bidirectional GRU with recurrent dropouts, showcased superior performance in capturing the temporal dynamics of maritime data, illustrating the potential of deep learning to enhance maritime surveillance capabilities. Our work lays a solid foundation for future research in this domain, highlighting a path toward improved maritime safety through the innovative application of technology.

Updated: 2024-06-14 12:15:15

标题: 在海上环境中使用AIS数据和深度递归结构进行异常值检测

摘要: 本文提出了一种基于深度递归模型的海洋监视方法，利用公开可获得的自动识别系统（AIS）数据。该设置采用了基于深度递归神经网络（RNN）的模型，用于编码和重建观察到的船舶运动模式。我们的方法基于观测到的海上船舶运动模式和重建运动模式之间的计算误差的阈值机制。具体来说，一个深度学习框架，即编码器-解码器架构，通过使用观察到的运动模式进行训练，使模型能够学习和预测预期的轨迹，然后与实际轨迹进行比较。我们的模型，特别是具有递归丢失的双向GRU，展示了在捕捉海洋数据的时间动态方面表现出优越的性能，显示了深度学习提升海洋监视能力的潜力。我们的工作为这一领域的未来研究奠定了坚实基础，强调了通过技术的创新应用改善海上安全的道路。

更新时间: 2024-06-14 12:15:15

领域: cs.LG,cs.AI,68T10

下载: http://arxiv.org/abs/2406.09966v1

H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a factorized approach to momentum and scaling parameters. Our algorithm demonstrates competitive performances on both ResNets and Vision Transformers, while achieving sublinear memory costs through the use of rank-1 parameterizations for moment estimators. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings. These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.

Updated: 2024-06-14 12:05:17

标题: H-Fac：使用分解哈密顿下降的内存高效优化

摘要: 在这项研究中，我们介绍了一种新颖的自适应优化器H-Fac，它采用了动量和缩放参数的因子化方法。我们的算法在ResNets和Vision Transformers上表现出竞争性能，并通过使用基于秩-1的参数化估计器实现亚线性内存成本。我们基于从哈密顿动力学导出的原则开发我们的算法，提供坚实的理论基础。这些优化算法旨在既简单又适应性强，便于在不同环境中进行简单实施。

更新时间: 2024-06-14 12:05:17

领域: cs.LG

下载: http://arxiv.org/abs/2406.09958v1

Inferring State Machine from the Protocol Implementation via Large Language Model

State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex code structures and behaviors. To address these limitations, we propose an innovative state machine inference approach powered by Large Language Models (LLMs). Utilizing text-embedding technology, this method allows LLMs to dissect and analyze the intricacies of protocol implementation code. Through targeted prompt engineering, we systematically identify and infer the underlying state machines. Our evaluation across six protocol implementations demonstrates the method's high efficacy, achieving an accuracy rate exceeding 90% and successfully delineating differences on state machines among various implementations of the same protocol. Importantly, integrating this approach with protocol fuzzing has notably enhanced AFLNet's code coverage by 10% over RFCNLP, showcasing the considerable potential of LLMs in advancing network protocol security analysis. Our proposed method not only marks a significant step forward in accurate state machine inference but also opens new avenues for improving the security and reliability of protocol implementations.

Updated: 2024-06-14 12:03:56

标题: 通过大型语言模型推断协议实现的状态机

摘要: 状态机在增强协议分析的效力方面发挥着关键作用，以揭示更多的漏洞。然而，从网络协议实现中推断状态机的任务面临着重大挑战。基于动态分析的传统方法往往会忽略关键的状态转换，因为覆盖范围有限，而静态分析则面临着复杂代码结构和行为的困难。为了解决这些限制，我们提出了一种创新的状态机推断方法，由大型语言模型（LLMs）提供支持。利用文本嵌入技术，这种方法允许LLMs剖析和分析协议实现代码的复杂性。通过有针对性的提示工程，我们系统地识别和推断出潜在的状态机。我们在六个协议实现上进行的评估显示了该方法的高效性，实现了超过90%的准确率，并成功地描绘了同一协议的不同实现之间状态机的差异。重要的是，将这种方法与协议模糊测试相结合，显著提高了AFLNet的代码覆盖率，比RFCNLP高出10%，展示了LLMs在推进网络协议安全分析方面的巨大潜力。我们提出的方法不仅在准确的状态机推断方面迈出了重要的一步，还为改进协议实现的安全性和可靠性打开了新的途径。

更新时间: 2024-06-14 12:03:56

领域: cs.CR

下载: http://arxiv.org/abs/2405.00393v2

Rule Based Learning with Dynamic (Graph) Neural Networks

A common problem of classical neural network architectures is that additional information or expert knowledge cannot be naturally integrated into the learning process. To overcome this limitation, we propose a two-step approach consisting of (1) generating rule functions from knowledge and (2) using these rules to define rule based layers -- a new type of dynamic neural network layer. The focus of this work is on the second step, i.e., rule based layers that are designed to dynamically arrange learnable parameters in the weight matrices and bias vectors depending on the input samples. Indeed, we prove that our approach generalizes classical feed-forward layers such as fully connected and convolutional layers by choosing appropriate rules. As a concrete application we present rule based graph neural networks (RuleGNNs) that overcome some limitations of ordinary graph neural networks. Our experiments show that the predictive performance of RuleGNNs is comparable to state-of-the-art graph classifiers using simple rules based on Weisfeiler-Leman labeling and pattern counting. Moreover, we introduce new synthetic benchmark graph datasets to show how to integrate expert knowledge into RuleGNNs making them more powerful than ordinary graph neural networks.

Updated: 2024-06-14 12:01:18

标题: 基于规则的动态(图)神经网络学习

摘要: 经典神经网络结构的一个常见问题是无法自然地将额外信息或专家知识集成到学习过程中。为了克服这一限制，我们提出了一个两步方法，包括（1）从知识生成规则函数和（2）使用这些规则来定义基于规则的层 - 一种新型动态神经网络层。本文的重点在于第二步，即设计用于根据输入样本动态排列可学习参数的权重矩阵和偏置向量的基于规则的层。事实上，我们证明了我们的方法通过选择适当的规则推广了经典的前馈层，如全连接层和卷积层。作为一个具体的应用，我们提出了基于规则的图神经网络（RuleGNNs），克服了普通图神经网络的一些限制。我们的实验表明，RuleGNNs的预测性能与使用Weisfeiler-Leman标记和模式计数的简单规则的最先进图分类器相当。此外，我们引入了新的合成基准图数据集，展示如何将专家知识集成到RuleGNNs中，使它们比普通图神经网络更强大。

更新时间: 2024-06-14 12:01:18

领域: cs.LG

下载: http://arxiv.org/abs/2406.09954v1

DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

Dual-arm robots offer enhanced versatility and efficiency over single-arm counterparts by enabling concurrent manipulation of multiple objects or cooperative execution of tasks using both arms. However, effectively coordinating the two arms for complex long-horizon tasks remains a significant challenge. Existing task planning methods predominantly focus on single-arm robots or rely on predefined bimanual operations, failing to fully leverage the capabilities of dual-arm systems. To address this limitation, we introduce DAG-Plan, a structured task planning framework tailored for dual-arm robots. DAG-Plan harnesses large language models (LLMs) to decompose intricate tasks into actionable sub-tasks represented as nodes within a directed acyclic graph (DAG). Critically, DAG-Plan dynamically assigns these sub-tasks to the appropriate arm based on real-time environmental observations, enabling parallel and adaptive execution. We evaluate DAG-Plan on the novel Dual-Arm Kitchen Benchmark, comprising 9 sequential tasks with 78 sub-tasks and 26 objects. Extensive experiments demonstrate the superiority of DAG-Plan over directly using LLM to generate plans, achieving nearly 50% higher efficiency compared to the single-arm task planning baseline and nearly double the success rate of the dual-arm task planning baseline.

Updated: 2024-06-14 11:58:51

标题: DAG-Plan: 生成双臂协作规划的有向无环依赖图

摘要: 双臂机器人相较于单臂机器人具有更强的灵活性和效率，能够同时操纵多个物体或协同执行任务。然而，有效地协调两臂进行复杂长期任务仍然是一个重要挑战。现有的任务规划方法主要集中在单臂机器人上，或依赖于预定义的双臂操作，未能充分发挥双臂系统的能力。为了解决这一局限性，我们引入了DAG-Plan，这是一个专为双臂机器人量身定制的结构化任务规划框架。DAG-Plan利用大型语言模型（LLMs）将复杂任务分解为可操作的子任务，表示为有向无环图（DAG）中的节点。关键是，DAG-Plan根据实时环境观察动态地将这些子任务分配给适当的手臂，实现并行和自适应执行。我们在新颖的双臂厨房基准测试中评估了DAG-Plan，该基准测试包括9个顺序任务，涉及78个子任务和26个物体。广泛的实验表明，DAG-Plan相较于直接使用LLM生成计划，效率提高了将近50%，比单臂任务规划基准的成功率提高了近一倍。

更新时间: 2024-06-14 11:58:51

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.09953v1

BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval

Existing Vision-Language Compositionality (VLC) benchmarks like SugarCrepe are formulated as image-to-text retrieval problems, where, given an image, the models need to select between the correct textual description and a synthetic hard negative text. In this work we present the Bidirectional Vision-Language Compositionality (BiVLC) dataset. The novelty of BiVLC is to add a synthetic hard negative image generated from the synthetic text, resulting in two image-to-text retrieval examples (one for each image) and, more importantly, two text-to-image retrieval examples (one for each text). Human annotators filter out ill-formed examples ensuring the validity of the benchmark. The experiments on BiVLC uncover a weakness of current multimodal models, as they perform poorly in the text-to-image direction. In fact, when considering both retrieval directions, the conclusions obtained in previous works change significantly. In addition to the benchmark, we show that a contrastive model trained using synthetic images and texts improves the state of the art in SugarCrepe and in BiVLC for both retrieval directions. The gap to human performance in BiVLC confirms that Vision-Language Compositionality is still a challenging problem. BiVLC and code are available at https://imirandam.github.io/BiVLC_project_page.

Updated: 2024-06-14 11:58:49

标题: BiVLC: 将文本-图像检索与视觉-语言组合性评估相结合

摘要: 现有的视觉-语言合成性（VLC）基准如SugarCrepe被制定为图像到文本检索问题，其中，给定一幅图像，模型需要在正确的文本描述和一个合成的负面文本之间进行选择。在这项工作中，我们提出了双向视觉-语言合成性（BiVLC）数据集。BiVLC的新颖之处在于添加了从合成文本生成的合成负面图像，从而产生两个图像到文本检索示例（每个图像一个），更重要的是，产生了两个文本到图像检索示例（每个文本一个）。人类注释员过滤掉不符合形式的示例，确保基准的有效性。在BiVLC上的实验揭示了当前多模态模型的一个弱点，因为它们在文本到图像方向上表现不佳。事实上，当考虑到两个检索方向时，之前的研究得出的结论发生了显著变化。除了基准，我们还展示了使用合成图像和文本训练的对比模型在SugarCrepe和BiVLC的两个检索方向上改进了最新技术。BiVLC中与人类表现之间的差距证实了视觉-语言合成性仍然是一个具有挑战性的问题。BiVLC和代码可在https://imirandam.github.io/BiVLC_project_page上获得。

更新时间: 2024-06-14 11:58:49

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.09952v1

Secure Aggregation is Not Private Against Membership Inference Attacks

Secure aggregation (SecAgg) is a commonly-used privacy-enhancing mechanism in federated learning, affording the server access only to the aggregate of model updates while safeguarding the confidentiality of individual updates. Despite widespread claims regarding SecAgg's privacy-preserving capabilities, a formal analysis of its privacy is lacking, making such presumptions unjustified. In this paper, we delve into the privacy implications of SecAgg by treating it as a local differential privacy (LDP) mechanism for each local update. We design a simple attack wherein an adversarial server seeks to discern which update vector a client submitted, out of two possible ones, in a single training round of federated learning under SecAgg. By conducting privacy auditing, we assess the success probability of this attack and quantify the LDP guarantees provided by SecAgg. Our numerical results unveil that, contrary to prevailing claims, SecAgg offers weak privacy against membership inference attacks even in a single training round. Indeed, it is difficult to hide a local update by adding other independent local updates when the updates are of high dimension. Our findings underscore the imperative for additional privacy-enhancing mechanisms, such as noise injection, in federated learning.

Updated: 2024-06-14 11:57:53

标题: 安全聚合对成员推断攻击并不私密

摘要: Secure aggregation（SecAgg）是联邦学习中常用的隐私增强机制，它使服务器仅能访问模型更新的汇总，同时保护个体更新的保密性。尽管关于SecAgg隐私保护能力的广泛声明，但对其隐私的形式分析尚不足，从而使这些假设变得不合理。本文通过将SecAgg视为每个本地更新的局部差分隐私（LDP）机制，深入探讨了SecAgg的隐私影响。我们设计了一种简单的攻击方式，其中对手服务器试图在SecAgg下的单一训练轮次中辨别客户端提交的两个可能更新向量中的哪一个。通过进行隐私审计，我们评估了这种攻击的成功概率，并量化了SecAgg提供的LDP保证。我们的数值结果揭示，与普遍声明相反，即使在单一训练轮次中，SecAgg也对成员推断攻击提供了较弱的隐私保护。事实上，在更新具有高维度时，通过添加其他独立的本地更新来隐藏一个本地更新是困难的。我们的研究结果强调了在联邦学习中需要额外的隐私增强机制，例如噪声注入。

更新时间: 2024-06-14 11:57:53

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2403.17775v2

PID: Prompt-Independent Data Protection Against Latent Diffusion Models

The few-shot fine-tuning of Latent Diffusion Models (LDMs) has enabled them to grasp new concepts from a limited number of images. However, given the vast amount of personal images accessible online, this capability raises critical concerns about civil privacy. While several previous defense methods have been developed to prevent such misuse of LDMs, they typically assume that the textual prompts used by data protectors exactly match those employed by data exploiters. In this paper, we first empirically demonstrate that breaking this assumption, i.e., in cases where discrepancies exist between the textual conditions used by protectors and exploiters, could substantially reduce the effectiveness of these defenses. Furthermore, considering the visual encoder's independence from textual prompts, we delve into the visual encoder and thoroughly investigate how manipulating the visual encoder affects the few-shot fine-tuning process of LDMs. Drawing on these insights, we propose a simple yet effective method called \textbf{Prompt-Independent Defense (PID)} to safeguard privacy against LDMs. We show that PID can act as a strong privacy shield on its own while requiring significantly less computational power. We believe our studies, along with the comprehensive understanding and new defense method, provide a notable advance toward reliable data protection against LDMs.

Updated: 2024-06-14 11:56:42

标题: PID：针对潜在扩散模型的即时独立数据保护

摘要: 潜在扩散模型（LDM）的少量微调使其能够从有限数量的图像中掌握新概念。然而，鉴于在线可获取的大量个人图像，这种能力引发了对公民隐私的重要担忧。尽管已经开发了几种先前的防御方法来防止LDM的滥用，但它们通常假设数据保护者使用的文本提示与数据利用者使用的文本提示完全匹配。在本文中，我们首先通过实证方法证明了打破这一假设，即在保护者和利用者之间存在文本条件差异的情况下，可能会大大降低这些防御措施的有效性。此外，考虑到视觉编码器与文本提示的独立性，我们深入研究了如何通过操纵视觉编码器影响LDM的少量微调过程。基于这些见解，我们提出了一种简单而有效的方法，称为\textbf{Prompt-Independent Defense（PID）}，以保护隐私免受LDM的侵害。我们展示了PID可以作为一个强大的隐私屏障，同时需要显著较少的计算功率。我们相信我们的研究，以及对全面理解和新防御方法的贡献，将为可靠地抵御LDM的数据保护迈出重要一步。

更新时间: 2024-06-14 11:56:42

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.15305v1

Neural Concept Binder

The challenge in object-based visual reasoning lies in generating descriptive yet distinct concept representations. Moreover, doing this in an unsupervised fashion requires human users to understand a model's learned concepts and potentially revise false concepts. In addressing this challenge, we introduce the Neural Concept Binder, a new framework for deriving discrete concept representations resulting in what we term "concept-slot encodings". These encodings leverage both "soft binding" via object-centric block-slot encodings and "hard binding" via retrieval-based inference. The Neural Concept Binder facilitates straightforward concept inspection and direct integration of external knowledge, such as human input or insights from other AI models like GPT-4. Additionally, we demonstrate that incorporating the hard binding mechanism does not compromise performance; instead, it enables seamless integration into both neural and symbolic modules for intricate reasoning tasks, as evidenced by evaluations on our newly introduced CLEVR-Sudoku dataset.

Updated: 2024-06-14 11:52:09

标题: 神经概念绑定器

摘要: 目前，基于对象的视觉推理的挑战在于生成描述性但又独特的概念表示。此外，在无监督的情况下做到这一点需要人类用户理解模型学习的概念，并可能修订错误的概念。为了解决这一挑战，我们引入了神经概念绑定器，这是一个新的框架，用于推导离散概念表示，从而产生我们所称的“概念槽编码”。这些编码利用了“软绑定”，通过以对象为中心的块-槽编码和“硬绑定”，通过检索式推理。神经概念绑定器促进了简单的概念检查和直接整合外部知识，例如人类输入或来自其他AI模型（如GPT-4）的见解。此外，我们证明了整合硬绑定机制并不会影响性能；相反，它能够实现与神经和符号模块的无缝集成，用于复杂的推理任务，这一点在我们新引入的CLEVR-Sudoku数据集上的评估中得到了证明。

更新时间: 2024-06-14 11:52:09

领域: cs.AI,cs.LG,cs.SC

下载: http://arxiv.org/abs/2406.09949v1

NoiseNCA: Noisy Seed Improves Spatio-Temporal Continuity of Neural Cellular Automata

Neural Cellular Automata (NCA) is a class of Cellular Automata where the update rule is parameterized by a neural network that can be trained using gradient descent. In this paper, we focus on NCA models used for texture synthesis, where the update rule is inspired by partial differential equations (PDEs) describing reaction-diffusion systems. To train the NCA model, the spatio-temporal domain is discretized, and Euler integration is used to numerically simulate the PDE. However, whether a trained NCA truly learns the continuous dynamic described by the corresponding PDE or merely overfits the discretization used in training remains an open question. We study NCA models at the limit where space-time discretization approaches continuity. We find that existing NCA models tend to overfit the training discretization, especially in the proximity of the initial condition, also called "seed". To address this, we propose a solution that utilizes uniform noise as the initial condition. We demonstrate the effectiveness of our approach in preserving the consistency of NCA dynamics across a wide range of spatio-temporal granularities. Our improved NCA model enables two new test-time interactions by allowing continuous control over the speed of pattern formation and the scale of the synthesized patterns. We demonstrate this new NCA feature in our interactive online demo. Our work reveals that NCA models can learn continuous dynamics and opens new venues for NCA research from a dynamical system's perspective.

Updated: 2024-06-14 11:48:51

标题: NoiseNCA：噪声种子改善神经细胞自动机的时空连续性

摘要: 神经元元胞自动机（NCA）是一类元胞自动机，其更新规则由可以使用梯度下降进行训练的神经网络参数化。本文重点研究了用于纹理合成的NCA模型，其中更新规则受到描述反应扩散系统的偏微分方程（PDEs）启发。为了训练NCA模型，将时空域离散化，并使用欧拉积分来数值模拟PDE。然而，训练后的NCA是否真正学习了相应PDE描述的连续动态，还是仅仅过度拟合了训练中使用的离散化，仍然是一个开放问题。我们研究了当时空离散化接近连续时的NCA模型。我们发现现有的NCA模型倾向于过度拟合训练的离散化，特别是在初始条件附近，也称为“种子”。为了解决这个问题，我们提出了一种利用均匀噪声作为初始条件的解决方案。我们展示了我们的方法在保持NCA动态在各种时空粒度范围内的一致性方面的有效性。我们改进的NCA模型使得可以通过连续控制模式形成的速度和合成模式的规模来进行两种新的测试时间交互。我们在我们的交互式在线演示中展示了这个新的NCA特性。我们的工作揭示了NCA模型可以学习连续动态，并从动力系统的角度为NCA研究开辟了新的途径。

更新时间: 2024-06-14 11:48:51

领域: cs.CV,cs.AI,cs.GR,cs.MA

下载: http://arxiv.org/abs/2404.06279v3

Finite-Time Analysis of Simultaneous Double Q-learning

$Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-learning update. To address this issue, double $Q$-learning employs two independent $Q$-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double $Q$-learning, called simultaneous double $Q$-learning (SDQ), with its finite-time analysis. SDQ eliminates the need for random selection between the two $Q$-estimators, and this modification allows us to analyze double $Q$-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double $Q$-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.

Updated: 2024-06-14 11:47:25

标题: Simultaneous Double Q-learning的有限时间分析

摘要: $Q$-learning是最基本的强化学习算法之一。尽管在各种应用中取得了广泛的成功，但它容易出现$Q$-learning更新中的过度估计偏差。为了解决这个问题，双$Q$-learning采用两个独立的$Q$估计器，在学习过程中随机选择并更新。本文提出了一种改进的双$Q$-learning，称为同时双$Q$-learning（SDQ），并进行了有限时间分析。SDQ消除了在两个$Q$估计器之间随机选择的需要，这种修改使我们能够通过新颖的切换系统框架来分析双$Q$-learning，从而促进了有效的有限时间分析。经验研究表明，SDQ比双$Q$-learning收敛更快，同时保留了减轻最大化偏差的能力。最后，我们为SDQ推导了有限时间的预期误差界。

更新时间: 2024-06-14 11:47:25

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.09946v1

An Empirical Study Into What Matters for Calibrating Vision-Language Models

Vision-Language Models (VLMs) have emerged as the dominant approach for zero-shot recognition, adept at handling diverse scenarios and significant distribution changes. However, their deployment in risk-sensitive areas requires a deeper understanding of their uncertainty estimation capabilities, a relatively uncharted area. In this study, we explore the calibration properties of VLMs across different architectures, datasets, and training strategies. In particular, we analyze the uncertainty estimation performance of VLMs when calibrated in one domain, label set or hierarchy level, and tested in a different one. Our findings reveal that while VLMs are not inherently calibrated for uncertainty, temperature scaling significantly and consistently improves calibration, even across shifts in distribution and changes in label set. Moreover, VLMs can be calibrated with a very small set of examples. Through detailed experimentation, we highlight the potential applications and importance of our insights, aiming for more reliable and effective use of VLMs in critical, real-world scenarios.

Updated: 2024-06-14 11:40:31

标题: 一个关于校准视觉-语言模型重要因素的实证研究

摘要: 视觉-语言模型（VLMs）已经成为零样本识别的主要方法，能够处理多种情景和重大分布变化。然而，在风险敏感领域部署它们需要更深入地了解它们的不确定性估计能力，这是一个相对未知的领域。在这项研究中，我们探讨了VLMs在不同体系结构、数据集和训练策略下的校准特性。特别是，当在一个领域、标签集或层次级别上校准后，在不同领域进行测试时，我们分析了VLMs的不确定性估计性能。我们的研究结果显示，虽然VLMs在不确定性方面并不固有，但温度缩放显著且一贯地改善了校准性，即使在分布变化和标签集变化的情况下也是如此。此外，VLMs可以通过非常少量的示例进行校准。通过详细的实验，我们突出了我们的见解的潜在应用和重要性，旨在更可靠和有效地在关键的现实场景中使用VLMs。

更新时间: 2024-06-14 11:40:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.07417v2

Bayesian uncertainty-weighted loss for improved generalisability on polyp segmentation task

While several previous studies have devised methods for segmentation of polyps, most of these methods are not rigorously assessed on multi-center datasets. Variability due to appearance of polyps from one center to another, difference in endoscopic instrument grades, and acquisition quality result in methods with good performance on in-distribution test data, and poor performance on out-of-distribution or underrepresented samples. Unfair models have serious implications and pose a critical challenge to clinical applications. We adapt an implicit bias mitigation method which leverages Bayesian predictive uncertainties during training to encourage the model to focus on underrepresented sample regions. We demonstrate the potential of this approach to improve generalisability without sacrificing state-of-the-art performance on a challenging multi-center polyp segmentation dataset (PolypGen) with different centers and image modalities.

Updated: 2024-06-14 11:39:01

标题: 贝叶斯不确定性加权损失在息肉分割任务上提高泛化能力

摘要: 尽管先前的研究已经提出了用于息肉分割的方法，但大多数这些方法并未在多中心数据集上进行严格评估。由于来自不同中心的息肉外观的变化、内窥镜仪器等级的不同以及数据采集质量的差异，导致了在分布测试数据上表现良好的方法，在分布之外或未充分代表的样本上表现不佳。不公平的模型会带来严重的影响，并对临床应用构成重大挑战。我们采用了一种隐性偏差缓解方法，利用贝叶斯预测不确定性在训练过程中鼓励模型专注于代表性不足的样本区域。我们展示了这种方法改善泛化能力的潜力，同时在具有不同中心和图像模态的具有挑战性的多中心息肉分割数据集（PolypGen）上保持最先进性能。

更新时间: 2024-06-14 11:39:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.06807v2

Implementing engrams from a machine learning perspective: XOR as a basic motif

We have previously presented the idea of how complex multimodal information could be represented in our brains in a compressed form, following mechanisms similar to those employed in machine learning tools, like autoencoders. In this short comment note we reflect, mainly with a didactical purpose, upon the basic question for a biological implementation: what could be the mechanism working as a loss function, and how it could be connected to a neuronal network providing the required feedback to build a simple training configuration. We present our initial ideas based on a basic motif that implements an XOR switch, using few excitatory and inhibitory neurons. Such motif is guided by a principle of homeostasis, and it implements a loss function that could provide feedback to other neuronal structures, establishing a control system. We analyse the presence of this XOR motif in the connectome of C.Elegans, and indicate the relationship with the well-known lateral inhibition motif. We then explore how to build a basic biological neuronal structure with learning capacity integrating this XOR motif. Guided by the computational analogy, we show an initial example that indicates the feasibility of this approach, applied to learning binary sequences, like it is the case for simple melodies. In summary, we provide didactical examples exploring the parallelism between biological and computational learning mechanisms, identifying basic motifs and training procedures, and how an engram encoding a melody could be built using a simple recurrent network involving both excitatory and inhibitory neurons.

Updated: 2024-06-14 11:36:49

标题: 从机器学习的角度实现痕迹：XOR作为一种基本模式

摘要: 我们先前提出了一个想法，即复杂的多模态信息如何以压缩形式在我们的大脑中表示，遵循类似于机器学习工具（如自动编码器）所采用的机制。在这篇简短的评论中，我们主要以教学为目的，反思一个生物实现的基本问题：作为损失函数的机制可能是什么，以及如何将其与神经网络连接起来，提供必要的反馈来构建一个简单的训练配置。我们基于一个实施XOR开关的基本模式提出了我们的初步想法，该模式使用少量兴奋和抑制性神经元。这种模式是基于稳态原理的，并且实施了一个损失函数，该函数可以向其他神经结构提供反馈，建立一个控制系统。我们分析了C.Elegans的连接组中此XOR模式的存在，并指出了与著名的侧抑制模式之间的关系。然后，我们探讨如何构建一个具有学习能力的基本生物神经结构，其中集成了这种XOR模式。在计算类比的指导下，我们展示了一个初步示例，表明这种方法的可行性，应用于学习二进制序列，例如简单旋律。总之，我们提供了教学示例，探索生物和计算学习机制之间的平行关系，识别基本模式和训练程序，以及如何使用既包括兴奋性又包括抑制性神经元的简单循环网络构建编码旋律的印记。

更新时间: 2024-06-14 11:36:49

领域: q-bio.NC,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.09940v1

Experiments in News Bias Detection with Pre-Trained Neural Transformers

The World Wide Web provides unrivalled access to information globally, including factual news reporting and commentary. However, state actors and commercial players increasingly spread biased (distorted) or fake (non-factual) information to promote their agendas. We compare several large, pre-trained language models on the task of sentence-level news bias detection and sub-type classification, providing quantitative and qualitative results.

Updated: 2024-06-14 11:34:36

标题: 使用预训练神经变换器检测新闻偏见的实验

摘要: 全球网络为全球信息提供了无与伦比的访问，包括事实新闻报道和评论。然而，国家行为者和商业参与者越来越多地传播有偏见（扭曲）或虚假（非事实）信息，以推动他们的议程。我们比较了几种大型、预训练的语言模型在句子级新闻偏见检测和子类型分类任务上的表现，提供了定量和定性结果。

更新时间: 2024-06-14 11:34:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.09938v1

Next-Generation Simulation Illuminates Scientific Problems of Organised Complexity

As artificial intelligence becomes increasingly prevalent in scientific research, data-driven methodologies appear to overshadow traditional approaches in resolving scientific problems. In this Perspective, we revisit a classic classification of scientific problems and acknowledge that a series of unresolved problems remain. Throughout the history of researching scientific problems, scientists have continuously formed new paradigms facilitated by advances in data, algorithms, and computational power. To better tackle unresolved problems, especially those of organised complexity, a novel paradigm is necessitated. While recognising that the strengths of new paradigms have expanded the scope of resolvable scientific problems, we aware that the continued advancement of data, algorithms, and computational power alone is hardly to bring a new paradigm. We posit that the integration of paradigms, which capitalises on the strengths of each, represents a promising approach. Specifically, we focus on next-generation simulation (NGS), which can serve as a platform to integrate methods from different paradigms. We propose a methodology, sophisticated behavioural simulation (SBS), to realise it. SBS represents a higher level of paradigms integration based on foundational models to simulate complex systems, such as social systems involving sophisticated human strategies and behaviours. NGS extends beyond the capabilities of traditional mathematical modelling simulations and agent-based modelling simulations, and therefore, positions itself as a potential solution to problems of organised complexity in complex systems.

Updated: 2024-06-14 11:31:43

标题: 下一代模拟揭示了有组织复杂性的科学问题

摘要: 随着人工智能在科学研究中日益普及，数据驱动的方法似乎正在取代传统方法来解决科学问题。在这篇观点文章中，我们重新审视了科学问题的经典分类，并承认一系列未解决的问题仍然存在。在研究科学问题的历史中，科学家们不断形成新的范例，这得益于数据、算法和计算能力的进步。为了更好地解决未解决的问题，特别是那些有组织复杂性的问题，需要一种新的范例。虽然我们认识到新范例的优势扩大了可解决的科学问题的范围，但我们意识到仅仅依靠数据、算法和计算能力的持续进步还不足以带来新的范例。我们认为整合范例，充分发挥各自的优势，是一种有希望的方法。具体来说，我们关注下一代模拟（NGS），它可以作为整合不同范例方法的平台。我们提出了一种方法论，即复杂行为模拟（SBS），来实现它。SBS代表了基于基础模型的更高级别的范例整合，用于模拟复杂系统，例如涉及复杂人类策略和行为的社会系统。NGS超越了传统数学建模模拟和基于代理的建模模拟的能力，因此，它定位为复杂系统中有组织复杂性问题的潜在解决方案。

更新时间: 2024-06-14 11:31:43

领域: cs.AI

下载: http://arxiv.org/abs/2401.09851v4

Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last

Catastrophic forgetting poses a significant challenge in continual learning, where models often forget previous tasks when trained on new data. Our empirical analysis reveals a strong correlation between catastrophic forgetting and the learning speed of examples: examples learned early are rarely forgotten, while those learned later are more susceptible to forgetting. We demonstrate that replay-based continual learning methods can leverage this phenomenon by focusing on mid-learned examples for rehearsal. We introduce Goldilocks, a novel replay buffer sampling method that filters out examples learned too quickly or too slowly, keeping those learned at an intermediate speed. Goldilocks improves existing continual learning algorithms, leading to state-of-the-art performance across several image classification tasks.

Updated: 2024-06-14 11:31:12

标题: 持续学习中的遗忘顺序：首先学习的例子最后被遗忘

摘要: 灾难性遗忘在持续学习中构成了一个重要挑战，模型经常在训练新数据时忘记先前的任务。我们的实证分析揭示了灾难性遗忘与示例学习速度之间的强相关性：早期学习的示例很少被遗忘，而后期学习的示例更容易被遗忘。我们证明基于重播的持续学习方法可以利用这一现象，通过关注中期学习的示例进行排练。我们引入了Goldilocks，一种新颖的重播缓冲区采样方法，可以过滤出学习过快或过慢的示例，保留那些以中等速度学习的示例。Goldilocks改进了现有的持续学习算法，在多个图像分类任务中取得了最先进的性能。

更新时间: 2024-06-14 11:31:12

领域: cs.LG

下载: http://arxiv.org/abs/2406.09935v1

Beyond Gut Feel: Using Time Series Transformers to Find Investment Gems

This paper addresses the growing application of data-driven approaches within the Private Equity (PE) industry, particularly in sourcing investment targets (i.e., companies) for Venture Capital (VC) and Growth Capital (GC). We present a comprehensive review of the relevant approaches and propose a novel approach leveraging a Transformer-based Multivariate Time Series Classifier (TMTSC) for predicting the success likelihood of any candidate company. The objective of our research is to optimize sourcing performance for VC and GC investments by formally defining the sourcing problem as a multivariate time series classification task. We consecutively introduce the key components of our implementation which collectively contribute to the successful application of TMTSC in VC/GC sourcing: input features, model architecture, optimization target, and investor-centric data processing. Our extensive experiments on two real-world investment tasks, benchmarked towards three popular baselines, demonstrate the effectiveness of our approach in improving decision making within the VC and GC industry.

Updated: 2024-06-14 11:30:25

标题: 超越直觉：利用时间序列变换器发现投资宝石

摘要: 本文讨论了数据驱动方法在私募股权（PE）行业中的不断应用，特别是在为风险投资（VC）和成长资本（GC）寻找投资目标（即公司）方面。我们对相关方法进行了全面的审查，并提出了一种新颖的方法，利用基于Transformer的多变量时间序列分类器（TMTSC）来预测任何候选公司的成功可能性。我们的研究目标是通过将寻找问题正式定义为多变量时间序列分类任务来优化VC和GC投资的寻找性能。我们依次介绍了我们实现的关键组件，这些组件共同有助于在VC/GC寻找中成功应用TMTSC：输入特征，模型架构，优化目标和投资者中心数据处理。我们在两个真实投资任务上进行了广泛的实验，并将其与三个流行的基线进行了基准测试，展示了我们的方法在改善VC和GC行业内的决策效果上的有效性。

更新时间: 2024-06-14 11:30:25

领域: cs.LG,cs.AI,cs.CE,q-fin.PM,91B84 (Primary) 68T07 (Secondary),I.2.6; I.2.1; H.4.0

下载: http://arxiv.org/abs/2309.16888v3

What Does it Take to Generalize SER Model Across Datasets? A Comprehensive Benchmark

Speech emotion recognition (SER) is essential for enhancing human-computer interaction in speech-based applications. Despite improvements in specific emotional datasets, there is still a research gap in SER's capability to generalize across real-world situations. In this paper, we investigate approaches to generalize the SER system across different emotion datasets. In particular, we incorporate 11 emotional speech datasets and illustrate a comprehensive benchmark on the SER task. We also address the challenge of imbalanced data distribution using over-sampling methods when combining SER datasets for training. Furthermore, we explore various evaluation protocols for adeptness in the generalization of SER. Building on this, we explore the potential of Whisper for SER, emphasizing the importance of thorough evaluation. Our approach is designed to advance SER technology by integrating speaker-independent methods.

Updated: 2024-06-14 11:27:19

标题: 跨数据集泛化SER模型需要什么？一个全面的基准测试

摘要: 语音情感识别（SER）对于提升基于语音的应用中的人机交互至关重要。尽管特定情感数据集有所改进，但在SER在真实世界情境中泛化能力仍存在研究空白。本文研究了在不同情感数据集之间泛化SER系统的方法。具体来说，我们整合了11个情感语音数据集，并在SER任务上进行了全面的基准测试。我们也解决了合并SER数据集进行训练时不平衡数据分布的挑战，采用过采样方法。此外，我们探索了各种评估协议，以熟练地进行SER泛化。在此基础上，我们研究了Whisper在SER中的潜力，强调了全面评估的重要性。我们的方法旨在通过集成独立于说话者的方法推进SER技术。

更新时间: 2024-06-14 11:27:19

领域: cs.SD,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2406.09933v1

SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redundant feature extraction when processing high-dimensional microimage data. We propose a novel fine-grained classification model, SCKansformer, for bone marrow blood cells, which addresses these challenges and enhances classification accuracy and efficiency. The model integrates the Kansformer Encoder, SCConv Encoder, and Global-Local Attention Encoder. The Kansformer Encoder replaces the traditional MLP layer with the KAN, improving nonlinear feature representation and interpretability. The SCConv Encoder, with its Spatial and Channel Reconstruction Units, enhances feature representation and reduces redundancy. The Global-Local Attention Encoder combines Multi-head Self-Attention with a Local Part module to capture both global and local features. We validated our model using the Bone Marrow Blood Cell Fine-Grained Classification Dataset (BMCD-FGCD), comprising over 10,000 samples and nearly 40 classifications, developed with a partner hospital. Comparative experiments on our private dataset, as well as the publicly available PBC and ALL-IDB datasets, demonstrate that SCKansformer outperforms both typical and advanced microcell classification methods across all datasets. Our source code and private BMCD-FGCD dataset are available at https://github.com/JustlfC03/SCKansformer.

Updated: 2024-06-14 11:25:53

标题: SCKansformer：通过Kansformer骨髓细胞细粒度分类和分层注意机制

摘要: 白血病等恶性肿瘤的发病率和死亡率显著上升。在临床上，医院依赖对外周血和骨髓涂片进行细胞学检查来诊断恶性肿瘤，准确的血细胞计数至关重要。现有的自动化方法在处理高维微影像数据时面临挑战，如特征表达能力低、解释性差以及冗余特征提取。我们提出了一种新颖的骨髓血细胞细粒度分类模型SCKansformer，以解决这些挑战并提高分类准确性和效率。该模型集成了Kansformer编码器、SCConv编码器和全局-局部注意力编码器。Kansformer编码器用KAN替代了传统的MLP层，改善了非线性特征表示和可解释性。SCConv编码器通过其空间和通道重构单元增强特征表示并减少冗余。全局-局部注意力编码器结合了多头自注意力和本地部分模块，捕捉全局和局部特征。我们使用骨髓血细胞细粒度分类数据集（BMCD-FGCD）对模型进行了验证，该数据集包括超过10,000个样本和近40个分类，与一家合作医院开发。对我们的私有数据集以及公开可用的PBC和ALL-IDB数据集的比较实验表明，SCKansformer在所有数据集上均优于典型和先进的微细胞分类方法。我们的源代码和私有BMCD-FGCD数据集可在https://github.com/JustlfC03/SCKansformer 上获得。

更新时间: 2024-06-14 11:25:53

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.09931v1

LimGen: Probing the LLMs for Generating Suggestive Limitations of Research Papers

Examining limitations is a crucial step in the scholarly research reviewing process, revealing aspects where a study might lack decisiveness or require enhancement. This aids readers in considering broader implications for further research. In this article, we present a novel and challenging task of Suggestive Limitation Generation (SLG) for research papers. We compile a dataset called \textbf{\textit{LimGen}}, encompassing 4068 research papers and their associated limitations from the ACL anthology. We investigate several approaches to harness large language models (LLMs) for producing suggestive limitations, by thoroughly examining the related challenges, practical insights, and potential opportunities. Our LimGen dataset and code can be accessed at \url{https://github.com/arbmf/LimGen}.

Updated: 2024-06-14 11:19:26

标题: LimGen: 探究LLMs以生成研究论文的建议性限制

摘要: 检查限制是学术研究审阅过程中的一个关键步骤，揭示了研究可能缺乏决断力或需要改进的方面。这有助于读者考虑进一步研究的更广泛影响。在本文中，我们提出了一个新颖且具有挑战性的任务，即研究论文的建议性限制生成（SLG）。我们编制了一个名为LimGen的数据集，包括4068篇来自ACL文集的研究论文及其相关的限制。我们研究了利用大型语言模型（LLMs）生成建议性限制的几种方法，通过彻底研究相关挑战、实用见解和潜在机会。我们的LimGen数据集和代码可以在\url{https://github.com/arbmf/LimGen} 上访问。

更新时间: 2024-06-14 11:19:26

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2403.15529v2

Personalized Speech Enhancement Without a Separate Speaker Embedding Model

Personalized speech enhancement (PSE) models can improve the audio quality of teleconferencing systems by adapting to the characteristics of a speaker's voice. However, most existing methods require a separate speaker embedding model to extract a vector representation of the speaker from enrollment audio, which adds complexity to the training and deployment process. We propose to use the internal representation of the PSE model itself as the speaker embedding, thereby avoiding the need for a separate model. We show that our approach performs equally well or better than the standard method of using a pre-trained speaker embedding model on noise suppression and echo cancellation tasks. Moreover, our approach surpasses the ICASSP 2023 Deep Noise Suppression Challenge winner by 0.15 in Mean Opinion Score.

Updated: 2024-06-14 11:16:46

标题: 个性化语音增强无需单独的说话者嵌入模型

摘要: 个性化语音增强（PSE）模型可以通过适应讲话者的声音特征来提高电话会议系统的音频质量。然而，大多数现有方法需要一个单独的说话者嵌入模型来从注册音频中提取说话者的向量表示，这增加了训练和部署过程的复杂性。我们建议使用PSE模型本身的内部表示作为说话者嵌入，从而避免需要单独的模型。我们展示了我们的方法在噪声抑制和回声消除任务上表现得同样好甚至更好，超过了ICASSP 2023年深度噪声抑制挑战赛的获胜者0.15分的平均意见得分。

更新时间: 2024-06-14 11:16:46

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.09928v1

POWN: Prototypical Open-World Node Classification

We consider the problem of \textit{true} open-world semi-supervised node classification, in which nodes in a graph either belong to known or new classes, with the latter not present during training. Existing methods detect and reject new classes but fail to distinguish between different new classes. We adapt existing methods and show they do not solve the problem sufficiently. We introduce a novel end-to-end approach for classification into known classes and new classes based on class prototypes, which we call Prototypical Open-World Learning for Node Classification (POWN). Our method combines graph semi-supervised learning, self-supervised learning, and pseudo-labeling to learn prototype representations of new classes in a zero-shot way. In contrast to existing solutions from the vision domain, POWN does not require data augmentation techniques for node classification. Experiments on benchmark datasets demonstrate the effectiveness of POWN, where it outperforms baselines by up to $20\%$ accuracy on the small and up to $30\%$ on the large datasets. Source code is available at https://github.com/Bobowner/POWN.

Updated: 2024-06-14 11:14:01

标题: POWN：典型的开放世界节点分类

摘要: 我们考虑真实开放世界半监督节点分类问题，其中图中的节点要么属于已知类别，要么属于新类别，后者在训练过程中不存在。现有方法可以检测和拒绝新类别，但无法区分不同的新类别。我们改编了现有方法，并表明它们并不能充分解决这个问题。我们引入了一种基于类别原型的新型端到端分类方法，称为节点分类的原型开放世界学习（POWN）。我们的方法结合了图半监督学习、自监督学习和伪标记技术，以零-shot方式学习新类别的原型表示。与视觉领域的现有解决方案不同，POWN不需要数据增强技术来进行节点分类。基准数据集上的实验表明了POWN的有效性，在小型数据集上的准确率比基线模型高出最多20％，在大型数据集上的准确率则高出最多30％。源代码可在https://github.com/Bobowner/POWN 上找到。

更新时间: 2024-06-14 11:14:01

领域: cs.LG

下载: http://arxiv.org/abs/2406.09926v1

Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted Boltzmann machines

We consider restricted Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern. We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters (i.e., replica symmetry). In particular, we outline the effective control parameters (e.g., the relative number of weights to be trained, the regularization parameter), whose tuning can yield qualitatively-different operative regimes. Further, we provide analytical and numerical evidence for the existence of a sub-region in the space of the hyperparameters where replica-symmetry breaking occurs.

Updated: 2024-06-14 11:12:00

标题: 基本操作模式、超参数微调和玻璃性：朝着一个可解释的受限玻尔兹曼机复制理论

摘要: 我们考虑具有二进制可见层和高斯隐藏层的受限玻尔兹曼机，通过由单一基准模式的嘈杂实现组成的无标签数据集进行训练。我们开发了一个统计力学框架来描述网络的生成能力，利用复制技巧并假设基础序参数的自平均性（即复制对称）。特别地，我们概述了有效控制参数（例如，要训练的权重的相对数量，正则化参数），其调整可以产生定性不同的操作区域。此外，我们提供了分析和数值证据，证明存在一个在超参数空间中的子区域，其中发生了复制对称性破缺。

更新时间: 2024-06-14 11:12:00

领域: cond-mat.dis-nn,cond-mat.stat-mech,cs.LG

下载: http://arxiv.org/abs/2406.09924v1

CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care. While LLMs have shown some promise in the medical domain, their application in clinical diagnosis remains underexplored, especially in real-world clinical practice, where highly sophisticated, patient-specific decisions need to be made. Current evaluations of LLMs in this field are often narrow in scope, focusing on specific diseases or specialties and employing simplified diagnostic tasks. To bridge this gap, we introduce CliBench, a novel benchmark developed from the MIMIC IV dataset, offering a comprehensive and realistic assessment of LLMs' capabilities in clinical diagnosis. This benchmark not only covers diagnoses from a diverse range of medical cases across various specialties but also incorporates tasks of clinical significance: treatment procedure identification, lab test ordering and medication prescriptions. Supported by structured output ontologies, CliBench enables a precise and multi-granular evaluation, offering an in-depth understanding of LLM's capability on diverse clinical tasks of desired granularity. We conduct a zero-shot evaluation of leading LLMs to assess their proficiency in clinical decision-making. Our preliminary results shed light on the potential and limitations of current LLMs in clinical settings, providing valuable insights for future advancements in LLM-powered healthcare.

Updated: 2024-06-14 11:10:17

标题: CliBench：对临床决策中大型语言模型在诊断、程序、实验室检测订单和处方方面的多方面评估

摘要: 人工智能（AI），尤其是大型语言模型（LLMs）的整合进入临床诊断过程，显著提高了医疗保健的效率和可及性。虽然LLMs在医疗领域显示出一些潜力，但它们在临床诊断中的应用仍未得到充分探索，特别是在现实世界的临床实践中，需要做出高度复杂、针对患者的决策。目前在这一领域对LLMs的评估往往局限于范围狭窄，专注于特定疾病或专业领域，并采用简化的诊断任务。为了弥补这一差距，我们引入了CliBench，这是一个从MIMIC IV数据集中开发出的新型基准，提供了对LLMs在临床诊断中能力的全面和现实评估。该基准不仅涵盖了各种专业领域的医疗案例中的诊断，还包括具有临床意义的任务：治疗程序识别，实验室检测订购和药物处方。通过结构化输出本体，CliBench使得精确和多粒度评估成为可能，为我们提供了对LLMs在所需粒度上的多样临床任务能力的深入理解。我们进行了对领先的LLMs的零样本评估，以评估它们在临床决策中的熟练程度。我们的初步结果揭示了当前LLMs在临床环境中的潜力和局限性，为未来LLM驱动的医疗保健的进步提供了宝贵的见解。

更新时间: 2024-06-14 11:10:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.09923v1

When Graph Neural Network Meets Causality: Opportunities, Methodologies and An Outlook

Graph Neural Networks (GNNs) have emerged as powerful representation learning tools for capturing complex dependencies within diverse graph-structured data. Despite their success in a wide range of graph mining tasks, GNNs have raised serious concerns regarding their trustworthiness, including susceptibility to distribution shift, biases towards certain populations, and lack of explainability. Recently, integrating causal learning techniques into GNNs has sparked numerous ground-breaking studies since many GNN trustworthiness issues can be alleviated by capturing the underlying data causality rather than superficial correlations. In this survey, we comprehensively review recent research efforts on Causality-Inspired GNNs (CIGNNs). Specifically, we first employ causal tools to analyze the primary trustworthiness risks of existing GNNs, underscoring the necessity for GNNs to comprehend the causal mechanisms within graph data. Moreover, we introduce a taxonomy of CIGNNs based on the type of causal learning capability they are equipped with, i.e., causal reasoning and causal representation learning. Besides, we systematically introduce typical methods within each category and discuss how they mitigate trustworthiness risks. Finally, we summarize useful resources and discuss several future directions, hoping to shed light on new research opportunities in this emerging field. The representative papers, along with open-source data and codes, are available in https://github.com/usail-hkust/Causality-Inspired-GNNs.

Updated: 2024-06-14 11:08:54

标题: 当图神经网络遇到因果关系：机会、方法论和展望

摘要: 图神经网络（GNNs）已经成为强大的表示学习工具，用于捕获各种图结构数据中复杂依赖关系。尽管它们在各种图挖掘任务中取得了成功，但GNNs引发了关于它们的可信度的严重关注，包括对分布偏移的敏感性，对某些人群的偏见以及缺乏可解释性。最近，将因果学习技术整合到GNNs中已经引发了许多开创性研究，因为许多GNN的可信度问题可以通过捕获数据的潜在因果关系而不是表面相关性来缓解。在本调查中，我们全面审查了最近关于因果启发式GNNs（CIGNNs）的研究努力。具体而言，我们首先使用因果工具分析现有GNNs的主要可信度风险，强调GNNs理解图数据中的因果机制的必要性。此外，我们根据它们所配备的因果学习能力类型，即因果推理和因果表示学习，介绍了CIGNNs的分类法。此外，我们系统地介绍了每个类别内的典型方法，并讨论它们如何缓解可信度风险。最后，我们总结了有用的资源，并讨论了几个未来方向，希望为这一新兴领域中的新研究机会提供启示。代表性论文以及开源数据和代码可在https://github.com/usail-hkust/Causality-Inspired-GNNs上找到。

更新时间: 2024-06-14 11:08:54

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2312.12477v2

Immunohistochemistry guided segmentation of benign epithelial cells, in situ lesions, and invasive epithelial cells in breast cancer slides

Digital pathology enables automatic analysis of histopathological sections using artificial intelligence (AI). Automatic evaluation could improve diagnostic efficiency and help find associations between morphological features and clinical outcome. For development of such prediction models, identifying invasive epithelial cells, and separating these from benign epithelial cells and in situ lesions would be the first step. In this study, we aimed to develop an AI model for segmentation of epithelial cells in sections from breast cancer. We generated epithelial ground truth masks by restaining hematoxylin and eosin (HE) sections with cytokeratin (CK) AE1/AE3, and by pathologists' annotations. HE/CK image pairs were used to train a convolutional neural network, and data augmentation was used to make the model more robust. Tissue microarrays (TMAs) from 839 patients, and whole slide images from two patients were used for training and evaluation of the models. The sections were derived from four cohorts of breast cancer patients. TMAs from 21 patients from a fifth cohort was used as a second test set. In quantitative evaluation, a mean Dice score of 0.70, 0.79, and 0.75 for invasive epithelial cells, benign epithelial cells, and in situ lesions, respectively, were achieved. In qualitative scoring (0-5) by pathologists, results were best for all epithelium and invasive epithelium, with scores of 4.7 and 4.4. Scores for benign epithelium and in situ lesions were 3.7 and 2.0. The proposed model segmented epithelial cells in HE stained breast cancer slides well, but further work is needed for accurate division between the classes. Immunohistochemistry, together with pathologists' annotations, enabled the creation of accurate ground truths. The model is made freely available in FastPathology and the code is available at https://github.com/AICAN-Research/breast-epithelium-segmentation

Updated: 2024-06-14 11:04:12

标题: 免疫组织化学引导下的乳腺癌组织切片中良性上皮细胞、原位病变和浸润性上皮细胞的分割

摘要: 数字病理学利用人工智能（AI）实现对组织病理切片的自动分析。自动评估可以提高诊断效率，并帮助发现形态特征与临床结果之间的关联。为了开发这种预测模型，识别浸润性上皮细胞，并将其与良性上皮细胞和原位病变区分开来将是第一步。本研究旨在开发一个用于乳腺癌切片上上皮细胞分割的AI模型。我们通过用细胞角蛋白（CK）AE1/AE3重新染色嗜酸性染色（HE）切片，并依靠病理学家的注释生成上皮真实掩蔽层。HE/CK图像对被用来训练一个卷积神经网络，并通过数据增强使模型更加稳健。来自839名患者的组织微阵列（TMAs）和两名患者的全切片图像被用于训练和评估模型。这些切片来自四个乳腺癌患者队列。来自第五队列的21名患者的TMAs被用作第二个测试集。在定量评估中，分别达到了0.70、0.79和0.75的平均Dice分数，用于浸润性上皮细胞、良性上皮细胞和原位病变。在病理学家的定性评分（0-5）中，所有上皮和浸润性上皮的结果最佳，得分分别为4.7和4.4。良性上皮和原位病变的得分分别为3.7和2.0。提出的模型在HE染色的乳腺癌切片中很好地分割了上皮细胞，但需要进一步的工作来准确区分各类别。免疫组化结合病理学家的注释，使得准确的真实掩蔽层的创建成为可能。该模型在FastPathology中免费提供，代码可在https://github.com/AICAN-Research/breast-epithelium-segmentation获取。

更新时间: 2024-06-14 11:04:12

领域: eess.IV,cs.CV,cs.LG,I.4.6,I.4.6; I.4.9; I.5.4; J.3

下载: http://arxiv.org/abs/2311.13261v3

Knowledge Editing in Language Models via Adapted Direct Preference Optimization

Large Language Models (LLMs) can become outdated over time as they may lack updated world knowledge, leading to factual knowledge errors and gaps. Knowledge Editing (KE) aims to overcome this challenge using weight updates that do not require expensive retraining. We propose treating KE as an LLM alignment problem. Toward this goal, we introduce Knowledge Direct Preference Optimization (KDPO), a variation of the Direct Preference Optimization (DPO) that is more effective for knowledge modifications. Our method is based on an online approach that continually updates the knowledge stored in the model. We use the current knowledge as a negative sample and the new knowledge we want to introduce as a positive sample in a process called DPO. We also use teacher-forcing for negative sample generation and optimize using the positive sample, which helps maintain localized changes. We tested our KE method on various datasets and models, comparing it to several cutting-edge methods, with 100 and 500 sequential edits. Additionally, we conducted an ablation study comparing our method to the standard DPO approach. Our experimental results show that our modified DPO method allows for more refined KE, achieving similar or better performance compared to previous methods.

Updated: 2024-06-14 11:02:21

标题: 通过适应的直接偏好优化在语言模型中进行知识编辑

摘要: 大型语言模型（LLMs）可能会随着时间的推移而过时，因为它们可能缺乏更新的世界知识，导致事实知识错误和缺口。知识编辑（KE）旨在通过不需要昂贵的重新训练的权重更新来克服这一挑战。我们提出将KE视为LLM对齐问题。为实现这一目标，我们引入了知识直接偏好优化（KDPO），这是对直接偏好优化（DPO）的一种更有效的知识修改方法。我们的方法基于一种在线方法，不断更新模型中存储的知识。我们使用当前知识作为负样本，将要引入的新知识作为正样本，在一个称为DPO的过程中进行。我们还使用teacher-forcing来生成负样本，并使用正样本进行优化，这有助于保持局部化的变化。我们在各种数据集和模型上测试了我们的KE方法，将其与几种前沿方法进行比较，进行了100和500个连续编辑。此外，我们进行了一项消融研究，将我们的方法与标准的DPO方法进行了比较。我们的实验结果表明，我们修改后的DPO方法可以实现更精细的KE，与以前的方法相比表现相似或更好。

更新时间: 2024-06-14 11:02:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.09920v1

What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?

This work aims to develop a measure that can accurately rank the performance of various classifiers when they are tested on unlabeled data from out-of-distribution (OOD) distributions. We commence by demonstrating that conventional uncertainty metrics, notably the maximum Softmax prediction probability, possess inherent utility in forecasting model generalization across certain OOD contexts. Building on this insight, we introduce a new measure called Softmax Correlation (SoftmaxCorr). It calculates the cosine similarity between a class-class correlation matrix, constructed from Softmax output vectors across an unlabeled test dataset, and a predefined reference matrix that embodies ideal class correlations. A high resemblance of predictions to the reference matrix signals that the model delivers confident and uniform predictions across all categories, reflecting minimal uncertainty and confusion. Through rigorous evaluation across a suite of datasets, including ImageNet, CIFAR-10, and WILDS, we affirm the predictive validity of SoftmaxCorr in accurately forecasting model performance within both in-distribution (ID) and OOD settings. Furthermore, we discuss the limitations of our proposed measure and suggest avenues for future research.

Updated: 2024-06-14 10:36:26

标题: Softmax概率告诉我们关于分类器在不同测试条件下排名的信息是什么？

摘要: 这项工作旨在开发一种能够准确排名各种分类器在未标记数据（来自分布之外的分布）上测试时的性能的度量。我们首先展示了传统的不确定性度量，尤其是最大Softmax预测概率，在预测模型在某些分布之外的上下文中的泛化能力方面具有固有的实用性。基于这一洞察，我们引入了一种新的度量称为Softmax相关性（SoftmaxCorr）。它计算了一个由Softmax输出向量在未标记测试数据集上构建的类-类相关性矩阵与一个预定义的体现理想类相关性的参考矩阵之间的余弦相似度。预测与参考矩阵的高相似度表明该模型在所有类别上提供自信和一致的预测，反映出最小的不确定性和混淆。通过对一系列数据集（包括ImageNet、CIFAR-10和WILDS）的严格评估，我们确认了SoftmaxCorr在准确预测模型在分布内和分布外设置中的性能方面的预测有效性。此外，我们讨论了我们提出的度量的局限性并提出了未来研究的途径。

更新时间: 2024-06-14 10:36:26

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.09908v1

QQQ: Quality Quattuor-Bit Quantization for Large Language Models

Quantization is a proven effective method for compressing large language models. Although popular techniques like W8A8 and W4A16 effectively maintain model performance, they often fail to concurrently speed up the prefill and decoding stages of inference. W4A8 is a promising strategy to accelerate both of them while usually leads to a significant performance degradation. To address these issues, we present QQQ, a Quality Quattuor-bit Quantization method with 4-bit weights and 8-bit activations. QQQ employs adaptive smoothing and Hessian-based compensation, significantly enhancing the performance of quantized models without extensive training. Furthermore, we meticulously engineer W4A8 GEMM kernels to increase inference speed. Our specialized per-channel W4A8 GEMM and per-group W4A8 GEMM achieve impressive speed increases of 3.67$\times$ and 3.29 $\times$ over FP16 GEMM. Our extensive experiments show that QQQ achieves performance on par with existing state-of-the-art LLM quantization methods while significantly accelerating inference, achieving speed boosts up to 2.24 $\times$, 2.10$\times$, and 1.25$\times$ compared to FP16, W8A8, and W4A16, respectively.

Updated: 2024-06-14 10:23:45

标题: QQQ：大型语言模型的质量四位量化

摘要: 量化是一种已被证实有效的压缩大型语言模型的方法。尽管流行的技术如W8A8和W4A16有效地保持了模型的性能，但它们通常无法同时加速推理的预填充和解码阶段。W4A8是一种有希望加速这两个阶段的策略，但通常会导致显著的性能下降。为了解决这些问题，我们提出了QQQ，一种具有4位权重和8位激活的质量四位量化方法。QQQ采用自适应平滑和基于Hessian的补偿，显著提升了量化模型的性能，而无需大量训练。此外，我们精心设计了W4A8 GEMM核心，以提高推理速度。我们的专门每通道W4A8 GEMM和每组W4A8 GEMM分别比FP16 GEMM获得了令人瞩目的速度提升3.67倍和3.29倍。我们广泛的实验表明，QQQ在与现有最先进的LLM量化方法相媲美的性能的同时，显著加速了推理，相较于FP16、W8A8和W4A16分别获得了最高2.24倍、2.10倍和1.25倍的速度提升。

更新时间: 2024-06-14 10:23:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.09904v1

FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. However, their proficiency and reliability in the specialized domain of financial data analysis, particularly focusing on data-driven thinking, remain uncertain. To bridge this gap, we introduce \texttt{FinDABench}, a comprehensive benchmark designed to evaluate the financial data analysis capabilities of LLMs within this context. \texttt{FinDABench} assesses LLMs across three dimensions: 1) \textbf{Foundational Ability}, evaluating the models' ability to perform financial numerical calculation and corporate sentiment risk assessment; 2) \textbf{Reasoning Ability}, determining the models' ability to quickly comprehend textual information and analyze abnormal financial reports; and 3) \textbf{Technical Skill}, examining the models' use of technical knowledge to address real-world data analysis challenges involving analysis generation and charts visualization from multiple perspectives. We will release \texttt{FinDABench}, and the evaluation scripts at \url{https://github.com/cubenlp/BIBench}. \texttt{FinDABench} aims to provide a measure for in-depth analysis of LLM abilities and foster the advancement of LLMs in the field of financial data analysis.

Updated: 2024-06-14 10:17:40

标题: FinDABench：大型语言模型在金融数据分析能力的基准测试

摘要: 大型语言模型（LLMs）已经在各种任务中展示出令人印象深刻的能力。然而，在金融数据分析这一专业领域中，特别是专注于数据驱动思维的能力，它们的熟练性和可靠性仍然存在不确定性。为了弥补这一差距，我们引入了\texttt{FinDABench}，这是一个旨在评估LLMs在金融数据分析领域能力的综合基准。 \texttt{FinDABench} 从三个维度评估LLMs：1）\textbf{基础能力}，评估模型执行金融数字计算和企业情绪风险评估的能力；2）\textbf{推理能力}，确定模型快速理解文本信息和分析异常财务报告的能力；和3）\textbf{技术技能}，检查模型使用技术知识解决涉及分析生成和多角度图表可视化的现实数据分析挑战。我们将发布\texttt{FinDABench}，以及评估脚本，网址为 \url{https://github.com/cubenlp/BIBench}。 \texttt{FinDABench} 旨在提供对LLM能力进行深入分析的度量，并促进LLMs在金融数据分析领域的进步。

更新时间: 2024-06-14 10:17:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.02982v4

Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

Recently various optimization problems, such as Mixed Integer Linear Programming Problems (MILPs), have undergone comprehensive investigation, leveraging the capabilities of machine learning. This work focuses on learning-based solutions for efficiently solving the Quadratic Assignment Problem (QAPs), which stands as a formidable challenge in combinatorial optimization. While many instances of simpler problems admit fully polynomial-time approximate solution (FPTAS), QAP is shown to be strongly NP-hard. Even finding a FPTAS for QAP is difficult, in the sense that the existence of a FPTAS implies $P = NP$. Current research on QAPs suffer from limited scale and computational inefficiency. To attack the aforementioned issues, we here propose the first solution of its kind for QAP in the learn-to-improve category. This work encodes facility and location nodes separately, instead of forming computationally intensive association graphs prevalent in current approaches. This design choice enables scalability to larger problem sizes. Furthermore, a \textbf{S}olution \textbf{AW}are \textbf{T}ransformer (SAWT) architecture integrates the incumbent solution matrix with the attention score to effectively capture higher-order information of the QAPs. Our model's effectiveness is validated through extensive experiments on self-generated QAP instances of varying sizes and the QAPLIB benchmark.

Updated: 2024-06-14 10:15:03

标题: 学习解决方案感知的变压器来高效解决二次分配问题

摘要: 最近，各种优化问题，如混合整数线性规划问题（MILPs），已经进行了全面的研究，利用了机器学习的能力。本文侧重于基于学习的解决方案，以有效地解决二次分配问题（QAPs），这在组合优化中是一个艰巨的挑战。虽然许多简单问题的实例可以接受全多项式时间近似解（FPTAS），但QAP被证明是强NP-难的。即使找到QAP的FPTAS也很困难，因为FPTAS的存在意味着P = NP。目前关于QAP的研究受到规模有限和计算效率低下的困扰。为了解决上述问题，我们在此提出了QAP学习改进类别中的首个解决方案。本工作将设施和位置节点分别编码，而不是形成在当前方法中普遍存在的计算密集型关联图。这种设计选择使其能够扩展到更大的问题规模。此外，一种SAWT（Solution Aware Transformer）架构将现有解决方案矩阵与注意力分数相结合，以有效捕获QAP的高阶信息。通过对自动生成的不同规模的QAP实例和QAPLIB基准测试的广泛实验验证了我们模型的有效性。

更新时间: 2024-06-14 10:15:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09899v1

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.

Updated: 2024-06-14 10:14:10

标题: 双生1.5：解锁数百万标记上的多模态理解

摘要: 在这份报告中，我们介绍了Gemini 1.5系列模型，代表了下一代高度计算效率的多模态模型，能够从数百万个上下文标记中召回和推理细粒度信息，包括多个长文档和几小时的视频和音频。该系列包括两个新模型：（1）更新的Gemini 1.5 Pro，在大多数能力和基准测试方面超过了二月版本；（2）Gemini 1.5 Flash，一种更轻量级的变体，设计用于效率，质量最小程度的回归。Gemini 1.5模型在跨模态的长上下文检索任务上实现了近乎完美的召回，在长文档QA、长视频QA和长上下文ASR方面改进了最新技术，并在一系列基准测试中与或超过了Gemini 1.0 Ultra的最新技术性能。研究Gemini 1.5长上下文能力的极限，我们发现在至少10M标记的情况下，下一个标记预测和几乎完美的检索（>99％）持续改进，这是对现有模型如Claude 3.0（200k）和GPT-4 Turbo（128k）的一代飞跃。最后，我们强调了真实世界的用例，例如Gemini 1.5与专业人士合作完成任务，实现了在10个不同工作类别中节省26至75％的时间，以及大型语言模型在前沿的令人惊讶的新能力；当给定一个Kalaman语的语法手册，这是一种全球只有不到200位说话者的语言，模型学会了将英语翻译成Kalaman，达到了与从相同内容学习的人类水平相似的水平。

更新时间: 2024-06-14 10:14:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.05530v3

Positive-Unlabelled Learning for Identifying New Candidate Dietary Restriction-related Genes among Ageing-related Genes

Dietary Restriction (DR) is one of the most popular anti-ageing interventions, prompting exhaustive research into genes associated with its mechanisms. Recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on DR. However, to train a model from positive (DR-related) and negative (non-DR-related) examples, existing ML methods naively label genes without known DR relation as negative examples, assuming that lack of DR-related annotation for a gene represents evidence of absence of DR-relatedness, rather than absence of evidence; this hinders the reliability of the negative examples (non-DR-related genes) and the method's ability to identify novel DR-related genes. This work introduces a novel gene prioritization method based on the two-step Positive-Unlabelled (PU) Learning paradigm: using a similarity-based, KNN-inspired approach, our method first selects reliable negative examples among the genes without known DR associations. Then, these reliable negatives and all known positives are used to train a classifier that effectively differentiates DR-related and non-DR-related genes, which is finally employed to generate a more reliable ranking of promising genes for novel DR-relatedness. Our method significantly outperforms the existing state-of-the-art non-PU approach for DR-relatedness prediction in three relevant performance metrics. In addition, curation of existing literature finds support for the top-ranked candidate DR-related genes identified by our model.

Updated: 2024-06-14 10:14:01

标题: 正标记-未标记学习用于在衰老相关基因中识别新的候选膳食限制相关基因

摘要: 膳食限制（DR）是最受欢迎的抗衰老干预措施之一，促使人们对与其机制相关的基因进行详尽研究。最近，机器学习（ML）被用于在与衰老相关的基因中识别潜在的与DR相关的基因，旨在减少扩展我们对DR知识所需的昂贵湿实验。然而，为了训练一个模型从正例（与DR相关）和负例（非DR相关）示例中，现有的机器学习方法朴素地将没有已知DR关联的基因标记为负例，假设对于一个基因而言缺乏DR相关标注代表缺乏DR相关性的证据，而不是证据的缺乏；这影响了负例（非DR相关基因）的可靠性以及该方法识别新的DR相关基因的能力。本研究介绍了一种基于两步骤正样本-未标记（PU）学习范式的新型基因优先级排序方法：使用基于相似性的、受KNN启发的方法，我们的方法首先在没有已知DR关联的基因中选择可靠的负例。然后，这些可靠的负例和所有已知的正例被用于训练一个分类器，该分类器有效区分DR相关和非DR相关的基因，最终用于生成更可靠的有关新的DR相关性有前景基因的排名。我们的方法在三个相关性能指标上显著优于现有的最先进非PU方法用于DR相关性预测。此外，对现有文献的整理发现，我们的模型识别出的排名靠前的候选DR相关基因得到了支持。

更新时间: 2024-06-14 10:14:01

领域: cs.LG

下载: http://arxiv.org/abs/2406.09898v1

Sparse Graphical Linear Dynamical Systems

Time-series datasets are central in machine learning with applications in numerous fields of science and engineering, such as biomedicine, Earth observation, and network analysis. Extensive research exists on state-space models (SSMs), which are powerful mathematical tools that allow for probabilistic and interpretable learning on time series. Learning the model parameters in SSMs is arguably one of the most complicated tasks, and the inclusion of prior knowledge is known to both ease the interpretation but also to complicate the inferential tasks. Very recent works have attempted to incorporate a graphical perspective on some of those model parameters, but they present notable limitations that this work addresses. More generally, existing graphical modeling tools are designed to incorporate either static information, focusing on statistical dependencies among independent random variables (e.g., graphical Lasso approach), or dynamic information, emphasizing causal relationships among time series samples (e.g., graphical Granger approaches). However, there are no joint approaches combining static and dynamic graphical modeling within the context of SSMs. This work proposes a novel approach to fill this gap by introducing a joint graphical modeling framework that bridges the graphical Lasso model and a causal-based graphical approach for the linear-Gaussian SSM. We present DGLASSO (Dynamic Graphical Lasso), a new inference method within this framework that implements an efficient block alternating majorization-minimization algorithm. The algorithm's convergence is established by departing from modern tools from nonlinear analysis. Experimental validation on various synthetic data showcases the effectiveness of the proposed model and inference algorithm.

Updated: 2024-06-14 10:13:02

标题: 稀疏图形线性动力系统

摘要: 时间序列数据集在机器学习中起着核心作用，在生物医学、地球观测和网络分析等许多科学和工程领域都有应用。关于状态空间模型（SSMs）的研究非常广泛，这是一种强大的数学工具，可以实现对时间序列的概率性和可解释性学习。在SSMs中学习模型参数可以说是最复杂的任务之一，先验知识的引入既能简化解释，也能使推断任务复杂化。最近的一些工作尝试在某些模型参数上引入图形视角，但它们存在明显的局限性，本文正是针对这些限制进行研究。更普遍地说，现有的图形建模工具要么设计用于整合静态信息，侧重于独立随机变量之间的统计依赖关系（例如，图形Lasso方法），要么侧重于动态信息，强调时间序列样本之间的因果关系（例如，图形Granger方法）。然而，在SSMs的背景下，尚无结合静态和动态图形建模的联合方法。本文提出了一种新颖的方法，通过引入一个联合图形建模框架，将图形Lasso模型与基于因果关系的图形方法结合起来，应用于线性高斯SSM。我们提出了DGLASSO（动态图形Lasso），这是一个在该框架内实现的新的推断方法，采用了高效的块交替主导极小化算法。该算法的收敛性通过离开现代非线性分析工具得以建立。在各种合成数据的实验验证中展示了所提出的模型和推断算法的有效性。

更新时间: 2024-06-14 10:13:02

领域: cs.LG,math.OC,stat.CO

下载: http://arxiv.org/abs/2307.03210v2

Forecasting Four Business Cycle Phases Using Machine Learning: A Case Study of US and EuroZone

Understanding the business cycle is crucial for building economic stability, guiding business planning, and informing investment decisions. The business cycle refers to the recurring pattern of expansion and contraction in economic activity over time. Economic analysis is inherently complex, incorporating a myriad of factors (such as macroeconomic indicators, political decisions). This complexity makes it challenging to fully account for all variables when determining the current state of the economy and predicting its future trajectory in the upcoming months. The objective of this study is to investigate the capacity of machine learning models in automatically analyzing the state of the economic, with the goal of forecasting business phases (expansion, slowdown, recession and recovery) in the United States and the EuroZone. We compared three different machine learning approaches to classify the phases of the business cycle, and among them, the Multinomial Logistic Regression (MLR) achieved the best results. Specifically, MLR got the best results by achieving the accuracy of 65.25% (Top1) and 84.74% (Top2) for the EuroZone and 75% (Top1) and 92.14% (Top2) for the United States. These results demonstrate the potential of machine learning techniques to predict business cycles accurately, which can aid in making informed decisions in the fields of economics and finance.

Updated: 2024-06-14 10:10:13

标题: 使用机器学习预测四个商业周期阶段：美国和欧元区的案例研究

摘要: 理解商业周期对于建立经济稳定、指导商业规划和提供投资决策至关重要。商业周期指的是经济活动在一段时间内的扩张和收缩的周期性模式。经济分析本质上是复杂的，涵盖了诸多因素（如宏观经济指标、政治决策）。这种复杂性使得在确定经济当前状态并预测未来几个月的轨迹时，很难完全考虑所有变量。本研究的目标是调查机器学习模型在自动分析经济状态方面的能力，以预测美国和欧元区的商业周期阶段（扩张、放缓、衰退和复苏）。我们比较了三种不同的机器学习方法来分类商业周期的阶段，其中，多项式逻辑回归（MLR）取得了最佳结果。具体而言，MLR在欧元区获得了65.25％（Top1）和84.74％（Top2）的准确率，而在美国获得了75％（Top1）和92.14％（Top2）的准确率。这些结果表明机器学习技术有望准确预测商业周期，从而有助于在经济和金融领域做出明智的决策。

更新时间: 2024-06-14 10:10:13

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.17170v2

Heuristic Learning with Graph Neural Networks: A Unified Framework for Link Prediction

Link prediction is a fundamental task in graph learning, inherently shaped by the topology of the graph. While traditional heuristics are grounded in graph topology, they encounter challenges in generalizing across diverse graphs. Recent research efforts have aimed to leverage the potential of heuristics, yet a unified formulation accommodating both local and global heuristics remains undiscovered. Drawing insights from the fact that both local and global heuristics can be represented by adjacency matrix multiplications, we propose a unified matrix formulation to accommodate and generalize various heuristics. We further propose the Heuristic Learning Graph Neural Network (HL-GNN) to efficiently implement the formulation. HL-GNN adopts intra-layer propagation and inter-layer connections, allowing it to reach a depth of around 20 layers with lower time complexity than GCN. Extensive experiments on the Planetoid, Amazon, and OGB datasets underscore the effectiveness and efficiency of HL-GNN. It outperforms existing methods by a large margin in prediction performance. Additionally, HL-GNN is several orders of magnitude faster than heuristic-inspired methods while requiring only a few trainable parameters. The case study further demonstrates that the generalized heuristics and learned weights are highly interpretable.

Updated: 2024-06-14 10:06:38

标题: 基于图神经网络的启发式学习：链路预测的统一框架

摘要: 联系预测是图学习中的一个基本任务，其本质上受到图的拓扑结构的影响。虽然传统的启发式方法基于图的拓扑结构，但在跨不同图之间泛化时遇到挑战。最近的研究工作旨在利用启发式方法的潜力，然而尚未发现一个统一的公式，可以容纳和泛化各种局部和全局启发式方法。从局部和全局启发式方法均可通过邻接矩阵乘法表示的事实中汲取见解，我们提出了一个统一的矩阵公式，以容纳和泛化各种启发式方法。我们进一步提出了启发式学习图神经网络（HL-GNN）来有效实现这个公式。HL-GNN采用层内传播和层间连接，使其可以达到约20层的深度，同时时间复杂度低于GCN。在Planetoid、Amazon和OGB数据集上进行的大量实验强调了HL-GNN的有效性和效率。它在预测性能方面远远优于现有方法。此外，HL-GNN比启发式方法快几个数量级，同时只需要少量可训练参数。案例研究进一步证明了泛化启发式方法和学习权重的高度可解释性。

更新时间: 2024-06-14 10:06:38

领域: cs.LG

下载: http://arxiv.org/abs/2406.07979v2

Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal Transport Loss

We propose Any2graph, a generic framework for end-to-end Supervised Graph Prediction (SGP) i.e. a deep learning model that predicts an entire graph for any kind of input. The framework is built on a novel Optimal Transport loss, the Partially-Masked Fused Gromov-Wasserstein, that exhibits all necessary properties (permutation invariance, differentiability and scalability) and is designed to handle any-sized graphs. Numerical experiments showcase the versatility of the approach that outperform existing competitors on a novel challenging synthetic dataset and a variety of real-world tasks such as map construction from satellite image (Sat2Graph) or molecule prediction from fingerprint (Fingerprint2Graph).

Updated: 2024-06-14 10:06:23

标题: Any2Graph：利用最优输运损失进行端到端监督图预测

摘要: 我们提出了Any2graph，一个通用的框架，用于端到端的监督图预测（SGP），即一种深度学习模型，用于预测任何类型的输入的整个图。该框架建立在一种新颖的最优传输损失上，即部分遮罩融合Gromov-Wasserstein，该损失具有所有必要的属性（置换不变性、可微性和可扩展性），设计用于处理任意大小的图形。数值实验展示了该方法的多功能性，在新颖的具有挑战性的合成数据集和各种真实世界任务（如从卫星图像中构建地图（Sat2Graph）或从指纹中预测分子（Fingerprint2Graph））上优于现有竞争对手。

更新时间: 2024-06-14 10:06:23

领域: cs.LG

下载: http://arxiv.org/abs/2402.12269v3

Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming

Generative models have demonstrated human-level proficiency in various benchmarks across domains like programming, natural sciences, and general knowledge. Despite these promising results on competitive benchmarks, they still struggle with seemingly simple problem-solving tasks typically carried out by elementary-level students. How do state-of-the-art models perform on standardized tests designed to assess computational thinking and problem-solving skills at schools? In this paper, we curate a novel benchmark involving computational thinking tests grounded in elementary visual programming domains. Our initial results show that state-of-the-art models like GPT-4o and Llama3 barely match the performance of an average school student. To further boost the performance of these models, we fine-tune them using a novel synthetic data generation methodology. The key idea is to develop a comprehensive dataset using symbolic methods that capture different skill levels, ranging from recognition of visual elements to multi-choice quizzes to synthesis-style tasks. We showcase how various aspects of symbolic information in synthetic data help improve fine-tuned models' performance. We will release the full implementation and datasets to facilitate further research on enhancing computational thinking in generative models.

Updated: 2024-06-14 10:02:52

标题: 在小学视觉编程中基于计算思维测试对生成模型进行基准测试

摘要: 生成模型已经在编程、自然科学和一般知识领域的各种基准测试中展示出与人类水平相当的熟练度。尽管在竞争性基准测试中取得了令人鼓舞的成绩，但它们仍然在看似简单的问题解决任务上表现出困难，这些任务通常由小学生完成。最先进的模型在设计用于评估学校计算思维和问题解决能力的标准化测试中表现如何？本文中，我们精心策划了一个新颖的基准测试，涉及基于基础可视化编程领域的计算思维测试。我们的初步结果显示，像GPT-4o和Llama3这样的最先进模型几乎无法与普通学生的表现相匹配。为了进一步提高这些模型的性能，我们使用一种新颖的合成数据生成方法对它们进行微调。关键思想是利用符号方法开发一个全面的数据集，捕捉不同技能水平，从识别可视元素到多选测验再到综合式任务。我们展示了合成数据中符号信息的各个方面如何帮助改善微调模型的性能。我们将发布完整的实现和数据集，以促进进一步研究，提高生成模型中的计算思维。

更新时间: 2024-06-14 10:02:52

领域: cs.AI

下载: http://arxiv.org/abs/2406.09891v1

L2XGNN: Learning to Explain Graph Neural Networks

Graph Neural Networks (GNNs) are a popular class of machine learning models. Inspired by the learning to explain (L2X) paradigm, we propose L2XGNN, a framework for explainable GNNs which provides faithful explanations by design. L2XGNN learns a mechanism for selecting explanatory subgraphs (motifs) which are exclusively used in the GNNs message-passing operations. L2XGNN is able to select, for each input graph, a subgraph with specific properties such as being sparse and connected. Imposing such constraints on the motifs often leads to more interpretable and effective explanations. Experiments on several datasets suggest that L2XGNN achieves the same classification accuracy as baseline methods using the entire input graph while ensuring that only the provided explanations are used to make predictions. Moreover, we show that L2XGNN is able to identify motifs responsible for the graph's properties it is intended to predict.

Updated: 2024-06-14 09:54:58

标题: L2XGNN：学习解释图神经网络

摘要: 图神经网络（GNNs）是一类流行的机器学习模型。受学习解释（L2X）范式的启发，我们提出了L2XGNN，这是一个可解释的GNN框架，通过设计提供忠实的解释。L2XGNN学习了一种选择解释性子图（模式）的机制，这些子图专门用于GNN的消息传递操作。L2XGNN能够为每个输入图选择具有特定属性的子图，如稀疏和连接。对模式施加这样的约束通常会导致更具解释性和有效性的解释。在几个数据集上的实验证明，L2XGNN实现了与基线方法相同的分类准确性，同时确保只使用提供的解释来进行预测。此外，我们展示了L2XGNN能够识别负责预测的图属性的模式。

更新时间: 2024-06-14 09:54:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2209.14402v4

Harm Mitigation in Recommender Systems under User Preference Dynamics

We consider a recommender system that takes into account the interplay between recommendations, the evolution of user interests, and harmful content. We model the impact of recommendations on user behavior, particularly the tendency to consume harmful content. We seek recommendation policies that establish a tradeoff between maximizing click-through rate (CTR) and mitigating harm. We establish conditions under which the user profile dynamics have a stationary point, and propose algorithms for finding an optimal recommendation policy at stationarity. We experiment on a semi-synthetic movie recommendation setting initialized with real data and observe that our policies outperform baselines at simultaneously maximizing CTR and mitigating harm.

Updated: 2024-06-14 09:52:47

标题: 在用户偏好动态下的推荐系统中的危害缓解

摘要: 我们考虑一个推荐系统，考虑到推荐之间的互动、用户兴趣的演变以及有害内容。我们模拟了推荐对用户行为的影响，特别是倾向于消费有害内容的趋势。我们寻求建立一个在最大化点击率（CTR）和减少伤害之间进行权衡的推荐策略。我们建立了用户画像动态具有稳定点的条件，并提出了在稳定状态下找到最优推荐策略的算法。我们在一个半合成的电影推荐设置上进行实验，初始化了真实数据，并观察到我们的策略在同时最大化点击率和减少伤害方面优于基准线。

更新时间: 2024-06-14 09:52:47

领域: cs.IR,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.09882v1

Automatic Counting and Classification of Mosquito Eggs in Field Traps

The analysis of the field traps where the mosquitoes insert their eggs is vital to check that the sterile insect technique (SIT) is working properly. This is because the number of hatched eggs may indicate that the sterile males are not competing with the wild ones. Nowadays, the study of the traps is done manually by microscope and is very time-consuming and prone to human error. This paper presents an automatic trap survey. For this purpose, a device has been designed that automatically scans the slat obtaining different overlapping photos. Subsequently, the images are analyzed by a Mask-RCNN neural network that segments the eggs and classifies them into 2 classes: full or hatch

Updated: 2024-06-14 09:46:14

标题: 田间陷阱中蚊子卵的自动计数和分类

摘要: 蚊子产卵的野外陷阱的分析对于检查无性昆虫技术（SIT）是否正常运作至关重要。这是因为孵化的卵数量可能表明无性雄性与野生雄性不在竞争。目前，陷阱的研究是通过显微镜手动完成的，非常耗时且容易出现人为错误。本文介绍了一种自动陷阱调查。为此，设计了一种自动扫描条状物的设备，获取不同重叠的照片。随后，通过Mask-RCNN神经网络分析图像，将卵分割并分类为两类：完整或孵化。

更新时间: 2024-06-14 09:46:14

领域: cs.AI

下载: http://arxiv.org/abs/2405.20656v2

Federated Learning with Flexible Architectures

Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To address this need, this paper introduces Federated Learning with Flexible Architectures (FedFA), an FL training algorithm that allows clients to train models of different widths and depths. Each client can select a network architecture suitable for its resources, with shallower and thinner networks requiring fewer computing resources for training. Unlike prior work in this area, FedFA incorporates the layer grafting technique to align clients' local architectures with the largest network architecture in the FL system during model aggregation. Layer grafting ensures that all client contributions are uniformly integrated into the global model, thereby minimizing the risk of any individual client's data skewing the model's parameters disproportionately and introducing security benefits. Moreover, FedFA introduces the scalable aggregation method to manage scale variations in weights among different network architectures. Experimentally, FedFA outperforms previous width and depth flexible aggregation strategies. Furthermore, FedFA demonstrates increased robustness against performance degradation in backdoor attack scenarios compared to earlier strategies.

Updated: 2024-06-14 09:44:46

标题: 使用灵活架构的联邦学习

摘要: 传统的联邦学习（FL）方法对具有不同计算和通信能力的客户端的支持有限，导致模型训练中的低效率和潜在的不准确性。这种限制阻碍了FL在不同资源受限环境中的广泛应用，例如具有从强大服务器到移动设备的客户端设备的环境。为了解决这一需求，本文介绍了具有灵活架构的联邦学习（FedFA），这是一种FL训练算法，允许客户端训练不同宽度和深度的模型。每个客户端可以选择适合其资源的网络架构，较浅和较薄的网络需要更少的计算资源进行训练。与此领域的先前工作不同，FedFA将层嫁接技术纳入模型聚合过程中，以使客户端的本地架构与FL系统中最大的网络架构保持一致。层嫁接确保所有客户端的贡献均匀地集成到全局模型中，从而最小化任何单个客户端的数据使模型参数不成比例地倾斜并引入安全性好处的风险。此外，FedFA引入可扩展的聚合方法来管理不同网络架构之间的权重变化。在实验中，与以前的宽度和深度灵活聚合策略相比，FedFA表现出更好的性能。此外，与早期策略相比，FedFA在后门攻击场景中表现出更高的抗性能下降。

更新时间: 2024-06-14 09:44:46

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2406.09877v1

Sailing in high-dimensional spaces: Low-dimensional embeddings through angle preservation

Low-dimensional embeddings (LDEs) of high-dimensional data are ubiquitous in science and engineering. They allow us to quickly understand the main properties of the data, identify outliers and processing errors, and inform the next steps of data analysis. As such, LDEs have to be faithful to the original high-dimensional data, i.e., they should represent the relationships that are encoded in the data, both at a local as well as global scale. The current generation of LDE approaches focus on reconstructing local distances between any pair of samples correctly, often out-performing traditional approaches aiming at all distances. For these approaches, global relationships are, however, usually strongly distorted, often argued to be an inherent trade-off between local and global structure learning for embeddings. We suggest a new perspective on LDE learning, reconstructing angles between data points. We show that this approach, Mercat, yields good reconstruction across a diverse set of experiments and metrics, and preserve structures well across all scales. Compared to existing work, our approach also has a simple formulation, facilitating future theoretical analysis and algorithmic improvements.

Updated: 2024-06-14 09:44:06

标题: 在高维空间中航行：通过角度保持实现低维嵌入

摘要: 高维数据的低维嵌入(LDEs)在科学和工程中是无处不在的。它们使我们能够快速了解数据的主要特性，识别异常值和处理错误，并指导数据分析的下一步。因此，LDEs必须忠实于原始的高维数据，即它们应该代表数据中编码的关系，无论是在局部还是全局尺度上。当前一代LDE方法专注于正确重建任意一对样本之间的局部距离，通常优于旨在获得所有距离的传统方法。然而，对于这些方法来说，全局关系通常会被强烈扭曲，常被认为是嵌入的局部和全局结构学习之间的固有权衡。我们提出了一个新的LDE学习视角，重建数据点之间的角度。我们展示了这种方法Mercat，在各种实验和指标中都能实现良好的重建，并且在各种尺度上保持结构。与现有工作相比，我们的方法还具有简单的表达形式，有助于未来的理论分析和算法改进。

更新时间: 2024-06-14 09:44:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.09876v1

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model. We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs, to improve model recognition of Chinese dysarthric speech. Experimental results from our Chinese dysarthric speech dataset demonstrate consistent improvements in recognition performance with Perceiver-Prompt. Relative reduction up to 13.04% in CER is obtained over the fine-tuned Whisper.

Updated: 2024-06-14 09:36:46

标题: 感知者提示：在华语混乱语音识别中的低语者灵活适应

摘要: 不规则语音识别对改善患有发音障碍（例如运动障碍）的个体生活质量具有深远意义。发音障碍语音识别面临挑战，包括数据有限、发音障碍和非发音障碍说话者之间存在显著差异，以及源自障碍的显著说话者变异。本文介绍了Perceiver-Prompt，这是一种利用Whisper大规模模型上的P-Tuning进行说话者适应的方法。我们首先使用LoRA对Whisper进行微调，然后集成一个可训练的Perceiver，从可变长度的输入生成固定长度的说话者提示，以提高模型对中文发音障碍语音的识别。我们的中文发音障碍语音数据集的实验结果表明，Perceiver-Prompt在识别性能方面取得了一致的改进。相对于微调后的Whisper，字符错误率（CER）的相对降低率高达13.04%。

更新时间: 2024-06-14 09:36:46

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2406.09873v1

A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models

Large Vision-Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the "Cookie Theft" task in human cognition test, we propose a novel evaluation benchmark to evaluate high-level cognitive ability of LVLMs using images with rich semantics. It defines eight reasoning capabilities and consists of an image description task and a visual question answering task. Our evaluation on well-known LVLMs shows that there is still a large gap in cognitive ability between LVLMs and humans.

Updated: 2024-06-14 09:35:57

标题: 一个用于大型视觉-语言模型图像推理和描述的认知评估基准

摘要: 尽管大型视觉-语言模型（LVLMs）最近取得了成功，但它们的认知能力很少得到全面测试。受人类认知测试中广泛使用的“饼干被盗”任务的启发，我们提出了一个新颖的评估基准，用于评估LVLMs的高级认知能力，使用具有丰富语义的图像。它定义了八种推理能力，并包括图像描述任务和视觉问答任务。我们对知名的LVLMs进行的评估显示，LVLMs与人类之间的认知能力仍存在很大差距。

更新时间: 2024-06-14 09:35:57

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2402.18409v3

IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to biased outcomes. To address this challenge, Imbalanced Graph Learning (IGL) has garnered substantial attention, enabling more balanced data distributions and better task performance. Despite the proliferation of IGL algorithms, the absence of consistent experimental protocols and fair performance comparisons pose a significant barrier to comprehending advancements in this field. To bridge this gap, we introduce IGL-Bench, a foundational comprehensive benchmark for imbalanced graph learning, embarking on 16 diverse graph datasets and 24 distinct IGL algorithms with uniform data processing and splitting strategies. Specifically, IGL-Bench systematically investigates state-of-the-art IGL algorithms in terms of effectiveness, robustness, and efficiency on node-level and graph-level tasks, with the scope of class-imbalance and topology-imbalance. Extensive experiments demonstrate the potential benefits of IGL algorithms on various imbalanced conditions, offering insights and opportunities in the IGL field. Further, we have developed an open-sourced and unified package to facilitate reproducible evaluation and inspire further innovative research, which is available at https://github.com/RingBDStack/IGL-Bench.

Updated: 2024-06-14 09:30:18

标题: IGL-Bench：建立不平衡图学习的综合基准

摘要: 深度图学习近年来因其在各个领域中代表图数据的多功能性和成功而备受青睐。然而，不平衡的图数据分布是一个普遍存在的问题，某些部分的数据过多，而其他部分仍然稀疏，这会削弱传统图学习算法的有效性，导致结果出现偏差。为了解决这一挑战，不平衡图学习（IGL）引起了广泛关注，实现了更平衡的数据分布和更好的任务表现。尽管IGL算法大量涌现，但缺乏一致的实验协议和公平的性能比较，构成了理解该领域进展的重要障碍。为了弥合这一差距，我们引入了IGL-Bench，这是一个面向不平衡图学习的基础性综合基准，涉及16个不同的图数据集和24种独特的IGL算法，具有统一的数据处理和分割策略。具体而言，IGL-Bench系统地研究了最先进的IGL算法在节点级和图级任务上的有效性、鲁棒性和效率，涵盖了类别不平衡和拓扑不平衡的范围。广泛的实验展示了IGL算法在各种不平衡条件下的潜在好处，为IGL领域提供了见解和机遇。此外，我们开发了一个开源和统一的软件包，以促进可重现的评估并激发进一步的创新研究，可在https://github.com/RingBDStack/IGL-Bench找到。

更新时间: 2024-06-14 09:30:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09870v1

Empowering Character-level Text Infilling by Eliminating Sub-Tokens

In infilling tasks, sub-tokens, representing instances where a complete token is segmented into two parts, often emerge at the boundaries of prefixes, middles, and suffixes. Traditional methods focused on training models at the token level, leading to sub-optimal performance in character-level infilling tasks during the inference stage. Alternately, some approaches considered character-level infilling, but they relied on predicting sub-tokens in inference, yet this strategy diminished ability in character-level infilling tasks due to the large perplexity of the model on sub-tokens. In this paper, we introduce FIM-SE, which stands for Fill-In-the-Middle with both Starting and Ending character constraints. The proposed method addresses character-level infilling tasks by utilizing a line-level format to avoid predicting any sub-token in inference. In addition, we incorporate two special tokens to signify the rest of the incomplete lines, thereby enhancing generation guidance. Extensive experiments demonstrate that our proposed approach surpasses previous methods, offering a significant advantage. Code is available at https://github.com/SenseLLM/FIM-SE.

Updated: 2024-06-14 09:26:41

标题: 通过消除子标记来赋予字符级文本填充更强大的能力

摘要: 在填充任务中，子标记代表将完整标记分割为两部分的实例，通常出现在前缀、中间和后缀的边界上。传统方法着重于在标记级别训练模型，在推断阶段导致字符级填充任务性能不佳。另外，一些方法考虑了字符级填充，但它们依赖于在推断中预测子标记，然而这种策略由于模型对子标记的困惑度较大而降低了字符级填充任务的能力。在本文中，我们引入了FIM-SE，它代表Fill-In-the-Middle，具有起始和结束字符约束。所提出的方法通过使用行级格式来避免在推断中预测任何子标记来解决字符级填充任务。此外，我们还引入了两个特殊标记来表示不完整行的其余部分，从而增强了生成指导。大量实验证明我们提出的方法超越了以前的方法，提供了显著优势。代码可在https://github.com/SenseLLM/FIM-SE找到。

更新时间: 2024-06-14 09:26:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17103v2

LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data

Multimodal Deep Learning enhances decision-making by integrating diverse information sources, such as texts, images, audio, and videos. To develop trustworthy multimodal approaches, it is essential to understand how uncertainty impacts these models. We introduce LUMA, a unique benchmark dataset, featuring audio, image, and textual data from 50 classes, for learning from uncertain and multimodal data. It extends the well-known CIFAR 10/100 dataset with audio samples extracted from three audio corpora, and text data generated using the Gemma-7B Large Language Model (LLM). The LUMA dataset enables the controlled injection of varying types and degrees of uncertainty to achieve and tailor specific experiments and benchmarking initiatives. LUMA is also available as a Python package including the functions for generating multiple variants of the dataset with controlling the diversity of the data, the amount of noise for each modality, and adding out-of-distribution samples. A baseline pre-trained model is also provided alongside three uncertainty quantification methods: Monte-Carlo Dropout, Deep Ensemble, and Reliable Conflictive Multi-View Learning. This comprehensive dataset and its tools are intended to promote and support the development and benchmarking of trustworthy and robust multimodal deep learning approaches.

Updated: 2024-06-14 09:22:07

标题: LUMA：用于学习不确定和多模态数据的基准数据集

摘要: 多模态深度学习通过整合不同的信息来源，如文本、图像、音频和视频，提升了决策能力。为了开发可信赖的多模态方法，了解不确定性如何影响这些模型至关重要。我们引入了LUMA，一个独特的基准数据集，其中包含来自50个类别的音频、图像和文本数据，用于学习来自不确定和多模态数据。它扩展了著名的CIFAR 10/100数据集，其中包括从三个音频语料库中提取的音频样本，以及使用Gemma-7B大型语言模型（LLM）生成的文本数据。LUMA数据集使得可以控制地注入不同类型和程度的不确定性，以实现和定制特定的实验和基准倡议。LUMA还作为一个Python软件包提供，包括用于生成数据集的多个变体的功能，控制数据的多样性，每种模态的噪音量，以及添加超出分布的样本。还提供了一个基线预训练模型，以及三种不确定性量化方法：蒙特卡洛辍学、深度集成和可靠的冲突多视图学习。这个全面的数据集及其工具旨在促进和支持可信赖和稳健的多模态深度学习方法的开发和基准测试。

更新时间: 2024-06-14 09:22:07

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.09864v1

Dataset Condensation with Latent Quantile Matching

Dataset condensation (DC) methods aim to learn a smaller synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real dataset. However two distributions with the same mean can still be vastly different. In this work we demonstrate the shortcomings of using Maximum Mean Discrepancy to match latent distributions i.e. the weak matching power and lack of outlier regularization. To alleviate these shortcomings we propose our new method: Latent Quantile Matching (LQM) which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions. Empirical experiments on both image and graph-structured datasets show that LQM matches or outperforms previous state of the art in distribution matching based DC. Moreover we show that LQM improves the performance in continual graph learning (CGL) setting where memory efficiency and privacy can be important. Our work sheds light on the application of DM based DC for CGL.

Updated: 2024-06-14 09:20:44

标题: 使用潜在分位数匹配进行数据集压缩

摘要: 数据集压缩（DC）方法旨在学习一个更小的综合数据集，其中包含信息丰富的数据记录，以加速机器学习模型的训练。当前基于分布匹配（DM）的DC方法通过匹配合成数据集和真实数据集之间的潜在嵌入的均值来学习一个综合数据集。然而，具有相同均值的两个分布仍然可能大不相同。在这项工作中，我们展示了使用最大均值差异匹配潜在分布的局限性，即弱匹配能力和缺乏异常值正则化。为了缓解这些缺点，我们提出了我们的新方法：潜在分位数匹配（LQM），它通过匹配潜在嵌入的分位数来最小化两个分布之间的拟合度检验统计量。对图像和图结构数据集的实证实验表明，LQM在基于分布匹配的DC中匹配或优于先前的最新技术。此外，我们展示了LQM在连续图学习（CGL）设置中提高了性能，其中内存效率和隐私可能很重要。我们的工作为基于DM的DC在CGL中的应用提供了启示。

更新时间: 2024-06-14 09:20:44

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.09860v1

Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge

Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular representations, due to challenges in explicitly incorporating view information and handling molecular knowledge from heterogeneous sources. To address these issues, we present MV-Mol, a molecular representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs. We utilize text prompts to model view information and design a fusion architecture to extract view-based molecular representations. We develop a two-stage pre-training procedure, exploiting heterogeneous data of varying quality and quantity. Through extensive experiments, we show that MV-Mol provides improved representations that substantially benefit molecular property prediction. Additionally, MV-Mol exhibits state-of-the-art performance in multi-modal comprehension of molecular structures and texts. Code and data are available at https://github.com/PharMolix/OpenBioMed.

Updated: 2024-06-14 08:48:10

标题: 学习使用结构化和非结构化知识进行多视图分子表示

摘要: 使用表示学习方法捕获分子知识在化学和生命科学等广泛科学领域具有重要潜力。预期具有有效且可泛化的分子表示将从不同视角和角度捕捉共识和互补的分子专业知识。然而，由于明确融合视图信息和处理来自异质来源的分子知识的挑战，现有作品在学习多视图分子表示方面存在不足。为了解决这些问题，我们提出了MV-Mol，一种分子表示学习模型，从化学结构、生物医学文本中的非结构化知识和知识图中的结构化知识中收集多视图分子专业知识。我们利用文本提示来建模视图信息，并设计了融合架构来提取基于视图的分子表示。我们开发了一个两阶段的预训练过程，利用质量和数量各异的异质数据。通过广泛的实验，我们展示了MV-Mol提供的改进表示显著有益于分子性质预测。此外，MV-Mol在分子结构和文本的多模式理解方面表现出尖端性能。代码和数据可在https://github.com/PharMolix/OpenBioMed 上获得。

更新时间: 2024-06-14 08:48:10

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2406.09841v1

Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps

Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, thereby introducing a more precise and automated solution. Leveraging Vision-Language Models (VLM) to simultaneously process visual and textual data, we offer an effective aid to enhance the analysis process of weather heatmaps. Our initial assessment of general-purpose VLMs (e.g., GPT-4-Vision) on EWED revealed poor performance, characterized by low accuracy and frequent hallucinations due to inadequate color differentiation and insufficient meteorological knowledge. To address these challenges, we introduce ClimateIQA, the first meteorological VQA dataset, which includes 8,760 wind gust heatmaps and 254,040 question-answer pairs covering four question types, both generated from the latest climate reanalysis data. We also propose Sparse Position and Outline Tracking (SPOT), an innovative technique that leverages OpenCV and K-Means clustering to capture and depict color contours in heatmaps, providing ClimateIQA with more accurate color spatial location information. Finally, we present Climate-Zoo, the first meteorological VLM collection, which adapts VLMs to meteorological applications using the ClimateIQA dataset. Experiment results demonstrate that models from Climate-Zoo substantially outperform state-of-the-art general VLMs, achieving an accuracy increase from 0% to over 90% in EWED verification. The datasets and models in this study are publicly available for future climate science research: https://github.com/AlexJJJChen/Climate-Zoo.

Updated: 2024-06-14 08:46:44

标题: 视觉-语言模型遇见气象学：使用热图开发极端天气事件检测模型

摘要: 实时检测和预测极端天气有助于保护人类生命和基础设施。传统方法依赖于数值阈值设置和对地理信息系统(GIS)中天气热图的手动解释，这可能会很慢且容易出错。我们的研究通过将极端天气事件检测(EWED)重新定义为视觉问答(VQA)问题，从而引入了更精确和自动化的解决方案。利用视觉-语言模型(VLM)同时处理视觉和文本数据，我们提供了有效的辅助工具，以增强天气热图的分析过程。我们对通用VLM（例如GPT-4-Vision）在EWED上的初步评估显示性能不佳，表现为低准确性和频繁的幻觉，原因是颜色差异不足和气象知识不足。为了解决这些挑战，我们引入了ClimateIQA，这是第一个气象VQA数据集，包括8,760个阵风热图和254,040个问题-答案对，涵盖了四种问题类型，均由最新的气候再分析数据生成。我们还提出了Sparse Position and Outline Tracking（SPOT），这是一种创新技术，利用OpenCV和K-Means聚类来捕捉和描述热图中的颜色轮廓，为ClimateIQA提供更准确的颜色空间位置信息。最后，我们介绍了Climate-Zoo，这是第一个气象VLM集合，利用ClimateIQA数据集将VLMs调整到气象应用中。实验结果表明，来自Climate-Zoo的模型明显优于最先进的通用VLM，使EWED验证的准确性从0%提高到超过90%。本研究中的数据集和模型已公开提供给未来的气候科学研究：https://github.com/AlexJJJChen/Climate-Zoo。

更新时间: 2024-06-14 08:46:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.09838v1

TabularFM: An Open Framework For Tabular Foundational Models

Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured data, such as text and images, or semi-structured data, like time-series. However, there has been limited attention to structured data, such as tabular data, which, despite its prevalence, remains under-studied due to a lack of clean datasets and insufficient research on the transferability of FMs for various tabular data tasks. In response to this gap, we introduce a framework called TabularFM (\url{https://tabularfm.github.io/}), which incorporates state-of-the-art methods for developing FMs specifically for tabular data. This includes variations of neural architectures such as GANs, VAEs, and Transformers. We have curated a million of tabular datasets and released cleaned versions to facilitate the development of tabular FMs. We pretrained FMs on this curated data, benchmarked various learning methods on these datasets, and released the pretrained models along with leaderboards for future comparative studies. Our fully open-sourced system provides a comprehensive analysis of the transferability of tabular FMs. By releasing these datasets, pretrained models, and leaderboards, we aim to enhance the validity and usability of tabular FMs in the near future.

Updated: 2024-06-14 08:46:33

标题: TabularFM：一个用于表格基础模型的开放框架

摘要: 基础模型（FMs）是使用自监督技术在广泛数据集上预训练的，能够从大量数据中学习广义模式。这减少了每个新任务所需的大量标记数据集，通过利用预训练期间建立的广泛知识基础，节约了时间和资源。大多数关于FMs的研究主要集中在非结构化数据，如文本和图像，或半结构化数据，如时间序列。然而，对于结构化数据，如表格数据，却鲜有关注，尽管其普遍存在，由于缺乏干净的数据集和关于FMs在各种表格数据任务中可转移性的不足研究而鲜有研究。为填补这一空白，我们介绍了一个名为TabularFM的框架（https://tabularfm.github.io/），该框架融合了专门为表格数据开发FMs的最先进方法，包括GANs、VAEs和Transformers等神经架构的变体。我们已经整理了数百万个表格数据集并发布了经过清理的版本，以促进表格FMs的发展。我们在这些整理过的数据上预训练了FMs，对这些数据集上的各种学习方法进行了基准测试，并发布了预训练模型以及未来比较研究的排行榜。我们的完全开源系统提供了对表格FMs的可转移性的全面分析。通过发布这些数据集、预训练模型和排行榜，我们旨在增强未来表格FMs的有效性和可用性。

更新时间: 2024-06-14 08:46:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.09837v1

Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks

Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor attacks where generated triggers have different properties. Hence, we first empirically verify that prediction variance under edge dropping is a crucial indicator for identifying poisoned nodes. With this observation, we propose using random edge dropping to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones. Furthermore, we introduce a novel robust training strategy to efficiently counteract the impact of the triggers. Extensive experiments on real-world datasets show that our framework can effectively identify poisoned nodes, significantly degrade the attack success rate, and maintain clean accuracy when defending against various types of graph backdoor attacks with different properties.

Updated: 2024-06-14 08:46:26

标题: 受鲁棒性启发的对抗图神经网络后门攻击的防御措施

摘要: 图神经网络（GNNs）在节点分类和图分类等任务中取得了令人期待的结果。然而，最近的研究表明，GNNs容易受到后门攻击的影响，对其在现实世界中的应用构成了重大威胁。尽管已经有初步的工作来抵御特定的图后门攻击，但目前还没有针对生成的触发器具有不同属性的各种类型后门攻击的防御工作。因此，我们首先通过实证验证了边缘删除下的预测方差对于识别被污染节点是一个关键指标。基于这一观察，我们提出使用随机边缘删除来检测后门，并理论上证明它能够有效区分被污染节点和干净节点。此外，我们引入了一种新颖的强化训练策略，以有效地对抗触发器的影响。在真实世界数据集上的大量实验表明，我们的框架能够有效识别被污染节点，在抵御具有不同属性的各种类型图后门攻击时显著降低攻击成功率，并保持干净准确率。

更新时间: 2024-06-14 08:46:26

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.09836v1

I Know How: Combining Prior Policies to Solve New Tasks

Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios. However, this goal is challenging to achieve due to the phenomenon of catastrophic forgetting and the high demand of computational resources. Learning from scratch for each new task is not a viable or sustainable option, and thus agents should be able to collect and exploit prior knowledge while facing new problems. While several methodologies have attempted to address the problem from different perspectives, they lack a common structure. In this work, we propose a new framework, I Know How (IKH), which provides a common formalization. Our methodology focuses on modularity and compositionality of knowledge in order to achieve and enhance agent's ability to learn and adapt efficiently to dynamic environments. To support our framework definition, we present a simple application of it in a simulated driving environment and compare its performance with that of state-of-the-art approaches.

Updated: 2024-06-14 08:44:51

标题: 我知道如何：将先前的策略结合起来解决新任务

摘要: 多任务强化学习旨在开发能够不断进化和适应新情境的智能体。然而，由于灾难性遗忘现象和对计算资源的高需求，这一目标很难实现。每个新任务都从零开始学习并不是可行或可持续的选择，因此智能体应该能够在面对新问题时收集和利用先前的知识。虽然有几种方法论尝试从不同角度解决这个问题，但它们缺乏一个共同的结构。在这项工作中，我们提出了一个新的框架，称为I Know How（IKH），它提供了一个共同的形式化。我们的方法论专注于知识的模块化和组合性，以实现和增强智能体学习和高效适应动态环境的能力。为了支持我们的框架定义，我们在一个模拟驾驶环境中展示了它的一个简单应用，并将其性能与最先进的方法进行了比较。

更新时间: 2024-06-14 08:44:51

领域: cs.LG

下载: http://arxiv.org/abs/2406.09835v1

SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering

The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature extraction and fusion processes more challenging. Euclidean space is difficult to effectively represent multi-dimensional relationships of data. Especially when extracting and processing data with a tree structure or hierarchical structure, Euclidean space is not suitable as an embedding space. Additionally, the self-attention mechanism in Transformers is effective in capturing the dynamic relationships between elements in a sequence. However, the self-attention mechanism's limitations in window modeling and quadratic computational complexity reduce its effectiveness in modeling long sequences. To address these limitations, we propose SHMamba: Structured Hyperbolic State Space Model to integrate the advantages of hyperbolic geometry and state space models. Specifically, SHMamba leverages the intrinsic properties of hyperbolic space to represent hierarchical structures and complex relationships in audio-visual data. Meanwhile, the state space model captures dynamic changes over time by globally modeling the entire sequence. Furthermore, we introduce an adaptive curvature hyperbolic alignment module and a cross fusion block to enhance the understanding of hierarchical structures and the dynamic exchange of cross-modal information, respectively. Extensive experiments demonstrate that SHMamba outperforms previous methods with fewer parameters and computational costs. Our learnable parameters are reduced by 78.12\%, while the average performance improves by 2.53\%. Experiments show that our method demonstrates superiority among all current major methods and is more suitable for practical application scenarios.

Updated: 2024-06-14 08:43:31

标题: SHMamba：用于音频-视觉问答的结构化双曲状态空间模型

摘要: 音频视觉问答（AVQA）任务在应用中具有重要潜力。与传统的单模态方法相比，AVQA的多模态输入使特征提取和融合过程更具挑战性。欧几里得空间难以有效表示数据的多维关系。特别是在提取和处理具有树状结构或层次结构的数据时，欧几里得空间并不适合作为嵌入空间。此外，变形器中的自注意机制在捕捉序列中元素之间的动态关系方面非常有效。然而，自注意机制在窗口建模和二次计算复杂性方面的局限性降低了其在建模长序列方面的有效性。为了解决这些限制，我们提出了SHMamba：结构化双曲状态空间模型，以整合双曲几何和状态空间模型的优势。具体而言，SHMamba利用双曲空间的固有属性来表示音频视觉数据中的层次结构和复杂关系。同时，状态空间模型通过全局建模整个序列来捕捉随时间的动态变化。此外，我们引入了自适应曲率双曲对齐模块和交叉融合块，分别增强了对层次结构的理解和跨模态信息的动态交互。大量实验证明，SHMamba在参数和计算成本更少的情况下胜过先前的方法。我们的可学习参数减少了78.12％，而平均性能提高了2.53％。实验证明，我们的方法在所有当前主要方法中表现出优势，并更适合实际应用场景。

更新时间: 2024-06-14 08:43:31

领域: cs.AI,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.09833v1

Federated Learning driven Large Language Models for Swarm Intelligence: A Survey

Federated learning (FL) offers a compelling framework for training large language models (LLMs) while addressing data privacy and decentralization challenges. This paper surveys recent advancements in the federated learning of large language models, with a particular focus on machine unlearning, a crucial aspect for complying with privacy regulations like the Right to be Forgotten. Machine unlearning in the context of federated LLMs involves systematically and securely removing individual data contributions from the learned model without retraining from scratch. We explore various strategies that enable effective unlearning, such as perturbation techniques, model decomposition, and incremental learning, highlighting their implications for maintaining model performance and data privacy. Furthermore, we examine case studies and experimental results from recent literature to assess the effectiveness and efficiency of these approaches in real-world scenarios. Our survey reveals a growing interest in developing more robust and scalable federated unlearning methods, suggesting a vital area for future research in the intersection of AI ethics and distributed machine learning technologies.

Updated: 2024-06-14 08:40:58

标题: 驱动群体智能的联邦学习大型语言模型：一项调查

摘要: 联邦学习（FL）为训练大型语言模型（LLMs）提供了一个具有吸引力的框架，同时解决了数据隐私和去中心化挑战。本文调查了最近在大型语言模型的联邦学习方面的进展，特别关注机器遗忘，这是遵守隐私法规如被遗忘权的一个关键方面。在联邦LLMs的背景下，机器遗忘涉及有系统地和安全地从学习模型中移除个体数据贡献，而无需从头开始重新训练。我们探讨了各种策略，使得有效的遗忘成为可能，例如扰动技术、模型分解和增量学习，突出它们对维持模型性能和数据隐私的影响。此外，我们审查了最近文献中的案例研究和实验结果，以评估这些方法在现实场景中的有效性和效率。我们的调查显示了在开发更加稳健和可扩展的联邦遗忘方法方面的日益增长的兴趣，表明这是未来研究中人工智能伦理和分布式机器学习技术交叉领域的一个关键领域。

更新时间: 2024-06-14 08:40:58

领域: cs.LG,cs.AI,cs.CL,cs.NE

下载: http://arxiv.org/abs/2406.09831v1

M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed cognitive science to understand the intelligence of MLLMs beyond superficial achievements remains largely unexplored. To this end, we introduce the first cognitive-driven multi-lingual and multi-modal benchmark to evaluate the general intelligence ability of MLLMs, dubbed M3GIA. Specifically, we identify five key cognitive factors based on the well-recognized Cattell-Horn-Carrol (CHC) model of intelligence and propose a novel evaluation metric. In addition, since most MLLMs are trained to perform in different languages, a natural question arises: is language a key factor influencing the cognitive ability of MLLMs? As such, we go beyond English to encompass other languages based on their popularity, including Chinese, French, Spanish, Portuguese and Korean, to construct our M3GIA. We make sure all the data relevant to the cultural backgrounds are collected from their native context to avoid English-centric bias. We collected a significant corpus of data from human participants, revealing that the most advanced MLLM reaches the lower boundary of human intelligence in English. Yet, there remains a pronounced disparity in the other five languages assessed. We also reveals an interesting winner takes all phenomenon that are aligned with the discovery in cognitive studies. Our benchmark will be open-sourced, with the aspiration of facilitating the enhancement of cognitive capabilities in MLLMs.

Updated: 2024-06-14 08:35:06

标题: M3GIA：一种受认知启发的多语言和多模态通用智能能力基准

摘要: 随着最近的多模态大型语言模型（MLLMs）在各种复杂任务上展现出强大的能力，人们开始越来越关注讨论这些模型是否最终能够反映人类智能。然而，现有的基准主要集中在评估任务性能，如识别对象属性的准确性等方面。结合发展完善的认知科学来理解MLLMs的智能能力超越表面成就的状况仍然很少被探讨。为此，我们引入了第一个基于认知驱动的多语言和多模态基准M3GIA，用于评估MLLMs的普遍智能能力。具体来说，我们基于广为人知的卡特尔-霍恩-卡罗尔（CHC）智力模型确定了五个关键认知因素，并提出了一种新颖的评估指标。此外，由于大多数MLLMs都是在不同语言中进行训练的，一个自然的问题出现了：语言是否是影响MLLMs认知能力的关键因素？因此，我们不仅限于英语，还包括其他流行语言（包括中文、法语、西班牙语、葡萄牙语和韩语）来构建我们的M3GIA。我们确保所有与文化背景相关的数据都是从其本地环境中收集的，以避免英语中心偏见。我们从人类参与者中收集了大量数据，揭示出最先进的MLLM在英语中达到了人类智能的下限。然而，在其他五种语言中仍存在明显的差距。我们还揭示了一个有趣的胜者通吃现象，与认知研究中的发现相一致。我们的基准将开源，旨在促进MLLMs认知能力的提升。

更新时间: 2024-06-14 08:35:06

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05343v2

Generative AI-based Prompt Evolution Engineering Design Optimization With Vision-Language Model

Engineering design optimization requires an efficient combination of a 3D shape representation, an optimization algorithm, and a design performance evaluation method, which is often computationally expensive. We present a prompt evolution design optimization (PEDO) framework contextualized in a vehicle design scenario that leverages a vision-language model for penalizing impractical car designs synthesized by a generative model. The backbone of our framework is an evolutionary strategy coupled with an optimization objective function that comprises a physics-based solver and a vision-language model for practical or functional guidance in the generated car designs. In the prompt evolutionary search, the optimizer iteratively generates a population of text prompts, which embed user specifications on the aerodynamic performance and visual preferences of the 3D car designs. Then, in addition to the computational fluid dynamics simulations, the pre-trained vision-language model is used to penalize impractical designs and, thus, foster the evolutionary algorithm to seek more viable designs. Our investigations on a car design optimization problem show a wide spread of potential car designs generated at the early phase of the search, which indicates a good diversity of designs in the initial populations, and an increase of over 20\% in the probability of generating practical designs compared to a baseline framework without using a vision-language model. Visual inspection of the designs against the performance results demonstrates prompt evolution as a very promising paradigm for finding novel designs with good optimization performance while providing ease of use in specifying design specifications and preferences via a natural language interface.

Updated: 2024-06-14 08:33:11

标题: 基于生成式AI的提示演变工程设计优化与视觉语言模型

摘要: 工程设计优化需要高效地结合3D形状表示、优化算法和设计性能评估方法，这通常是计算昂贵的。我们提出了一个快速演化设计优化（PEDO）框架，将其置于车辆设计场景中，利用视觉-语言模型来惩罚由生成模型合成的不切实际的汽车设计。我们框架的骨干是一种进化策略，结合了一个包含基于物理的求解器和视觉-语言模型的优化目标函数，用于生成汽车设计中的实用或功能性指导。在快速演化搜索中，优化器迭代地生成一组文本提示，其中嵌入了用户对3D汽车设计的空气动力性能和视觉偏好的规格。然后，除了计算流体动力学模拟外，预训练的视觉-语言模型用于惩罚不切实际的设计，从而促进进化算法寻找更具可行性的设计。我们对汽车设计优化问题的研究表明，在搜索早期阶段生成了大量潜在的汽车设计，这表明初始群体中设计的多样性很好，并且与不使用视觉-语言模型的基线框架相比，生成实用设计的概率增加了超过20％。设计与性能结果的视觉检查表明，快速演化是一种非常有前途的范式，能够找到性能优良的新设计，同时通过自然语言界面提供了指定设计规格和偏好的便利。

更新时间: 2024-06-14 08:33:11

领域: cs.AI,cs.CE,cs.CV,cs.LG,cs.NE

下载: http://arxiv.org/abs/2406.09143v2

HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning

In modern large language models (LLMs), increasing sequence lengths is a crucial challenge for enhancing their comprehension and coherence in handling complex tasks such as multi-modal question answering. However, handling long context sequences with LLMs is prohibitively costly due to the conventional attention mechanism's quadratic time and space complexity, and the context window size is limited by the GPU memory. Although recent works have proposed linear and sparse attention mechanisms to address this issue, their real-world applicability is often limited by the need to re-train pre-trained models. In response, we propose a novel approach, Hierarchically Pruned Attention (HiP), which simultaneously reduces the training and inference time complexity from $O(T^2)$ to $O(T \log T)$ and the space complexity from $O(T^2)$ to $O(T)$. To this end, we devise a dynamic sparse attention mechanism that generates an attention mask through a novel tree-search-like algorithm for a given query on the fly. HiP is training-free as it only utilizes the pre-trained attention scores to spot the positions of the top-$k$ most significant elements for each query. Moreover, it ensures that no token is overlooked, unlike the sliding window-based sub-quadratic attention methods, such as StreamingLLM. Extensive experiments on diverse real-world benchmarks demonstrate that HiP significantly reduces prompt (i.e., prefill) and decoding latency and memory usage while maintaining high generation performance with little or no degradation. As HiP allows pretrained LLMs to scale to millions of tokens on commodity GPUs with no additional engineering due to its easy plug-and-play deployment, we believe that our work will have a large practical impact, opening up the possibility to many long-context LLM applications previously infeasible.

Updated: 2024-06-14 08:32:45

标题: HiP 注意力：稀疏次二次方注意力与分层注意力修剪

摘要: 在现代大型语言模型（LLMs）中，增加序列长度是增强它们在处理诸如多模态问答之类的复杂任务中的理解和连贯性的关键挑战。然而，由于传统的注意力机制的二次时间和空间复杂度，以及GPU内存限制上下文窗口大小，使用LLMs处理长上下文序列是成本高昂的。尽管最近的研究提出了线性和稀疏的注意力机制来解决这个问题，但它们在现实世界中的适用性通常受到重新训练预训练模型的限制。为此，我们提出了一种新方法，称为Hierarchically Pruned Attention（HiP），它同时将训练和推断时间复杂度从$O(T^2)$降低到$O(T \log T)$，将空间复杂度从$O(T^2)$降低到$O(T)$。为此，我们设计了一种动态稀疏的注意力机制，通过一种类似树搜索的算法实时生成一个针对特定查询的注意力掩码。HiP是无需训练的，因为它仅利用预训练的注意力分数来定位每个查询的前$k$个最重要元素的位置。此外，与基于滑动窗口的次二次注意力方法（如StreamingLLM）不同，它确保没有任何标记被忽略。在各种真实世界基准测试上进行的广泛实验表明，HiP显着降低了提示（即预填充）和解码延迟以及内存使用，同时保持高生成性能，几乎没有降级。由于HiP允许预训练的LLMs在商品GPU上扩展到数百万个标记，且无需额外的工程部署，我们相信我们的工作将产生重大实际影响，为许多先前不可行的长上下文LLM应用打开可能性。

更新时间: 2024-06-14 08:32:45

领域: cs.CL,cs.CV,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.09827v1

Unraveling Anomalies in Time: Unsupervised Discovery and Isolation of Anomalous Behavior in Bio-regenerative Life Support System Telemetry

The detection of abnormal or critical system states is essential in condition monitoring. While much attention is given to promptly identifying anomalies, a retrospective analysis of these anomalies can significantly enhance our comprehension of the underlying causes of observed undesired behavior. This aspect becomes particularly critical when the monitored system is deployed in a vital environment. In this study, we delve into anomalies within the domain of Bio-Regenerative Life Support Systems (BLSS) for space exploration and analyze anomalies found in telemetry data stemming from the EDEN ISS space greenhouse in Antarctica. We employ time series clustering on anomaly detection results to categorize various types of anomalies in both uni- and multivariate settings. We then assess the effectiveness of these methods in identifying systematic anomalous behavior. Additionally, we illustrate that the anomaly detection methods MDI and DAMP produce complementary results, as previously indicated by research.

Updated: 2024-06-14 08:29:34

标题: 解开时间中的异常：生物再生生命支持系统遥测中异常行为的无监督发现和隔离

摘要: 在条件监测中，检测异常或关键系统状态是至关重要的。虽然我们通常会着重及时识别异常，但对这些异常的回顾性分析可以显著增强我们对观察到的不良行为根本原因的理解。当监测系统部署在重要环境中时，这一方面变得尤为关键。在这项研究中，我们深入探讨太空探索中生物再生生命支持系统（BLSS）领域内的异常，并分析源自南极洲EDEN ISS空间温室的遥测数据中发现的异常。我们使用时间序列聚类对异常检测结果进行分类，以在单变量和多变量环境中区分各种类型的异常。然后，我们评估这些方法在识别系统异常行为方面的有效性。此外，我们说明异常检测方法MDI和DAMP产生互补结果，正如之前的研究所指出的那样。

更新时间: 2024-06-14 08:29:34

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.09825v1

From Manifestations to Cognitive Architectures: a Scalable Framework

The Artificial Intelligence field is flooded with optimisation methods. In this paper, we change the focus to developing modelling methods with the aim of getting us closer to Artificial General Intelligence. To do so, we propose a novel way to interpret reality as an information source, that is later translated into a computational framework able to capture and represent such information. This framework is able to build elements of classical cognitive architectures, like Long Term Memory and Working Memory, starting from a simple primitive that only processes Spatial Distributed Representations. Moreover, it achieves such level of verticality in a seamless scalable hierarchical way.

Updated: 2024-06-14 08:26:26

标题: 从表现到认知架构：一个可扩展的框架

摘要: 人工智能领域充斥着优化方法。在本文中，我们改变了关注点，致力于开发建模方法，以便让我们更接近人工通用智能。为此，我们提出了一种新颖的方式来解释现实，将其转化为一个能够捕捉和表示信息的计算框架。这个框架能够从仅处理空间分布表示的简单原始元素开始建立经典认知架构的元素，如长期记忆和工作记忆。此外，它以一种无缝可扩展的分层方式实现了垂直性水平。

更新时间: 2024-06-14 08:26:26

领域: cs.AI

下载: http://arxiv.org/abs/2406.09823v1

Understanding Large Language Model Based Fuzz Driver Generation

LLM-based (Large Language Model) fuzz driver generation is a promising research area. Unlike traditional program analysis-based method, this text-based approach is more general and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: - While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; - LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; - While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.

Updated: 2024-06-14 08:26:00

标题: 理解基于大型语言模型的模糊驱动程序生成

摘要: 基于LLM（Large Language Model）的模糊驱动程序生成是一个有前途的研究领域。与传统的基于程序分析的方法不同，这种基于文本的方法更加通用，能够利用各种API使用信息，生成对人类读者友好的代码。然而，对于这一方向的一些基本问题，如其有效性和潜在挑战，仍然存在缺乏理解。为了填补这一空白，我们进行了第一次深入研究，针对使用LLMs生成有效的模糊驱动程序的重要问题。我们的研究包括一个经过筛选的数据集，其中包含来自30个广泛使用的C项目的86个模糊驱动程序生成问题。设计并测试了六种提示策略，跨越五种最先进的LLMs和五种不同的温度设置。总计，我们的研究评估了736,430个生成的模糊驱动程序，总计耗费了0.85亿个代币（超过8000美元）。此外，我们将LLM生成的驱动程序与工业中使用的进行了比较，并进行了大量的模糊测试实验（3.75个CPU年）。我们的研究揭示了：- 虽然基于LLM的模糊驱动程序生成是一个有前途的方向，但在实际应用中仍然面临一些障碍；- LLM在为具有复杂细节的API生成有效的模糊驱动程序方面面临困难。三种特色设计选择的提示策略可能是有益的：发出重复查询，用示例进行查询，以及采用迭代查询过程；- 虽然LLM生成的驱动程序可以产生与工业中使用的相当的模糊结果，但仍然存在大量提升的机会，如扩展包含的API使用，或集成语义oracle以促进逻辑漏洞检测。我们的见解已被实施，以改进OSS-Fuzz-Gen项目，在工业中促进实际的模糊驱动程序生成。

更新时间: 2024-06-14 08:26:00

领域: cs.CR,D.2.5

下载: http://arxiv.org/abs/2307.12469v3

An I2I Inpainting Approach for Efficient Channel Knowledge Map Construction

Channel knowledge map (CKM) has received widespread attention as an emerging enabling technology for environment-aware wireless communications. It involves the construction of databases containing location-specific channel knowledge, which are then leveraged to facilitate channel state information (CSI) acquisition and transceiver design. In this context, a fundamental challenge lies in efficiently constructing the CKM based on a given wireless propagation environment. Most existing methods are based on stochastic modeling and sequence prediction, which do not fully exploit the inherent physical characteristics of the propagation environment, resulting in low accuracy and high computational complexity. To address these limitations, we propose a Laplacian pyramid (LP)-based CKM construction scheme to predict the channel knowledge at arbitrary locations in a targeted area. Specifically, we first view the channel knowledge as a 2-D image and transform the CKM construction problem into an image-to-image (I2I) inpainting task, which predicts the channel knowledge at a specific location by recovering the corresponding pixel value in the image matrix. Then, inspired by the reversible and closed-form structure of the LP, we show its natural suitability for our task in designing a fast I2I mapping network. For different frequency components of LP decomposition, we design tailored networks accordingly. Besides, to encode the global structural information of the propagation environment, we introduce self-attention and cross-covariance attention mechanisms in different layers, respectively. Finally, experimental results show that the proposed scheme outperforms the benchmark, achieving higher reconstruction accuracy while with lower computational complexity. Moreover, the proposed approach has a strong generalization ability and can be implemented in different wireless communication scenarios.

Updated: 2024-06-14 08:24:52

标题: 一种用于高效频道知识映射构建的I2I修补方法

摘要: 通道知识图（CKM）作为一种新兴的环境感知无线通信的技术，受到了广泛关注。它涉及构建包含特定位置通道知识的数据库，然后利用这些知识来促进信道状态信息（CSI）的获取和收发器设计。在这种情况下，一个基本挑战在于有效地基于给定的无线传播环境构建CKM。大多数现有方法都是基于随机建模和序列预测，不能充分利用传播环境的固有物理特性，导致低准确性和高计算复杂性。为了解决这些限制，我们提出了一种基于拉普拉斯金字塔（LP）的CKM构建方案，用于预测目标区域任意位置的通道知识。具体而言，我们首先将通道知识视为一个2-D图像，并将CKM构建问题转化为图像到图像（I2I）修补任务，通过恢复图像矩阵中相应像素的值来预测特定位置的通道知识。然后，受LP可逆闭合形式结构的启发，我们展示了其在设计快速I2I映射网络中的自然适用性。对于LP分解的不同频率成分，我们相应地设计了定制网络。此外，为了编码传播环境的全局结构信息，我们在不同层次引入了自注意力和交叉协方差注意力机制。最后，实验结果表明，所提出的方案优于基准，实现了更高的重建准确性，同时具有较低的计算复杂性。此外，所提出的方法具有很强的泛化能力，并可在不同的无线通信场景中实施。

更新时间: 2024-06-14 08:24:52

领域: cs.IT,cs.CV,cs.LG,eess.IV,eess.SP,math.IT

下载: http://arxiv.org/abs/2406.09822v1

Adaptive Teaching with Shared Classifier for Knowledge Distillation

Knowledge distillation (KD) is a technique used to transfer knowledge from an overparameterized teacher network to a less-parameterized student network, thereby minimizing the incurred performance loss. KD methods can be categorized into offline and online approaches. Offline KD leverages a powerful pretrained teacher network, while online KD allows the teacher network to be adjusted dynamically to enhance the learning effectiveness of the student network. Recently, it has been discovered that sharing the classifier of the teacher network can significantly boost the performance of the student network with only a minimal increase in the number of network parameters. Building on these insights, we propose adaptive teaching with a shared classifier (ATSC). In ATSC, the pretrained teacher network self-adjusts to better align with the learning needs of the student network based on its capabilities, and the student network benefits from the shared classifier, enhancing its performance. Additionally, we extend ATSC to environments with multiple teachers. We conduct extensive experiments, demonstrating the effectiveness of the proposed KD method. Our approach achieves state-of-the-art results on the CIFAR-100 and ImageNet datasets in both single-teacher and multiteacher scenarios, with only a modest increase in the number of required model parameters. The source code is publicly available at https://github.com/random2314235/ATSC.

Updated: 2024-06-14 08:19:28

标题: 使用共享分类器进行知识蒸馏的自适应教学

摘要: 知识蒸馏（KD）是一种技术，用于将知识从一个过度参数化的教师网络转移到一个较少参数化的学生网络，从而最小化所造成的性能损失。KD方法可以分为离线和在线方法。离线KD利用强大的预训练教师网络，而在线KD允许动态调整教师网络以增强学生网络的学习效果。最近，发现共享教师网络的分类器可以显著提升学生网络的性能，而网络参数数量仅略微增加。基于这些见解，我们提出了具有共享分类器的自适应教学（ATSC）。在ATSC中，预训练教师网络会自我调整以更好地与学生网络的学习需求相匹配，并且学生网络会受益于共享的分类器，从而增强其性能。此外，我们将ATSC扩展到具有多个教师的环境中。我们进行了大量实验，展示了所提出的KD方法的有效性。我们的方法在CIFAR-100和ImageNet数据集上实现了最新的结果，无论是在单教师还是多教师场景下，所需模型参数数量只有适度增加。源代码可在https://github.com/random2314235/ATSC 上公开获取。

更新时间: 2024-06-14 08:19:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.08528v2

Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

The rapid propagation of misinformation poses substantial risks to public interest. To combat misinformation, large language models (LLMs) are adapted to automatically verify claim credibility. Nevertheless, existing methods heavily rely on the embedded knowledge within LLMs and / or black-box APIs for evidence collection, leading to subpar performance with smaller LLMs or upon unreliable context. In this paper, we propose retrieval augmented fact verification through the synthesis of contrasting arguments (RAFTS). Upon input claims, RAFTS starts with evidence retrieval, where we design a retrieval pipeline to collect and re-rank relevant documents from verifiable sources. Then, RAFTS forms contrastive arguments (i.e., supporting or refuting) conditioned on the retrieved evidence. In addition, RAFTS leverages an embedding model to identify informative demonstrations, followed by in-context prompting to generate the prediction and explanation. Our method effectively retrieves relevant documents as evidence and evaluates arguments from varying perspectives, incorporating nuanced information for fine-grained decision-making. Combined with informative in-context examples as prior, RAFTS achieves significant improvements to supervised and LLM baselines without complex prompts. We demonstrate the effectiveness of our method through extensive experiments, where RAFTS can outperform GPT-based methods with a significantly smaller 7B LLM.

Updated: 2024-06-14 08:13:34

标题: 通过合成对比论点增强事实验证的检索

摘要: 快速传播错误信息对公共利益构成重大风险。为了打击错误信息，大型语言模型（LLMs）被调整为自动验证声明的可信度。然而，现有方法严重依赖LLMs内嵌的知识和/或黑盒APIs用于证据收集，导致在较小的LLMs或不可靠的上下文中性能不佳。在本文中，我们提出了通过对比论点合成的检索增强事实验证（RAFTS）。在输入声明后，RAFTS从证据检索开始，我们设计了一个检索管道来收集和重新排列来自可验证来源的相关文档。然后，RAFTS形成对比论点（支持或反驳），取决于检索到的证据。此外，RAFTS利用嵌入模型识别信息性证明，随后进行上下文提示以生成预测和解释。我们的方法有效地检索相关文档作为证据，并评估来自不同角度的论点，融合细微信息以进行精细的决策。结合信息丰富的上下文示例作为先验，RAFTS在无需复杂提示的情况下显著改善了监督和LLM基线。我们通过广泛的实验展示了我们方法的有效性，其中RAFTS可以在具有显著较小的7B LLM的情况下胜过基于GPT的方法。

更新时间: 2024-06-14 08:13:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.09815v1

Generalized Linear Bandits with Limited Adaptivity

We study the generalized linear contextual bandit problem within the constraints of limited adaptivity. In this paper, we present two algorithms, $\texttt{B-GLinCB}$ and $\texttt{RS-GLinCB}$, that address, respectively, two prevalent limited adaptivity settings. Given a budget $M$ on the number of policy updates, in the first setting, the algorithm needs to decide upfront $M$ rounds at which it will update its policy, while in the second setting it can adaptively perform $M$ policy updates during its course. For the first setting, we design an algorithm $\texttt{B-GLinCB}$, that incurs $\tilde{O}(\sqrt{T})$ regret when $M = \Omega\left( \log{\log T} \right)$ and the arm feature vectors are generated stochastically. For the second setting, we design an algorithm $\texttt{RS-GLinCB}$ that updates its policy $\tilde{O}(\log^2 T)$ times and achieves a regret of $\tilde{O}(\sqrt{T})$ even when the arm feature vectors are adversarially generated. Notably, in these bounds, we manage to eliminate the dependence on a key instance dependent parameter $\kappa$, that captures non-linearity of the underlying reward model. Our novel approach for removing this dependence for generalized linear contextual bandits might be of independent interest.

Updated: 2024-06-14 08:11:11

标题: 受限适应性的广义线性赌博机

摘要: 我们研究了在有限适应性约束下的广义线性情境赌博问题。在本文中，我们提出了两种算法，$\texttt{B-GLinCB}$和$\texttt{RS-GLinCB}$，分别处理两种普遍存在的有限适应性设置。在给定策略更新次数上限$M$的情况下，第一种设置中，算法需要预先决定$M$轮在哪些轮次更新其策略，而在第二种设置中，它可以在整个过程中自适应地执行$M$次策略更新。对于第一种设置，我们设计了一个算法$\texttt{B-GLinCB}$，当$M = \Omega\left( \log{\log T} \right)$且臂特征向量是随机生成时，产生$\tilde{O}(\sqrt{T})$的后悔。对于第二种设置，我们设计了一个算法$\texttt{RS-GLinCB}$，更新其策略$\tilde{O}(\log^2 T)$次，并实现了$\tilde{O}(\sqrt{T})$的后悔，即使臂特征向量是敌对生成的。值得注意的是，在这些界限中，我们设法消除了对关键实例相关参数$\kappa$的依赖，该参数捕捉了基础奖励模型的非线性特性。我们为移除广义线性情境赌博中对这种依赖的新方法可能具有独立的兴趣。

更新时间: 2024-06-14 08:11:11

领域: cs.LG

下载: http://arxiv.org/abs/2404.06831v3

EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification

Fact verification aims to automatically probe the veracity of a claim based on several pieces of evidence. Existing works are always engaging in accuracy improvement, let alone explainability, a critical capability of fact verification systems. Constructing an explainable fact verification system in a complex multi-hop scenario is consistently impeded by the absence of a relevant, high-quality dataset. Previous datasets either suffer from excessive simplification or fail to incorporate essential considerations for explainability. To address this, we present EXFEVER, a pioneering dataset for multi-hop explainable fact verification. With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents. Each instance is accompanied by a veracity label and an explanation that outlines the reasoning path supporting the veracity classification. Additionally, we demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification, and validate the significance of our dataset. Furthermore, we highlight the potential of utilizing Large Language Models in the fact verification task. We hope our dataset could make a significant contribution by providing ample opportunities to explore the integration of natural language explanations in the domain of fact verification.

Updated: 2024-06-14 08:10:04

标题: EX-FEVER：一个用于多跳可解释事实验证的数据集

摘要: 事实验证旨在根据多个证据自动探究主张的真实性。现有作品始终致力于提高准确性，更不用说可解释性了，这是事实验证系统的关键能力。在复杂的多跳场景中构建一个可解释的事实验证系统始终受到缺乏相关、高质量数据集的阻碍。先前的数据集要么存在过度简化，要么未能融入解释性所需的基本考虑因素。为了解决这个问题，我们提出了EXFEVER，一个用于多跳可解释事实验证的开创性数据集。该数据集包含超过60,000个主张，涉及2跳和3跳推理，每个主张都是通过总结和修改来自超链接维基百科文档的信息而创建的。每个实例都附带一个真实性标签和一个解释，概述支持真实性分类的推理路径。此外，我们在我们的EX-FEVER数据集上展示了一个新颖的基线系统，展示了文档检索、解释生成和主张验证，并验证了我们数据集的重要性。此外，我们强调了在事实验证任务中利用大型语言模型的潜力。我们希望我们的数据集可以通过提供丰富的机会来探索自然语言解释在事实验证领域的融合而做出重要贡献。

更新时间: 2024-06-14 08:10:04

领域: cs.AI

下载: http://arxiv.org/abs/2310.09754v3

The Interplay of Learning, Analytics, and Artificial Intelligence in Education

This paper presents a multi-dimensional view of AI's role in learning and education, emphasizing the intricate interplay between AI, analytics, and the learning processes. Here, I challenge the prevalent narrow conceptualization of AI as stochastic tools, as exemplified in generative AI, and argue for the importance of alternative conceptualizations of AI. I highlight the differences between human intelligence and artificial information processing, the cognitive diversity inherent in AI algorithms, and posit that AI can also serve as an instrument for understanding human learning. Early learning sciences and AI in Education research, which saw AI as an analogy for human intelligence, have diverged from this perspective, prompting a need to rekindle this connection. The paper presents three unique conceptualizations of AI in education: the externalization of human cognition, the internalization of AI models to influence human mental models, and the extension of human cognition via tightly integrated human-AI systems. Examples from current research and practice are examined as instances of the three conceptualizations, highlighting the potential value and limitations of each conceptualization for education, as well as the perils of overemphasis on externalizing human cognition. It is argued that AI models can be useful as objects to think about learning, even though some aspects of learning might just come through the slow experience of living those learning moments and cannot be fully explained with AI models to be hacked with predictions. The paper concludes with advocacy for a broader approach to AI in Education that goes beyond considerations on the design and development of AI solutions in education, but also includes educating people about AI and innovating educational systems to remain relevant in an AI-ubiquitous world.

Updated: 2024-06-14 08:05:18

标题: 教育中学习、分析和人工智能的相互作用

摘要: 本文提出了人工智能在学习和教育中的多维视角，强调人工智能、分析和学习过程之间错综复杂的相互作用。在这里，我挑战了人工智能被狭义地概念化为随机工具的普遍看法，如生成式人工智能所体现的那样，并主张对人工智能的替代概念化的重要性。我强调了人类智能和人工信息处理之间的差异，AI算法中固有的认知多样性，并提出AI也可以作为理解人类学习的工具。早期的学习科学和教育中的人工智能研究将AI视为人类智能的类比，但现在这一观点已经分道扬镳，促使我们重新点燃这种联系的必要性。本文提出了教育中三种独特的人工智能概念化：人类认知的外化、内化AI模型以影响人类心智模型，以及通过紧密集成的人工智能系统扩展人类认知。当前研究和实践中的例证被视为这三种概念化的实例，突出了每种概念化在教育中的潜在价值和局限性，以及过分强调人类认知外化的危险。文章认为，AI模型可以作为思考学习的对象，尽管学习的一些方面可能只能通过慢慢体验这些学习时刻而无法完全用AI模型来解释。文章最终呼吁对教育中的人工智能采取更广泛的态度，不仅要考虑设计和开发教育中的人工智能解决方案，还要包括教育人们关于人工智能的知识，并创新教育系统以在普遍存在人工智能的世界中保持相关性。

更新时间: 2024-06-14 08:05:18

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.16081v3

Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks

Hypergraphs serve as an effective model for depicting complex connections in various real-world scenarios, from social to biological networks. The development of Hypergraph Neural Networks (HGNNs) has emerged as a valuable method to manage the intricate associations in data, though scalability is a notable challenge due to memory limitations. In this study, we introduce a new adaptive sampling strategy specifically designed for hypergraphs, which tackles their unique complexities in an efficient manner. We also present a Random Hyperedge Augmentation (RHA) technique and an additional Multilayer Perceptron (MLP) module to improve the robustness and generalization capabilities of our approach. Thorough experiments with real-world datasets have proven the effectiveness of our method, markedly reducing computational and memory demands while maintaining performance levels akin to conventional HGNNs and other baseline models. This research paves the way for improving both the scalability and efficacy of HGNNs in extensive applications. We will also make our codebase publicly accessible.

Updated: 2024-06-14 08:01:09

标题: Ada-HGNN：可扩展超图神经网络的自适应采样

摘要: 超图是一种有效的模型，用于描述各种现实世界场景中的复杂连接，从社交到生物网络。超图神经网络（HGNNs）的发展已经成为管理数据中复杂关联的宝贵方法，尽管由于内存限制，可伸缩性是一个显著的挑战。在这项研究中，我们介绍了一种专门为超图设计的新的自适应采样策略，以高效处理它们的独特复杂性。我们还提出了一种随机超边增强（RHA）技术和一个额外的多层感知机（MLP）模块，以提高我们方法的鲁棒性和泛化能力。通过对真实世界数据集的彻底实验，我们的方法的有效性得到了证明，显著降低了计算和内存需求，同时保持了与传统HGNNs和其他基准模型相似的性能水平。这项研究为在广泛应用中提高HGNNs的可伸缩性和效力铺平了道路。我们还将使我们的代码库公开可访问。

更新时间: 2024-06-14 08:01:09

领域: cs.LG

下载: http://arxiv.org/abs/2405.13372v3

SemantIC: Semantic Interference Cancellation Towards 6G Wireless Communications

This letter proposes a novel anti-interference technique, semantic interference cancellation (SemantIC), for enhancing information quality towards the sixth-generation (6G) wireless networks. SemantIC only requires the receiver to concatenate the channel decoder with a semantic auto-encoder. This constructs a turbo loop which iteratively and alternately eliminates noise in the signal domain and the semantic domain. From the viewpoint of network information theory, the neural network of the semantic auto-encoder stores side information by training, and provides side information in iterative decoding, as an implementation of the Wyner-Ziv theorem. Simulation results verify the performance improvement by SemantIC without extra channel resource cost.

Updated: 2024-06-14 07:59:15

标题: SemantIC：面向6G无线通信的语义干扰抵消

摘要: 这封信提出了一种新颖的抗干扰技术，语义干扰抵消（SemantIC），用于增强信息质量，以应对第六代（6G）无线网络。SemantIC只需要接收端将信道解码器与语义自编码器连接起来。这构建了一个迭代循环，交替消除信号域和语义域中的噪声。从网络信息理论的角度来看，语义自编码器的神经网络通过训练存储了辅助信息，并在迭代解码中提供辅助信息，作为Wyner-Ziv定理的实现。模拟结果证实了SemantIC在不增加额外信道资源成本的情况下的性能改善。

更新时间: 2024-06-14 07:59:15

领域: eess.SP,cs.AI,cs.IT,cs.LG,cs.NI,math.IT

下载: http://arxiv.org/abs/2310.12768v2

Quantum Merkle Trees

Committing to information is a central task in cryptography, where a party (typically called a prover) stores a piece of information (e.g., a bit string) with the promise of not changing it. This information can be accessed by another party (typically called the verifier), who can later learn the information and verify that it was not meddled with. Merkle trees are a well-known construction for doing so in a succinct manner, in which the verifier can learn any part of the information by receiving a short proof from the honest prover. Despite its significance in classical cryptography, there was no quantum analog of the Merkle tree. A direct generalization using the Quantum Random Oracle Model (QROM) does not seem to be secure. In this work, we propose the quantum Merkle tree. It is based on what we call the Quantum Haar Random Oracle Model (QHROM). In QHROM, both the prover and the verifier have access to a Haar random quantum oracle $G$ and its inverse. Using the quantum Merkle tree, we propose a succinct quantum argument for the Gap-$k$-Local-Hamiltonian problem. Assuming the Quantum PCP conjecture is true, this succinct argument extends to all of QMA. This work raises a number of interesting open research problems.

Updated: 2024-06-14 07:55:24

标题: 量子 Merkle 树

摘要: 致力于信息的承诺是密码学中的一个核心任务，其中一方（通常称为证明者）存储一段信息（例如，一个比特串），并承诺不改变它。另一方（通常称为验证者）可以访问这些信息，以后可以学习这些信息并验证它没有被篡改。 Merkle树是一种用简洁方式实现这一目的的著名构造，验证者可以通过从诚实的证明者接收一个简短的证明来学习信息的任何部分。尽管在经典密码学中具有重要意义，但没有量子Merkle树的类比。直接使用量子随机Oracle模型（QROM）进行的泛化似乎不安全。在这项工作中，我们提出了量子Merkle树。它基于我们所称的量子Haar随机Oracle模型（QHROM）。在QHROM中，证明者和验证者都可以访问一个Haar随机量子Oracle $G$及其逆。使用量子Merkle树，我们提出了Gap-$k$-Local-Hamiltonian问题的简洁量子论证。假设量子PCP猜想成立，这个简洁的论证扩展到了所有的QMA。这项工作提出了一些有趣的开放性研究问题。

更新时间: 2024-06-14 07:55:24

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2112.14317v4

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA and Timeloop on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding up the search by 1.7x and 24x compared to LOMA and Timeloop, respectively.

Updated: 2024-06-14 07:49:38

标题: SALSA: 基于模拟退火的DNN加速器循环排序调度器

摘要: 为了满足深度神经网络对计算能力日益增长的需求，提出了多种专门的硬件架构。每个DNN层应该被映射到具有最有效调度的硬件上，然而，目前的最先进调度器在合理的时间内很难为所有DNN-HW组合提供始终如一的最佳调度。本文提出了SALSA，一种快速的双引擎调度器，用于生成既适用于均匀映射又适用于不均匀映射的最佳执行计划。我们引入了一种新策略，将穷尽搜索与模拟退火相结合，以解决不同层间循环顺序设计空间大小的动态特性。SALSA在5个不同的DNN上与两个最先进的调度器LOMA和Timeloop进行了广泛的基准测试，平均而言，SALSA找到的调度能耗比LOMA和Timeloop分别降低了11.9%和7.6%，同时搜索速度分别加快了1.7倍和24倍。

更新时间: 2024-06-14 07:49:38

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2304.12931v2

Understanding Inter-Session Intentions via Complex Logical Reasoning

Understanding user intentions is essential for improving product recommendations, navigation suggestions, and query reformulations. However, user intentions can be intricate, involving multiple sessions and attribute requirements connected by logical operators such as And, Or, and Not. For instance, a user may search for Nike or Adidas running shoes across various sessions, with a preference for purple. In another example, a user may have purchased a mattress in a previous session and is now looking for a matching bed frame without intending to buy another mattress. Existing research on session understanding has not adequately addressed making product or attribute recommendations for such complex intentions. In this paper, we present the task of logical session complex query answering (LS-CQA), where sessions are treated as hyperedges of items, and we frame the problem of complex intention understanding as an LS-CQA task on an aggregated hypergraph of sessions, items, and attributes. This is a unique complex query answering task with sessions as ordered hyperedges. We also introduce a new model, the Logical Session Graph Transformer (LSGT), which captures interactions among items across different sessions and their logical connections using a transformer structure. We analyze the expressiveness of LSGT and prove the permutation invariance of the inputs for the logical operators. By evaluating LSGT on three datasets, we demonstrate that it achieves state-of-the-art results.

Updated: 2024-06-14 07:47:47

标题: 通过复杂逻辑推理理解跨会话意图

摘要: 理解用户意图对于改进产品推荐、导航建议和查询重构至关重要。然而，用户意图可能很复杂，涉及多个会话和由逻辑运算符如And、Or和Not连接的属性要求。例如，用户可能在不同的会话中搜索Nike或Adidas跑鞋，偏好紫色。在另一个例子中，用户可能在先前的会话中购买了床垫，现在正在寻找一个匹配的床架，而不是打算购买另一个床垫。现有的关于会话理解的研究并没有充分解决为这种复杂意图制定产品或属性推荐的问题。在本文中，我们提出了逻辑会话复杂查询回答（LS-CQA）的任务，其中将会话视为项目的超边，并将复杂意图理解问题构建为在会话、项目和属性的聚合超图上的LS-CQA任务。这是一个具有会话作为有序超边的独特复杂查询回答任务。我们还介绍了一个新模型，逻辑会话图变换器（LSGT），它利用变压器结构捕捉不同会话之间项目之间的相互作用和它们的逻辑连接。我们分析了LSGT的表达能力，并证明了逻辑运算符的输入的排列不变性。通过在三个数据集上评估LSGT，我们证明它实现了最先进的结果。

更新时间: 2024-06-14 07:47:47

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2312.13866v2

DeltaPhi: Learning Physical Trajectory Residual for PDE Solving

Although neural operator networks theoretically approximate any operator mapping, the limited generalization capability prevents them from learning correct physical dynamics when potential data biases exist, particularly in the practical PDE solving scenario where the available data amount is restricted or the resolution is extremely low. To address this issue, we propose and formulate the Physical Trajectory Residual Learning (DeltaPhi), which learns to predict the physical residuals between the pending solved trajectory and a known similar auxiliary trajectory. First, we transform the direct operator mapping between input-output function fields in original training data to residual operator mapping between input function pairs and output function residuals. Next, we learn the surrogate model for the residual operator mapping based on existing neural operator networks. Additionally, we design helpful customized auxiliary inputs for efficient optimization. Through extensive experiments, we conclude that, compared to direct learning, physical residual learning is preferred for PDE solving.

Updated: 2024-06-14 07:45:07

标题: DeltaPhi：学习PDE求解的物理轨迹残差

摘要: 虽然神经算子网络在理论上可以逼近任何算子映射，但受限于有限的泛化能力，当存在潜在数据偏差时，特别是在实际的PDE求解场景中，可用数据量受限或分辨率极低时，阻碍了它们学习正确的物理动力学。为解决这一问题，我们提出并制定了物理轨迹残差学习（DeltaPhi）方法，该方法学习预测待解轨迹与已知相似辅助轨迹之间的物理残差。首先，我们将原始训练数据中输入输出函数字段之间的直接算子映射转换为输入函数对和输出函数残差之间的残差算子映射。接下来，我们基于现有的神经算子网络学习残差算子映射的代理模型。此外，我们设计了有用的定制辅助输入以实现高效优化。通过大量实验，我们得出结论，与直接学习相比，物理残差学习更适用于PDE求解。

更新时间: 2024-06-14 07:45:07

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.09795v1

Lifted Algorithms for Symmetric Weighted First-Order Model Sampling

Weighted model counting (WMC) is the task of computing the weighted sum of all satisfying assignments (i.e., models) of a propositional formula. Similarly, weighted model sampling (WMS) aims to randomly generate models with probability proportional to their respective weights. Both WMC and WMS are hard to solve exactly, falling under the $\#\mathsf{P}$-hard complexity class. However, it is known that the counting problem may sometimes be tractable, if the propositional formula can be compactly represented and expressed in first-order logic. In such cases, model counting problems can be solved in time polynomial in the domain size, and are known as domain-liftable. The following question then arises: Is it also the case for weighted model sampling? This paper addresses this question and answers it affirmatively. Specifically, we prove the domain-liftability under sampling for the two-variables fragment of first-order logic with counting quantifiers in this paper, by devising an efficient sampling algorithm for this fragment that runs in time polynomial in the domain size. We then further show that this result continues to hold even in the presence of cardinality constraints. To empirically verify our approach, we conduct experiments over various first-order formulas designed for the uniform generation of combinatorial structures and sampling in statistical-relational models. The results demonstrate that our algorithm outperforms a start-of-the-art WMS sampler by a substantial margin, confirming the theoretical results.

Updated: 2024-06-14 07:39:07

标题: 对称加权一阶模型抽样的提升算法

摘要: 加权模型计数（WMC）是计算命题公式的所有满足赋值（即模型）的加权和的任务。类似地，加权模型抽样（WMS）旨在随机生成具有与其权重成比例的概率的模型。WMC和WMS都很难精确解决，属于$\#\mathsf{P}$-hard复杂性类。然而，已知计数问题有时可能是可处理的，如果命题公式可以被紧凑地表示并在一阶逻辑中表达。在这种情况下，模型计数问题可以在域大小的多项式时间内解决，并且被称为可提升域。接下来的问题是：加权模型抽样也是这种情况吗？本文回答了这个问题，并肯定地回答了这个问题。具体来说，我们证明了本文中使用计数量词的一阶逻辑的双变量片段在抽样下的域提升性质，通过设计一个在域大小的多项式时间内运行的有效抽样算法。然后我们进一步展示，即使在存在基数约束的情况下，这个结果仍然成立。为了从经验上验证我们的方法，我们对为组合结构的均匀生成和统计关系模型中的抽样而设计的各种一阶公式进行实验。结果表明，我们的算法在很大程度上优于最先进的WMS抽样器，证实了理论结果。

更新时间: 2024-06-14 07:39:07

领域: cs.AI,cs.LO,68T27,I.2.4

下载: http://arxiv.org/abs/2308.08828v3

Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning

Biological neural networks are characterized by their high degree of plasticity, a core property that enables the remarkable adaptability of natural organisms. Importantly, this ability affects both the synaptic strength and the topology of the nervous systems. Artificial neural networks, on the other hand, have been mainly designed as static, fully connected structures that can be notoriously brittle in the face of changing environments and novel inputs. Building on previous works on Neural Developmental Programs (NDPs), we propose a class of self-organizing neural networks capable of synaptic and structural plasticity in an activity and reward-dependent manner which we call Lifelong Neural Developmental Program (LNDP). We present an instance of such a network built on the graph transformer architecture and propose a mechanism for pre-experience plasticity based on the spontaneous activity of sensory neurons. Our results demonstrate the ability of the model to learn from experiences in different control tasks starting from randomly connected or empty networks. We further show that structural plasticity is advantageous in environments necessitating fast adaptation or with non-stationary rewards.

Updated: 2024-06-14 07:36:21

标题: 进化中的自组装神经网络：从自发活动到经验依赖学习

摘要: 生物神经网络的特点是其高度可塑性，这是一种核心属性，使自然生物具有出色的适应性。重要的是，这种能力影响了神经系统的突触强度和拓扑结构。另一方面，人工神经网络主要被设计为静态的、完全连接的结构，在面对变化环境和新输入时往往表现脆弱。在以往的神经发育程序（NDPs）研究基础上，我们提出了一类自组织神经网络，能够根据活动和奖励的依赖性实现突触和结构可塑性，我们称之为终身神经发育程序（LNDP）。我们提出了一个基于图形转换器架构构建的这种网络的实例，并提出了一个基于感觉神经的自发活动的预先经验可塑性机制。我们的结果表明，该模型能够从不同控制任务的经验中学习，从随机连接或空网络开始。我们进一步表明，在需要快速适应或具有非静态奖励的环境中，结构可塑性是有益的。

更新时间: 2024-06-14 07:36:21

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2406.09787v1

OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Language Model (LLM) analysis to comprehensively understand and classify harmful memes. Utilizing the BLIP model for image captioning, PP-OCR and TrOCR for text recognition across multiple languages, and the Qwen LLM for nuanced language understanding, our system is capable of identifying harmful content in memes created in English, Chinese, Malay, and Tamil. To enhance the system's performance, we fine-tuned our approach by leveraging additional data labeled using GPT-4V, aiming to distill the understanding capability of GPT-4V for harmful memes to our system. Our framework achieves top-1 at the public leaderboard of the Online Safety Prize Challenge hosted by AI Singapore, with the AUROC as 0.7749 and accuracy as 0.7087, significantly ahead of the other teams. Notably, our approach outperforms previous benchmarks, with FLAVA achieving an AUROC of 0.5695 and VisualBERT an AUROC of 0.5561.

Updated: 2024-06-14 07:28:02

标题: OSPC：利用大型语言模型作为催化剂检测有害迷因

摘要: 模因迅速传播个人观点和立场在互联网上，同时也带来了在传播社会偏见和偏见方面的重大挑战。本研究提出了一种新颖的方法来检测有害的模因，特别是在新加坡的多元文化和多语言环境中。我们的方法结合了图像字幕、光学字符识别(OCR)和大型语言模型(LLM)分析，全面理解和分类有害的模因。利用BLIP模型进行图像字幕，PP-OCR和TrOCR用于跨多种语言的文本识别，以及Qwen LLM用于微妙语言理解，我们的系统能够识别用英语、中文、马来语和泰米尔语创建的有害内容的模因。为了提高系统的性能，我们通过利用额外使用GPT-4V标记的数据来微调我们的方法，旨在将GPT-4V对有害模因的理解能力提炼到我们的系统中。我们的框架在由AI Singapore主办的在线安全奖挑战的公共排行榜上获得了前1名，AUROC为0.7749，准确度为0.7087，明显领先于其他团队。值得注意的是，我们的方法超越了先前的基准，FLAVA实现了0.5695的AUROC，VisualBERT实现了0.5561的AUROC。

更新时间: 2024-06-14 07:28:02

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.09779v1

Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing

Large language models (LLMs) are increasingly being adopted in a wide range of real-world applications. Despite their impressive performance, recent studies have shown that LLMs are vulnerable to deliberately crafted adversarial prompts even when aligned via Reinforcement Learning from Human Feedback or supervised fine-tuning. While existing defense methods focus on either detecting harmful prompts or reducing the likelihood of harmful responses through various means, defending LLMs against jailbreak attacks based on the inner mechanisms of LLMs remains largely unexplored. In this work, we investigate how LLMs response to harmful prompts and propose a novel defense method termed \textbf{L}ayer-specific \textbf{Ed}iting (LED) to enhance the resilience of LLMs against jailbreak attacks. Through LED, we reveal that several critical \textit{safety layers} exist among the early layers of LLMs. We then show that realigning these safety layers (and some selected additional layers) with the decoded safe response from selected target layers can significantly improve the alignment of LLMs against jailbreak attacks. Extensive experiments across various LLMs (e.g., Llama2, Mistral) show the effectiveness of LED, which effectively defends against jailbreak attacks while maintaining performance on benign prompts. Our code is available at \url{https://github.com/ledllm/ledllm}.

Updated: 2024-06-14 07:27:26

标题: 通过层特定编辑保护大型语言模型抵御越狱攻击

摘要: 大型语言模型（LLMs）越来越多地被采用在各种实际应用中。尽管它们表现出色，最近的研究表明，LLMs容易受到故意制作的对抗性提示的攻击，即使通过人类反馈的强化学习或监督微调进行了对齐。虽然现有的防御方法集中在检测有害提示或通过各种手段减少有害响应的可能性，但基于LLMs内部机制的越狱攻击的防御方法仍然未被充分探讨。在这项工作中，我们研究了LLMs如何响应有害提示，并提出了一种新颖的防御方法，称为\textbf{L}ayer-specific \textbf{Ed}iting（LED），以增强LLMs对越狱攻击的抵抗力。通过LED，我们发现在LLMs的早期层中存在若干关键的\textit{安全层}。然后，我们展示了重新对齐这些安全层（以及一些选择的额外层）与从选择的目标层解码的安全响应可以显著提高LLMs对越狱攻击的对齐性。在各种LLMs上进行的大量实验（例如Llama2、Mistral）展示了LED的有效性，有效地防御了越狱攻击，同时在良性提示上保持了性能。我们的代码可在\url{https://github.com/ledllm/ledllm}获取。

更新时间: 2024-06-14 07:27:26

领域: cs.AI

下载: http://arxiv.org/abs/2405.18166v2

Faster Convergence on Heterogeneous Federated Edge Learning: An Adaptive Sidelink-Assisted Data Multicasting Approach

Federated Edge Learning (FEEL) emerges as a pioneering distributed machine learning paradigm for the 6G Hyper-Connectivity, harnessing data from the Internet of Things (IoT) devices while upholding data privacy. However, current FEEL algorithms struggle with non-independent and non-identically distributed (non-IID) data, leading to elevated communication costs and compromised model accuracy. To address these statistical imbalances within FEEL, we introduce a clustered data sharing framework, mitigating data heterogeneity by selectively sharing partial data from cluster heads to trusted associates through sidelink-aided multicasting. The collective communication pattern is integral to FEEL training, where both cluster formation and the efficiency of communication and computation impact training latency and accuracy simultaneously. To tackle the strictly coupled data sharing and resource optimization, we decompose the overall optimization problem into the clients clustering and effective data sharing subproblems. Specifically, a distribution-based adaptive clustering algorithm (DACA) is devised basing on three deductive cluster forming conditions, which ensures the maximum sharing yield. Meanwhile, we design a stochastic optimization based joint computed frequency and shared data volume optimization (JFVO) algorithm, determining the optimal resource allocation with an uncertain objective function. The experiments show that the proposed framework facilitates FEEL on non-IID datasets with faster convergence rate and higher model accuracy in a limited communication environment.

Updated: 2024-06-14 07:22:39

标题: 异构联邦边缘学习上的更快收敛：一种自适应侧链辅助数据多播方法

摘要: 联邦边缘学习（FEEL）作为6G超连接的开创性分布式机器学习范例出现，利用物联网设备的数据并维护数据隐私。然而，当前的FEEL算法在处理非独立和非同分布（non-IID）数据时遇到困难，导致通信成本增加和模型精度下降。为了解决FEEL中的统计不平衡问题，我们引入了一个聚类数据共享框架，通过侧链辅助多播将部分数据从簇首选择性地共享给可信的合作伙伴，从而减轻数据异质性。集体通信模式对于FEEL训练至关重要，其中簇形成和通信计算效率同时影响训练延迟和准确性。为了解决严格耦合的数据共享和资源优化问题，我们将整体优化问题分解为客户端聚类和有效数据共享子问题。具体来说，基于三个演绎的簇形成条件设计了一个基于分布的自适应聚类算法（DACA），确保最大的共享产量。同时，我们设计了一种基于随机优化的联合计算频率和共享数据量优化（JFVO）算法，确定了具有不确定目标函数的最优资源分配。实验证明，提出的框架在有限通信环境中加快了非IID数据集上FEEL的收敛速度，并提高了模型精度。

更新时间: 2024-06-14 07:22:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.09776v1

FusionBench: A Comprehensive Benchmark of Deep Model Fusion

Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations tend to be inconsistent and often inadequate to validate their effectiveness and robustness against distribution shifts. To address this issue, we introduce FusionBench, which is the first comprehensive benchmark dedicated to deep model fusion. FusionBench covers a wide range of tasks, including open-vocabulary image classification, text classification, and text-to-text generation. Each category includes up to eight tasks with corresponding task-specific models, featuring both full fine-tuning and LoRA fine-tuning, as well as models of different sizes, to ensure fair and balanced comparisons of various multi-task model fusion techniques across different tasks, model scales, and fine-tuning strategies. We implement and evaluate a broad spectrum of deep model fusion techniques. These techniques range from model ensemble methods, which combine the predictions to improve the overall performance, to model merging, which integrates different models into a single one, and model mixing methods, which upscale or recombine the components of the original models. FusionBench now contains 26 distinct tasks, 74 fine-tuned models, and 16 fusion techniques, and we are committed to consistently expanding the benchmark with more tasks, models, and fusion techniques. In addition, we offer a well-documented set of resources and guidelines to aid researchers in understanding and replicating the benchmark results. Homepage https://github.com/tanganke/fusion_bench

Updated: 2024-06-14 07:19:51

标题: 融合基准：一个深度模型融合的综合基准

摘要: 深度模型融合是一种新兴的技术，它以经济高效和数据高效的方式将多个深度神经网络的预测或参数统一到一个单一模型中。这使得统一模型能够利用原始模型的优势，潜在地超越它们的性能。尽管引入了各种深度模型融合技术，但它们的评估往往不一致，往往不足以验证其对分布转移的有效性和鲁棒性。为了解决这个问题，我们介绍了FusionBench，这是专门为深度模型融合设计的第一个全面基准。FusionBench涵盖了广泛的任务，包括开放词汇图像分类、文本分类和文本生成。每个类别包括多达八个任务，具有相应的任务特定模型，包括完全微调和LoRA微调，以及不同大小的模型，以确保在不同任务、模型规模和微调策略之间进行公平和平衡的比较各种多任务模型融合技术。我们实现并评估了广泛的深度模型融合技术。这些技术范围从模型集成方法，将预测组合以提高整体性能，到模型合并，将不同模型整合成一个模型，以及模型混合方法，将原始模型的组件进行升级或重组。FusionBench现在包含26个独特任务、74个微调模型和16种融合技术，我们致力于不断扩展基准，增加更多任务、模型和融合技术。此外，我们提供了一套完整的资源和指南，帮助研究人员理解和复制基准结果。主页https://github.com/tanganke/fusion_bench

更新时间: 2024-06-14 07:19:51

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.03280v3

Research on Edge Detection of LiDAR Images Based on Artificial Intelligence Technology

With the widespread application of Light Detection and Ranging (LiDAR) technology in fields such as autonomous driving, robot navigation, and terrain mapping, the importance of edge detection in LiDAR images has become increasingly prominent. Traditional edge detection methods often face challenges in accuracy and computational complexity when processing LiDAR images. To address these issues, this study proposes an edge detection method for LiDAR images based on artificial intelligence technology. This paper first reviews the current state of research on LiDAR technology and image edge detection, introducing common edge detection algorithms and their applications in LiDAR image processing. Subsequently, a deep learning-based edge detection model is designed and implemented, optimizing the model training process through preprocessing and enhancement of the LiDAR image dataset. Experimental results indicate that the proposed method outperforms traditional methods in terms of detection accuracy and computational efficiency, showing significant practical application value. Finally, improvement strategies are proposed for the current method's shortcomings, and the improvements are validated through experiments.

Updated: 2024-06-14 07:18:54

标题: 基于人工智能技术的LiDAR图像边缘检测研究

摘要: 随着光探测与测距（LiDAR）技术在自动驾驶、机器人导航和地形绘制等领域的广泛应用，LiDAR图像中边缘检测的重要性日益凸显。传统的边缘检测方法在处理LiDAR图像时往往面临准确性和计算复杂性方面的挑战。为解决这些问题，本研究提出了一种基于人工智能技术的LiDAR图像边缘检测方法。本文首先回顾了当前关于LiDAR技术和图像边缘检测的研究现状，介绍了常见的边缘检测算法及其在LiDAR图像处理中的应用。随后，设计并实现了基于深度学习的边缘检测模型，通过对LiDAR图像数据集的预处理和增强优化模型训练过程。实验结果表明，所提出的方法在检测准确性和计算效率方面优于传统方法，具有显著的实际应用价值。最后，针对当前方法的不足提出改进策略，并通过实验证实改进效果。

更新时间: 2024-06-14 07:18:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.09773v1

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for learning Pareto set, including (1) evolutionary, hypernetworks, and hypervolume-maximization methods, are computationally expensive and have restricted scalability to large models; (2) Scalarization algorithms, where a separate model is trained for each objective ray, which is inefficient for learning the entire Pareto set and fails to capture the objective trade-offs effectively. Inspired by the recent success of model merging, we propose a practical and scalable approach to Pareto set learning problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives and closely approximate the entire Pareto set of large neural networks. Once the routers are learned and a preference vector is set, the MoE module can be unloaded, thus no additional computational cost is introduced during inference. We conduct extensive experiments on vision and language tasks using large-scale models such as CLIP-ViT and GPT-2. The experimental results demonstrate that our method efficiently approximates the entire Pareto front of large models. Using only hundreds of trainable parameters of the MoE routers, our method even has lower memory usage compared to linear scalarization and algorithms that learn a single Pareto optimal solution, and are scalable to both the number of objectives and the size of the model.

Updated: 2024-06-14 07:16:18

标题: 走向高效帕累托集近似：基于专家混合模型融合

摘要: 解决大型深度神经网络的多目标优化问题是一项具有挑战性的任务，原因是损失景观的复杂性以及训练和评估模型的昂贵计算成本。对大型模型的有效帕累托前沿逼近使得多目标优化能够用于诸如多任务学习和权衡分析等各种任务。现有的学习帕累托集的算法，包括（1）进化、超网络和超体积最大化方法，计算成本高且对大型模型的可扩展性受限；（2）标量化算法，其中为每个目标射线训练一个单独的模型，这对于学习整个帕累托集是低效的，且未能有效捕捉目标权衡。受模型融合最近取得的成功启发，我们提出了一种实用和可扩展的帕累托集学习问题解决方法，通过基于专家混合（MoE）的模型融合。通过组合专业单任务模型的权重，MoE模块能够有效捕捉多个目标之间的权衡，并且密切逼近大型神经网络的整个帕累托集。一旦路由器学习完成并设置了偏好向量，MoE模块可以卸载，因此在推断过程中不会引入额外的计算成本。我们在视觉和语言任务上使用大型模型，如CLIP-ViT和GPT-2进行了大量实验。实验结果表明，我们的方法能够高效地逼近大型模型的整个帕累托前沿。仅使用MoE路由器的可训练参数，我们的方法甚至比线性标量化和学习单个帕累托最优解的算法具有更低的内存使用，且可扩展到目标数量和模型大小。

更新时间: 2024-06-14 07:16:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09770v1

Bayesian Conditioned Diffusion Models for Inverse Problems

Diffusion models have recently been shown to excel in many image reconstruction tasks that involve inverse problems based on a forward measurement operator. A common framework uses task-agnostic unconditional models that are later post-conditioned for reconstruction, an approach that typically suffers from suboptimal task performance. While task-specific conditional models have also been proposed, current methods heuristically inject measured data as a naive input channel that elicits sampling inaccuracies. Here, we address the optimal conditioning of diffusion models for solving challenging inverse problems that arise during image reconstruction. Specifically, we propose a novel Bayesian conditioning technique for diffusion models, BCDM, based on score-functions associated with the conditional distribution of desired images given measured data. We rigorously derive the theory to express and train the conditional score-function. Finally, we show state-of-the-art performance in image dealiasing, deblurring, super-resolution, and inpainting with the proposed technique.

Updated: 2024-06-14 07:13:03

标题: 贝叶斯条件扩散模型用于逆问题

摘要: 最近显示，扩散模型在许多涉及基于前向测量算子的逆问题的图像重建任务中表现出色。一个常见的框架使用任务不可知的无条件模型，后续再进行后验条件化重建，这种方法通常会导致次优的任务性能。虽然也提出了特定任务的条件模型，但当前的方法启发式地将测量数据注入作为天真的输入通道，导致采样不准确。在这里，我们解决了扩散模型的最佳条件化问题，用于解决图像重建过程中出现的具有挑战性的逆问题。具体来说，我们提出了一种基于条件分布的得到的图像给定测量数据的分数函数相关的贝叶斯条件化技术，称为BCDM。我们严格推导了表达和训练条件分数函数的理论。最后，我们展示了所提出技术在图像去混叠、去模糊、超分辨率和修补方面的最新性能。

更新时间: 2024-06-14 07:13:03

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.09768v1

Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning

The trustworthy machine learning (ML) community is increasingly recognizing the crucial need for models capable of selectively 'unlearning' data points after training. This leads to the problem of machine unlearning (MU), aiming to eliminate the influence of chosen data points on model performance, while still maintaining the model's utility post-unlearning. Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting, ignoring the vital inquiry into which subset should be chosen to truly gauge the authenticity of unlearning performance. To tackle this issue, we introduce a new evaluative angle for MU from an adversarial viewpoint. We propose identifying the data subset that presents the most significant challenge for influence erasure, i.e., pinpointing the worst-case forget set. Utilizing a bi-level optimization principle, we amplify unlearning challenges at the upper optimization level to emulate worst-case scenarios, while simultaneously engaging in standard training and unlearning at the lower level, achieving a balance between data influence erasure and model utility. Our proposal offers a worst-case evaluation of MU's resilience and effectiveness. Through extensive experiments across different datasets (including CIFAR-10, 100, CelebA, Tiny ImageNet, and ImageNet) and models (including both image classifiers and generative models), we expose critical pros and cons in existing (approximate) unlearning strategies. Our results illuminate the complex challenges of MU in practice, guiding the future development of more accurate and robust unlearning algorithms. The code is available at https://github.com/OPTML-Group/Unlearn-WorstCase.

Updated: 2024-06-14 07:03:36

标题: 挑战忘却：揭示机器遗忘中的最坏情况遗忘集

摘要: 值得信赖的机器学习（ML）社区越来越认识到，在训练后能够有选择性地“遗忘”数据点的模型至关重要。这导致了机器遗忘（MU）的问题，旨在消除选择的数据点对模型性能的影响，同时仍保持模型的实用性。尽管存在各种用于数据影响消除的MU方法，但评估主要集中在随机数据遗忘上，忽视了对应该选择哪个子集来真正衡量遗忘性能的重要探讨。为了解决这个问题，我们从对抗的角度引入了一个新的评估角度来评估MU。我们提出确定呈现最重要挑战的数据子集，即确定最坏情况遗忘集。利用双层优化原则，在上层优化层放大遗忘挑战，模拟最坏情况，同时在下层进行标准训练和遗忘，实现数据影响消除和模型实用性之间的平衡。我们的提议提供了MU韧性和有效性的最坏情况评估。通过在不同数据集（包括CIFAR-10、100、CelebA、Tiny ImageNet和ImageNet）和模型（包括图像分类器和生成模型）上进行广泛实验，我们揭示了现有（近似）遗忘策略的关键优缺点。我们的结果阐明了MU在实践中的复杂挑战，指导未来开发更准确和健壮的遗忘算法。代码可在https://github.com/OPTML-Group/Unlearn-WorstCase找到。

更新时间: 2024-06-14 07:03:36

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2403.07362v2

Harmonics of Learning: Universal Fourier Features Emerge in Invariant Networks

In this work, we formally prove that, under certain conditions, if a neural network is invariant to a finite group then its weights recover the Fourier transform on that group. This provides a mathematical explanation for the emergence of Fourier features -- a ubiquitous phenomenon in both biological and artificial learning systems. The results hold even for non-commutative groups, in which case the Fourier transform encodes all the irreducible unitary group representations. Our findings have consequences for the problem of symmetry discovery. Specifically, we demonstrate that the algebraic structure of an unknown group can be recovered from the weights of a network that is at least approximately invariant within certain bounds. Overall, this work contributes to a foundation for an algebraic learning theory of invariant neural network representations.

Updated: 2024-06-14 07:03:08

标题: 学习的谐波：不变网络中出现的通用傅立叶特征

摘要: 在这项工作中，我们正式证明，在某些条件下，如果一个神经网络对一个有限群具有不变性，那么它的权重将恢复该群上的傅立叶变换。这为傅立叶特征的出现提供了一个数学解释 - 这是生物学和人工学习系统中普遍存在的现象。即使对于非交换群，这些结果也成立，在这种情况下，傅立叶变换编码了所有不可约酉群表示。我们的研究结果对于对称性发现问题具有重要意义。具体来说，我们证明了从至少在某些范围内大致不变的网络的权重中可以恢复未知群的代数结构。总的来说，这项工作为不变神经网络表示的代数学习理论奠定了基础。

更新时间: 2024-06-14 07:03:08

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2312.08550v3

Towards Full Integration of Artificial Intelligence in Colon Capsule Endoscopy's Pathway

Despite recent surge of interest in deploying colon capsule endoscopy (CCE) for early diagnosis of colorectal diseases, there remains a large gap between the current state of CCE in clinical practice, and the state of its counterpart optical colonoscopy (OC). Our study is aimed at closing this gap, by focusing on the full integration of AI in CCE's pathway, where image processing steps linked to the detection, localization and characterisation of important findings are carried out autonomously using various AI algorithms. We developed a recognition network, that with an impressive sensitivity of 99.9%, a specificity of 99.4%, and a negative predictive value (NPV) of 99.8%, detected colorectal polyps. After recognising a polyp within a sequence of images, only those images containing polyps were fed into two parallel independent networks for characterisation, and estimation of the size of those important findings. The characterisation network reached a sensitivity of 82% and a specificity of 80% in classifying polyps to two groups, namely neoplastic vs. non-neoplastic. The size estimation network reached an accuracy of 88% in correctly segmenting the polyps. By automatically incorporating this crucial information into CCE's pathway, we moved a step closer towards the full integration of AI in CCE's routine clinical practice.

Updated: 2024-06-14 06:59:37

标题: 朝向在结肠胶囊内镜检查路径中全面整合人工智能

摘要: 尽管最近对将结肠胶囊内镜（CCE）用于早期诊断结肠疾病的兴趣激增，但目前CCE在临床实践中的状态与其同类的光学结肠镜（OC）之间仍存在巨大差距。我们的研究旨在通过全面整合人工智能（AI）到CCE的路径中来弥合这一差距，通过使用各种AI算法，自主地进行与重要发现的检测、定位和表征相关的图像处理步骤。我们开发了一个识别网络，其敏感性达到99.9％，特异性达到99.4％，阴性预测值（NPV）达到99.8％，可以检测结肠息肉。在识别图像序列中的息肉后，只有包含息肉的图像被输入到两个独立的平行网络中用于表征和估计这些重要发现的大小。表征网络在将息肉分类为两组（恶性 vs. 非恶性）方面达到82％的敏感性和80％的特异性。大小估计网络在正确分割息肉方面达到88％的准确率。通过自动将这些关键信息纳入CCE的路径中，我们朝着在CCE的日常临床实践中全面整合AI迈出了一步。

更新时间: 2024-06-14 06:59:37

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.09761v1

Bootstrapping Language Models with DPO Implicit Rewards

Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that this implicit reward model can by itself be used in a bootstrapping fashion to further align the LLM. Our approach is to use the rewards from a current LLM model to construct a preference dataset, which is then used in subsequent DPO rounds. We incorporate refinements that debias the length of the responses and improve the quality of the preference dataset to further improve our approach. Our approach, named self-alignment with DPO ImpliCit rEwards (DICE), shows great improvements in alignment and achieves superior performance than Gemini Pro on AlpacaEval 2, reaching 27.55% length-controlled win rate against GPT-4 Turbo, but with only 8B parameters and no external feedback. Our code is available at https://github.com/sail-sg/dice.

Updated: 2024-06-14 06:57:18

标题: 使用DPO隐式奖励对语言模型进行引导

摘要: 大型语言模型（LLMs）中的人类对齐是一个活跃的研究领域。最近一项开创性的工作，直接偏好优化（DPO），通过绕过强化学习中的奖励学习阶段大大简化了过去工作中来自人类反馈的强化学习（RLHF）。DPO在训练后提供了一个隐式奖励模型。在这项工作中，我们做出了一个新颖的观察，即这个隐式奖励模型本身可以被用来以引导方式进一步对齐LLM。我们的方法是利用当前LLM模型的奖励来构建一个偏好数据集，然后在随后的DPO轮次中使用。我们加入了去偏见响应长度和提高偏好数据集质量的改进措施，以进一步改进我们的方法。我们的方法，命名为DICE自对齐与DPO隐式奖励（DICE），在对齐方面取得了巨大进展，并在AlpacaEval 2上取得了比Gemini Pro更优异的表现，以27.55%的长度控制胜率击败了GPT-4 Turbo，但仅具有8B参数且没有外部反馈。我们的代码可在https://github.com/sail-sg/dice 找到。

更新时间: 2024-06-14 06:57:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.09760v1

Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages

Verification-aware programming languages such as Dafny and F* provide means to formally specify and prove properties of programs. Although the problem of checking an implementation against a specification can be defined mechanically, there is no algorithmic way of ensuring the correctness of the user-intent formalization for programs -- that a specification adheres to the user's intent behind the program. The intent or requirement is expressed informally in natural language and the specification is a formal artefact. The advent of large language models (LLMs) has made strides bridging the gap between informal intent and formal program implementations recently, driven in large parts due to benchmarks and automated metrics for evaluation. Recent work has proposed evaluating {\it user-intent formalization} problem for mainstream programming languages~\cite{endres-fse24}. However, such an approach does not readily extend to verification-aware languages that support rich specifications (containing quantifiers and ghost variables) that cannot be evaluated through dynamic execution. Previous work also required generating program mutants using LLMs to create the benchmark. We advocate an alternate approach of {\it symbolically testing specifications} to provide an intuitive metric for evaluating the quality of specifications for verification-aware languages. We demonstrate that our automated metric agrees closely with mostly GPT-4 generated and human-labeled dataset of roughly 150 Dafny specifications for the popular MBPP code-generation benchmark, yet demonstrates cases where the human labeling is not perfect. We believe our work provides a stepping stone to enable the establishment of a benchmark and research agenda for the problem of user-intent formalization for programs.

Updated: 2024-06-14 06:52:08

标题: 评估LLM驱动的用户意图形式化在验证感知语言中的应用

摘要: 验证感知的编程语言，如Dafny和F*，提供了正式指定和证明程序属性的手段。虽然可以机械地定义检查实现是否符合规范的问题，但没有算法方式来确保用户意图的正确性形式化程序 - 即规范是否符合程序背后用户的意图。意图或要求以自然语言非正式表达，规范是一个正式的实物。大型语言模型（LLMs）的出现最近在弥合非正式意图和正式程序实现之间取得了进展，这在很大程度上是由于基准和自动化评估指标的推动。最近的工作提出评估{\it 用户意图形式化}问题的主流编程语言~\cite{endres-fse24}。然而，这种方法不容易扩展到支持丰富规范（包含量词和幽灵变量）的验证感知语言，这些规范不能通过动态执行进行评估。先前的工作还需要使用LLMs生成程序突变体来创建基准。我们主张采用{\it 符号测试规范}的替代方法，为评估验证感知语言的规范质量提供直观的度量标准。我们证明，我们的自动化度量与大多数由GPT-4生成和人工标记的约150个Dafny规范的MBPP代码生成基准数据集密切一致，但也显示出人工标记不完美的情况。我们相信我们的工作为解决程序用户意图形式化问题建立基准和研究议程提供了一个垫脚石。

更新时间: 2024-06-14 06:52:08

领域: cs.PL,cs.LG,cs.SE

下载: http://arxiv.org/abs/2406.09757v1

Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning

Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has become a major research focus due to its advantages in data acquisition and interpretability. However, current models often overlook collaboration, which affects not only impacts overall traffic efficiency but also hinders the vehicle's own normal driving in the long run. To address the aforementioned issue, this paper proposes a method named Mix Q-learning for Lane Changing(MQLC) that integrates a hybrid value Q network, taking into account both collective and individual benefits for the greater good. At the collective level, our method coordinates the individual Q and global Q networks by utilizing global information. This enables agents to effectively balance their individual interests with the collective benefit. At the individual level, we integrated a deep learning-based intent recognition module into our observation and enhanced the decision network. These changes provide agents with richer decision information and more accurate feature extraction for improved lane-changing decisions. This strategy enables the multi-agent system to learn and formulate optimal decision-making strategies effectively. Our MQLC model, through extensive experimental results, impressively outperforms other state-of-the-art multi-agent decision-making methods, achieving significantly safer and faster lane-changing decisions.

Updated: 2024-06-14 06:44:19

标题: 混合Q学习用于车道变更：多智体深度强化学习中的协作决策方法

摘要: 车道变换决策对于自动驾驶车辆路径规划至关重要，但由于基于规则的约束和有限数据，面临实际挑战。深度强化学习由于其在数据获取和可解释性方面的优势，已成为主要研究重点。然而，当前模型通常忽视协作，这不仅影响整体交通效率，而且从长远来看也会阻碍车辆自身的正常驾驶。为解决上述问题，本文提出了一种名为Mix Q-learning for Lane Changing（MQLC）的方法，该方法整合了混合值Q网络，考虑了整体和个体利益的平衡。在整体层面上，我们的方法通过利用全局信息协调个体Q和全局Q网络。这使得代理能够有效地平衡其个人利益和整体利益。在个体层面上，我们将基于深度学习的意图识别模块整合到我们的观察中，并增强了决策网络。这些改变为代理提供了更丰富的决策信息和更准确的特征提取，以改进车道变换决策。这种策略使多代理系统能够有效地学习和制定最佳决策策略。通过广泛的实验结果，我们的MQLC模型令人印象深刻地优于其他最先进的多代理决策方法，实现了显著更安全和更快的车道变换决策。

更新时间: 2024-06-14 06:44:19

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.09755v1

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability of characterizing the entire scene environment such as room geometry, material properties, and the spatial relation between the listener and sound source. To address these issues, we propose a novel Audio-Visual Gaussian Splatting (AV-GS) model. To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source. To make the visual scene model audio adaptive, we propose a point densification and pruning strategy to optimally distribute the Gaussian points, with the per-point contribution in sound propagation (e.g., more points needed for texture-less wall surfaces as they affect sound path diversion). Extensive experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.

Updated: 2024-06-14 06:38:50

标题: AV-GS：学习材料和几何感知先验知识用于新视角声学合成

摘要: 新颖的视角声学合成（NVAS）旨在在任何目标视点呈现双耳音频，给定一个在3D场景中发出的单声源音频。现有方法已经提出了基于NeRF的隐式模型，以利用视觉线索作为合成双耳音频的条件。然而，除了由于NeRF渲染过重而导致的低效率外，这些方法都具有对整个场景环境进行表征的能力有限，如房间几何形状、材料属性和听者与声源之间的空间关系。为了解决这些问题，我们提出了一种新颖的音频-视觉高斯点扩散（AV-GS）模型。为了获得一个具有材料感知和几何感知条件的音频合成，我们学习了一个显式的基于点的场景表示，通过在局部初始化的高斯点上加入音频引导参数，考虑听者和声源之间的空间关系。为了使视觉场景模型适应音频，我们提出了一种点密集化和修剪策略，以最佳地分布高斯点，在声音传播中每个点的贡献（例如，对于无纹理的墙面表面，需要更多点，因为它们会影响声音路径的偏离）。大量实验证实了我们的AV-GS在真实世界的RWAS和基于模拟的SoundSpaces数据集上优于现有的替代方案。

更新时间: 2024-06-14 06:38:50

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.08920v2

ControlVAR: Exploring Controllable Visual Autoregressive Modeling

Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs), especially in tasks like control-to-image generation. However, challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs. This paper introduces ControlVAR, a novel framework that explores pixel-level controls in visual autoregressive (VAR) modeling for flexible and efficient conditional generation. In contrast to traditional conditional models that learn the conditional distribution, ControlVAR jointly models the distribution of image and pixel-level conditions during training and imposes conditional controls during testing. To enhance the joint modeling, we adopt the next-scale AR prediction paradigm and unify control and image representations. A teacher-forcing guidance strategy is proposed to further facilitate controllable generation with joint modeling. Extensive experiments demonstrate the superior efficacy and flexibility of ControlVAR across various conditional generation tasks against popular conditional DMs, \eg, ControlNet and T2I-Adaptor.

Updated: 2024-06-14 06:35:33

标题: ControlVAR：探索可控的视觉自回归建模

摘要: 条件视觉生成在扩散模型（DMs）的出现下取得了显著进展，特别是在控制到图像生成等任务中。然而，诸如昂贵的计算成本、高推理延迟和与大型语言模型（LLMs）集成的困难等挑战已经促使人们探索DMs的替代方案。本文介绍了ControlVAR，这是一个探索在视觉自回归（VAR）建模中像素级控制以实现灵活和高效条件生成的新框架。与传统的学习条件分布的条件模型不同，ControlVAR在训练期间联合建模图像和像素级条件的分布，并在测试期间施加条件控制。为了增强联合建模，我们采用了下一尺度的AR预测范式，并统一了控制和图像表示。提出了一种教师引导策略，进一步促进了联合建模下可控生成的实现。大量实验证明了ControlVAR在各种条件生成任务中相对于流行的条件DMs（如ControlNet和T2I-Adaptor）具有卓越的功效和灵活性。

更新时间: 2024-06-14 06:35:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.09750v1

Self-Distilled Disentangled Learning for Counterfactual Prediction

The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenarios, especially within high-dimensional spaces. To circumvent this challenge, we propose the Self-Distilled Disentanglement framework, referred to as $SD^2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs for high-dimensional representations. Our comprehensive experiments, conducted on both synthetic and real-world datasets, confirms the effectiveness of our approach in facilitating counterfactual inference in the presence of both observed and unobserved confounders.

Updated: 2024-06-14 06:30:22

标题: 自主提炼的解缠学习用于反事实预测

摘要: 解缠结表示学习的进展显著提高了反事实预测的准确性，通过精确控制工具变量、混杂变量和可调变量。实现这些因素的独立分离的一种吸引人的方法是互信息最小化，这是一个在许多机器学习场景中提出挑战的任务，特别是在高维空间中。为了避免这一挑战，我们提出了基于信息论的自蒸馏解缠结框架，称为$SD^2$。它确保了理论上健全的独立解缠结表示，而不需要为高维表示设计复杂的互信息估计器。我们在合成和真实世界数据集上进行的全面实验证实了我们的方法在存在观察到和未观察到的混杂变量时促进反事实推断的有效性。

更新时间: 2024-06-14 06:30:22

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.05855v2

How Does Distribution Matching Help Domain Generalization: An Information-theoretic Analysis

Domain generalization aims to learn invariance across multiple training domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution matching. In this work, we formulate domain generalization from a novel probabilistic perspective, ensuring robustness while avoiding overly conservative solutions. Through comprehensive information-theoretic analysis, we provide key insights into the roles of gradient and representation matching in promoting generalization. Our results reveal the complementary relationship between these two components, indicating that existing works focusing solely on either gradient or representation alignment are insufficient to solve the domain generalization problem. In light of these theoretical findings, we introduce IDM to simultaneously align the inter-domain gradients and representations. Integrated with the proposed PDM method for complex distribution matching, IDM achieves superior performance over various baseline methods.

Updated: 2024-06-14 06:28:17

标题: 分布匹配如何帮助领域泛化：信息论分析

摘要: 域泛化旨在学习跨多个训练领域的不变性，从而增强对于超出分布数据的泛化能力。虽然梯度或表示匹配算法取得了显著成功，但这些方法通常缺乏泛化保证或依赖于强假设，从而在理解分布匹配的基本机制方面存在差距。在这项工作中，我们从一种新颖的概率视角制定了域泛化，确保了鲁棒性同时避免了过度保守的解决方案。通过全面的信息论分析，我们提供了关于梯度和表示匹配在促进泛化中的作用的关键见解。我们的结果揭示了这两个组件之间的互补关系，表明现有研究仅专注于梯度或表示对齐的工作不足以解决域泛化问题。鉴于这些理论发现，我们引入了IDM来同时对齐跨域梯度和表示。结合提出的用于复杂分布匹配的PDM方法，IDM在各种基线方法上实现了卓越的性能。

更新时间: 2024-06-14 06:28:17

领域: cs.LG

下载: http://arxiv.org/abs/2406.09745v1

Improved Crop and Weed Detection with Diverse Data Ensemble Learning

Modern agriculture heavily relies on Site-Specific Farm Management practices, necessitating accurate detection, localization, and quantification of crops and weeds in the field, which can be achieved using deep learning techniques. In this regard, crop and weed-specific binary segmentation models have shown promise. However, uncontrolled field conditions limit their performance from one field to the other. To improve semantic model generalization, existing methods augment and synthesize agricultural data to account for uncontrolled field conditions. However, given highly varied field conditions, these methods have limitations. To overcome the challenges of model deterioration in such conditions, we propose utilizing data specific to other crops and weeds for our specific target problem. To achieve this, we propose a novel ensemble framework. Our approach involves utilizing different crop and weed models trained on diverse datasets and employing a teacher-student configuration. By using homogeneous stacking of base models and a trainable meta-architecture to combine their outputs, we achieve significant improvements for Canola crops and Kochia weeds on unseen test data, surpassing the performance of single semantic segmentation models. We identify the UNET meta-architecture as the most effective in this context. Finally, through ablation studies, we demonstrate and validate the effectiveness of our proposed model. We observe that including base models trained on other target crops and weeds can help generalize the model to capture varied field conditions. Lastly, we propose two novel datasets with varied conditions for comparisons.

Updated: 2024-06-14 06:26:48

标题: 多样数据集成学习提高作物和杂草检测效果

摘要: 现代农业在很大程度上依赖于特定场地的农场管理实践，这需要对田间作物和杂草进行准确检测、定位和定量化，可以利用深度学习技术实现。在这方面，作物和杂草特定的二元分割模型表现出了潜力。然而，不受控制的田间条件限制了它们在不同田地中的性能。为了改进语义模型的泛化能力，现有方法通过增强和合成农业数据来考虑不受控制的田间条件。然而，由于田间条件的高度变化，这些方法存在局限性。为了克服这些条件下模型退化的挑战，我们提出利用针对其他作物和杂草的数据来解决我们的具体目标问题。为了实现这一目标，我们提出了一种新颖的集成框架。我们的方法包括利用在不同数据集上训练的不同作物和杂草模型，并采用师生配置。通过使用基本模型的同质堆叠和可训练的元架构来结合它们的输出，我们在未见测试数据上取得了显著的Canola作物和Kochia杂草的改进，超过了单一语义分割模型的性能。我们确定UNET元架构在这种情况下最有效。最后，通过消融研究，我们展示并验证了我们提出的模型的有效性。我们观察到，包括在其他目标作物和杂草上训练的基本模型可以帮助泛化模型以捕捉多样化的田间条件。最后，我们提出了两个具有不同条件的新颖数据集以进行比较。

更新时间: 2024-06-14 06:26:48

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.01055v3

Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks

In this paper, we present a novel method to significantly enhance the computational efficiency of Adaptive Spatial-Temporal Graph Neural Networks (ASTGNNs) by introducing the concept of the Graph Winning Ticket (GWT), derived from the Lottery Ticket Hypothesis (LTH). By adopting a pre-determined star topology as a GWT prior to training, we balance edge reduction with efficient information propagation, reducing computational demands while maintaining high model performance. Both the time and memory computational complexity of generating adaptive spatial-temporal graphs is significantly reduced from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$. Our approach streamlines the ASTGNN deployment by eliminating the need for exhaustive training, pruning, and retraining cycles, and demonstrates empirically across various datasets that it is possible to achieve comparable performance to full models with substantially lower computational costs. Specifically, our approach enables training ASTGNNs on the largest scale spatial-temporal dataset using a single A6000 equipped with 48 GB of memory, overcoming the out-of-memory issue encountered during original training and even achieving state-of-the-art performance. Furthermore, we delve into the effectiveness of the GWT from the perspective of spectral graph theory, providing substantial theoretical support. This advancement not only proves the existence of efficient sub-networks within ASTGNNs but also broadens the applicability of the LTH in resource-constrained settings, marking a significant step forward in the field of graph neural networks. Code is available at https://anonymous.4open.science/r/paper-1430.

Updated: 2024-06-14 06:25:36

标题: 自适应时空图神经网络中图获胜票的预训练识别

摘要: 本文介绍了一种新颖的方法，通过引入图中获胜的票（GWT）的概念，从而显著增强自适应空间-时间图神经网络（ASTGNNs）的计算效率，该概念源自于抽奖票假设（LTH）。通过在训练之前采用预先确定的星形拓扑作为GWT，我们在减少计算需求的同时保持高模型性能，平衡了边缘减少和有效信息传播。生成自适应空间-时间图的时间和内存计算复杂性从$\mathcal{O}(N^2)$显著减少到$\mathcal{O}(N)$。我们的方法通过消除需要详尽的训练、修剪和重新训练周期，简化了ASTGNN的部署，并在各种数据集上经验证明，可以以大大降低的计算成本实现与完整模型可比的性能。具体而言，我们的方法使得使用配备48 GB内存的单个A6000在最大规模的空间-时间数据集上训练ASTGNN成为可能，克服了原始训练中遇到的内存不足问题，并甚至实现了最先进的性能。此外，我们从谱图理论的角度探讨了GWT的有效性，提供了实质性的理论支持。这一进步不仅证明了ASTGNN中存在有效的子网络，还拓宽了LTH在资源受限环境中的适用范围，标志着图神经网络领域的一大进步。代码可在https://anonymous.4open.science/r/paper-1430获取。

更新时间: 2024-06-14 06:25:36

领域: cs.LG

下载: http://arxiv.org/abs/2406.08287v2

Deep Symbolic Optimization for Combinatorial Optimization: Accelerating Node Selection by Discovering Potential Heuristics

Combinatorial optimization (CO) is one of the most fundamental mathematical models in real-world applications. Traditional CO solvers, such as Branch-and-Bound (B&B) solvers, heavily rely on expert-designed heuristics, which are reliable but require substantial manual tuning. Recent studies have leveraged deep learning (DL) models as an alternative to capture rich feature patterns for improved performance on GPU machines. Nonetheless, the drawbacks of high training and inference costs, as well as limited interpretability, severely hinder the adoption of DL methods in real-world applications. To address these challenges, we propose a novel deep symbolic optimization learning framework that combines their advantages. Specifically, we focus on the node selection module within B&B solvers -- namely, deep symbolic optimization for node selection (Dso4NS). With data-driven approaches, Dso4NS guides the search for mathematical expressions within the high-dimensional discrete symbolic space and then incorporates the highest-performing mathematical expressions into a solver. The data-driven model captures the rich feature information in the input data and generates symbolic expressions, while the expressions deployed in solvers enable fast inference with high interpretability. Experiments demonstrate the effectiveness of Dso4NS in learning high-quality expressions, outperforming existing approaches on a CPU machine. Encouragingly, the learned CPU-based policies consistently achieve performance comparable to state-of-the-art GPU-based approaches.

Updated: 2024-06-14 06:02:14

标题: 深度符号优化用于组合优化：通过发现潜在启发式加速节点选择

摘要: 组合优化（CO）是现实世界应用中最基本的数学模型之一。传统的CO求解器，如分支定界（B&B）求解器，严重依赖专家设计的启发式方法，这些方法可靠但需要大量手动调整。最近的研究利用深度学习（DL）模型作为一种替代方法，以捕获丰富的特征模式，从而提高在GPU机器上的性能。然而，高训练和推理成本以及有限的可解释性的缺点严重阻碍了DL方法在现实世界应用中的采用。为了解决这些挑战，我们提出了一种结合了它们优势的新型深层符号优化学习框架。具体而言，我们关注B&B求解器内的节点选择模块，即节点选择的深层符号优化（Dso4NS）。通过数据驱动方法，Dso4NS引导在高维离散符号空间内搜索数学表达式，然后将表现最好的数学表达式整合到求解器中。数据驱动模型捕获输入数据中的丰富特征信息并生成符号表达式，而在求解器中部署的表达式实现了高速推理和高可解释性。实验证明了Dso4NS在学习高质量表达式方面的有效性，在CPU机器上优于现有方法。令人鼓舞的是，所学的基于CPU的策略始终达到了与最先进的基于GPU方法相媲美的性能。

更新时间: 2024-06-14 06:02:14

领域: cs.LG

下载: http://arxiv.org/abs/2406.09740v1

Self-Play Preference Optimization for Language Model Alignment

Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed Self-Play Preference Optimization (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys a theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Starting from a stronger base model Llama-3-8B-Instruct, we are able to achieve a length-controlled win rate of 38.77%. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models. Codes are available at https://github.com/uclaml/SPPO.

Updated: 2024-06-14 05:57:01

标题: 自我对弈偏好优化用于语言模型对齐

摘要: 传统的强化学习从人类反馈（RLHF）方法依赖于像Bradley-Terry模型这样的参数模型，无法捕捉人类偏好中的不传递性和非理性。最近的进展表明，直接使用偏好概率可以更准确地反映人类偏好，实现更灵活和准确的语言模型对齐。在本文中，我们提出了一种基于自我对弈的语言模型对齐方法，将问题视为一个旨在确定纳什均衡策略的常和两人游戏。我们的方法，被称为自我对弈偏好优化（SPPO），通过迭代策略更新来近似纳什均衡，并享有理论收敛保证。我们的方法可以有效地增加所选响应的对数似然性，并减少被拒绝响应的对数似然性，这是对称配对损失（如直接偏好优化（DPO）和身份偏好优化（IPO））无法轻易实现的。在我们的实验中，仅使用来自UltraFeedback数据集的60k个提示（没有响应），并且没有任何提示增强，通过利用一个仅包含0.4B参数的预训练偏好模型PairRM，SPPO可以获得从精细调整Mistral-7B-Instruct-v0.2的模型，在AlpacaEval 2.0上实现28.53％的与GPT-4-Turbo的最新长度控制胜率。它还在MT-Bench和Open LLM排行榜上击败了（迭代的）DPO和IPO。从更强的基础模型Llama-3-8B-Instruct开始，我们能够实现38.77％的长度控制胜率。值得注意的是，SPPO的良好表现是在没有来自GPT-4或其他更强大语言模型的额外外部监督（例如响应，偏好等）的情况下实现的。代码可在https://github.com/uclaml/SPPO找到。

更新时间: 2024-06-14 05:57:01

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2405.00675v4

Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs

Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point formats(FP8) in the context of PTQ for model inference. However, floating-point formats smaller than 8 bits and their relative comparison in terms of accuracy-hardware cost with integers remains unexplored on FPGAs. In this work, we present minifloats, which are reduced-precision floating-point formats capable of further reducing the memory footprint, latency, and energy cost of a model while approaching full-precision model accuracy. We implement a custom FPGA-based multiply-accumulate operator library and explore the vast design space, comparing minifloat and integer representations across 3 to 8 bits for both weights and activations. We also examine the applicability of various integerbased quantization techniques to minifloats. Our experiments show that minifloats offer a promising alternative for emerging workloads such as vision transformers.

Updated: 2024-06-14 05:39:24

标题: 摘掉比特：在FPGA上利用小浮点数扩展量化的边界

摘要: 后训练量化（PTQ）是一种强大的模型压缩技术，可以在神经网络中降低数值精度而无需额外的训练开销。最近的研究探讨了在PTQ环境下采用8位浮点格式（FP8）进行模型推理。然而，小于8位的浮点格式及其在准确性-硬件成本方面与整数的相对比较在FPGAs上尚未被探索。在这项工作中，我们提出了minifloats，这是一种降低精度的浮点格式，能够进一步减少模型的内存占用、延迟和能耗，同时接近完整精度的模型准确性。我们实现了一个基于FPGA的自定义乘累加运算符库，并探索了广阔的设计空间，比较了3到8位的权重和激活的minifloat和整数表示。我们还研究了各种基于整数的量化技术对minifloats的适用性。我们的实验表明，minifloats为视觉变换器等新兴工作负载提供了一种有前途的替代方案。

更新时间: 2024-06-14 05:39:24

领域: cs.CV,cs.AI,cs.AR,cs.LG,cs.PF

下载: http://arxiv.org/abs/2311.12359v2

When Will Gradient Regularization Be Harmful?

Gradient regularization (GR), which aims to penalize the gradient norm atop the loss function, has shown promising results in training modern over-parameterized deep neural networks. However, can we trust this powerful technique? This paper reveals that GR can cause performance degeneration in adaptive optimization scenarios, particularly with learning rate warmup. Our empirical and theoretical analyses suggest this is due to GR inducing instability and divergence in gradient statistics of adaptive optimizers at the initial training stage. Inspired by the warmup heuristic, we propose three GR warmup strategies, each relaxing the regularization effect to a certain extent during the warmup course to ensure the accurate and stable accumulation of gradients. With experiments on Vision Transformer family, we confirm the three GR warmup strategies can effectively circumvent these issues, thereby largely improving the model performance. Meanwhile, we note that scalable models tend to rely more on the GR warmup, where the performance can be improved by up to 3\% on Cifar10 compared to baseline GR. Code is available at \href{https://github.com/zhaoyang-0204/gnp}{https://github.com/zhaoyang-0204/gnp}.

Updated: 2024-06-14 05:17:39

标题: 梯度正则化何时会有害？

摘要: 梯度正则化（GR）旨在惩罚损失函数之上的梯度范数，在训练现代超参数化深度神经网络方面已经显示出了良好的结果。然而，我们能够信任这种强大的技术吗？本文揭示了在自适应优化场景中，特别是在学习率预热时，GR可能导致性能下降。我们的实证和理论分析表明，这是由于GR在初始训练阶段引起自适应优化器梯度统计的不稳定性和发散性。受预热启发，我们提出了三种GR预热策略，每一种在预热过程中都会在一定程度上放松正则化效果，以确保梯度的准确稳定累积。通过对Vision Transformer家族的实验，我们确认这三种GR预热策略可以有效规避这些问题，从而大大提高模型性能。与此同时，我们注意到可扩展模型更倾向于依赖GR预热，与基准GR相比，Cifar10的性能可以提高多达3％。代码可在\href{https://github.com/zhaoyang-0204/gnp}{https://github.com/zhaoyang-0204/gnp}上找到。

更新时间: 2024-06-14 05:17:39

领域: cs.LG,cs.AI,55N31,I.4.0

下载: http://arxiv.org/abs/2406.09723v1

Cross-view geo-localization: a survey

Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learning strategies. Feature-based methods capitalize on unique features to establish correspondences across disparate viewpoints, whereas deep learning-based methodologies deploy convolutional neural networks to embed view-invariant attributes. This work also delineates the multifaceted challenges encountered in cross-view geo-localization, such as variations in viewpoints and illumination, the occurrence of occlusions, and it elucidates innovative solutions that have been formulated to tackle these issues. Furthermore, we delineate benchmark datasets and relevant evaluation metrics, and also perform a comparative analysis of state-of-the-art techniques. Finally, we conclude the paper with a discussion on prospective avenues for future research and the burgeoning applications of cross-view geo-localization in an intricately interconnected global landscape.

Updated: 2024-06-14 05:14:54

标题: 跨视角地理定位：综述

摘要: 跨视图地理定位在计算机视觉领域引起了显着关注，受到大量地理标记数据集的普遍可用性和机器学习技术的进步的推动。本文全面调查了与该领域密切相关的最新方法、技术和相关挑战，重点关注基于特征和深度学习策略。基于特征的方法利用独特特征建立不同视角之间的对应关系，而基于深度学习的方法利用卷积神经网络嵌入视图不变属性。本文还详细描述了跨视图地理定位中遇到的多方面挑战，如视角和照明的变化，遮挡的发生，并阐明了为解决这些问题制定的创新解决方案。此外，我们界定了基准数据集和相关评估指标，并进行了最新技术的比较分析。最后，我们通过讨论未来研究的前景和跨视图地理定位在一个错综复杂的全球景观中日益增长的应用，总结了本文。

更新时间: 2024-06-14 05:14:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.09722v1

Self-Knowledge Distillation for Learning Ambiguity

Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently predicting a single label without consideration for its correctness. To address this issue, we propose a novel self-knowledge distillation method that enables models to learn label distributions more accurately by leveraging knowledge distilled from their lower layers. This approach also includes a learning phase that re-calibrates the unnecessarily strengthened confidence for training samples judged as extremely ambiguous based on the distilled distribution knowledge. We validate our method on diverse NLU benchmark datasets and the experimental results demonstrate its effectiveness in producing better label distributions. Particularly, through the process of re-calibrating the confidence for highly ambiguous samples, the issue of over-confidence when predictions for unseen samples do not match with their ground-truth labels has been significantly alleviated. This has been shown to contribute to generating better distributions than the existing state-of-the-art method. Moreover, our method is more efficient in training the models compared to the existing method, as it does not involve additional training processes to refine label distributions.

Updated: 2024-06-14 05:11:32

标题: 自我知识蒸馏用于学习模糊性

摘要: 最近的语言模型在自然语言理解（NLU）任务上表现出了出色的性能。然而，当面对可以被解释为多种方式的模糊样本时，它们往往表现不佳，过于自信地预测单一标签，而不考虑其正确性。为了解决这个问题，我们提出了一种新颖的自我知识蒸馏方法，通过利用从较低层次蒸馏出的知识，使模型更准确地学习标签分布。这种方法还包括一个学习阶段，重新校准基于蒸馏分布知识判断为极其模糊的训练样本的过度加强的信心。我们在各种NLU基准数据集上验证了我们的方法，实验结果表明它在产生更好的标签分布方面的有效性。特别是，通过重新校准高度模糊样本的信心，当对未知样本的预测与其实际标签不匹配时过度自信的问题得到了显著缓解。这已经被证明有助于生成比现有最先进方法更好的分布。此外，与现有方法相比，我们的方法在训练模型方面更有效率，因为它不涉及额外的训练过程来提炼标签分布。

更新时间: 2024-06-14 05:11:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.09719v1

Large-scale Dataset Pruning with Dynamic Uncertainty

The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this paper, we investigate how to prune the large-scale datasets, and thus produce an informative subset for training sophisticated deep models with negligible performance drop. We propose a simple yet effective dataset pruning method by exploring both the prediction uncertainty and training dynamics. We study dataset pruning by measuring the variation of predictions during the whole training process on large-scale datasets, i.e., ImageNet-1K and ImageNet-21K, and advanced models, i.e., Swin Transformer and ConvNeXt. Extensive experimental results indicate that our method outperforms the state of the art and achieves 25% lossless pruning ratio on both ImageNet-1K and ImageNet-21K. The code and pruned datasets are available at https://github.com/BAAI-DCAI/Dataset-Pruning.

Updated: 2024-06-14 05:10:07

标题: 大规模数据集修剪与动态不确定性

摘要: 许多学习任务的最新技术，例如图像分类，通过收集更大的数据集然后在其上训练更大的模型来推动。结果，不断增加的计算成本已经变得难以承受。在本文中，我们研究如何对大规模数据集进行修剪，从而为训练复杂的深度模型生成一个信息丰富的子集，而性能下降可以忽略不计。我们提出了一种简单但有效的数据集修剪方法，通过探索预测不确定性和训练动态。我们通过测量在大规模数据集上整个训练过程中预测的变化来研究数据集修剪，即ImageNet-1K和ImageNet-21K，以及先进的模型，即Swin Transformer和ConvNeXt。广泛的实验结果表明，我们的方法优于最新技术，并在ImageNet-1K和ImageNet-21K上实现了25%的无损修剪比率。代码和修剪后的数据集可在https://github.com/BAAI-DCAI/Dataset-Pruning上获取。

更新时间: 2024-06-14 05:10:07

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2306.05175v3

Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent

We propose HyperAgent, a reinforcement learning (RL) algorithm based on the hypermodel framework for exploration in RL. HyperAgent allows for the efficient incremental approximation of posteriors associated with an optimal action-value function ($Q^\star$) without the need for conjugacy and follows the greedy policies w.r.t. these approximate posterior samples. We demonstrate that HyperAgent offers robust performance in large-scale deep RL benchmarks. It can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in the Atari suite. Implementing HyperAgent requires minimal code addition to well-established deep RL frameworks like DQN. We theoretically prove that, under tabular assumptions, HyperAgent achieves logarithmic per-step computational complexity while attaining sublinear regret, matching the best known randomized tabular RL algorithm.

Updated: 2024-06-14 04:51:07

标题: Q-Star遇上可扩展的后验抽样：通过HyperAgent桥接理论与实践

摘要: 我们提出了HyperAgent，这是一种基于超模型框架的强化学习（RL）算法，用于RL中的探索。HyperAgent允许对与最优动作值函数（$Q^\star$）相关的后验进行高效增量逼近，而无需共轭，并遵循关于这些近似后验样本的贪婪策略。我们证明HyperAgent在大规模深度RL基准测试中表现稳健。它可以解决具有优化问题大小比例的Deep Sea难度探索问题，并在Atari套件中表现出显着的效率提升。实现HyperAgent只需向DQN等成熟的深度RL框架添加最少的代码。我们在表格假设下理论上证明，HyperAgent实现了对数级别的每步计算复杂度，同时实现了亚线性的后悔，与最佳已知的随机化表格RL算法相匹配。

更新时间: 2024-06-14 04:51:07

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.10228v5

Speed-up of Data Analysis with Kernel Trick in Encrypted Domain

Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performance in ML/STAT algorithms within encrypted domains. This technique, independent of underlying HE mechanisms and complementing existing optimizations, notably reduces costly HE multiplications, offering near constant time complexity relative to data dimension. Aimed at accessibility, this method is tailored for data scientists and developers with limited cryptography background, facilitating advanced data analysis in secure environments.

Updated: 2024-06-14 04:49:40

标题: 在加密领域中利用核技巧加快数据分析速度

摘要: 同态加密（HE）对于在加密数据上进行安全计算至关重要，对于隐私保护数据分析至关重要。然而，在HE中高效处理高维数据，特别是对于机器学习和统计（ML/STAT）算法，提出了挑战。本文提出了一种使用核方法的有效加速方法，用于HE方案，在加密域内增强ML/STAT算法的时间性能。这种技术独立于底层HE机制，并补充现有的优化，显著减少昂贵的HE乘法，相对于数据维度提供接近恒定的时间复杂度。针对可访问性，这种方法专为具有有限密码学背景的数据科学家和开发人员定制，促进在安全环境中进行高级数据分析。

更新时间: 2024-06-14 04:49:40

领域: cs.CR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.09716v1

Large language model validity via enhanced conformal prediction methods

We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a threshold calibrated via split conformal prediction. Existing methods in this area suffer from two deficiencies. First, the guarantee stated is not conditionally valid. The trustworthiness of the filtering step may vary based on the topic of the response. Second, because the scoring function is imperfect, the filtering step can remove many valuable and accurate claims. We address both of these challenges via two new conformal methods. First, we generalize the conditional conformal procedure of Gibbs et al. (2023) in order to adaptively issue weaker guarantees when they are required to preserve the utility of the output. Second, we show how to systematically improve the quality of the scoring function via a novel algorithm for differentiating through the conditional conformal procedure. We demonstrate the efficacy of our approach on both synthetic and real-world datasets.

Updated: 2024-06-14 04:46:39

标题: 通过增强的依从性预测方法验证大型语言模型的有效性

摘要: 我们开发了新的符合推理方法，用于获取大型语言模型（LLMs）输出的有效性保证。先前的符合语言建模工作确定了一个子集，该子集满足高概率的正确性保证。这些方法通过在声明上评估的评分函数未能超过通过分裂符合预测校准的阈值时，从LLM的原始响应中过滤声明来实现。该领域现有的方法存在两个缺陷。首先，所述的保证不是有条件有效的。过滤步骤的可信度可能根据响应的主题而变化。其次，由于评分函数不完美，过滤步骤可能会删除许多有价值且准确的声明。我们通过两种新的符合方法来解决这两个挑战。首先，我们推广了Gibbs等人（2023）的条件符合程序，以便在需要时自适应地发出较弱的保证，以保留输出的效用。其次，我们展示了如何通过一种新的算法，通过条件符合程序进行微分，系统地改善评分函数的质量。我们在合成和真实数据集上展示了我们方法的有效性。

更新时间: 2024-06-14 04:46:39

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.09714v1

Meta-Learning Loss Functions for Deep Neural Networks

Humans can often quickly and efficiently solve complex new learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully.

Updated: 2024-06-14 04:46:14

标题: 深度神经网络的元学习损失函数

摘要: 人类通常可以在仅有少量示例的情况下迅速高效地解决复杂的新学习任务。相比之下，现代人工智能系统通常需要成千上万甚至数百万观察才能解决甚至最基本的任务。元学习旨在通过利用类似学习任务的过去经验，将适当的归纳偏差嵌入到学习系统中，从而解决这一问题。历史上，针对元学习组件（如优化器、参数初始化等）的方法已经导致了显著的性能提升。本文旨在通过通常被忽视的损失函数组件，探讨元学习概念以提高性能。损失函数是学习系统的重要组成部分，因为它代表了主要的学习目标，系统成功的能力和量化都取决于其成功优化该目标的能力。

更新时间: 2024-06-14 04:46:14

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.09713v1

Fine-Grained Urban Flow Inference with Multi-scale Representation Learning

Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the interactions and dynamic information between different-scale regions within the city. Different-scale geographical features can capture redundant information from the same spatial areas. In order to effectively learn multi-scale information across time and space, we propose an effective fine-grained urban flow inference model called UrbanMSR, which uses self-supervised contrastive learning to obtain dynamic multi-scale representations of neighborhood-level and city-level geographic information, and fuses multi-scale representations to improve fine-grained accuracy. The fusion of multi-scale representations enhances fine-grained. We validate the performance through extensive experiments on three real-world datasets. The resutls compared with state-of-the-art methods demonstrate the superiority of the proposed model.

Updated: 2024-06-14 04:42:29

标题: 多尺度表征学习下的细粒度城市流量推断

摘要: Fine-grained urban flow inference (FUFI)是一项旨在提高交通效率和安全性的关键交通服务。FUFI可以仅基于观测到的粗粒度数据推断城市交通流量的细粒度信息。然而，大多数现有方法侧重于单一尺度静态地理信息对FUFI的影响，忽略了城市内不同尺度区域之间的相互作用和动态信息。不同尺度的地理特征可以捕捉来自同一空间区域的冗余信息。为了有效地学习跨时间和空间的多尺度信息，我们提出了一种名为UrbanMSR的有效的细粒度城市流量推断模型，该模型使用自监督对比学习来获取邻域级和城市级地理信息的动态多尺度表示，并融合多尺度表示以提高细粒度准确性。多尺度表示的融合增强了细粒度。我们通过对三个真实世界数据集的大量实验验证了模型的性能。与最先进的方法相比，实验结果表明了所提出模型的优越性。

更新时间: 2024-06-14 04:42:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.09710v1

Fredformer: Frequency Debiased Transformer for Time Series Forecasting

The Transformer model has shown leading performance in time series forecasting. Nevertheless, in some complex scenarios, it tends to learn low-frequency features in the data and overlook high-frequency features, showing a frequency bias. This bias prevents the model from accurately capturing important high-frequency data features. In this paper, we undertook empirical analyses to understand this bias and discovered that frequency bias results from the model disproportionately focusing on frequency features with higher energy. Based on our analysis, we formulate this bias and propose Fredformer, a Transformer-based framework designed to mitigate frequency bias by learning features equally across different frequency bands. This approach prevents the model from overlooking lower amplitude features important for accurate forecasting. Extensive experiments show the effectiveness of our proposed approach, which can outperform other baselines in different real-world time-series datasets. Furthermore, we introduce a lightweight variant of the Fredformer with an attention matrix approximation, which achieves comparable performance but with much fewer parameters and lower computation costs. The code is available at: https://github.com/chenzRG/Fredformer

Updated: 2024-06-14 04:41:22

标题: Fredformer：用于时间序列预测的频率去偏置Transformer

摘要: Transformer模型在时间序列预测中表现出色。然而，在一些复杂场景中，它往往会学习数据中的低频特征，忽略高频特征，呈现出频率偏差。这种偏差阻碍了模型准确捕捉重要的高频数据特征。本文通过实证分析来理解这种偏差，并发现频率偏差是由模型不成比例地关注具有更高能量的频率特征导致的。基于我们的分析，我们规范了这种偏差，并提出了Fredformer，这是一个基于Transformer的框架，旨在通过在不同频段均匀学习特征来缓解频率偏差。这种方法可以防止模型忽略对准确预测至关重要的低幅度特征。大量实验证明了我们提出的方法的有效性，它可以在不同的真实世界时间序列数据集中胜过其他基线模型。此外，我们还介绍了Fredformer的轻量级变体，通过注意力矩阵的近似实现了可比较的性能，但参数更少，计算成本更低。代码可在以下链接获取：https://github.com/chenzRG/Fredformer

更新时间: 2024-06-14 04:41:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09009v2

Parameter-Efficient Active Learning for Foundational models

Foundational vision transformer models have shown impressive few shot performance on many vision tasks. This research presents a novel investigation into the application of parameter efficient fine-tuning methods within an active learning (AL) framework, to advance the sampling selection process in extremely budget constrained classification tasks. The focus on image datasets, known for their out-of-distribution characteristics, adds a layer of complexity and relevance to our study. Through a detailed evaluation, we illustrate the improved AL performance on these challenging datasets, highlighting the strategic advantage of merging parameter efficient fine tuning methods with foundation models. This contributes to the broader discourse on optimizing AL strategies, presenting a promising avenue for future exploration in leveraging foundation models for efficient and effective data annotation in specialized domains.

Updated: 2024-06-14 04:40:09

标题: 基于参数高效的基础模型主动学习

摘要: 基础视觉变换模型在许多视觉任务上展现出令人印象深刻的少样本性能。本研究提出了一项新颖的研究，探讨了参数高效微调方法在主动学习（AL）框架中的应用，以推进在极度预算受限的分类任务中的采样选择过程。专注于图像数据集，这些数据集以其分布外特征而闻名，为我们的研究增添了一层复杂性和相关性。通过详细评估，我们展示了在这些具有挑战性的数据集上改进的AL性能，突出了将参数高效微调方法与基础模型相结合的战略优势。这有助于更广泛的优化AL策略的讨论，并为未来在专业领域利用基础模型进行高效有效数据标注的探索提供了一个有前景的途径。

更新时间: 2024-06-14 04:40:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.09296v2

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at https://github.com/Sweetaroo/VulDetectBench.

Updated: 2024-06-14 04:36:42

标题: VulDetectBench: 使用大语言模型评估漏洞检测的深度能力

摘要: 大型语言模型（LLMs）具有包含大量程序代码的训练语料库，极大地提升了模型对代码的理解和生成能力。然而，对于检测程序漏洞这一与代码相关的更具体任务进行充分综合的研究，以及评估LLMs在这种更专业场景下的表现仍然缺乏。为了解决漏洞分析中的常见挑战，我们的研究引入了一个新的基准测试，VulDetectBench，专门设计用于评估LLMs的漏洞检测能力。该基准测试全面评估了LLMs通过五项不断增加难度的任务来识别、分类和定位漏洞的能力。我们评估了17个模型（包括开源和闭源），发现现有模型在与漏洞识别和分类相关的任务上可以实现超过80%的准确率，但在特定、更详细的漏洞分析任务上仍然不足，准确率不到30%，这使得难以为专业漏洞挖掘提供有价值的辅助信息。我们的基准测试有效地评估了不同级别的LLMs在漏洞检测这一特定任务中的能力，为未来在代码安全这一关键领域的研究和改进提供了基础。VulDetectBench公开可用于https://github.com/Sweetaroo/VulDetectBench。

更新时间: 2024-06-14 04:36:42

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.07595v2

L^2GC:Lorentzian Linear Graph Convolutional Networks for Node Classification

Linear Graph Convolutional Networks (GCNs) are used to classify the node in the graph data. However, we note that most existing linear GCN models perform neural network operations in Euclidean space, which do not explicitly capture the tree-like hierarchical structure exhibited in real-world datasets that modeled as graphs. In this paper, we attempt to introduce hyperbolic space into linear GCN and propose a novel framework for Lorentzian linear GCN. Specifically, we map the learned features of graph nodes into hyperbolic space, and then perform a Lorentzian linear feature transformation to capture the underlying tree-like structure of data. Experimental results on standard citation networks datasets with semi-supervised learning show that our approach yields new state-of-the-art results of accuracy 74.7$\%$ on Citeseer and 81.3$\%$ on PubMed datasets. Furthermore, we observe that our approach can be trained up to two orders of magnitude faster than other nonlinear GCN models on PubMed dataset. Our code is publicly available at https://github.com/llqy123/LLGC-master.

Updated: 2024-06-14 04:15:20

标题: L^2GC：用于节点分类的洛伦兹线性图卷积网络

摘要: 线性图卷积网络（GCNs）被用于对图数据中的节点进行分类。然而，我们注意到大多数现有的线性GCN模型在欧几里得空间中执行神经网络操作，这并没有明确捕捉到在被建模为图形的真实世界数据集中所展示的类似树状的分层结构。在本文中，我们尝试将双曲空间引入线性GCN，并提出了一个新颖的洛伦兹线性GCN框架。具体地，我们将图节点的学习特征映射到双曲空间中，然后执行一个洛伦兹线性特征转换来捕捉数据的潜在树状结构。在标准引文网络数据集上进行的半监督学习实验结果表明，我们的方法在Citeseer数据集上达到了74.7％的准确率，并在PubMed数据集上达到了81.3％的准确率，创造了新的最先进结果。此外，我们观察到我们的方法在PubMed数据集上的训练速度比其他非线性GCN模型快两个数量级。我们的代码公开可用于https://github.com/llqy123/LLGC-master。

更新时间: 2024-06-14 04:15:20

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.06064v3

Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness

Prior neural architecture search (NAS) for adversarial robustness works have discovered that a lightweight and adversarially robust neural network architecture could exist in a non-robust large teacher network, generally disclosed by heuristic rules through statistical analysis and neural architecture search, generally disclosed by heuristic rules from neural architecture search. However, heuristic methods cannot uniformly handle different adversarial attacks and "teacher" network capacity. To solve this challenge, we propose a Reinforced Compressive Neural Architecture Search (RC-NAS) for Versatile Adversarial Robustness. Specifically, we define task settings that compose datasets, adversarial attacks, and teacher network information. Given diverse tasks, we conduct a novel dual-level training paradigm that consists of a meta-training and a fine-tuning phase to effectively expose the RL agent to diverse attack scenarios (in meta-training), and making it adapt quickly to locate a sub-network (in fine-tuning) for any previously unseen scenarios. Experiments show that our framework could achieve adaptive compression towards different initial teacher networks, datasets, and adversarial attacks, resulting in more lightweight and adversarially robust architectures.

Updated: 2024-06-14 03:59:05

标题: 增强的压缩神经架构搜索用于多功能对抗鲁棒性

摘要: 之前的神经架构搜索（NAS）用于对抗性鲁棒性的研究发现，在一个非鲁棒的大型教师网络中可能存在一个轻量级且具有对抗性鲁棒性的神经网络架构，通常通过统计分析和神经架构搜索揭示，通常通过神经架构搜索的启发式规则揭示。然而，启发式方法不能统一处理不同的对抗性攻击和“教师”网络容量。为了解决这一挑战，我们提出了一种用于多功能对抗性鲁棒性的强化压缩神经架构搜索（RC-NAS）。具体而言，我们定义了包括数据集、对抗性攻击和教师网络信息在内的任务设置。在给定多样化的任务的情况下，我们进行了一种新颖的双层训练范式，包括元训练和微调阶段，以有效地将RL代理暴露于不同攻击场景（在元训练中），并使其能够迅速适应以定位一个子网络（在微调中）以应对任何之前未见过的场景。实验证明，我们的框架可以实现对不同初始教师网络、数据集和对抗性攻击的自适应压缩，从而产生更轻量级和具有对抗性鲁棒性的架构。

更新时间: 2024-06-14 03:59:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06792v2

An Efficient Approach to Regression Problems with Tensor Neural Networks

This paper introduces a tensor neural network (TNN) to address nonparametric regression problems. Characterized by its distinct sub-network structure, the TNN effectively facilitates variable separation, thereby enhancing the approximation of complex, unknown functions. Our comparative analysis reveals that the TNN outperforms conventional Feed-Forward Networks (FFN) and Radial Basis Function Networks (RBN) in terms of both approximation accuracy and generalization potential, despite a similar scale of parameters. A key innovation of our approach is the integration of statistical regression and numerical integration within the TNN framework. This integration allows for the efficient computation of high-dimensional integrals associated with the regression function. The implications of this advancement extend to a broader range of applications, particularly in scenarios demanding precise high-dimensional data analysis and prediction.

Updated: 2024-06-14 03:38:40

标题: 一种高效的张量神经网络在回归问题中的应用方式

摘要: 本文介绍了一种张量神经网络（TNN），用于解决非参数回归问题。TNN以其独特的子网络结构为特征，有效促进变量分离，从而增强对复杂未知函数的逼近。我们的比较分析表明，尽管参数规模相似，TNN在逼近精度和泛化潜力方面均优于传统的前馈网络（FFN）和径向基函数网络（RBN）。我们方法的一个关键创新是将统计回归和数值积分集成到TNN框架中。这种集成允许对与回归函数相关的高维积分进行高效计算。这一进展的影响延伸到更广泛的应用领域，特别是在需要精确高维数据分析和预测的场景中。

更新时间: 2024-06-14 03:38:40

领域: stat.ML,cs.LG,62J02, 68T05

下载: http://arxiv.org/abs/2406.09694v1

Enhancing multimodal cooperation via sample-level modality valuation

One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However most models often suffer from unsatisfactory multimodal cooperation which cannot jointly utilize all modalities well. Some methods are proposed to identify and enhance the worse learnt modality but they are often hard to provide the fine-grained observation of multimodal cooperation at sample-level with theoretical support. Hence it is essential to reasonably observe and improve the fine-grained cooperation between modalities especially when facing realistic scenarios where the modality discrepancy could vary across different samples. To this end we introduce a sample-level modality valuation metric to evaluate the contribution of each modality for each sample. Via modality valuation we observe that modality discrepancy indeed could be different at sample-level beyond the global contribution discrepancy at dataset-level. We further analyze this issue and improve cooperation between modalities at sample-level by enhancing the discriminative ability of low-contributing modalities in a targeted manner. Overall our methods reasonably observe the fine-grained uni-modal contribution and achieve considerable improvement. The source code and dataset are available at https://github.com/GeWu-Lab/Valuate-and-Enhance-Multimodal-Cooperation.

Updated: 2024-06-14 03:37:46

标题: 通过样本级模态评估增强多模态协作

摘要: 多模态学习的一个主要话题是如何联合整合来自不同模态的异质信息。然而，大多数模型往往受到不理想的多模态合作的影响，无法很好地共同利用所有模态。一些方法被提出来识别和增强学习效果较差的模态，但往往难以在样本级提供多模态合作的细粒度观察，并得到理论支持。因此，在面对现实情景时，尤其是在模态差异可能在不同样本间变化时，合理观察和改进模态之间的细粒度合作至关重要。为此，我们引入了一个样本级的模态估值度量来评估每个样本中每个模态的贡献。通过模态估值，我们发现模态差异确实在样本级别可能有所不同，超出了数据集级别的全局贡献差异。我们进一步分析了这个问题，并通过有针对性地增强低贡献模态的区分能力来改善样本级别的模态合作。总体而言，我们的方法合理观察了细粒度的单模态贡献，并取得了显著的改进。源代码和数据集可以在https://github.com/GeWu-Lab/Valuate-and-Enhance-Multimodal-Cooperation 上找到。

更新时间: 2024-06-14 03:37:46

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2309.06255v4

GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives

Human annotation plays a core role in machine learning -- annotations for supervised models, safety guardrails for generative models, and human feedback for reinforcement learning, to cite a few avenues. However, the fact that many of these human annotations are inherently subjective is often overlooked. Recent work has demonstrated that ignoring rater subjectivity (typically resulting in rater disagreement) is problematic within specific tasks and for specific subgroups. Generalizable methods to harness rater disagreement and thus understand the socio-cultural leanings of subjective tasks remain elusive. In this paper, we propose GRASP, a comprehensive disagreement analysis framework to measure group association in perspectives among different rater sub-groups, and demonstrate its utility in assessing the extent of systematic disagreements in two datasets: (1) safety annotations of human-chatbot conversations, and (2) offensiveness annotations of social media posts, both annotated by diverse rater pools across different socio-demographic axes. Our framework (based on disagreement metrics) reveals specific rater groups that have significantly different perspectives than others on certain tasks, and helps identify demographic axes that are crucial to consider in specific task contexts.

Updated: 2024-06-14 03:28:49

标题: GRASP：一种用于评估观点中的团体关联的分歧分析框架

摘要: 人类标注在机器学习中发挥着核心作用--为监督模型提供注释，为生成模型提供安全防护，为强化学习提供人类反馈等。然而，许多这些人类标注本质上是主观的这一事实经常被忽视。最近的研究表明，忽视评分者主观性（通常导致评分者意见不一致）在特定任务和特定子群中是有问题的。目前尚缺乏能够利用评分者意见不一致并因此了解主观任务的社会文化倾向的可推广方法。在本文中，我们提出了GRASP，一个全面的分歧分析框架，用于衡量不同评分者子群之间观点的群体关联，并展示其在评估两个数据集中系统分歧程度方面的实用性：（1）人类-聊天机器人对话的安全标注，以及（2）社交媒体帖子的冒犯性标注，这两个数据集都由不同社会人口统计轴上的多样化评分者群体进行标注。我们的框架（基于分歧度量）揭示了对某些任务有显著不同观点的特定评分者群体，并帮助确定在特定任务背景下需要考虑的关键人口统计轴。

更新时间: 2024-06-14 03:28:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.05074v2

Generative Inverse Design of Crystal Structures via Diffusion Models with Transformers

Recent advances in deep learning have enabled the generation of realistic data by training generative models on large datasets of text, images, and audio. While these models have demonstrated exceptional performance in generating novel and plausible data, it remains an open question whether they can effectively accelerate scientific discovery through the data generation and drive significant advancements across various scientific fields. In particular, the discovery of new inorganic materials with promising properties poses a critical challenge, both scientifically and for industrial applications. However, unlike textual or image data, materials, or more specifically crystal structures, consist of multiple types of variables - including lattice vectors, atom positions, and atomic species. This complexity in data give rise to a variety of approaches for representing and generating such data. Consequently, the design choices of generative models for crystal structures remain an open question. In this study, we explore a new type of diffusion model for the generative inverse design of crystal structures, with a backbone based on a Transformer architecture. We demonstrate our models are superior to previous methods in their versatility for generating crystal structures with desired properties. Furthermore, our empirical results suggest that the optimal conditioning methods vary depending on the dataset.

Updated: 2024-06-14 03:25:22

标题: 晶体结构的生成式逆向设计：基于变压器的扩散模型

摘要: 最近深度学习的进展使得可以通过在大量文本、图片和音频数据集上训练生成模型来生成逼真的数据。虽然这些模型已经表现出在生成新颖和可信数据方面的优异性能，但是否它们能够通过数据生成有效加速科学发现并在各个科学领域推动重大进展仍然是一个开放问题。特别是，发现具有有前途性质的新无机材料提出了一个重要挑战，无论是在科学上还是在工业应用上。然而，与文本或图像数据不同，材料，特别是晶体结构，由多种类型的变量组成 - 包括晶格矢量、原子位置和原子种类。这种数据的复杂性导致了多种表示和生成这种数据的方法。因此，晶体结构的生成模型的设计选择仍然是一个开放问题。在这项研究中，我们探索了一种基于Transformer架构的新型扩散模型，用于生成晶体结构的生成逆向设计。我们展示了我们的模型在生成具有所需属性的晶体结构方面优于先前的方法。此外，我们的实证结果表明，最佳的调节方法取决于数据集。

更新时间: 2024-06-14 03:25:22

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2406.09263v2

Explainable AI for Comparative Analysis of Intrusion Detection Models

Explainable Artificial Intelligence (XAI) has become a widely discussed topic, the related technologies facilitate better understanding of conventional black-box models like Random Forest, Neural Networks and etc. However, domain-specific applications of XAI are still insufficient. To fill this gap, this research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic on the same dataset using occlusion sensitivity. The models evaluated include Linear Regression, Logistic Regression, Linear Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest, Decision Trees, and Multi-Layer Perceptrons (MLP). We trained all models to the accuracy of 90\% on the UNSW-NB15 Dataset. We found that most classifiers leverage only less than three critical features to achieve such accuracies, indicating that effective feature engineering could actually be far more important for intrusion detection than applying complicated models. We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness. Data and code available at https://github.com/pcwhy/XML-IntrusionDetection.git

Updated: 2024-06-14 03:11:01

标题: 可解释的人工智能用于入侵检测模型的比较分析

摘要: 可解释的人工智能（XAI）已经成为一个广泛讨论的话题，相关技术有助于更好地理解传统的黑盒模型，如随机森林、神经网络等。然而，XAI的领域特定应用仍然不足。为了填补这一空白，本研究使用遮挡敏感性分析了各种机器学习模型在相同数据集上对网络流量入侵检测的二元和多类分类任务的表现。评估的模型包括线性回归、逻辑回归、线性支持向量机（SVM）、K最近邻（KNN）、随机森林、决策树和多层感知器（MLP）。我们在UNSW-NB15数据集上训练所有模型使其准确率达到90\%。我们发现大多数分类器仅利用不到三个关键特征就能实现这样的准确性，表明有效的特征工程实际上对于入侵检测可能比应用复杂模型更重要。我们还发现随机森林在准确性、时间效率和稳健性方面表现最佳。数据和代码可在https://github.com/pcwhy/XML-IntrusionDetection.git获取。

更新时间: 2024-06-14 03:11:01

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.09684v1

Privacy-preserving Quantification of Non-IID Degree in Federated Learning

Federated learning (FL) offers a privacy-preserving approach to machine learning for multiple collaborators without sharing raw data. However, the existence of non-independent and non-identically distributed (non-IID) datasets across different clients presents a significant challenge to FL, leading to a sharp drop in accuracy, reduced efficiency, and hindered implementation. To address the non-IID problem, various methods have been proposed, including clustering and personalized FL frameworks. Nevertheless, to date, a formal quantitative definition of the non-IID degree between different clients' datasets is still missing, hindering the clients from comparing and obtaining an overview of their data distributions with other clients. For the first time, this paper proposes a quantitative definition of the non-IID degree in the federated environment by employing the cumulative distribution function (CDF), called Fully Homomorphic Encryption-based Federated Cumulative Distribution Function (FHE-FCDF). This method utilizes cryptographic primitive fully homomorphic encryption to enable clients to estimate the non-IID degree while ensuring privacy preservation. The experiments conducted on the CIFAR-100 non-IID dataset validate the effectiveness of our proposed method.

Updated: 2024-06-14 03:08:53

标题: 隐私保护的联邦学习中非独立同分布程度的量化

摘要: 联邦学习（FL）为多个合作伙伴提供了一种保护隐私的机器学习方法，而无需共享原始数据。然而，不同客户端之间存在非独立和非同分布（非IID）的数据集，给FL带来了重大挑战，导致准确性急剧下降，效率降低，并且实施受阻。为了解决非IID问题，已经提出了各种方法，包括聚类和个性化FL框架。然而，迄今为止，对不同客户端数据集之间的非IID程度的正式定量定义仍然缺失，阻碍了客户端比较和获取其数据分布与其他客户端的概述。本文首次提出了在联邦环境中利用累积分布函数（CDF）的非IID程度的定量定义，称为基于全同态加密的联邦累积分布函数（FHE-FCDF）。该方法利用密码学原语全同态加密，使客户端能够估计非IID程度，同时确保隐私保护。在CIFAR-100非IID数据集上进行的实验验证了我们提出的方法的有效性。

更新时间: 2024-06-14 03:08:53

领域: cs.CR

下载: http://arxiv.org/abs/2406.09682v1

Heterogeneous Federated Learning with Convolutional and Spiking Neural Networks

Federated learning (FL) has emerged as a promising paradigm for training models on decentralized data while safeguarding data privacy. Most existing FL systems, however, assume that all machine learning models are of the same type, although it becomes more likely that different edge devices adopt different types of AI models, including both conventional analogue artificial neural networks (ANNs) and biologically more plausible spiking neural networks (SNNs). This diversity empowers the efficient handling of specific tasks and requirements, showcasing the adaptability and versatility of edge computing platforms. One main challenge of such heterogeneous FL system lies in effectively aggregating models from the local devices in a privacy-preserving manner. To address the above issue, this work benchmarks FL systems containing both convoluntional neural networks (CNNs) and SNNs by comparing various aggregation approaches, including federated CNNs, federated SNNs, federated CNNs for SNNs, federated SNNs for CNNs, and federated CNNs with SNN fusion. Experimental results demonstrate that the CNN-SNN fusion framework exhibits the best performance among the above settings on the MNIST dataset. Additionally, intriguing phenomena of competitive suppression are noted during the convergence process of multi-model FL.

Updated: 2024-06-14 03:05:05

标题: 使用卷积和脉冲神经网络的异构联邦学习

摘要: 联邦学习（FL）已经成为在去中心化数据上训练模型并保护数据隐私的有前途的范式。然而，大多数现有的FL系统假定所有的机器学习模型都是相同类型的，尽管不同的边缘设备采用不同类型的人工智能模型变得更加可能，包括传统的模拟人工神经网络（ANNs）和更具生物学可信度的脉冲神经网络（SNNs）。这种多样性增强了对特定任务和要求的有效处理，展示了边缘计算平台的适应性和多功能性。这种异构FL系统的主要挑战在于以隐私保护的方式有效地聚合来自本地设备的模型。为了解决上述问题，这项工作通过比较各种聚合方法（包括联邦CNNs，联邦SNNs，联邦CNNs用于SNNs，联邦SNNs用于CNNs以及联邦CNNs与SNN融合）来对包含CNNs和SNNs的FL系统进行基准测试。实验结果表明，在MNIST数据集上，CNN-SNN融合框架在上述设置中表现最佳。此外，在多模型FL的收敛过程中观察到了竞争抑制的有趣现象。

更新时间: 2024-06-14 03:05:05

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.09680v1

Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency

With the recent advancements in graph neural networks (GNNs), spectral GNNs have received increasing popularity by virtue of their specialty in capturing graph signals in the frequency domain, demonstrating promising capability in specific tasks. However, few systematic studies have been conducted on assessing their spectral characteristics. This emerging family of models also varies in terms of designs and settings, leading to difficulties in comparing their performance and deciding on the suitable model for specific scenarios, especially for large-scale tasks. In this work, we extensively benchmark spectral GNNs with a focus on the frequency perspective. We analyze and categorize over 30 GNNs with 27 corresponding filters. Then, we implement these spectral models under a unified framework with dedicated graph computations and efficient training schemes. Thorough experiments are conducted on the spectral models with inclusive metrics on effectiveness and efficiency, offering practical guidelines on evaluating and selecting spectral GNNs with desirable performance. Our implementation enables application on larger graphs with comparable performance and less overhead, which is available at: https://github.com/gdmnl/Spectral-GNN-Benchmark.

Updated: 2024-06-14 02:56:57

标题: 基准测试光谱图神经网络：对效果和效率的全面研究

摘要: 随着图神经网络（GNNs）的最新进展，谱图神经网络因其在频域中捕获图信号的特殊能力而受到越来越多的关注，在特定任务中展现出有前途的能力。然而，对于评估它们的频谱特性，很少进行系统性研究。这一新兴的模型家族在设计和设置方面也存在差异，导致难以比较它们的性能并决定适用于特定场景的合适模型，尤其是对于大规模任务。在这项工作中，我们重点关注频率角度对谱图神经网络进行了全面的基准测试。我们分析和分类了30多个GNNs，对应27个滤波器。然后，我们在一个统一的框架下实现了这些谱模型，采用专用图计算和高效训练方案。我们对谱模型进行了彻底的实验，包括效果和效率等综合指标，为评估和选择具有理想性能的谱图神经网络提供了实用的指导。我们的实现使得在更大规模的图上应用具有可比性能和更少开销成为可能，可在https://github.com/gdmnl/Spectral-GNN-Benchmark 上获得。

更新时间: 2024-06-14 02:56:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09675v1

Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

The recent integration of visual capabilities into Large Language Models (LLMs) has the potential to play a pivotal role in science and technology education, where visual elements such as diagrams, charts, and tables are commonly used to improve the learning experience. This study investigates the performance of ChatGPT-4 Vision, OpenAI's most advanced visual model at the time the study was conducted, on the Bachelor in Computer Science section of Brazil's 2021 National Undergraduate Exam (ENADE). By presenting the model with the exam's open and multiple-choice questions in their original image format and allowing for reassessment in response to differing answer keys, we were able to evaluate the model's reasoning and self-reflecting capabilities in a large-scale academic assessment involving textual and visual content. ChatGPT-4 Vision significantly outperformed the average exam participant, positioning itself within the top 10 best score percentile. While it excelled in questions that incorporated visual elements, it also encountered challenges with question interpretation, logical reasoning, and visual acuity. The involvement of an independent expert panel to review cases of disagreement between the model and the answer key revealed some poorly constructed questions containing vague or ambiguous statements, calling attention to the critical need for improved question design in future exams. Our findings suggest that while ChatGPT-4 Vision shows promise in multimodal academic evaluations, human oversight remains crucial for verifying the model's accuracy and ensuring the fairness of high-stakes educational exams. The paper's research materials are publicly available at https://github.com/nabormendonca/gpt-4v-enade-cs-2021.

Updated: 2024-06-14 02:42:30

标题: 评估ChatGPT-4 Vision在巴西国家本科计算机科学考试中的表现

摘要: 最近将视觉能力整合到大型语言模型（LLMs）中，具有在科学技术教育中扮演关键角色的潜力，其中图表、图表和表格等视觉元素通常用于提高学习体验。本研究调查了ChatGPT-4 Vision在巴西2021年全国本科生考试（ENADE）计算机科学学士学位部分的表现。通过以原始图片格式呈现考试的开放和多项选择题目，并允许根据不同答案键重新评估，我们能够评估该模型在涉及文本和视觉内容的大规模学术评估中的推理和自我反思能力。ChatGPT-4 Vision在明显优于平均考试参与者的表现中，位居前10%的最高分百分位数。虽然它在涉及视觉元素的问题中表现出色，但也在问题解释、逻辑推理和视觉敏锐度方面遇到挑战。独立专家小组参与审查模型与答案键之间分歧的案例，揭示了一些问题设计不佳的问题，其中包含模糊或含糊的陈述，这引起了对未来考试中改进问题设计的迫切需求。我们的研究结果表明，虽然ChatGPT-4 Vision在多模式学术评估中显示出潜力，但人类监督仍然至关重要，用于验证模型的准确性并确保高风险教育考试的公平性。该论文的研究材料可在https://github.com/nabormendonca/gpt-4v-enade-cs-2021上公开获取。

更新时间: 2024-06-14 02:42:30

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.09671v1

Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

Thanks to their remarkable denoising capabilities, diffusion models are increasingly being employed as defensive tools to reinforce the security of other models, notably in purifying adversarial examples and certifying adversarial robustness. However, the security risks of these practices themselves remain largely unexplored, which is highly concerning. To bridge this gap, this work investigates the vulnerabilities of security-enhancing diffusion models. Specifically, we demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack, which substantially diminishes the security assurance provided by such models. Essentially, DIFF2 achieves this by integrating a malicious diffusion-sampling process into the diffusion model, guiding inputs embedded with specific triggers toward an adversary-defined distribution while preserving the normal functionality for clean inputs. Our case studies on adversarial purification and robustness certification show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models, highlighting the potential risks of relying on pre-trained diffusion models as defensive tools. We further explore possible countermeasures, suggesting promising avenues for future research.

Updated: 2024-06-14 02:39:43

标题: 观察观察者！对安全增强扩散模型的后门攻击

摘要: 由于其出色的去噪能力，扩散模型越来越被用作防御工具，以加强其他模型的安全性，特别是在净化敌对示例和认证敌对鲁棒性方面。然而，这些实践本身的安全风险仍然很大程度上未被探讨，这是非常令人担忧的。为了弥补这一差距，本研究调查了增强安全性的扩散模型的漏洞。具体地，我们证明了这些模型对DIFF2，一种简单而有效的后门攻击，非常易受攻击，这严重减少了这些模型提供的安全保障。基本上，DIFF2通过将恶意扩散采样过程集成到扩散模型中来实现这一点，引导嵌入具体触发器的输入向敌对定义的分布，同时保留对干净输入的正常功能。我们对敌对净化和鲁棒性认证的案例研究表明，DIFF2可以显著降低基准数据集和模型的净化后和认证准确性，突显了依赖预训练的扩散模型作为防御工具的潜在风险。我们进一步探讨可能的对策，为未来研究提出了有前途的途径。

更新时间: 2024-06-14 02:39:43

领域: cs.CR

下载: http://arxiv.org/abs/2406.09669v1

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

To mitigate the computational complexity in the self-attention mechanism on long sequences, linear attention utilizes computation tricks to achieve linear complexity, while state space models (SSMs) popularize a favorable practice of using non-data-dependent memory pattern, i.e., emphasize the near and neglect the distant, to processing sequences. Recent studies have shown the priorities by combining them as one. However, the efficiency of linear attention remains only at the theoretical level in a causal setting, and SSMs require various designed constraints to operate effectively on specific data. Therefore, in order to unveil the true power of the hybrid design, the following two issues need to be addressed: (1) hardware-efficient implementation for linear attention and (2) stabilization of SSMs. To achieve this, we leverage the thought of tiling and hierarchy to propose CHELA (short-long Convolutions with Hardware-Efficient Linear Attention), which replaces SSMs with short-long convolutions and implements linear attention in a divide-and-conquer manner. This approach enjoys global abstraction and data-dependent selection from stable SSM and linear attention while maintaining real linear complexity. Our comprehensive experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.

Updated: 2024-06-14 02:37:24

标题: 短长卷积有助于硬件高效的线性注意力聚焦于长序列

摘要: 为了减轻自注意机制在长序列上的计算复杂性，线性注意力利用计算技巧实现线性复杂度，而状态空间模型（SSMs）普及了一种有利的实践，即使用非数据相关的内存模式，即强调近处而忽略远处，来处理序列。最近的研究表明通过将它们结合为一个整体可以优先考虑这些问题。然而，线性注意力的效率仅在因果设置下保持在理论水平，而SSMs需要各种设计约束才能有效地在特定数据上运行。因此，为了揭示混合设计的真正力量，需要解决以下两个问题：（1）线性注意力的硬件高效实现和（2）SSMs的稳定性。为了实现这一目标，我们利用平铺和层次思想提出了CHELA（具有硬件高效线性注意力的短长卷积），它用短长卷积替代了SSMs，并以分治的方式实现线性注意力。这种方法在全局抽象和从稳定SSM和线性注意力中进行数据相关选择的同时保持了真正的线性复杂度。我们在长距离竞技场基准测试和语言建模任务上的全面实验证明了所提出方法的有效性。

更新时间: 2024-06-14 02:37:24

领域: cs.LG

下载: http://arxiv.org/abs/2406.08128v3

Are EEG-to-Text Models Working?

This work critically analyzes existing models for open-vocabulary EEG-to-Text translation. We identify a crucial limitation: previous studies often employed implicit teacher-forcing during evaluation, artificially inflating performance metrics. Additionally, they lacked a critical benchmark - comparing model performance on pure noise inputs. We propose a methodology to differentiate between models that truly learn from EEG signals and those that simply memorize training data. Our analysis reveals that model performance on noise data can be comparable to that on EEG data. These findings highlight the need for stricter evaluation practices in EEG-to-Text research, emphasizing transparent reporting and rigorous benchmarking with noise inputs. This approach will lead to more reliable assessments of model capabilities and pave the way for robust EEG-to-Text communication systems.

Updated: 2024-06-14 02:27:00

标题: 請問EEG轉文字模型是否有效？

摘要: 这项工作对现有的开放词汇脑电图到文本翻译模型进行了批判性分析。我们发现一个关键限制：先前的研究在评估过程中经常使用隐式教师强迫，人为地夸大了性能指标。此外，它们缺乏一个关键的基准 - 比较模型在纯噪声输入上的性能。我们提出了一种方法来区分真正从脑电信号中学习的模型和仅仅记忆训练数据的模型。我们的分析表明，模型在噪声数据上的性能可以与在脑电数据上的性能相媲美。这些发现突显了在脑电图到文本研究中需要更严格的评估实践，强调透明报告和与噪声输入进行严格基准测试。这种方法将带来更可靠的模型能力评估，并为强大的脑电图到文本通信系统铺平道路。

更新时间: 2024-06-14 02:27:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06459v3

Temporal Planning via Interval Logic Satisfiability for Autonomous Systems

Many automated planning methods and formulations rely on suitably designed abstractions or simplifications of the constrained dynamics associated with agents to attain computational scalability. We consider formulations of temporal planning where intervals are associated with both action and fluent atoms, and relations between these are given as sentences in Allen's Interval Logic. We propose a notion of planning graphs that can account for complex concurrency relations between actions and fluents as a Constraint Programming (CP) model. We test an implementation of our algorithm on a state-of-the-art framework for CP and compare it with PDDL 2.1 planners that capture plans requiring complex concurrent interactions between agents. We demonstrate our algorithm outperforms existing PDDL 2.1 planners in the case studies. Still, scalability remains challenging when plans must comply with intricate concurrent interactions and the sequencing of actions.

Updated: 2024-06-14 02:21:53

标题: 自主系统的时间规划：通过区间逻辑可满足性

摘要: 许多自动规划方法和公式依赖于适当设计的与代理相关的受约束动态的抽象或简化，以实现计算可扩展性。我们考虑时间规划的公式，其中区间与行动和流畅原子相关联，并且这些之间的关系以Allen的区间逻辑中的句子给出。我们提出了一种可以解释行动和流畅之间复杂并发关系的规划图的概念，作为约束编程（CP）模型。我们在一个最先进的CP框架上测试了我们算法的实现，并将其与捕捉需要代理之间复杂并发交互的PDDL 2.1规划器进行了比较。我们在案例研究中展示了我们的算法优于现有的PDDL 2.1规划器。然而，当计划必须遵守复杂的并发交互和行动顺序时，可扩展性仍然具有挑战性。

更新时间: 2024-06-14 02:21:53

领域: cs.LO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.09661v1

Learning Language Structures through Grounding

Language is highly structured, with syntactic and semantic structures, to some extent, agreed upon by speakers of the same language. With implicit or explicit awareness of such structures, humans can learn and use language efficiently and generalize to sentences that contain unseen words. Motivated by human language learning, in this dissertation, we consider a family of machine learning tasks that aim to learn language structures through grounding. We seek distant supervision from other data sources (i.e., grounds), including but not limited to other modalities (e.g., vision), execution results of programs, and other languages. We demonstrate the potential of this task formulation and advocate for its adoption through three schemes. In Part I, we consider learning syntactic parses through visual grounding. We propose the task of visually grounded grammar induction, present the first models to induce syntactic structures from visually grounded text and speech, and find that the visual grounding signals can help improve the parsing quality over language-only models. As a side contribution, we propose a novel evaluation metric that enables the evaluation of speech parsing without text or automatic speech recognition systems involved. In Part II, we propose two execution-aware methods to map sentences into corresponding semantic structures (i.e., programs), significantly improving compositional generalization and few-shot program synthesis. In Part III, we propose methods that learn language structures from annotations in other languages. Specifically, we propose a method that sets a new state of the art on cross-lingual word alignment. We then leverage the learned word alignments to improve the performance of zero-shot cross-lingual dependency parsing, by proposing a novel substructure-based projection method that preserves structural knowledge learned from the source language.

Updated: 2024-06-14 02:21:53

标题: 通过基础学习掌握语言结构

摘要: 语言是高度结构化的，具有句法和语义结构，某种程度上，同一语言的说话者对此达成共识。有了对这些结构的隐含或明确意识，人类可以高效地学习和使用语言，并推广到包含未见词汇的句子中。受人类语言学习的启发，在这篇论文中，我们考虑了一系列旨在通过基础建设学习语言结构的机器学习任务。我们寻求来自其他数据源（即基础）的远程监督，包括但不限于其他模态（例如视觉）、程序执行结果和其他语言。我们通过三种方案展示了这一任务形式的潜力，并主张采用这一形式。在第一部分中，我们考虑通过视觉基础构建学习句法解析。我们提出了视觉基础语法归纳的任务，提出了首个能够从视觉基础的文本和语音中归纳句法结构的模型，并发现视觉基础信号可以帮助提高解析质量，超过仅基于语言的模型。作为一个附带贡献，我们提出了一种新颖的评估指标，可以在没有文本或自动语音识别系统参与的情况下评估语音解析。在第二部分中，我们提出了两种意识执行的方法，将句子映射到相应的语义结构（即程序），显著提高了组合泛化和少样本程序合成。在第三部分中，我们提出了从其他语言注释中学习语言结构的方法。具体而言，我们提出了一种在跨语言词对齐方面取得了新的技术水平的方法。然后，我们利用学习到的词对齐来提高零-shot跨语言依存句法分析的性能，通过提出一种新颖的基于子结构的投影方法，保留了从源语言学习到的结构知识。

更新时间: 2024-06-14 02:21:53

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.09662v1

Detecting Complex Multi-step Attacks with Explainable Graph Neural Network

Complex multi-step attacks have caused significant damage to numerous critical infrastructures. To detect such attacks, graph neural network based methods have shown promising results by modeling the system's events as a graph. However, existing methods still face several challenges when deployed in practice. First, there is a lack of sufficient real attack data especially considering the large volume of normal data. Second, the modeling of event graphs is challenging due to their dynamic and heterogeneous nature. Third, the lack of explanation in learning models undermines the trustworthiness of such methods in production environments. To address the above challenges, in this paper, we propose an attack detection method, Trace2Vec. The approach first designs an erosion function to augment rare attack samples, and integrates them into the event graphs. Next, it models the event graphs via a continuous-time dynamic heterogeneous graph neural network. Finally, it employs the Monte Carlo tree search algorithm to identify events with greater contributions to the attack, thus enhancing the explainability of the detection result. We have implemented a prototype for Trace2Vec, and the experimental evaluations demonstrate its superior detection and explanation performance compared to existing methods.

Updated: 2024-06-14 02:19:45

标题: 用可解释的图神经网络检测复杂的多步攻击

摘要: 复杂的多阶段攻击给许多关键基础设施造成了重大损害。为了检测这种攻击，基于图神经网络的方法通过将系统事件建模为图表现出了很好的结果。然而，现有方法在实践中仍面临一些挑战。首先，缺乏足够的真实攻击数据，特别是考虑到大量的正常数据。其次，由于事件图的动态和异质性，事件图的建模具有挑战性。第三，学习模型缺乏解释性，削弱了这类方法在生产环境中的可信度。为了解决上述挑战，在本文中，我们提出了一种攻击检测方法Trace2Vec。该方法首先设计了一个侵蚀函数来增强罕见的攻击样本，并将它们整合到事件图中。接下来，它通过连续时间动态异质图神经网络对事件图进行建模。最后，它采用蒙特卡洛树搜索算法来识别对攻击有更大贡献的事件，从而增强了检测结果的可解释性。我们已经为Trace2Vec实现了一个原型，并实验评估表明，相比现有方法，它具有更优越的检测和解释性能。

更新时间: 2024-06-14 02:19:45

领域: cs.CR

下载: http://arxiv.org/abs/2405.11335v2

A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a significant challenge. To address this, we have constructed a large-scale evaluation benchmark called DeepFaceGen, aimed at quantitatively assessing the effectiveness of face forgery detection and facilitating the iterative development of forgery detection technology. DeepFaceGen consists of 776,990 real face image/video samples and 773,812 face forgery image/video samples, generated using 34 mainstream face generation techniques. During the construction process, we carefully consider important factors such as content diversity, fairness across ethnicities, and availability of comprehensive labels, in order to ensure the versatility and convenience of DeepFaceGen. Subsequently, DeepFaceGen is employed in this study to evaluate and analyze the performance of 13 mainstream face forgery detection techniques from various perspectives. Through extensive experimental analysis, we derive significant findings and propose potential directions for future research. The code and dataset for DeepFaceGen are available at https://github.com/HengruiLou/DeepFaceGen.

Updated: 2024-06-14 02:17:04

标题: 一个用于人脸伪造检测的大规模通用评估基准Benchmark

摘要: 随着人工智能生成内容（AIGC）技术的快速发展，制作逼真的假面部图像和视频以欺骗人类视觉感知变得可能。因此，各种面部伪造检测技术被提出来识别这种假面部内容。然而，评估这些检测技术的有效性和普适性仍然是一个重大挑战。为了解决这个问题，我们构建了一个名为DeepFaceGen的大规模评估基准，旨在定量评估面部伪造检测的有效性，并促进伪造检测技术的迭代发展。DeepFaceGen包括776,990个真实面部图像/视频样本和773,812个面部伪造图像/视频样本，使用34种主流面部生成技术生成。在构建过程中，我们仔细考虑了重要因素，如内容多样性，跨种族的公平性和全面标签的可用性，以确保DeepFaceGen的多功能性和方便性。随后，在这项研究中，我们使用DeepFaceGen来评估和分析来自不同角度的13种主流面部伪造检测技术的性能。通过广泛的实验分析，我们得出了重要的发现，并提出了未来研究的潜在方向。DeepFaceGen的代码和数据集可在https://github.com/HengruiLou/DeepFaceGen 上找到。

更新时间: 2024-06-14 02:17:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.09181v2

Cyberattack Data Analysis in IoT Environments using Big Data

In the landscape of the Internet of Things (IoT), transforming various industries, our research addresses the growing connectivity and security challenges, including interoperability and standardized protocols. Despite the anticipated exponential growth in IoT connections, network security remains a major concern due to inadequate datasets that fail to fully encompass potential cyberattacks in realistic IoT environments. Using Apache Hadoop and Hive, our in-depth analysis of security vulnerabilities identified intricate patterns and threats, such as attack behavior, network traffic anomalies, TCP flag usage, and targeted attacks, underscoring the critical need for robust data platforms to enhance IoT security.

Updated: 2024-06-14 02:12:43

标题: IoT环境中使用大数据进行网络攻击数据分析

摘要: 在物联网（IoT）的景观中，正在改变各个行业，我们的研究解决了不断增长的连接性和安全挑战，包括互操作性和标准化协议。尽管预计物联网连接将呈指数增长，但由于现有数据集不足，未能充分涵盖现实物联网环境中潜在的网络攻击，网络安全仍然是一个主要关注点。利用Apache Hadoop和Hive，我们对安全漏洞进行了深入分析，识别出复杂的模式和威胁，如攻击行为、网络流量异常、TCP标志使用和有针对性的攻击，强调了增强物联网安全所需的强大数据平台的重要性。

更新时间: 2024-06-14 02:12:43

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.10302v1

ScaLES: Scalable Latent Exploration Score for Pre-Trained Generative Networks

We develop Scalable Latent Exploration Score (ScaLES) to mitigate over-exploration in Latent Space Optimization (LSO), a popular method for solving black-box discrete optimization problems. LSO utilizes continuous optimization within the latent space of a Variational Autoencoder (VAE) and is known to be susceptible to over-exploration, which manifests in unrealistic solutions that reduce its practicality. ScaLES is an exact and theoretically motivated method leveraging the trained decoder's approximation of the data distribution. ScaLES can be calculated with any existing decoder, e.g. from a VAE, without additional training, architectural changes, or access to the training data. Our evaluation across five LSO benchmark tasks and three VAE architectures demonstrates that ScaLES enhances the quality of the solutions while maintaining high objective values, leading to improvements over existing solutions. We believe that new avenues to LSO will be opened by ScaLES ability to identify out of distribution areas, differentiability, and computational tractability. Open source code for ScaLES is available at https://github.com/OmerRonen/scales.

Updated: 2024-06-14 02:04:59

标题: ScaLES：用于预训练生成网络的可扩展潜在探索评分

摘要: 我们开发了可扩展的潜在探索得分（ScaLES）来缓解潜在空间优化（LSO）中的过度探索，LSO是一种解决黑盒离散优化问题的流行方法。LSO利用变分自动编码器（VAE）的潜在空间内的连续优化，并且已知容易受到过度探索的影响，这会导致出现不切实际的解决方案，降低其实用性。ScaLES是一种准确且理论上有动机的方法，利用训练好的解码器对数据分布的近似。ScaLES可以使用任何现有的解码器进行计算，例如来自VAE，无需额外的训练、架构更改或访问训练数据。我们在五个LSO基准任务和三个VAE架构上的评估表明，ScaLES提高了解决方案的质量，同时保持高目标值，从而改善了现有解决方案。我们相信，ScaLES的能力可以开辟新的LSO途径，以识别分布之外的区域、可微分性和计算可处理性。ScaLES的开源代码可在https://github.com/OmerRonen/scales找到。

更新时间: 2024-06-14 02:04:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.09657v1

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation to meet a certain latency requirement. However, this kind of parallelism introduces additional communication that might contribute a significant portion of overall runtime. Thus limits scalability of this technique within a group of devices with high speed interconnects, such as GPUs with NVLinks in a node. This paper proposes a novel method, Flux, to significantly hide communication latencies with dependent computations for GPUs. Flux over-decomposes communication and computation operations into much finer-grained operations and further fuses them into a larger kernel to effectively hide communication without compromising kernel efficiency. Flux can potentially overlap up to 96% of communication given a fused kernel. Overall, it can achieve up to 1.24x speedups for training over Megatron-LM on a cluster of 128 GPUs with various GPU generations and interconnects, and up to 1.66x and 1.30x speedups for prefill and decoding inference over vLLM on a cluster with 8 GPUs with various GPU generations and interconnects.

Updated: 2024-06-14 01:46:04

标题: FLUX：通过内核融合在GPU上实现快速软件通信重叠

摘要: 大型深度学习模型已经展示出在广泛应用领域解决许多任务的强大能力。这些大型模型通常需要分布式训练和推断。张量并行是一种常见的技术，将一个操作或层的计算分区到多个设备上，以克服单个处理器的内存容量限制，加速计算以满足特定的延迟要求。然而，这种并行性引入了额外的通信，可能会占整体运行时间的相当大比例。因此，这种技术在具有高速互连的设备组内的可扩展性受到限制，例如在一个节点中具有NVLinks的GPU。本文提出了一种新颖的方法Flux，以显著隐藏依赖计算的通信延迟。Flux将通信和计算操作过度细分，并进一步将它们融合到一个更大的内核中，以有效地隐藏通信而不影响内核效率。在给定一个融合内核的情况下，Flux可以潜在地重叠高达96%的通信。总体而言，在具有不同GPU世代和互连的128个GPU集群上，它可以实现高达1.24倍的训练速度提升，以及在具有不同GPU世代和互连的8个GPU集群上，预填和解码推断速度提升高达1.66倍和1.30倍。

更新时间: 2024-06-14 01:46:04

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.06858v3

TimeCMA: Towards LLM-Empowered Time Series Forecasting via Cross-Modality Alignment

The widespread adoption of scalable mobile sensing has led to large amounts of time series data for real-world applications. A fundamental application is multivariate time series forecasting (MTSF), which aims to predict future time series values based on historical observations. Existing MTSF methods suffer from limited parameterization and small-scale training data. Recently, Large language models (LLMs) have been introduced in time series, which achieve promising forecasting performance but incur heavy computational costs. To solve these challenges, we propose TimeCMA, an LLM-empowered framework for time series forecasting with cross-modality alignment. We design a dual-modality encoding module with two branches, where the time series encoding branch extracts relatively low-quality yet pure embeddings of time series through an inverted Transformer. In addition, the LLM-empowered encoding branch wraps the same time series as prompts to obtain high-quality yet entangled prompt embeddings via a Pre-trained LLM. Then, we design a cross-modality alignment module to retrieve high-quality and pure time series embeddings from the prompt embeddings. Moreover, we develop a time series forecasting module to decode the aligned embeddings while capturing dependencies among multiple variables for forecasting. Notably, we tailor the prompt to encode sufficient temporal information into a last token and design the last token embedding storage to reduce computational costs. Extensive experiments on real data offer insight into the accuracy and efficiency of the proposed framework.

Updated: 2024-06-14 01:39:29

标题: TimeCMA: 通过跨模态对齐实现LLM增强的时间序列预测

摘要: 可伸缩移动传感技术的广泛应用导致了大量用于实际应用的时间序列数据。一个基本的应用是多变量时间序列预测（MTSF），旨在基于历史观测来预测未来的时间序列数值。现有的MTSF方法存在参数化有限和小规模训练数据的问题。最近，大型语言模型（LLMs）已被引入时间序列中，取得了有希望的预测性能，但也带来了沉重的计算成本。为了解决这些挑战，我们提出了TimeCMA，一个基于LLM的具有跨模态对齐的时间序列预测框架。我们设计了一个双模态编码模块，其中时间序列编码分支通过反向Transformer提取相对低质量但纯净的时间序列嵌入。此外，LLM增强编码分支将相同的时间序列作为提示来获取高质量但混合的提示嵌入。然后，我们设计了一个跨模态对齐模块，以从提示嵌入中检索高质量和纯净的时间序列嵌入。此外，我们开发了一个时间序列预测模块，用于解码对齐的嵌入，同时捕获多个变量之间的依赖关系进行预测。值得注意的是，我们定制了提示以将足够的时间信息编码为最后一个令牌，并设计了最后一个令牌嵌入存储以减少计算成本。对真实数据的广泛实验为所提出的框架的准确性和效率提供了见解。

更新时间: 2024-06-14 01:39:29

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.01638v3

An Interpretable Evaluation of Entropy-based Novelty of Generative Models

The massive developments of generative model frameworks require principled methods for the evaluation of a model's novelty compared to a reference dataset. While the literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a reference model has not been adequately explored in the machine learning community. In this work, we focus on the novelty assessment for multi-modal distributions and attempt to address the following differential clustering task: Given samples of a generative model $P_\mathcal{G}$ and a reference model $P_\mathrm{ref}$, how can we discover the sample types expressed by $P_\mathcal{G}$ more frequently than in $P_\mathrm{ref}$? We introduce a spectral approach to the differential clustering task and propose the Kernel-based Entropic Novelty (KEN) score to quantify the mode-based novelty of $P_\mathcal{G}$ with respect to $P_\mathrm{ref}$. We analyze the KEN score for mixture distributions with well-separable components and develop a kernel-based method to compute the KEN score from empirical data. We support the KEN framework by presenting numerical results on synthetic and real image datasets, indicating the framework's effectiveness in detecting novel modes and comparing generative models. The paper's code is available at: www.github.com/buyeah1109/KEN

Updated: 2024-06-14 01:37:57

标题: 一种可解释的评估生成模型基于熵的新颖性

摘要: 生成模型框架的大规模发展需要有原则的方法来评估模型与参考数据集的新颖性。虽然文献已经广泛研究了生成模型的质量、多样性和泛化能力的评估，但与参考模型相比对模型的新颖性的评估在机器学习社区中尚未得到充分探讨。在这项工作中，我们关注多模态分布的新颖性评估，并尝试解决以下差异聚类任务：给定生成模型$P_\mathcal{G}$和参考模型$P_\mathrm{ref}$的样本，我们如何发现$P_\mathcal{G}$所表达的样本类型比$P_\mathrm{ref}$中频繁出现的更多？我们引入了一种谱方法来解决差异聚类任务，并提出了基于核的熵新颖性（KEN）分数来量化$P_\mathcal{G}$相对于$P_\mathrm{ref}$的基于模式的新颖性。我们分析了混合分布的KEN分数，其具有可分离的组件，并开发了一种基于核的方法来从经验数据计算KEN分数。我们通过对合成和真实图像数据集的数值结果支持KEN框架，表明该框架在检测新颖模式和比较生成模型方面的有效性。该论文的代码可在以下网址获得：www.github.com/buyeah1109/KEN

更新时间: 2024-06-14 01:37:57

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2402.17287v2

RSEND: Retinex-based Squeeze and Excitation Network with Dark Region Detection for Efficient Low Light Image Enhancement

Images captured under low-light scenarios often suffer from low quality. Previous CNN-based deep learning methods often involve using Retinex theory. Nevertheless, most of them cannot perform well in more complicated datasets like LOL-v2 while consuming too much computational resources. Besides, some of these methods require sophisticated training at different stages, making the procedure even more time-consuming and tedious. In this paper, we propose a more accurate, concise, and one-stage Retinex theory based framework, RSEND. RSEND first divides the low-light image into the illumination map and reflectance map, then captures the important details in the illumination map and performs light enhancement. After this step, it refines the enhanced gray-scale image and does element-wise matrix multiplication with the reflectance map. By denoising the output it has from the previous step, it obtains the final result. In all the steps, RSEND utilizes Squeeze and Excitation network to better capture the details. Comprehensive quantitative and qualitative experiments show that our Efficient Retinex model significantly outperforms other CNN-based models, achieving a PSNR improvement ranging from 0.44 dB to 4.2 dB in different datasets and even outperforms transformer-based models in the LOL-v2-real dataset.

Updated: 2024-06-14 01:36:52

标题: RSEND：基于Retinex的压缩与激励网络，结合暗区域检测实现高效低光图像增强

摘要: 在低光条件下捕获的图像往往质量较低。先前基于CNN的深度学习方法通常涉及使用Retinex理论。然而，大多数方法在更复杂的数据集（如LOL-v2）中表现不佳，同时消耗大量计算资源。此外，其中一些方法需要在不同阶段进行复杂的训练，使得整个过程更加耗时和繁琐。在本文中，我们提出了一个更精确、简洁且一阶段的基于Retinex理论的框架RSEND。RSEND首先将低光图像分为照明图和反射图，然后捕获照明图中的重要细节并进行光线增强。在此步骤之后，它精炼增强的灰度图像，并与反射图进行逐元素矩阵乘法。通过去噪前一步骤的输出，它获得最终结果。在所有步骤中，RSEND利用Squeeze和Excitation网络更好地捕捉细节。全面的定量和定性实验表明，我们的高效Retinex模型明显优于其他基于CNN的模型，在不同数据集中PSNR提高范围从0.44 dB到4.2 dB，并且在LOL-v2-real数据集中甚至优于基于Transformer的模型。

更新时间: 2024-06-14 01:36:52

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.09656v1

Open-Vocabulary Calibration for Fine-tuned CLIP

Vision-language models (VLMs) have emerged as formidable tools, showing their strong capability in handling various open-vocabulary tasks in image recognition, text-driven visual content generation, and visual chatbots, to name a few. In recent years, considerable efforts and resources have been devoted to adaptation methods for improving downstream performance of VLMs, particularly on parameter-efficient fine-tuning methods like prompt learning. However, a crucial aspect that has been largely overlooked is the confidence calibration problem in fine-tuned VLMs, which could greatly reduce reliability when deploying such models in the real world. This paper bridges the gap by systematically investigating the confidence calibration problem in the context of prompt learning and reveals that existing calibration methods are insufficient to address the problem, especially in the open-vocabulary setting. To solve the problem, we present a simple and effective approach called Distance-Aware Calibration (DAC), which is based on scaling the temperature using as guidance the distance between predicted text labels and base classes. The experiments with 7 distinct prompt learning methods applied across 11 diverse downstream datasets demonstrate the effectiveness of DAC, which achieves high efficacy without sacrificing the inference speed. Our code is available at https://github.com/ml-stat-Sustech/CLIP_Calibration.

Updated: 2024-06-14 01:26:05

标题: 开放词汇校准以进行精细调整的CLIP

摘要: 视觉语言模型（VLMs）已经成为强大的工具，展示了它们在处理各种开放词汇任务中的强大能力，包括图像识别、文本驱动的视觉内容生成和视觉聊天机器人等。近年来，人们已经投入了相当多的努力和资源来改进VLMs的下游性能，特别是在参数效率高的微调方法，如提示学习。然而，一个被大多数人忽视的关键方面是微调VLMs中的置信校准问题，这可能会在现实世界中部署这些模型时大大降低可靠性。本文通过系统地研究提示学习背景下的置信校准问题，揭示了现有的校准方法无法解决该问题，特别是在开放词汇环境中。为了解决这个问题，我们提出了一种简单有效的方法，称为距离感知校准（DAC），它基于使用预测文本标签和基本类别之间的距离作为指导来调整温度。在11个不同的下游数据集上应用了7种不同的提示学习方法的实验表明，DAC的有效性，它在不牺牲推理速度的情况下实现了高效性。我们的代码可以在https://github.com/ml-stat-Sustech/CLIP_Calibration找到。

更新时间: 2024-06-14 01:26:05

领域: cs.LG

下载: http://arxiv.org/abs/2402.04655v4

Coralai: Intrinsic Evolution of Embodied Neural Cellular Automata Ecosystems

This paper presents Coralai, a framework for exploring diverse ecosystems of Neural Cellular Automata (NCA). Organisms in Coralai utilize modular, GPU-accelerated Taichi kernels to interact, enact environmental changes, and evolve through local survival, merging, and mutation operations implemented with HyperNEAT and PyTorch. We provide an exploratory experiment implementing physics inspired by slime mold behavior showcasing the emergence of competition between sessile and mobile organisms, cycles of resource depletion and recovery, and symbiosis between diverse organisms. We conclude by outlining future work to discover simulation parameters through measures of multi-scale complexity and diversity. Code for Coralai is available at https://github.com/aidanbx/coralai , video demos are available at https://www.youtube.com/watch?v=NL8IZQY02-8 .

Updated: 2024-06-14 01:24:01

标题: Coralai:具有体现神经细胞自动机生态系统内在演化的标题

摘要: 这篇论文介绍了Coralai，一个用于探索多样化神经细胞自动机生态系统的框架。Coralai中的生物利用模块化、GPU加速的Taichi内核进行互动，实施环境变化，并通过HyperNEAT和PyTorch实现的局部存活、合并和突变操作进行进化。我们提供了一个实验，实现了受粘菌行为启发的物理学，展示了固着和移动生物之间的竞争、资源枯竭和恢复周期以及多样化生物之间的共生关系的出现。最后，我们总结了通过多尺度复杂性和多样性度量来发现模拟参数的未来工作。Coralai的代码可在https://github.com/aidanbx/coralai找到，视频演示可在https://www.youtube.com/watch?v=NL8IZQY02-8找到。

更新时间: 2024-06-14 01:24:01

领域: cs.NE,cs.LG,cs.MA

下载: http://arxiv.org/abs/2406.09654v1

Deep learning for precipitation nowcasting: A survey from the perspective of time series forecasting

Deep learning-based time series forecasting has dominated the short-term precipitation forecasting field with the help of its ability to estimate motion flow in high-resolution datasets. The growing interest in precipitation nowcasting offers substantial opportunities for the advancement of current forecasting technologies. Nevertheless, there has been a scarcity of in-depth surveys of time series precipitation forecasting using deep learning. Thus, this paper systemically reviews recent progress in time series precipitation forecasting models. Specifically, we investigate the following key points within background components, covering: i) preprocessing, ii) objective functions, and iii) evaluation metrics. We then categorize forecasting models into \textit{recursive} and \textit{multiple} strategies based on their approaches to predict future frames, investigate the impacts of models using the strategies, and performance assessments. Finally, we evaluate current deep learning-based models for precipitation forecasting on a public benchmark, discuss their limitations and challenges, and present some promising research directions. Our contribution lies in providing insights for a better understanding of time series precipitation forecasting and in aiding the development of robust AI solutions for the future.

Updated: 2024-06-14 01:11:09

标题: 深度学习用于降水预报：从时间序列预测的角度进行调查

摘要: 基于深度学习的时间序列预测已经在短期降水预测领域占据主导地位，其能够估计高分辨率数据集中的运动流动。对于短时降水预测的日益增长的兴趣为当前预测技术的进步提供了重要机遇。然而，对于使用深度学习进行时间序列降水预测的深入调查仍然很少。因此，本文系统地审查了时间序列降水预测模型的最新进展。具体而言，我们调查了背景组件内的以下关键点，包括：i) 预处理，ii) 目标函数和iii) 评估指标。然后，我们根据它们预测未来帧的方法，将预测模型分为\textit{递归}和\textit{多重}策略，调查使用这些策略的模型的影响以及性能评估。最后，我们在公共基准上评估了用于降水预测的当前深度学习模型，讨论它们的局限性和挑战，并提出一些有前途的研究方向。我们的贡献在于提供对时间序列降水预测的更好理解，并帮助未来健壮人工智能解决方案的发展。

更新时间: 2024-06-14 01:11:09

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.04867v2

A Neural-preconditioned Poisson Solver for Mixed Dirichlet and Neumann Boundary Conditions

We introduce a neural-preconditioned iterative solver for Poisson equations with mixed boundary conditions. Typical Poisson discretizations yield large, ill-conditioned linear systems. Iterative solvers can be effective for these problems, but only when equipped with powerful preconditioners. Unfortunately, effective preconditioners like multigrid require costly setup phases that must be re-executed every time domain shapes or boundary conditions change, forming a severe bottleneck for problems with evolving boundaries. In contrast, we present a neural preconditioner trained to efficiently approximate the inverse of the discrete Laplacian in the presence of such changes. Our approach generalizes to domain shapes, boundary conditions, and grid sizes outside the training set. The key to our preconditioner's success is a novel, lightweight neural network architecture featuring spatially varying convolution kernels and supporting fast inference. We demonstrate that our solver outperforms state-of-the-art methods like algebraic multigrid as well as recently proposed neural preconditioners on challenging test cases arising from incompressible fluid simulations.

Updated: 2024-06-14 00:57:50

标题: 一个神经预处理泊松解算器用于混合Dirichlet和Neumann边界条件

摘要: 我们介绍了一种针对带有混合边界条件的Possion方程的神经预处理迭代求解器。典型的Possion离散化产生大型、病态的线性系统。迭代求解器可以有效地解决这些问题，但只有在配备强大的预处理器时才能实现。不幸的是，像多重网格这样的有效预处理器需要昂贵的设置阶段，必须在每次域形状或边界条件发生变化时重新执行，这对于具有演变边界的问题形成严重瓶颈。相比之下，我们提出了一个神经预处理器，训练以有效地近似在存在这些变化的情况下离散Laplace算子的逆。我们的方法推广到域形状、边界条件和训练集之外的网格尺寸。我们的预处理器成功的关键是一种新颖的轻量级神经网络架构，具有空间变化的卷积核，并支持快速推理。我们展示了我们的求解器在挑战性测试案例中的性能优于像代数多重网格以及最近提出的神经预处理器这样的最新方法，这些测试案例来源于不可压缩流体模拟。

更新时间: 2024-06-14 00:57:50

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2310.00177v5

"Did They F***ing Consent to That?": Safer Digital Intimacy via Proactive Protection Against Image-Based Sexual Abuse

As many as 8 in 10 adults share intimate content such as nude or lewd images. Sharing such content has significant benefits for relationship intimacy and body image, and can offer employment. However, stigmatizing attitudes and a lack of technological mitigations put those sharing such content at risk of sexual violence. An estimated 1 in 3 people have been subjected to image-based sexual abuse (IBSA), a spectrum of violence that includes the nonconsensual distribution or threat of distribution of consensually-created intimate content (also called NDII). In this work, we conducted a rigorous empirical interview study of 52 European creators of intimate content to examine the threats they face and how they defend against them, situated in the context of their different use cases for intimate content sharing and their choice of technologies for storing and sharing such content. Synthesizing our results with the limited body of prior work on technological prevention of NDII, we offer concrete next steps for both platforms and security & privacy researchers to work toward safer intimate content sharing through proactive protection. Content Warning: This work discusses sexual violence, specifically, the harms of image-based sexual abuse (particularly in Sections 2 and 6).

Updated: 2024-06-14 00:56:24

标题: 他们是否同意那样做？：通过积极保护来实现更安全的数字亲密关系，防范基于图像的性虐待

摘要: 80%的成年人分享私密内容，如裸体或淫秽图像。分享这类内容对于关系亲密度和身体形象有显著的好处，并且可能为就业提供帮助。然而，污名化的态度和缺乏技术缓解措施使分享此类内容的人面临性暴力的风险。据估计，3人中有1人曾遭受基于图像的性虐待（IBSA），这是一种包括非自愿分发或威胁分发经过共识创造的私密内容（也称为NDII）的暴力范围。在这项工作中，我们对52名欧洲私密内容创作者进行了严格的实证访谈研究，以研究他们面临的威胁以及他们如何抵御这些威胁，这些威胁发生在分享私密内容的不同用例和选择存储和分享此类内容的技术的背景下。将我们的结果与有限的关于NDII技术预防的先前工作综合，我们为平台和安全与隐私研究人员提供了具体的下一步行动，以通过积极保护促进更安全的私密内容分享。内容警告：本作品讨论了性暴力，特别是基于图像的性虐待的危害（特别是在第2节和第6节中）。

更新时间: 2024-06-14 00:56:24

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2403.04659v2

An Intrinsic Vector Heat Network

Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-valued architectures to process channels individually, thus fail to preserve fundamental intrinsic properties of the vector field. The core idea of this work is to introduce a trainable vector heat diffusion module to spatially propagate vector-valued feature data across the surface, which we incorporate into our proposed architecture that consists of vector-valued neurons. Our architecture is invariant to rigid motion of the input, isometric deformation, and choice of local tangent bases, and is robust to discretizations of the surface. We evaluate our Vector Heat Network on triangle meshes, and empirically validate its invariant properties. We also demonstrate the effectiveness of our method on the useful industrial application of quadrilateral mesh generation.

Updated: 2024-06-14 00:40:31

标题: 一个内在的向量热网络

摘要: 矢量场被广泛用于表示和模拟许多科学和工程应用中的流动。本文介绍了一种新颖的神经网络架构，用于学习在三维嵌入曲面上固有定义的切向矢量场。先前学习曲面上的矢量场的方法将矢量视为多维标量场，使用传统的标量值架构逐个处理通道，因此未能保留矢量场的基本固有属性。本文的核心思想是引入可训练的矢量热扩散模块，将矢量值特征数据在曲面上空间传播，我们将其纳入我们提出的由矢量值神经元组成的架构中。我们的架构对输入的刚性运动、等距变形和局部切向基的选择具有不变性，并且对曲面的离散化具有鲁棒性。我们在三角网格上评估了我们的矢量热网络，并通过实证验证其不变性属性。我们还展示了我们的方法在四边形网格生成这一有用的工业应用上的有效性。

更新时间: 2024-06-14 00:40:31

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.09648v1

A Survey of Video Datasets for Grounded Event Understanding

While existing video benchmarks largely consider specialized downstream tasks like retrieval or question-answering (QA), contemporary multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding. A critical component of human temporal-visual perception is our ability to identify and cognitively model "things happening", or events. Historically, video benchmark tasks have implicitly tested for this ability (e.g., video captioning, in which models describe visual events with natural language), but they do not consider video event understanding as a task in itself. Recent work has begun to explore video analogues to textual event extraction but consists of competing task definitions and datasets limited to highly specific event types. Therefore, while there is a rich domain of event-centric video research spanning the past 10+ years, it is unclear how video event understanding should be framed and what resources we have to study it. In this paper, we survey 105 video datasets that require event understanding capability, consider how they contribute to the study of robust event understanding in video, and assess proposed video event extraction tasks in the context of this body of research. We propose suggestions informed by this survey for dataset curation and task framing, with an emphasis on the uniquely temporal nature of video events and ambiguity in visual content.

Updated: 2024-06-14 00:36:55

标题: 一个关于基于视频数据集的事件理解的调查

摘要: 尽管现有的视频基准主要考虑专门的下游任务，如检索或问答（QA），但当代多模态人工智能系统必须能够进行类似于人类视觉理解的全面常识推理。人类时间-视觉感知的一个关键组成部分是我们识别和认知建模“事情发生”的能力，或事件。在历史上，视频基准任务隐含地测试了这种能力（例如视频字幕，模型用自然语言描述视觉事件），但它们并未将视频事件理解视为一个任务。最近的研究已经开始探索视频类似于文本事件提取，但这些研究包括竞争性任务定义和仅限于高度特定事件类型的数据集。因此，尽管过去10多年存在着丰富的以事件为中心的视频研究领域，但尚不清楚视频事件理解应该如何构建以及我们有哪些资源来研究它。在本文中，我们调查了105个需要事件理解能力的视频数据集，考虑它们如何有助于研究视频中的鲁棒事件理解，并评估了在这一研究领域的背景下提出的视频事件提取任务。我们根据这项调查提出了数据集策划和任务构建的建议，强调视频事件的独特时间性质和视觉内容的模糊性。

更新时间: 2024-06-14 00:36:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.09646v1

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. VoiceCraft employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an existing sequence. On speech editing tasks, VoiceCraft produces edited speech that is nearly indistinguishable from unedited recordings in terms of naturalness, as evaluated by humans; for zero-shot TTS, our model outperforms prior SotA models including VALLE and the popular commercial model XTTS-v2. Crucially, the models are evaluated on challenging and realistic datasets, that consist of diverse accents, speaking styles, recording conditions, and background noise and music, and our model performs consistently well compared to other models and real recordings. In particular, for speech editing evaluation, we introduce a high quality, challenging, and realistic dataset named RealEdit. We encourage readers to listen to the demos at https://jasonppy.github.io/VoiceCraft_web.

Updated: 2024-06-14 00:29:46

标题: VoiceCraft：野外零次编辑和文本转语音

摘要: 我们介绍了VoiceCraft，这是一个使用令牌填充的神经编解码器语言模型，可在有声书籍、互联网视频和播客的语音编辑和零样本文本到语音（TTS）任务上实现最先进的性能。VoiceCraft采用Transformer解码器架构，并引入了一种令牌重新排列过程，结合因果掩蔽和延迟堆叠，以在现有序列中实现生成。在语音编辑任务中，VoiceCraft生成的编辑语音在自然度方面几乎无法与未编辑的录音区分，经人类评估；对于零样本TTS，我们的模型优于先前的SotA模型，包括VALLE和流行的商业模型XTTS-v2。至关重要的是，模型在具有不同口音、说话风格、录音条件、背景噪音和音乐的具有挑战性和现实性的数据集上进行评估，我们的模型与其他模型和真实录音相比表现一致。特别是，在语音编辑评估中，我们介绍了一个名为RealEdit的高质量、具有挑战性和现实性的数据集。我们鼓励读者在https://jasonppy.github.io/VoiceCraft_web上听取演示。

更新时间: 2024-06-14 00:29:46

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2403.16973v3

Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting

Recurrent neural network-based sequence-to-sequence models have been extensively applied for multi-step-ahead time series forecasting. These models typically involve a decoder trained using either its previous forecasts or the actual observed values as the decoder inputs. However, relying on self-generated predictions can lead to the rapid accumulation of errors over multiple steps, while using the actual observations introduces exposure bias as these values are unavailable during the extrapolation stage. In this regard, this study proposes a novel training approach called reinforced decoder, which introduces auxiliary models to generate alternative decoder inputs that remain accessible when extrapolating. Additionally, a reinforcement learning algorithm is utilized to dynamically select the optimal inputs to improve accuracy. Comprehensive experiments demonstrate that our approach outperforms representative training methods over several datasets. Furthermore, the proposed approach also exhibits promising performance when generalized to self-attention-based sequence-to-sequence forecasting models.

Updated: 2024-06-14 00:24:29

标题: 强化解码器：朝向为时间序列预测训练循环神经网络

摘要: 基于循环神经网络的序列到序列模型已广泛应用于多步时间序列预测。这些模型通常涉及使用其先前的预测或实际观测值作为解码器输入进行训练。然而，依赖于自动生成的预测可能会导致在多个步骤中快速积累误差，而使用实际观察值会引入暴露偏差，因为这些值在外推阶段不可用。在这方面，本研究提出了一种称为强化解码器的新型训练方法，引入辅助模型生成可在外推时访问的替代解码器输入。此外，利用强化学习算法动态选择最佳输入以提高准确性。全面的实验证明，我们的方法在多个数据集上优于代表性的训练方法。此外，所提出的方法在推广到基于自注意力的序列到序列预测模型时也表现出有希望的性能。

更新时间: 2024-06-14 00:24:29

领域: cs.LG

下载: http://arxiv.org/abs/2406.09643v1

TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs

Multi-relational temporal graphs are powerful tools for modeling real-world data, capturing the evolving and interconnected nature of entities over time. Recently, many novel models are proposed for ML on such graphs intensifying the need for robust evaluation and standardized benchmark datasets. However, the availability of such resources remains scarce and evaluation faces added complexity due to reproducibility issues in experimental protocols. To address these challenges, we introduce Temporal Graph Benchmark 2.0 (TGB 2.0), a novel benchmarking framework tailored for evaluating methods for predicting future links on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs with a focus on large-scale datasets, extending the Temporal Graph Benchmark. TGB 2.0 facilitates comprehensive evaluations by presenting eight novel datasets spanning five domains with up to 53 million edges. TGB 2.0 datasets are significantly larger than existing datasets in terms of number of nodes, edges, or timestamps. In addition, TGB 2.0 provides a reproducible and realistic evaluation pipeline for multi-relational temporal graphs. Through extensive experimentation, we observe that 1) leveraging edge-type information is crucial to obtain high performance, 2) simple heuristic baselines are often competitive with more complex methods, 3) most methods fail to run on our largest datasets, highlighting the need for research on more scalable methods.

Updated: 2024-06-14 00:08:04

标题: TGB 2.0：用于学习时间知识图和异构图的基准

摘要: 多关系时态图是建模现实世界数据的强大工具，捕捉实体随时间演变和相互关联的特性。最近，许多新颖的模型被提出用于在这样的图上进行机器学习，加剧了对稳健评估和标准化基准数据集的需求。然而，这类资源的可用性仍然稀缺，评估面临实验协议中可复制性问题的额外复杂性。为了解决这些挑战，我们引入了适用于评估在时态知识图和时态异构图上预测未来链接方法的新型基准框架 Temporal Graph Benchmark 2.0 (TGB 2.0)，专注于大规模数据集，扩展了时态图基准。TGB 2.0通过提供涵盖五个领域的八个新颖数据集，其中最多包含5300万条边，促进了全面的评估。TGB 2.0数据集在节点数量、边数或时间戳方面显著大于现有数据集。此外，TGB 2.0为多关系时态图提供了可复制和实际的评估流程。通过广泛的实验，我们观察到：1) 利用边类型信息对获得高性能至关重要，2) 简单的启发式基线方法通常与更复杂的方法竞争，3) 大多数方法无法运行在我们最大的数据集上，凸显了对更可伸缩方法的研究的需求。

更新时间: 2024-06-14 00:08:04

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2406.09639v1

RASPNet: A Benchmark Dataset for Radar Adaptive Signal Processing Applications

This work presents a large-scale dataset for radar adaptive signal processing (RASP) applications, aimed at supporting the development of data-driven models within the radar community. The dataset, called RASPNet, consists of 100 realistic scenarios compiled over a variety of topographies and land types from across the contiguous United States, designed to reflect a diverse array of real-world environments. Within each scenario, RASPNet consists of 10,000 clutter realizations from an airborne radar setting, which can be utilized for radar algorithm development and evaluation. RASPNet intends to fill a prominent gap in the availability of a large-scale, realistic dataset that standardizes the evaluation of adaptive radar processing techniques. We describe its construction, organization, and several potential applications, which includes a transfer learning example to demonstrate how RASPNet can be leveraged for realistic adaptive radar processing scenarios.

Updated: 2024-06-14 00:07:52

标题: RASPNet：用于雷达自适应信号处理应用的基准数据集

摘要: 这项工作提供了一个用于雷达自适应信号处理（RASP）应用的大规模数据集，旨在支持雷达社区内数据驱动模型的开发。该数据集名为RASPNet，包括100个真实场景，涵盖了美国连续的各种地形和土地类型，旨在反映多样化的真实环境。在每个场景中，RASPNet包括来自空中雷达设置的10,000个杂波实现，可用于雷达算法的开发和评估。RASPNet旨在填补现有大规模、真实数据集的不足之处，标准化自适应雷达处理技术评估。我们描述了其构建、组织和几种潜在应用，包括一个迁移学习示例，以展示RASPNet如何用于真实的自适应雷达处理场景。

更新时间: 2024-06-14 00:07:52

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.09638v1