Arxiv Day: Article

Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds

Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution $\mathcal{D}$, unlabeled samples from test distribution $\mathcal{D}'$, and the goal is to output a classifier with low error on $\mathcal{D}'$ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on $\mathcal{D}'$. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a $2^{(k/\epsilon)^{O(1)}} \mathsf{poly}(d)$-time algorithm for TDS learning intersections of $k$ homogeneous halfspaces to accuracy $\epsilon$ (prior work achieved $d^{(k/\epsilon)^{O(1)}}$). We work under the mild assumption that the Gaussian training distribution contains at least an $\epsilon$ fraction of both positive and negative examples ($\epsilon$-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the $\epsilon$-balanced assumption is necessary for $\mathsf{poly}(d,1/\epsilon)$-time TDS learning for a single halfspace and (2) a $d^{\tilde{\Omega}(\log 1/\epsilon)}$ lower bound for the intersection of two general halfspaces, even with the $\epsilon$-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature.

Updated: 2024-04-02 23:34:39

标题: 学习半空间的交集与分布转移：改进算法和SQ下界

摘要: Klivans, Stavropoulos, and Vasilyan recently began studying testable learning with distribution shift (TDS learning), where a learner receives labeled samples from a training distribution D, unlabeled samples from a test distribution D', and aims to produce a classifier with low error on D' when the training samples pass a corresponding test. Unlike previous work, their model does not make assumptions about D'. Instead, the test must accurately recognize when the marginals of the training and test distributions are equal. This study focuses on the basic case of intersections of halfspaces with Gaussian training distributions and establishes various new upper bounds, including a time algorithm of 2^(k/epsilon)^O(1) poly(d) for TDS learning intersections of k homogeneous halfspaces with accuracy epsilon (previous work achieved d^(k/epsilon)^O(1)). It is assumed that the Gaussian training distribution contains at least an epsilon fraction of both positive and negative examples (epsilon-balanced). Additionally, the first set of SQ lower bounds for any TDS learning problem is proven, showing that the epsilon-balanced assumption is necessary for poly(d, 1/epsilon) time TDS learning for a single halfspace and a d^tilde{Omega}(log 1/epsilon) lower bound for the intersection of two general halfspaces, even with the epsilon-balanced assumption. The techniques utilized in this study greatly enhance the tools available for TDS learning. Dimension reduction and coverings are employed to develop efficient algorithms for calculating a localized version of the discrepancy distance, an important metric in domain adaptation literature.

更新时间: 2024-04-02 23:34:39

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2404.02364v1

Unified Control Framework for Real-Time Interception and Obstacle Avoidance of Fast-Moving Objects with Diffusion Variational Autoencoder

Real-time interception of fast-moving objects by robotic arms in dynamic environments poses a formidable challenge due to the need for rapid reaction times, often within milliseconds, amidst dynamic obstacles. This paper introduces a unified control framework to address the above challenge by simultaneously intercepting dynamic objects and avoiding moving obstacles. Central to our approach is using diffusion-based variational autoencoder for motion planning to perform both object interception and obstacle avoidance. We begin by encoding the high-dimensional temporal information from streaming events into a two-dimensional latent manifold, enabling the discrimination between safe and colliding trajectories, culminating in the construction of an offline densely connected trajectory graph. Subsequently, we employ an extended Kalman filter to achieve precise real-time tracking of the moving object. Leveraging a graph-traversing strategy on the established offline dense graph, we generate encoded robotic motor control commands. Finally, we decode these commands to enable real-time motion of robotic motors, ensuring effective obstacle avoidance and high interception accuracy of fast-moving objects. Experimental validation on both computer simulations and autonomous 7-DoF robotic arms demonstrates the efficacy of our proposed framework. Results indicate the capability of the robotic manipulator to navigate around multiple obstacles of varying sizes and shapes while successfully intercepting fast-moving objects thrown from different angles by hand. Complete video demonstrations of our experiments can be found in https://sites.google.com/view/multirobotskill/home.

Updated: 2024-04-02 23:27:36

标题: 快速移动物体的实时拦截和避障的统一控制框架与扩散变分自动编码器

摘要: 在动态环境中实时拦截快速移动物体的机械臂面临着巨大挑战，因为需要在动态障碍物之间快速反应，通常在毫秒级内。本文介绍了一个统一的控制框架，通过同时拦截动态物体和避免移动障碍物来应对上述挑战。我们的方法的核心是使用基于扩散的变分自动编码器进行运动规划，以执行对象拦截和障碍物避免。我们首先将来自流事件的高维时态信息编码为二维潜在空间，使得可以区分安全和碰撞轨迹，最终构建离线密集连接轨迹图。随后，我们采用扩展卡尔曼滤波器来实现对移动物体的精确实时跟踪。利用在建立的离线密集图上的图遍历策略，我们生成编码的机器人电机控制命令。最后，我们解码这些命令以实现机器人电机的实时运动，确保有效避障和高效拦截快速移动物体。在计算机模拟和自主7自由度机械臂上的实验验证了我们提出的框架的有效性。结果表明，机器人操纵器能够成功地绕过各种大小和形状的障碍物，并成功地拦截从不同角度用手抛出的快速移动物体。我们实验的完整视频演示可以在https://sites.google.com/view/multirobotskill/home找到。

更新时间: 2024-04-02 23:27:36

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2209.13628v2

EnergAIze: Multi Agent Deep Deterministic Policy Gradient for Vehicle to Grid Energy Management

This paper investigates the increasing roles of Renewable Energy Sources (RES) and Electric Vehicles (EVs). While indicating a new era of sustainable energy, these also introduce complex challenges, including the need to balance supply and demand and smooth peak consumptions amidst rising EV adoption rates. Addressing these challenges requires innovative solutions such as Demand Response (DR), energy flexibility management, Renewable Energy Communities (RECs), and more specifically for EVs, Vehicle-to-Grid (V2G). However, existing V2G approaches often fall short in real-world adaptability, global REC optimization with other flexible assets, scalability, and user engagement. To bridge this gap, this paper introduces EnergAIze, a Multi-Agent Reinforcement Learning (MARL) energy management framework, leveraging the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. EnergAIze enables user-centric and multi-objective energy management by allowing each prosumer to select from a range of personal management objectives, thus encouraging engagement. Additionally, it architects' data protection and ownership through decentralized computing, where each prosumer can situate an energy management optimization node directly at their own dwelling. The local node not only manages local energy assets but also fosters REC wide optimization. The efficacy of EnergAIze was evaluated through case studies employing the CityLearn simulation framework. These simulations were instrumental in demonstrating EnergAIze's adeptness at implementing V2G technology within a REC and other energy assets. The results show reduction in peak loads, ramping, carbon emissions, and electricity costs at the REC level while optimizing for individual prosumers objectives.

Updated: 2024-04-02 23:16:17

标题: EnergAIze：车辆到电网能源管理的多智能体深度确定性策略梯度

摘要: 本文研究了可再生能源和电动汽车在能源领域中的日益增加的作用。这不仅预示着可持续能源的新时代，同时也引入了复杂的挑战，包括需要在不断增长的电动汽车普及率中平衡供需和平滑高峰消耗。解决这些挑战需要创新的解决方案，如需求响应（DR）、能源灵活性管理、可再生能源社区（RECs），以及更具体地针对电动汽车的车辆对电网（V2G）。然而，现有的V2G方法在现实世界中的适应能力、与其他灵活资产的全球REC优化、可扩展性和用户参与方面经常存在不足。为了弥合这一差距，本文介绍了EnergAIze，这是一个多智能体强化学习（MARL）能源管理框架，利用多智能体深度确定性策略梯度（MADDPG）算法。EnergAIze通过允许每个生产者选择一系列个人管理目标，从而鼓励参与，实现了以用户为中心和多目标能源管理。此外，它通过分散计算架构了数据保护和所有权，其中每个生产者可以将一个能源管理优化节点直接放置在自己的住所。本地节点不仅管理本地能源资产，还促进了REC的广泛优化。通过采用CityLearn模拟框架进行案例研究，评估了EnergAIze的有效性。这些模拟对于展示EnergAIze在REC和其他能源资产中实施V2G技术的熟练程度起到了关键作用。结果显示，在优化个体生产者目标的同时，REC水平的峰值负荷、斜坡、碳排放和电费都有所减少。

更新时间: 2024-04-02 23:16:17

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2404.02361v1

FraGNNet: A Deep Probabilistic Model for Mass Spectrum Prediction

The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted spectra. Unfortunately, many existing C2MS models suffer from problems with prediction resolution, scalability, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately predict high-resolution spectra. FraGNNet uses a structured latent space to provide insight into the underlying processes that define the spectrum. Our model achieves state-of-the-art performance in terms of prediction error, and surpasses existing C2MS models as a tool for retrieval-based MS2C.

Updated: 2024-04-02 23:16:15

标题: FraGNNet：一种用于质谱预测的深度概率模型

摘要: 从质谱图中识别化合物的过程是分析复杂混合物的关键步骤。典型的质谱图到化合物（MS2C）问题的解决方案涉及将未知谱与已知谱-分子对的库进行匹配，这种方法受限于库覆盖不完整。化合物到质谱图（C2MS）模型可以通过预测谱来提高检索率，从而扩展真实库。不幸的是，许多现有的C2MS模型存在预测分辨率、可扩展性或可解释性等问题。我们开发了一种新的用于C2MS预测的概率方法FraGNNet，可以高效准确地预测高分辨率谱。FraGNNet利用结构化的潜在空间来揭示定义谱的基本过程。我们的模型在预测误差方面实现了最先进的性能，并且作为基于检索的MS2C工具超越了现有的C2MS模型。

更新时间: 2024-04-02 23:16:15

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2404.02360v1

OSCaR: Object State Captioning and State Change Representation

The capability of intelligent models to extrapolate and comprehend changes in object states is a crucial yet demanding aspect of AI research, particularly through the lens of human interaction in real-world settings. This task involves describing complex visual environments, identifying active objects, and interpreting their changes as conveyed through language. Traditional methods, which isolate object captioning and state change detection, offer a limited view of dynamic environments. Moreover, relying on a small set of symbolic words to represent changes has restricted the expressiveness of the language. To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark. OSCaR consists of 14,084 annotated video segments with nearly 1,000 unique objects from various egocentric video collections. It sets a new testbed for evaluating multimodal large language models (MLLMs). Our experiments demonstrate that while MLLMs show some skill, they lack a full understanding of object state changes. The benchmark includes a fine-tuned model that, despite initial capabilities, requires significant improvements in accuracy and generalization ability for effective understanding of these changes. Our code and dataset are available at https://github.com/nguyennm1024/OSCaR.

Updated: 2024-04-02 23:14:42

标题: OSCaR：对象状态字幕和状态变化表示

摘要: 智能模型在对物体状态的变化进行外推和理解的能力是人工智能研究中至关重要但具有挑战性的方面，特别是通过人类在现实世界环境中的互动视角。这项任务涉及描述复杂的视觉环境，识别活跃物体，并通过语言解释它们的变化。传统方法，将物体标题和状态改变检测分离开来，提供了对动态环境的有限视图。此外，依赖一小组符号词来表示变化已经限制了语言的表达能力。为了解决这些挑战，在本文中，我们介绍了Object State Captioning and State Change Representation (OSCaR)数据集和基准。OSCaR包含来自各种主观视频收藏品的14,084个带注释的视频片段，近1000个独特的对象。它为评估多模态大语言模型（MLLMs）提供了一个新的测试平台。我们的实验证明，虽然MLLMs展示了一些技能，但它们缺乏对对象状态变化的全面理解。该基准包括一个经过微调的模型，尽管具有初步的能力，但需要在准确性和泛化能力方面有重大改进，以有效理解这些变化。我们的代码和数据集可在https://github.com/nguyennm1024/OSCaR找到。

更新时间: 2024-04-02 23:14:42

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.17128v4

Attribution Regularization for Multimodal Paradigms

Multimodal machine learning has gained significant attention in recent years due to its potential for integrating information from multiple modalities to enhance learning and decision-making processes. However, it is commonly observed that unimodal models outperform multimodal models, despite the latter having access to richer information. Additionally, the influence of a single modality often dominates the decision-making process, resulting in suboptimal performance. This research project aims to address these challenges by proposing a novel regularization term that encourages multimodal models to effectively utilize information from all modalities when making decisions. The focus of this project lies in the video-audio domain, although the proposed regularization technique holds promise for broader applications in embodied AI research, where multiple modalities are involved. By leveraging this regularization term, the proposed approach aims to mitigate the issue of unimodal dominance and improve the performance of multimodal machine learning systems. Through extensive experimentation and evaluation, the effectiveness and generalizability of the proposed technique will be assessed. The findings of this research project have the potential to significantly contribute to the advancement of multimodal machine learning and facilitate its application in various domains, including multimedia analysis, human-computer interaction, and embodied AI research.

Updated: 2024-04-02 23:05:56

标题: 多模态范式的归因规范化

摘要: 多模态机器学习近年来引起了广泛关注，因为它有潜力整合多个模态的信息以增强学习和决策过程。然而，通常观察到单模型优于多模型，尽管后者可以获得更丰富的信息。此外，一个单一模态的影响往往主导决策过程，导致性能不佳。本研究项目旨在通过提出一种新颖的正则化项来解决这些挑战，鼓励多模态模型在做出决策时有效利用所有模态的信息。本项目的重点在于视频-音频领域，尽管所提出的正则化技术在涉及多个模态的体现AI研究中也具有潜力。通过利用这种正则化项，所提出的方法旨在减轻单模态主导的问题，并提高多模态机器学习系统的性能。通过广泛的实验和评估，将评估所提出技术的有效性和普适性。本研究项目的发现有可能显著促进多模态机器学习的进展，并促进其在各个领域的应用，包括多媒体分析、人机交互和体现AI研究。

更新时间: 2024-04-02 23:05:56

领域: cs.LG

下载: http://arxiv.org/abs/2404.02359v1

From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

Language models are often at risk of diverse backdoor attacks, especially data poisoning. Thus, it is important to investigate defense solutions for addressing them. Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers, leaving a universal defense against various backdoor attacks with diverse triggers largely unexplored. In this paper, we propose an end-to-end ensemble-based backdoor defense framework, DPoE (Denoised Product-of-Experts), which is inspired by the shortcut nature of backdoor attacks, to defend various backdoor attacks. DPoE consists of two models: a shallow model that captures the backdoor shortcuts and a main model that is prevented from learning the backdoor shortcuts. To address the label flip caused by backdoor attackers, DPoE incorporates a denoising design. Experiments on SST-2 dataset show that DPoE significantly improves the defense performance against various types of backdoor triggers including word-level, sentence-level, and syntactic triggers. Furthermore, DPoE is also effective under a more challenging but practical setting that mixes multiple types of trigger.

Updated: 2024-04-02 23:01:17

标题: 从快捷方式到触发器：使用去噪 PoE 进行后门防御

摘要: 语言模型经常面临多种后门攻击的风险，尤其是数据污染。因此，研究针对这些攻击的防御解决方案至关重要。现有的后门防御方法主要集中在具有明确触发器的后门攻击上，对于各种触发器的后门攻击的普遍防御方法尚未得到充分探索。在本文中，我们提出了一个端到端的基于集成的后门防御框架DPoE（去噪专家乘积），灵感来自于后门攻击的快捷特性，用于防御各种后门攻击。DPoE由两个模型组成：一个捕捉后门快捷方式的浅层模型，以及一个被阻止学习后门快捷方式的主模型。为了解决后门攻击者造成的标签翻转问题，DPoE采用了一个去噪设计。在SST-2数据集上的实验表明，DPoE显著提高了对各种类型的后门触发器（包括单词级、句子级和句法触发器）的防御性能。此外，DPoE在混合多种类型触发器的更具挑战性但实用的设置下也表现出了有效性。

更新时间: 2024-04-02 23:01:17

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2305.14910v3

Semantic Augmentation in Images using Language

Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to train these diffusion models, we propose a technique to utilize generated images to augment existing datasets. This paper explores various strategies for effective data augmentation to improve the out-of-domain generalization capabilities of deep learning models.

Updated: 2024-04-02 22:54:24

标题: 使用语言进行图像的语义增强

摘要: 深度学习模型对数据需求很大，需要非常庞大的带标签数据集进行监督学习。因此，这些模型经常受到过拟合的困扰，限制了它们泛化到真实世界示例的能力。最近扩散模型的进展使基于文本输入生成逼真的图像成为可能。利用用于训练这些扩散模型的大量数据集，我们提出了一种利用生成图像来增强现有数据集的技术。本文探讨了各种有效数据增强策略，以提高深度学习模型的跨领域泛化能力。

更新时间: 2024-04-02 22:54:24

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.02353v1

Budget Recycling Differential Privacy

Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.

Updated: 2024-04-02 22:50:16

标题: 预算回收差分隐私

摘要: 差分隐私（DP）机制通常通过为紧密隐私预算产生“超出边界”的嘈杂结果来迫使数据效用降低。我们引入了预算回收差分隐私（BR-DP）框架，旨在为广泛范围的现有DP机制提供软边界的嘈杂输出。所谓“软边界”，指的是机制释放大多数输出在预定义的误差边界内，从而改善效用并同时保持隐私。BR-DP的核心包括两个组件：负责每次迭代生成嘈杂答案的DP内核，以及以概率方式回收/再生或释放嘈杂答案的回收器。我们深入研究了BR-DP的隐私核算，最终发展出了一种在DP内核和回收器之间最优地分配可用预算的预算原则。此外，我们介绍了在组合情景下进行紧密BR-DP核算的算法，并发现与DP相比，BR-DP在组合后实现了减少的隐私泄漏。此外，我们探讨了在BR-DP框架内通过子抽样进行隐私放大的概念，并为BR-DP在各种查询中提出了最佳抽样率。我们在真实数据上进行了实验，结果表明BR-DP有效地提升了DP机制提供的效用-隐私权衡。

更新时间: 2024-04-02 22:50:16

领域: cs.CR,cs.DS,eess.SP

下载: http://arxiv.org/abs/2403.11445v2

On-Demand Sampling: Learning Optimally from Multiple Distributions

Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative learning, group distributionally robust optimization, and fair federated learning. In each of these settings, a learner seeks to uniformly minimize its expected loss over $n$ predefined data distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Importantly, our sample complexity bounds for multi-distribution learning exceed that of learning a single distribution by only an additive factor of $n \log(n) / \epsilon^2$. This improves upon the best known sample complexity bounds for fair federated learning by Mohri et al. and collaborative learning by Nguyen and Zakynthinou by multiplicative factors of $n$ and $\log(n)/\epsilon^3$, respectively. We also provide the first sample complexity bounds for the group DRO objective of Sagawa et al. To guarantee these optimal sample complexity bounds, our algorithms learn to sample from data distributions on demand. Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving stochastic zero-sum games. In particular, we contribute stochastic variants of no-regret dynamics that can trade off between players' differing sampling costs.

Updated: 2024-04-02 22:48:13

标题: 按需取样：从多个分布中最优地学习

摘要: 社会和现实考虑因素，如健壮性、公平性、社会福利和多智能体权衡，促使了多分布学习范式的出现，例如协作学习、组分布鲁棒优化和公平联邦学习。在每种设置中，学习者试图在预定义的$n$个数据分布中均匀减小其期望损失，同时尽可能少地使用样本。在本文中，我们建立了这些学习范式的最优样本复杂性，并给出了满足这一样本复杂性的算法。重要的是，我们对多分布学习的样本复杂性界限仅比学习单一分布的学习复杂性界限高出一个加性因子$n \log(n)/\epsilon^2$。这对于Mohri等人提出的公平联邦学习和Nguyen和Zakynthinou提出的协作学习的最佳已知样本复杂性界限分别提高了$n$和$\log(n)/\epsilon^3$的乘法因子。我们还为Sagawa等人提出的组DRO目标提供了首个样本复杂性界限。为了保证这些最优样本复杂度界限，我们的算法学会根据需求从数据分布中抽样。我们的算法设计和分析是通过我们对在线学习技术的扩展来实现的，用于解决随机零和游戏。特别是，我们贡献了可以在不同玩家的抽样成本之间权衡的无后悔动力学的随机变体。

更新时间: 2024-04-02 22:48:13

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2210.12529v3

Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization

Synthetic image data generation represents a promising avenue for training deep learning models, particularly in the realm of transfer learning, where obtaining real images within a specific domain can be prohibitively expensive due to privacy and intellectual property considerations. This work delves into the generation and utilization of synthetic images derived from text-to-image generative models in facilitating transfer learning paradigms. Despite the high visual fidelity of the generated images, we observe that their naive incorporation into existing real-image datasets does not consistently enhance model performance due to the inherent distribution gap between synthetic and real images. To address this issue, we introduce a novel two-stage framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability and subsequently uses real data for rapid adaptation. Alongside, We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements, with up to 30% accuracy increase on classification tasks. Intriguingly, we note that the enhancements were not yet saturated, indicating that the benefits may further increase with an expanded volume of synthetic data.

Updated: 2024-04-02 22:41:53

标题: 合成图像在迁移学习中是否有用？对数据生成、容量和利用的调查

摘要: 合成图像数据生成代表了一个有希望的训练深度学习模型的途径，特别是在迁移学习领域，其中由于隐私和知识产权考虑，获得特定领域内的真实图像可能代价高昂。本文深入探讨了从文本到图像生成模型中衍生出的合成图像的生成和利用，以促进迁移学习范式。尽管生成图像具有高视觉保真度，但我们观察到，将其天真地纳入现有真实图像数据集并不能始终提升模型性能，这是由于合成图像与真实图像之间固有的分布差距。为了解决这个问题，我们引入了一个名为桥接迁移的新颖的两阶段框架，该框架首先利用合成图像对预训练模型进行微调以提高其可迁移性，然后使用真实数据进行快速适应。同时，我们提出了数据集样式反转策略，以改善合成图像和真实图像之间的风格对齐。我们提出的方法在10个不同数据集和5个不同模型上进行评估，展示了一致的改进，分类任务的准确率提高了30%。有趣的是，我们注意到这些改进尚未饱和，这表明随着合成数据量的增加，好处可能会进一步增加。

更新时间: 2024-04-02 22:41:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.19866v2

Effective Malware Detection for Embedded Computing Systems with Limited Exposure

One of the pivotal security threats for the embedded computing systems is malicious software a.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require a tremendous number of benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation technique that generates multiple mutated samples of the limitedly seen malware by the devices. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware and mitigate the impractical samples. Such developed malware is further incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited exposure. The experimental results demonstrates that the proposed technique achieves an accuracy of 90% in detecting limitedly seen malware, which is approximately 3x more than the accuracy attained by state-of-the-art techniques.

Updated: 2024-04-02 22:37:34

标题: 有限曝光的嵌入式计算系统的有效恶意软件检测

摘要: 嵌入式计算系统的一个关键安全威胁是恶意软件，也称为恶意软件。近年来，机器学习（ML）以其高效性和有效性被广泛应用于恶意软件检测。尽管现有技术效率高，但需要大量良性和恶意样本用于训练和建模高效的恶意软件检测器。此外，这些限制限制了对新兴恶意软件样本的检测，因为缺乏足够的恶意样本来进行有效的训练。为了解决这些问题，我们引入了一种代码感知的数据生成技术，通过生成设备有限见的恶意软件的多个变异样本。通过损失最小化，确保生成的样本与有限见的恶意软件密切模拟并减轻不切实际的样本。这些开发的恶意软件进一步纳入训练集，以制定能够有效检测新兴恶意软件的模型，尽管其暴露有限。实验结果表明，提出的技术在检测有限见的恶意软件方面达到了90%的准确率，比最先进技术获得的准确率大约高出3倍。

更新时间: 2024-04-02 22:37:34

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2404.02344v1

Improved model-free bounds for multi-asset options using option-implied information and deep learning

We consider the computation of model-free bounds for multi-asset options in a setting that combines dependence uncertainty with additional information on the dependence structure. More specifically, we consider the setting where the marginal distributions are known and partial information, in the form of known prices for multi-asset options, is also available in the market. We provide a fundamental theorem of asset pricing in this setting, as well as a superhedging duality that allows to transform the maximization problem over probability measures in a more tractable minimization problem over trading strategies. The latter is solved using a penalization approach combined with a deep learning approximation using artificial neural networks. The numerical method is fast and the computational time scales linearly with respect to the number of traded assets. We finally examine the significance of various pieces of additional information. Empirical evidence suggests that "relevant" information, i.e. prices of derivatives with the same payoff structure as the target payoff, are more useful that other information, and should be prioritized in view of the trade-off between accuracy and computational efficiency.

Updated: 2024-04-02 22:37:22

标题: 使用期权隐含信息和深度学习改进的无模型多资产期权边界

摘要: 我们考虑在将依赖不确定性与对依赖结构的额外信息相结合的情况下计算多资产期权的无模型边界。更具体地说，我们考虑边际分布已知且市场上也有多资产期权价格的已知价格形式的部分信息的情景。我们提供了在这种情况下资产定价的基本定理，以及一种超对冲二元性，允许将概率测度上的最大化问题转化为更易处理的交易策略上的最小化问题。后者使用惩罚方法结合深度学习逼近，利用人工神经网络进行求解。数值方法快速，计算时间与交易资产数量成线性关系。最后，我们考察各种额外信息的重要性。经验证据表明，“相关”信息，即具有与目标回报结构相同的衍生品价格，比其他信息更有用，应在精度和计算效率之间的权衡中优先考虑。

更新时间: 2024-04-02 22:37:22

领域: q-fin.PR,cs.LG,math.OC,stat.ML,91G20, 91G60, 68T07

下载: http://arxiv.org/abs/2404.02343v1

Gemini: A Family of Highly Capable Multimodal Models

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.

Updated: 2024-04-02 22:35:21

标题: 双子座：一系列高性能多模型家族

摘要: 这份报告介绍了一种新的多模型系列Gemini，展示了在图像、音频、视频和文本理解方面具有显著能力的特点。Gemini系列包括Ultra、Pro和Nano三种规格，适用于从复杂推理任务到设备内存受限的用例。在广泛的基准测试中，我们最强大的Gemini Ultra模型在32个基准测试中有30个取得了技术进步，特别是作为第一个在广泛研究的考试基准MMLU上实现人类专家表现的模型，并在我们检查的每一个20个多模式基准测试中改进了技术水平。我们相信Gemini系列在跨模态推理和语言理解方面的新能力将能够应用于多种用例。我们讨论了我们的方法，即通过包括Gemini、Gemini Advanced、Google AI Studio和Cloud Vertex AI在内的服务，负责地进行后训练和部署Gemini模型给用户。

更新时间: 2024-04-02 22:35:21

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2312.11805v2

Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation

The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings. These challenges are also visible in the Persian name entity recognition (NER) settings. Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations. Single models often struggle to capture the nuances of diverse domains, while utilizing multiple large models can lead to resource constraints, rendering the training of a model for each domain virtually impractical. Therefore, this paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters. We utilize techniques such as prompt tuning and adapters, combined with the incorporation of additional layers, to add parameters that we can train for the specific domains. This enables the model to perform comparably to individual models for each domain. Experimental results on different formal and informal datasets show that by employing these added parameters, the proposed model significantly surpasses existing practical models in performance. Remarkably, the proposed model requires only one instance for training and storage, yet achieves outstanding results across all domains, even surpassing the state-of-the-art in some. Moreover, we analyze each adaptation strategy, delineating its strengths, weaknesses, and optimal hyper-parameters for the Persian NER settings. Finally, we introduce a document-based domain detection pipeline tailored for scenarios with unknown text domains, enhancing the adaptability and practicality of this paper in real-world applications.

Updated: 2024-04-02 22:15:48

标题: Multi-BERT：利用适配器和提示调整进行低资源多领域适应

摘要: 文本数量和多样性的快速扩张在多领域环境中提出了巨大挑战。这些挑战在波斯语命名实体识别（NER）设置中也是明显的。传统方法，无论是采用统一模型用于多个领域还是为每个领域使用单独的模型，经常会带来重大限制。单一模型通常难以捕捉不同领域的微妙差异，而使用多个大型模型可能会导致资源限制，使得为每个领域训练模型几乎不可行。因此，本文介绍了一种由一个核心模型和多组领域特定参数组成的新方法。我们利用提示调整和适配器等技术，结合引入额外层次的方式，添加我们可以为特定领域训练的参数。这使得模型能够在性能上与每个领域的单独模型相媲美。对不同正式和非正式数据集的实验结果表明，通过使用这些额外的参数，所提出的模型在性能上显著超越了现有的实用模型。值得注意的是，所提出的模型只需要一个实例进行训练和存储，却在所有领域都取得了出色的结果，甚至在某些领域超过了最先进的技术。此外，我们分析了每种适应策略，描述了其优点、缺点和波斯语NER设置的最佳超参数。最后，我们介绍了一种基于文档的领域检测管道，专为具有未知文本领域的情景定制，增强了该论文在实际应用中的适应性和实用性。

更新时间: 2024-04-02 22:15:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02335v1

Comparative Study of Domain Driven Terms Extraction Using Large Language Models

Keywords play a crucial role in bridging the gap between human understanding and machine processing of textual data. They are essential to data enrichment because they form the basis for detailed annotations that provide a more insightful and in-depth view of the underlying data. Keyword/domain driven term extraction is a pivotal task in natural language processing, facilitating information retrieval, document summarization, and content categorization. This review focuses on keyword extraction methods, emphasizing the use of three major Large Language Models(LLMs): Llama2-7B, GPT-3.5, and Falcon-7B. We employed a custom Python package to interface with these LLMs, simplifying keyword extraction. Our study, utilizing the Inspec and PubMed datasets, evaluates the performance of these models. The Jaccard similarity index was used for assessment, yielding scores of 0.64 (Inspec) and 0.21 (PubMed) for GPT-3.5, 0.40 and 0.17 for Llama2-7B, and 0.23 and 0.12 for Falcon-7B. This paper underlines the role of prompt engineering in LLMs for better keyword extraction and discusses the impact of hallucination in LLMs on result evaluation. It also sheds light on the challenges in using LLMs for keyword extraction, including model complexity, resource demands, and optimization techniques.

Updated: 2024-04-02 22:04:51

标题: 基于大型语言模型的领域驱动术语提取的比较研究

摘要: 关键词在人类理解与机器处理文本数据之间起着至关重要的桥梁作用。它们对数据丰富化至关重要，因为它们构成了提供更深入和深刻视角的详细注释的基础。关键词/领域驱动的术语提取是自然语言处理中的一个关键任务，有助于信息检索、文档摘要和内容分类。本综述聚焦于关键词提取方法，强调三种主要的大型语言模型(LLMs)的使用：Llama2-7B、GPT-3.5和Falcon-7B。我们使用自定义的Python包与这些LLMs进行接口，简化了关键词提取。我们利用Inspec和PubMed数据集，评估了这些模型的性能。使用Jaccard相似性指数进行评估，得分分别为0.64（Inspec）和0.21（PubMed）的GPT-3.5，0.40和0.17的Llama2-7B，以及0.23和0.12的Falcon-7B。本文强调了在LLMs中进行提示工程以实现更好的关键词提取的作用，并讨论了LLMs中幻觉对结果评估的影响。它还揭示了在使用LLMs进行关键词提取时面临的挑战，包括模型复杂性、资源需求和优化技术。

更新时间: 2024-04-02 22:04:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02330v1

Heat Death of Generative Models in Closed-Loop Learning

Improvement and adoption of generative machine learning models is rapidly accelerating, as exemplified by the popularity of LLMs (Large Language Models) for text, and diffusion models for image generation.As generative models become widespread, data they generate is incorporated into shared content through the public web. This opens the question of what happens when data generated by a model is fed back to the model in subsequent training campaigns. This is a question about the stability of the training process, whether the distribution of publicly accessible content, which we refer to as "knowledge", remains stable or collapses. Small scale empirical experiments reported in the literature show that this closed-loop training process is prone to degenerating. Models may start producing gibberish data, or sample from only a small subset of the desired data distribution (a phenomenon referred to as mode collapse). So far there has been only limited theoretical understanding of this process, in part due to the complexity of the deep networks underlying these generative models. The aim of this paper is to provide insights into this process (that we refer to as "generative closed-loop learning") by studying the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset. The sampling of many of these models can be controlled via a "temperature" parameter. Using dynamical systems tools, we show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to asymptotically degenerate. In fact, either the generative distribution collapses to a small set of outputs, or becomes uniform over a large set of outputs.

Updated: 2024-04-02 21:51:39

标题: 封闭环学习中生成模型的热死亡

摘要: 生成式机器学习模型的改进和采用正在迅速加速，正如LLMs（大型语言模型）在文本中的流行以及图像生成的扩散模型所示。随着生成模型变得普及，它们生成的数据被整合到公共网络内容中。这引发了一个问题，即当由模型生成的数据在随后的训练活动中反馈给模型时会发生什么。这是关于训练过程的稳定性的问题，即公开可访问内容的分布，我们将其称为“知识”，是否保持稳定或崩溃。文献中报道的小规模实证实验表明，这种闭环训练过程容易退化。模型可能开始生成无意义的数据，或者仅从所需数据分布的一小部分中抽样（这种现象称为模式坍缩）。到目前为止，由于这些生成模型基础的深度网络的复杂性，对这一过程只有有限的理论理解。本文的目的是通过研究将生成模型反馈其自身生成内容以及原始训练数据集的学习动态（我们称之为“生成闭环学习”）来提供关于这一过程的见解。许多这些模型的采样可以通过“温度”参数进行控制。利用动力系统工具，我们表明，除非在每次迭代中引入足够数量的外部数据，否则任何非平凡温度都会导致模型渐近退化。实际上，生成分布要么崩溃为一小组输出，要么变得均匀地分布在一大组输出上。

更新时间: 2024-04-02 21:51:39

领域: cs.LG

下载: http://arxiv.org/abs/2404.02325v1

Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization

Large language models (LLMs) can now handle longer and more complex inputs, which facilitate the use of more elaborate prompts. However, prompts often require some tuning to improve performance for deployment. Recent work has proposed automatic prompt optimization methods, but as prompt complexity and LLM strength increase, many prompt optimization techniques are no longer sufficient and a new approach is needed to optimize {\em meta prompt programs}. To address this, we introduce SAMMO, a framework for {\em compile-time} optimizations of metaprompt programs, which represent prompts as structured objects that allows for a rich set of transformations that can be searched over during optimization. We show that SAMMO generalizes previous methods and improves the performance of complex prompts on (1) instruction tuning, (2) RAG pipeline tuning, and (3) prompt compression, across several different LLMs. We make all code available open-source at https://github.com/microsoft/sammo .

Updated: 2024-04-02 21:35:54

标题: 提示作为程序：一种结构感知的方法来进行高效的编译时提示优化

摘要: 大型语言模型(LLMs)现在可以处理更长、更复杂的输入，这有助于使用更精心设计的提示。然而，提示通常需要一些调整以提高部署性能。最近的工作提出了自动提示优化方法，但随着提示复杂度和LLM强度的增加，许多提示优化技术已不再足够，需要一种新的方法来优化元提示程序。为了解决这个问题，我们引入了SAMMO，一个用于元提示程序的{\em 编译时}优化的框架，它将提示表示为结构化对象，允许在优化过程中搜索一系列丰富的转换。我们展示了SAMMO泛化了以前的方法，并改善了在多个不同的LLMs上复杂提示的性能，包括指令调整、RAG管道调整和提示压缩。我们将所有代码开源发布在https://github.com/microsoft/sammo。

更新时间: 2024-04-02 21:35:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.02319v1

Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications

Recent literature has suggested the potential of using large language models (LLMs) to make classifications for tabular tasks. However, LLMs have been shown to exhibit harmful social biases that reflect the stereotypes and inequalities present in society. To this end, as well as the widespread use of tabular data in many high-stake applications, it is important to explore the following questions: what sources of information do LLMs draw upon when making classifications for tabular tasks; whether and to what extent are LLM classifications for tabular data influenced by social biases and stereotypes; and what are the consequential implications for fairness? Through a series of experiments, we delve into these questions and show that LLMs tend to inherit social biases from their training data which significantly impact their fairness in tabular classification tasks. Furthermore, our investigations show that in the context of bias mitigation, though in-context learning and finetuning have a moderate effect, the fairness metric gap between different subgroups is still larger than that in traditional machine learning models, such as Random Forest and shallow Neural Networks. This observation emphasizes that the social biases are inherent within the LLMs themselves and inherited from their pretraining corpus, not only from the downstream task datasets. Besides, we demonstrate that label-flipping of in-context examples can significantly reduce biases, further highlighting the presence of inherent bias within LLMs.

Updated: 2024-04-02 21:29:20

标题: 面对LLMs与传统ML：重新思考大型语言模型在表格分类中的公平性

摘要: 最近的文献表明，使用大型语言模型（LLMs）对表格任务进行分类具有潜力。然而，LLMs已经被证明具有反映社会刻板印象和不平等的有害社会偏见。为此，以及在许多高风险应用中广泛使用表格数据，探索以下问题至关重要：LLMs在进行表格任务分类时依靠哪些信息源；LLMs对表格数据进行分类是否受到社会偏见和刻板印象的影响程度；对公平性有何重要后果？通过一系列实验，我们深入探讨这些问题，并展示LLMs往往会继承其训练数据中的社会偏见，这显著影响了它们在表格分类任务中的公平性。此外，我们的调查显示，在偏见缓解的背景下，尽管上下文学习和微调有一定效果，但不同子群体之间的公平性指标差距仍然大于传统机器学习模型（如随机森林和浅层神经网络）。这一观察强调了社会偏见在LLMs内在且从它们的预训练语料库中继承，而不仅仅来自下游任务数据集。此外，我们证明，在上下文示例中翻转标签可以显著减少偏见，进一步突显了LLMs内在偏见的存在。

更新时间: 2024-04-02 21:29:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.14607v2

Time Series Analysis in Compressor-Based Machines: A Survey

In both industrial and residential contexts, compressor-based machines, such as refrigerators, HVAC systems, heat pumps and chillers, are essential to fulfil production and consumers' needs. The diffusion of sensors and IoT connectivity supports the development of monitoring systems that can detect and predict faults, identify behavioural shifts and forecast the operational status of machines and their components. The focus of this paper is to survey the recent research on such tasks as FD, FP, Forecasting and CPD applied to multivariate time series characterizing the operations of compressor-based machines. These tasks play a critical role in improving the efficiency and longevity of machines by minimizing downtime and maintenance costs and improving the energy efficiency. Specifically, FD detects and diagnoses faults, FP predicts such occurrences, forecasting anticipates the future value of characteristic variables of machines and CPD identifies significant variations in the behaviour of the appliances, such as a change in the working regime. We identify and classify the approaches to the tasks mentioned above, compare the algorithms employed, highlight the gaps in the current status of the art and discuss the most promising future research directions in the field.

Updated: 2024-04-02 21:26:01

标题: 基于压缩机的机器中的时间序列分析：一项调查

摘要: 在工业和住宅环境中，基于压缩机的机器，如冰箱、暖通空调系统、热泵和冷却器，对满足生产和消费者的需求至关重要。传感器和物联网连接的普及支持监测系统的发展，这些系统可以检测和预测故障，识别行为变化，并预测机器及其组件的运行状态。本文的重点是调查最近对基于压缩机的机器操作进行特征化的多变量时间序列的FD、FP、预测和CPD等任务的研究。这些任务在通过最小化停机时间和维护成本以及提高能源效率来改善机器的效率和寿命方面发挥关键作用。具体而言，FD可检测和诊断故障，FP预测此类事件，预测预测机器的特征变量未来值，CPD识别设备行为中的显著变化，如工作制度的变化。我们识别和分类了上述任务的方法，比较了所采用的算法，突出了当前艺术现状中的差距，并讨论了该领域最有前途的未来研究方向。

更新时间: 2024-04-02 21:26:01

领域: cs.LG

下载: http://arxiv.org/abs/2402.17802v2

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance. As a result LLM-generated testsuites still suffer from low coverage. In this paper, we present SymPrompt, a code-aware prompting strategy for LLMs in test generation. SymPrompt's approach is based on recent work that demonstrates LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion. We apply this methodology to test generation by deconstructing the testsuite generation process into a multi-stage sequence, each of which is driven by a specific prompt aligned with the execution paths of the method under test, and exposing relevant type and dependency focal context to the model. Our approach enables pretrained LLMs to generate more complete test cases without any additional training. We implement SymPrompt using the TreeSitter parsing framework and evaluate on a benchmark challenging methods from open source Python projects. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.

Updated: 2024-04-02 21:23:03

标题: 代码感知提示：使用LLM在回归设置中的覆盖率引导测试生成研究

摘要: 测试在确保软件质量方面发挥着关键作用，然而传统的基于搜索的软件测试（SBST）方法常常在复杂的软件单元上遇到困难，实现的测试覆盖率不佳。最近利用大型语言模型（LLMs）进行测试生成的研究主要集中在通过优化测试生成上下文和纠正模型输出中的错误来改善生成质量，但使用固定的提示策略促使模型生成测试而没有额外的指导。因此，LLM生成的测试套件仍然存在覆盖率低的问题。在本文中，我们提出了SymPrompt，这是一种面向LLMs的代码感知提示策略，在测试生成中使用。SymPrompt的方法基于最近的研究，表明当提示LLMs以多步骤方式思考问题时，LLMs可以解决更复杂的逻辑问题。我们将这种方法应用于测试生成，通过将测试套件生成过程拆分成多阶段序列，每个阶段由与待测试方法的执行路径对齐的特定提示驱动，并向模型暴露相关的类型和依赖关系上下文。我们的方法使预训练的LLMs能够生成更完整的测试用例，而无需额外的训练。我们使用TreeSitter解析框架实现了SymPrompt，并在挑战来自开源Python项目的基准方法上进行了评估。SymPrompt将正确的测试生成提升了5倍，并将CodeGen2的相对覆盖率提升了26%。值得注意的是，将SymPrompt应用于GPT-4时，相对于基准提示策略，覆盖率提高了两倍以上。

更新时间: 2024-04-02 21:23:03

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2402.00097v2

Is Meta-training Really Necessary for Molecular Few-Shot Learning ?

Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving convoluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.

Updated: 2024-04-02 21:20:51

标题: 元训练对于分子少样本学习真的必要吗？

摘要: 最近，少样本学习在药物发现领域引起了广泛关注，最近迅速增长的文献主要涉及复杂的元学习策略。我们重新审视了更为直接的分子数据微调方法，并提出了基于马氏距离的正则化二次探针损失。我们设计了一个专门的块坐标下降优化器，避免了我们损失的退化解。有趣的是，我们简单的微调方法在与最先进方法的比较中取得了极具竞争力的表现，同时适用于黑盒设置，并消除了特定情节预训练策略的需求。此外，我们引入了一个新的基准来评估竞争方法对领域转移的稳健性。在这种情况下，我们的微调基线始终比元学习方法获得更好的结果。

更新时间: 2024-04-02 21:20:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.02314v1

Will My Robot Achieve My Goals? Predicting the Probability that an MDP Policy Reaches a User-Specified Behavior Target

As an autonomous system performs a task, it should maintain a calibrated estimate of the probability that it will achieve the user's goal. If that probability falls below some desired level, it should alert the user so that appropriate interventions can be made. This paper considers settings where the user's goal is specified as a target interval for a real-valued performance summary, such as the cumulative reward, measured at a fixed horizon $H$. At each time $t \in \{0, \ldots, H-1\}$, our method produces a calibrated estimate of the probability that the final cumulative reward will fall within a user-specified target interval $[y^-,y^+].$ Using this estimate, the autonomous system can raise an alarm if the probability drops below a specified threshold. We compute the probability estimates by inverting conformal prediction. Our starting point is the Conformalized Quantile Regression (CQR) method of Romano et al., which applies split-conformal prediction to the results of quantile regression. CQR is not invertible, but by using the conditional cumulative distribution function (CDF) as the non-conformity measure, we show how to obtain an invertible modification that we call Probability-space Conformalized Quantile Regression (PCQR). Like CQR, PCQR produces well-calibrated conditional prediction intervals with finite-sample marginal guarantees. By inverting PCQR, we obtain guarantees for the probability that the cumulative reward of an autonomous system will fall below a threshold sampled from the marginal distribution of the response variable (i.e., a calibrated CDF estimate) that we employ to predict coverage probabilities for user-specified target intervals. Experiments on two domains confirm that these probabilities are well-calibrated.

Updated: 2024-04-02 21:15:23

标题: 我的机器人能实现我的目标吗？预测一个MDP策略达到用户指定行为目标的概率

摘要: 随着自主系统执行任务，它应该保持对实现用户目标的概率的校准估计。如果该概率低于某个期望水平，它应该警告用户，以便采取适当的干预措施。本文考虑用户目标被指定为实值性能摘要的目标区间的情况，例如在固定的时段$H$内测得的累积奖励。在每个时间点$t \in \{0, \ldots, H-1\}$，我们的方法产生一个校准估计，即最终累积奖励将落在用户指定的目标区间$[y^-,y^+]$内的概率。利用这个估计，自主系统可以在概率低于指定阈值时发出警报。我们通过反演符合预测来计算概率估计。我们的出发点是Romano等人的Conformalized Quantile Regression (CQR)方法，该方法将分裂一致预测应用于分位回归的结果。CQR不可逆，但通过使用条件累积分布函数（CDF）作为非一致性度量，我们展示了如何获得一个可逆的修改，我们称之为Probability-space Conformalized Quantile Regression (PCQR)。像CQR一样，PCQR产生经过良好校准的条件预测区间，并具有有限样本边际保证。通过反演PCQR，我们为自主系统的累积奖励低于从响应变量的边际分布中抽样的阈值提供了保证（即我们用于预测用户指定目标区间的校准CDF估计）的概率。在两个领域的实验证实了这些概率是校准的。

更新时间: 2024-04-02 21:15:23

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2211.16462v2

Collapse of Self-trained Language Models

In various fields of knowledge creation, including science, new ideas often build on pre-existing information. In this work, we explore this concept within the context of language models. Specifically, we explore the potential of self-training models on their own outputs, akin to how humans learn and build on their previous thoughts and actions. While this approach is intuitively appealing, our research reveals its practical limitations. We find that extended self-training of the GPT-2 model leads to a significant degradation in performance, resulting in repetitive and collapsed token output.

Updated: 2024-04-02 21:03:37

标题: 自训练语言模型的崩溃

摘要: 在包括科学在内的各个知识领域中，新的想法通常是建立在现有信息的基础上的。在这项工作中，我们探讨了这一概念在语言模型中的应用。具体而言，我们探讨了自我训练模型在其自身输出上的潜力，类似于人类学习和建立在他们先前的思想和行动基础上。虽然这种方法在直觉上很有吸引力，但我们的研究揭示了其实际限制。我们发现对GPT-2模型进行延长的自我训练会导致性能显著下降，结果是重复和折叠的令牌输出。

更新时间: 2024-04-02 21:03:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02305v1

Virtual Sensor for Real-Time Bearing Load Prediction Using Heterogeneous Temporal Graph Neural Networks

Accurate bearing load monitoring is essential for their Prognostics and Health Management (PHM), enabling damage assessment, wear prediction, and proactive maintenance. While bearing sensors are typically placed on the bearing housing, direct load monitoring requires sensors inside the bearing itself. Recently introduced sensor rollers enable direct bearing load monitoring but are constrained by their battery life. Data-driven virtual sensors can learn from sensor roller data collected during a batterys lifetime to map operating conditions to bearing loads. Although spatially distributed bearing sensors offer insights into load distribution (e.g., correlating temperature with load), traditional machine learning algorithms struggle to fully exploit these spatial-temporal dependencies. To address this gap, we introduce a graph-based virtual sensor that leverages Graph Neural Networks (GNNs) to analyze spatial-temporal dependencies among sensor signals, mapping existing measurements (temperature, vibration) to bearing loads. Since temperature and vibration signals exhibit vastly different dynamics, we propose Heterogeneous Temporal Graph Neural Networks (HTGNN), which explicitly models these signal types and their interactions for effective load prediction. Our results demonstrate that HTGNN outperforms Convolutional Neural Networks (CNNs), which struggle to capture both spatial and heterogeneous signal characteristics. These findings highlight the importance of capturing the complex spatial interactions between temperature, vibration, and load.

Updated: 2024-04-02 21:03:17

标题: 使用异构时间图神经网络进行实时轴承负载预测的虚拟传感器

摘要: 准确的轴承载荷监测对于其预测和健康管理（PHM）至关重要，可以实现损伤评估、磨损预测和主动维护。虽然通常将轴承传感器放置在轴承座上，但直接载荷监测需要传感器安装在轴承内部。最近引入的传感器滚子可以实现直接轴承载荷监测，但受其电池寿命的限制。数据驱动的虚拟传感器可以借助在电池寿命内收集的传感器滚子数据，将运行条件映射到轴承载荷。虽然空间分布的轴承传感器可以提供载荷分布的见解（例如，将温度与载荷相关联），但传统的机器学习算法往往难以充分利用这些空间-时间依赖关系。为了解决这一问题，我们引入了一种基于图的虚拟传感器，利用图神经网络（GNN）分析传感器信号之间的空间-时间依赖关系，将现有测量数据（温度、振动）映射到轴承载荷。由于温度和振动信号表现出截然不同的动态特性，我们提出了异构时间图神经网络（HTGNN），明确建模这些信号类型及其相互作用，以实现有效的载荷预测。我们的结果表明，HTGNN优于卷积神经网络（CNN），后者难以捕捉空间和异构信号特征。这些发现突显了捕获温度、振动和载荷之间复杂空间交互作用的重要性。

更新时间: 2024-04-02 21:03:17

领域: cs.LG,cs.AI,cs.ET

下载: http://arxiv.org/abs/2404.02304v1

CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks

Graph neural networks have been shown successful in recent years. While different GNN architectures and training systems have been developed, GNN training on large-scale real-world graphs still remains challenging. Existing distributed systems load the entire graph in memory for graph partitioning, requiring a huge memory space to process large graphs and thus hindering GNN training on such large graphs using commodity workstations. In this paper, we propose CATGNN, a cost-efficient and scalable distributed GNN training system which focuses on scaling GNN training to billion-scale or larger graphs under limited computational resources. Among other features, it takes a stream of edges as input, instead of loading the entire graph in memory, for partitioning. We also propose a novel streaming partitioning algorithm named SPRING for distributed GNN training. We verify the correctness and effectiveness of CATGNN with SPRING on 16 open datasets. In particular, we demonstrate that CATGNN can handle the largest publicly available dataset with limited memory, which would have been infeasible without increasing the memory space. SPRING also outperforms state-of-the-art partitioning algorithms significantly, with a 50% reduction in replication factor on average.

Updated: 2024-04-02 20:55:39

标题: CATGNN: 成本有效且可扩展的图神经网络分布式训练

摘要: 图神经网络在近年来表现出了成功的迹象。虽然已经开发了不同的GNN架构和训练系统，但在大规模真实世界图上训练GNN仍然具有挑战性。现有的分布式系统在进行图分区时需要将整个图加载到内存中，这需要巨大的内存空间来处理大型图，从而阻碍了使用普通工作站在这些大型图上进行GNN训练。在本文中，我们提出了CATGNN，一个成本高效且可扩展的分布式GNN训练系统，重点是在有限的计算资源下将GNN训练扩展至十亿级或更大规模的图。除其他特点外，它以边流作为输入，而不是将整个图加载到内存中进行分区。我们还提出了一种名为SPRING的新型流式分区算法，用于分布式GNN训练。我们通过对16个开放数据集进行验证，证明了CATGNN与SPRING的正确性和有效性。特别是，我们展示了CATGNN可以处理具有有限内存的最大公开可用数据集，如果不增加内存空间，这是不可行的。SPRING在性能上也明显优于当前最先进的分区算法，平均减少了50%的复制因子。

更新时间: 2024-04-02 20:55:39

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2404.02300v1

CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases. Although Multimodal Large Language Models (MLLMs) demonstrate state-of-the-art performance, they exhibit limitations in handling large-scale, diverse, and ambiguous real-world needs of retrieval, due to the computation cost and the injective embeddings they produce. This paper presents a two-stage Coarse-to-Fine Index-shared Retrieval (CFIR) framework, designed for fast and effective large-scale long-text to image retrieval. The first stage, Entity-based Ranking (ER), adapts to long-text query ambiguity by employing a multiple-queries-to-multiple-targets paradigm, facilitating candidate filtering for the next stage. The second stage, Summary-based Re-ranking (SR), refines these rankings using summarized queries. We also propose a specialized Decoupling-BEiT-3 encoder, optimized for handling ambiguous user needs and both stages, which also enhances computational efficiency through vector-based similarity inference. Evaluation on the AToMiC dataset reveals that CFIR surpasses existing MLLMs by up to 11.06% in Recall@1000, while reducing training and retrieval times by 68.75% and 99.79%, respectively. We will release our code to facilitate future research at https://github.com/longkukuhi/CFIR.

Updated: 2024-04-02 20:54:46

标题: CFIR: 用于大型语料库的快速有效的长文本到图像检索

摘要: 文本到图像检索旨在基于文本查询找到相关图像，这在数字图书馆、电子商务和多媒体数据库等各种用例中都很重要。尽管多模态大语言模型（MLLMs）展示了最先进的性能，但由于计算成本和它们产生的注入嵌入，它们在处理大规模、多样化和模糊的实际检索需求方面存在局限性。本文提出了一个两阶段的粗到细索引共享检索（CFIR）框架，旨在快速有效地进行大规模长文本到图像检索。第一阶段，基于实体的排名（ER），通过采用多查询对多目标的范式，适应长文本查询的模糊性，促进下一阶段的候选过滤。第二阶段，基于摘要的重新排名（SR），使用摘要的查询来优化这些排名。我们还提出了一种专门的Decoupling-BEiT-3编码器，针对处理模糊用户需求和两个阶段进行了优化，同时通过基于向量的相似性推理增强了计算效率。在AToMiC数据集上的评估结果显示，CFIR在Recall@1000方面超过现有的MLLMs高达11.06％，同时将训练和检索时间分别减少了68.75％和99.79％。我们将发布我们的代码以促进未来的研究，网址为https://github.com/longkukuhi/CFIR。

更新时间: 2024-04-02 20:54:46

领域: cs.IR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2402.15276v3

Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations

This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We assume that a model of the system dynamics and a state estimator are available along with corresponding error bounds, e.g., estimated from data in practice. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety, as defined through controlled forward invariance of a safe set. We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior, e.g., data collected from a human operator or an expert controller. When the parametrization of the ROCBF is linear, then we show that, under mild assumptions, the optimization problem is convex. Along with the optimization problem, we provide verifiable conditions in terms of the density of the data, smoothness of the system model and state estimator, and the size of the error bounds that guarantee validity of the obtained ROCBF. Towards obtaining a practical control algorithm, we propose an algorithmic implementation of our theoretical framework that accounts for assumptions made in our framework in practice. We validate our algorithm in the autonomous driving simulator CARLA and demonstrate how to learn safe control laws from simulated RGB camera images.

Updated: 2024-04-02 20:54:46

标题: 学习稳健输出控制屏障函数，从安全专家演示中

摘要: 本文讨论了从部分观测到的专家演示中学习安全输出反馈控制律的问题。我们假设系统动态模型和状态估计器可用，并且具有相应的误差界限，例如，从实践中估计得出。我们首先提出了鲁棒输出控制屏障函数（ROCBFs）作为确保安全的手段，通过控制前向不变性来定义一个安全集。然后我们制定了一个优化问题，从展示安全系统行为的专家演示中学习ROCBFs，例如从人类操作员或专家控制器收集的数据。当ROCBF的参数化是线性时，我们表明，在温和的假设下，优化问题是凸的。除了优化问题，我们提供了可验证的条件，涉及数据的密度、系统模型和状态估计器的平滑性，以及误差界限的大小，以保证获得的ROCBF的有效性。为获得实用的控制算法，我们提出了一个算法实现我们的理论框架，考虑到我们在实践中所作的假设。我们在自动驾驶模拟器CARLA中验证了我们的算法，并演示了如何从模拟的RGB摄像头图像中学习安全控制律。

更新时间: 2024-04-02 20:54:46

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2111.09971v3

BioImage.IO Chatbot: A Community-Driven AI Assistant for Advanced Bioimage Analysis and Tool Integration

We introduce the BioImage$.$IO Chatbot, an AI assistant underpinned by Large Language Models and enriched by a community-driven knowledge base and tools. It facilitates customized interactions across a spectrum of user requirements via a flexible extension mechanism, from data retrieval to AI-enhanced analysis. Adhering to open-source values, the chatbot is in constant development with input from the bioimage community, improving its dependability and collaboratively tackling AI-related challenges. This tool streamlines the exploration of the complex bioimage analysis landscape, enabling life sciences to advance by harnessing the collective ingenuity of its community.

Updated: 2024-04-02 20:48:39

标题: BioImage.IO Chatbot：一款由社区驱动的高级生物图像分析和工具集成的人工智能助手

摘要: 我们介绍了BioImage$.$IO Chatbot，这是一个由大型语言模型支持的人工智能助手，并且通过社区驱动的知识库和工具进行丰富化。它通过灵活的扩展机制促进了用户需求的各种定制化交互，从数据检索到AI增强分析。遵循开源价值观，该聊天机器人在不断发展中，接受来自生物图像社区的意见，提高了其可靠性，并协作解决与人工智能相关的挑战。该工具简化了复杂的生物图像分析领域的探索，通过利用社区的集体智慧推动生命科学的进步。

更新时间: 2024-04-02 20:48:39

领域: cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2310.18351v4

Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs

This paper explores leveraging large language models for map-free off-road navigation using generative AI, reducing the need for traditional data collection and annotation. We propose a method where a robot receives verbal instructions, converted to text through Whisper, and a large language model (LLM) model extracts landmarks, preferred terrains, and crucial adverbs translated into speed settings for constrained navigation. A language-driven semantic segmentation model generates text-based masks for identifying landmarks and terrain types in images. By translating 2D image points to the vehicle's motion plane using camera parameters, an MPC controller can guides the vehicle towards the desired terrain. This approach enhances adaptation to diverse environments and facilitates the use of high-level instructions for navigating complex and challenging terrains.

Updated: 2024-04-02 20:46:13

标题: 限制性机器人导航在优选地形上使用LLMs和语音指令：利用副词的力量

摘要: 本文探讨了利用大型语言模型进行无地图越野导航的可能性，采用生成式人工智能，减少了传统数据收集和标注的需求。我们提出了一种方法，其中机器人接收口头指令，通过Whisper转换为文本，一个大型语言模型（LLM）提取地标、优选地形和关键副词，将其转化为用于受限导航的速度设置。一种以语言驱动的语义分割模型生成基于文本的掩模，用于在图像中识别地标和地形类型。通过使用摄像机参数将2D图像点转换到车辆的运动平面，一个MPC控制器可以引导车辆朝向所需地形前进。这种方法增强了对多样环境的适应能力，并促进了使用高级指令来导航复杂和具有挑战性的地形。

更新时间: 2024-04-02 20:46:13

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.02294v1

MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning

Text-to-image diffusion models allow seamless generation of personalized images from scant reference photos. Yet, these tools, in the wrong hands, can fabricate misleading or harmful content, endangering individuals. To address this problem, existing poisoning-based approaches perturb user images in an imperceptible way to render them "unlearnable" from malicious uses. We identify two limitations of these defending approaches: i) sub-optimal due to the hand-crafted heuristics for solving the intractable bilevel optimization and ii) lack of robustness against simple data transformations like Gaussian filtering. To solve these challenges, we propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework with an additional transformation sampling process to craft transferable and robust perturbation. Specifically, we employ a pool of surrogate diffusion models to craft transferable and model-agnostic perturbation. Furthermore, by incorporating an additional transformation process, we design a simple denoising-error maximization loss that is sufficient for causing transformation-robust semantic distortion and degradation in a personalized generation. Extensive experiments on the VGGFace2 and CelebA-HQ datasets show that MetaCloak outperforms existing approaches. Notably, MetaCloak can successfully fool online training services like Replicate, in a black-box manner, demonstrating the effectiveness of MetaCloak in real-world scenarios. Our code is available at https://github.com/liuyixin-louis/MetaCloak.

Updated: 2024-04-02 20:42:51

标题: MetaCloak：通过元学习防止未经授权的主题驱动的基于扩散的文本到图像合成

摘要: 文本到图像扩散模型允许从有限的参考照片无缝生成个性化图像。然而，这些工具如果落入错误的手中，可能会制造误导性或有害内容，危及个人安全。为了解决这一问题，现有基于中毒的方法会以一种不可察觉的方式扰乱用户图像，使其对恶意用途“难以学习”。我们确定了这些防御方法的两个局限性：i）由于为解决棘手的双层优化问题而手工制定的启发式规则不够优化；ii）缺乏对简单数据转换（如高斯滤波）的鲁棒性。为了解决这些挑战，我们提出了MetaCloak，该方法使用元学习框架解决了双层中毒问题，并通过额外的转换抽样过程制造可转移和强大的扰动。具体而言，我们使用一组替代扩散模型来制造可转移且与模型无关的扰动。此外，通过结合额外的转换过程，我们设计了一个简单的去噪误差最大化损失，足以在个性化生成中引起转换鲁棒语义失真和降级。对VGGFace2和CelebA-HQ数据集进行的广泛实验表明，MetaCloak优于现有方法。值得注意的是，MetaCloak可以以黑盒方式成功欺骗Replicate等在线培训服务，展示了MetaCloak在现实场景中的有效性。我们的代码可在https://github.com/liuyixin-louis/MetaCloak 上找到。

更新时间: 2024-04-02 20:42:51

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2311.13127v3

Towards a New Configurable and Practical Remote Automotive Security Testing Platform

In the automotive security sector, the absence of a testing platform that is configurable, practical, and user-friendly presents considerable challenges. These difficulties are compounded by the intricate design of vehicle systems, the rapid evolution of attack vectors, and the absence of standardized testing methodologies. We propose a next-generation testing platform that addresses several challenges in vehicle cybersecurity testing and research domains. In this paper, we detail how the Vehicle Security Engineering Cloud (VSEC) Test platform enables easier access to test beds for efficient vehicle cybersecurity testing and advanced (e.g., penetration, fuzz) testing and how we extend such test beds to benefit automotive security research. We highlight methodology on how to use this platform for a variety of users and use cases with real implemented examples.

Updated: 2024-04-02 20:40:12

标题: 朝着一个新的可配置和实用的远程汽车安全测试平台迈进

摘要: 在汽车安全领域，缺乏一个可配置、实用和用户友好的测试平台带来了相当大的挑战。这些困难受到车辆系统复杂设计、攻击向量快速演变以及缺乏标准化测试方法论的影响。我们提出了一个新一代测试平台，解决了车辆网络安全测试和研究领域中的几个挑战。在本文中，我们详细介绍了车辆安全工程云（VSEC）测试平台如何使得对测试床的访问更加容易，以便进行高效的车辆网络安全测试和先进的（如渗透、模糊）测试，并说明我们如何扩展这些测试床以利于汽车安全研究。我们强调了如何为各种用户和使用情况提供这一平台的方法论，并提供了实际实施例子。

更新时间: 2024-04-02 20:40:12

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.02291v1

Federated Multi-Agent Mapping for Planetary Exploration

In multi-agent robotic exploration, managing and effectively utilizing the vast, heterogeneous data generated from dynamic environments poses a significant challenge. Federated learning (FL) is a promising approach for distributed mapping, addressing the challenges of decentralized data in collaborative learning. FL enables joint model training across multiple agents without requiring the centralization or sharing of raw data, overcoming bandwidth and storage constraints. Our approach leverages implicit neural mapping, representing maps as continuous functions learned by neural networks, for compact and adaptable representations. We further enhance this approach with meta-initialization on Earth datasets, pre-training the network to quickly learn new map structures. This combination demonstrates strong generalization to diverse domains like Martian terrain and glaciers. We rigorously evaluate this approach, demonstrating its effectiveness for real-world deployment in multi-agent exploration scenarios.

Updated: 2024-04-02 20:32:32

标题: 行星探索的联合多智能体制图

摘要: 在多智能体机器人探索中，管理和有效利用从动态环境中生成的大量异构数据构成了一个重要挑战。联邦学习（FL）是一种分布式映射的有前途的方法，解决了协作学习中去中心化数据的挑战。FL使多个智能体之间可以进行联合模型训练，而无需集中化或共享原始数据，从而克服了带宽和存储限制。我们的方法利用隐式神经映射，将地图表示为由神经网络学习的连续函数，以获得紧凑且适应性强的表示。我们进一步通过对地球数据集进行元初始化来增强这种方法，预训练网络以快速学习新的地图结构。这种组合在诸如火星地形和冰川等不同领域展示出强大的泛化能力。我们对这种方法进行了严格评估，证明了它在多智能体探索场景中的实际部署效果。

更新时间: 2024-04-02 20:32:32

领域: cs.RO,cs.LG,cs.MA,I.2.11; I.2.9

下载: http://arxiv.org/abs/2404.02289v1

One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal Perturbation

This paper presents a novel universal perturbation method for generating robust multi-view adversarial examples in 3D object recognition. Unlike conventional attacks limited to single views, our approach operates on multiple 2D images, offering a practical and scalable solution for enhancing model scalability and robustness. This generalizable method bridges the gap between 2D perturbations and 3D-like attack capabilities, making it suitable for real-world applications. Existing adversarial attacks may become ineffective when images undergo transformations like changes in lighting, camera position, or natural deformations. We address this challenge by crafting a single universal noise perturbation applicable to various object views. Experiments on diverse rendered 3D objects demonstrate the effectiveness of our approach. The universal perturbation successfully identified a single adversarial noise for each given set of 3D object renders from multiple poses and viewpoints. Compared to single-view attacks, our universal attacks lower classification confidence across multiple viewing angles, especially at low noise levels. A sample implementation is made available at https://github.com/memoatwit/UniversalPerturbation.

Updated: 2024-04-02 20:29:59

标题: 一种噪声统治一切：具有通用扰动的多视角对抗攻击

摘要: 这篇论文提出了一种新颖的通用扰动方法，用于生成在3D物体识别中具有强鲁棒性的多视角对抗样本。与传统的仅限于单个视角的攻击不同，我们的方法在多个2D图像上操作，为增强模型的可扩展性和鲁棒性提供了实用且可扩展的解决方案。这种通用方法弥合了2D扰动与类3D攻击能力之间的差距，使其适用于真实世界的应用。现有的对抗攻击在图像经历光照变化、摄像机位置变化或自然变形等转换时可能变得无效。我们通过创建一种适用于各种物体视图的单一通用噪声扰动来应对这一挑战。对各种渲染的3D物体的实验展示了我们方法的有效性。通用扰动成功地确定了每组给定的来自多个姿势和视角的3D物体渲染的单一对抗性噪声。与单视角攻击相比，我们的通用攻击降低了多个视角上的分类置信度，尤其是在低噪声水平下。一个示例实现可在https://github.com/memoatwit/UniversalPerturbation找到。

更新时间: 2024-04-02 20:29:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.02287v1

Extracting Norms from Contracts Via ChatGPT: Opportunities and Challenges

We investigate the effectiveness of ChatGPT in extracting norms from contracts. Norms provide a natural way to engineer multiagent systems by capturing how to govern the interactions between two or more autonomous parties. We extract norms of commitment, prohibition, authorization, and power, along with associated norm elements (the parties involved, antecedents, and consequents) from contracts. Our investigation reveals ChatGPT's effectiveness and limitations in norm extraction from contracts. ChatGPT demonstrates promising performance in norm extraction without requiring training or fine-tuning, thus obviating the need for annotated data, which is not generally available in this domain. However, we found some limitations of ChatGPT in extracting these norms that lead to incorrect norm extractions. The limitations include oversight of crucial details, hallucination, incorrect parsing of conjunctions, and empty norm elements. Enhanced norm extraction from contracts can foster the development of more transparent and trustworthy formal agent interaction specifications, thereby contributing to the improvement of multiagent systems.

Updated: 2024-04-02 19:49:34

标题: 通过ChatGPT从合同中提取规范：机遇与挑战

摘要: 我们研究了ChatGPT在从合同中提取规范的有效性。规范提供了一种自然的方式来设计多智能体系统，捕捉如何管理两个或更多自主方之间的互动。我们从合同中提取了承诺、禁止、授权和权力等规范，以及相关的规范元素（参与方、前提和后果）。我们的研究揭示了ChatGPT在从合同中提取规范方面的有效性和局限性。ChatGPT展现出了在提取规范方面的有希望的表现，而无需训练或微调，因此避免了对一般情况下在此领域不易获得的标注数据的需求。然而，我们发现了ChatGPT在提取这些规范方面的一些局限性，导致错误的规范提取。这些局限性包括对关键细节的忽视、幻觉、对连词的错误解析和空的规范元素。从合同中增强的规范提取可以促进更透明和可信赖的正式智能体互动规范的发展，从而有助于改善多智能体系统。

更新时间: 2024-04-02 19:49:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02269v1

Physics-Informed Graph Neural Network for Dynamic Reconfiguration of Power Systems

To maintain a reliable grid we need fast decision-making algorithms for complex problems like Dynamic Reconfiguration (DyR). DyR optimizes distribution grid switch settings in real-time to minimize grid losses and dispatches resources to supply loads with available generation. DyR is a mixed-integer problem and can be computationally intractable to solve for large grids and at fast timescales. We propose GraPhyR, a Physics-Informed Graph Neural Network (GNNs) framework tailored for DyR. We incorporate essential operational and connectivity constraints directly within the GNN framework and train it end-to-end. Our results show that GraPhyR is able to learn to optimize the DyR task.

Updated: 2024-04-02 19:44:34

标题: 物理信息图神经网络用于电力系统动态重构

摘要: 为了维护一个可靠的电网，我们需要快速的决策算法来解决复杂问题，比如动态重构（DyR）。DyR优化配电网开关设置，以实时最小化网格损失并分配资源来满足可用发电量的负荷需求。DyR是一个混合整数问题，对于大型电网和快速时间尺度可能难以计算求解。我们提出了GraPhyR，一个专门针对DyR设计的基于物理信息的图神经网络（GNNs）框架。我们直接将关键的运营和连接约束纳入到GNN框架中，并进行端到端的训练。我们的结果表明，GraPhyR能够学习优化DyR任务。

更新时间: 2024-04-02 19:44:34

领域: cs.LG,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2310.00728v2

OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment

The task of motion prediction is pivotal for autonomous driving systems, providing crucial data to choose a vehicle behavior strategy within its surroundings. Existing motion prediction techniques primarily focus on predicting the future trajectory of each agent in the scene individually, utilizing its past trajectory data. In this paper, we introduce an end-to-end neural network methodology designed to predict the future behaviors of all dynamic objects in the environment. This approach leverages the occupancy map and the scene's motion flow. We are investigatin various alternatives for constructing a deep encoder-decoder model called OFMPNet. This model uses a sequence of bird's-eye-view road images, occupancy grid, and prior motion flow as input data. The encoder of the model can incorporate transformer, attention-based, or convolutional units. The decoder considers the use of both convolutional modules and recurrent blocks. Additionally, we propose a novel time-weighted motion flow loss, whose application has shown a substantial decrease in end-point error. Our approach has achieved state-of-the-art results on the Waymo Occupancy and Flow Prediction benchmark, with a Soft IoU of 52.1% and an AUC of 76.75% on Flow-Grounded Occupancy.

Updated: 2024-04-02 19:37:58

标题: OFMPNet：城市环境中占用和流量预测的深度端到端模型

摘要: 运动预测的任务对于自动驾驶系统至关重要，为选择车辆行为策略提供关键数据。现有的运动预测技术主要侧重于单独预测场景中每个代理的未来轨迹，利用其过去的轨迹数据。本文介绍了一种端到端神经网络方法，旨在预测环境中所有动态物体的未来行为。这种方法利用占据地图和场景的运动流。我们正在研究构建一个名为OFMPNet的深度编码器-解码器模型的各种替代方案。该模型使用一系列鸟瞰道路图像、占据栅格和先前的运动流作为输入数据。模型的编码器可以整合变压器、基于注意力的或卷积单元。解码器考虑使用卷积模块和循环块。此外，我们提出了一种新颖的时间加权运动流损失，其应用显示出端点误差显著减少。我们的方法在Waymo占据和流预测基准测试中取得了最先进的结果，在Flow-Grounded占据上的Soft IoU为52.1%，AUC为76.75%。

更新时间: 2024-04-02 19:37:58

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.02263v1

LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages

Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling, rendering them rare and costly. The scarcity of data and the absence of preexisting tools exacerbate these challenges, especially since these languages may not be adequately represented in various NLP datasets. To address this gap, we propose leveraging the potential of LLMs in the active learning loop for data annotation. Initially, we conduct evaluations to assess inter-annotator agreement and consistency, facilitating the selection of a suitable LLM annotator. The chosen annotator is then integrated into a training loop for a classifier using an active learning paradigm, minimizing the amount of queried data required. Empirical evaluations, notably employing GPT-4-Turbo, demonstrate near-state-of-the-art performance with significantly reduced data requirements, as indicated by estimated potential cost savings of at least 42.45 times compared to human annotation. Our proposed solution shows promising potential to substantially reduce both the monetary and computational costs associated with automation in low-resource settings. By bridging the gap between low-resource languages and AI, this approach fosters broader inclusion and shows the potential to enable automation across diverse linguistic landscapes.

Updated: 2024-04-02 19:34:22

标题: 标题翻译：循环中的LLMs：利用大型语言模型注释在低资源语言中进行主动学习

摘要: 低资源语言在人工智能开发中面临重大障碍，因为存在有限的语言资源和数据标注专业知识，使得它们变得罕见且昂贵。数据的稀缺性和缺乏现有工具加剧了这些挑战，特别是因为这些语言可能在各种自然语言处理数据集中没有得到充分代表。为了填补这一差距，我们提出利用LLM在主动学习循环中进行数据标注的潜力。首先，我们进行评估以评估标注者之间的一致性，从而选择适合的LLM标注者。然后，选择的标注者被整合到使用主动学习范式的分类器的训练循环中，最小化所需查询的数据量。实证评估，特别是使用GPT-4-Turbo，显示出接近最先进性能的表现，同时显著减少了数据需求，根据估计的潜在成本节约至少为42.45倍，与人工标注相比。我们提出的解决方案显示出有望大幅降低与低资源环境中自动化相关的货币和计算成本。通过弥合低资源语言和人工智能之间的鸿沟，这种方法促进了更广泛的包容性，并展示了在不同语言环境中实现自动化的潜力。

更新时间: 2024-04-02 19:34:22

领域: cs.CL,cs.AI,cs.IR,cs.LG,I.2.7; I.2.6

下载: http://arxiv.org/abs/2404.02261v1

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by capping the number of tokens ($k$) that can participate in the self-attention and MLP computations at a given layer. The tokens to be processed are determined by the network using a top-$k$ routing mechanism. Since $k$ is defined a priori, this simple procedure uses a static computation graph with known tensor sizes, unlike other conditional computation techniques. Nevertheless, since the identities of the $k$ tokens are fluid, this method can expend FLOPs non-uniformly across the time and model depth dimensions. Thus, compute expenditure is entirely predictable in sum total, but dynamic and context-sensitive at the token-level. Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50\% faster to step during post-training sampling.

Updated: 2024-04-02 19:28:11

标题: 深度混合：在基于transformer的语言模型中动态分配计算

摘要: 基于Transformer的语言模型均匀分配FLOP到输入序列中。在这项工作中，我们展示了transformers可以学习动态地将FLOP（或计算）分配到序列中的特定位置，优化模型深度中不同层的序列上的分配。我们的方法通过限制参与给定层中自注意力和MLP计算的令牌数量（$k$）来强制执行总计算预算。要处理的令牌是由网络使用top-$k$路由机制确定的。由于$k$是预先定义的，这种简单的程序使用具有已知张量大小的静态计算图，不同于其他条件计算技术。然而，由于$k$令牌的身份是流动的，这种方法可以在时间和模型深度维度上非均匀地消耗FLOP。因此，计算支出在总和上是完全可预测的，但在令牌级别上是动态的和上下文敏感的。通过这种方式训练的模型不仅学会动态分配计算，而且效率高。这些模型与等效FLOPS和训练时间的基准性能相匹配，但每次前向传递所需的FLOP仅为一小部分，并且在后训练采样过程中可以比步进速度快50\%以上。

更新时间: 2024-04-02 19:28:11

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.02258v1

$\texttt{LM}^\texttt{2}$: A Simple Society of Language Models Solves Complex Reasoning

Despite demonstrating emergent reasoning abilities, Large Language Models (LLMS) often lose track of complex, multi-step reasoning. Existing studies show that providing guidance via decomposing the original question into multiple subproblems elicits more robustness in LLM reasoning -- a decomposer generates the subproblems, and a solver solves each of these subproblems. However, these techniques fail to accommodate coordination between the decomposer and the solver modules (either in a single model or different specialized ones) -- the decomposer does not keep track of the ability of the solver to follow the decomposed reasoning. In this paper, we propose LM2 to address these challenges. LM2 modularizes the decomposition, solution, and verification into three different language models. The decomposer module identifies the key concepts necessary to solve the problem and generates step-by-step subquestions according to the reasoning requirement. The solver model generates the solution to the subproblems that are then checked by the verifier module; depending upon the feedback from the verifier, the reasoning context is constructed using the subproblems and the solutions. These models are trained to coordinate using policy learning. Exhaustive experimentation suggests the superiority of LM2 over existing methods on in- and out-domain reasoning problems, outperforming the best baselines by $8.1\%$ on MATH, $7.71\%$ on JEEBench, and $9.7\%$ on MedQA problems (code available at https://github.com/LCS2-IIITD/Language_Model_Multiplex).

Updated: 2024-04-02 19:23:10

标题: LM^2：一个简单的语言模型社会解决复杂推理

摘要: 尽管展示了新兴的推理能力，但大型语言模型（LLMS）经常在复杂的多步推理中失去追踪。现有研究表明，通过将原始问题分解为多个子问题来提供指导，可以引发LLM推理中更强大的稳健性 - 一个分解器生成子问题，一个解算器解决每个子问题。然而，这些技术未能适应分解器和解算器模块之间的协调（无论是在单个模型中还是在不同的专门模型中） - 分解器无法跟踪解算器遵循分解推理的能力。在本文中，我们提出了LM2来解决这些挑战。LM2将分解、解决和验证模块化为三个不同的语言模型。分解器模块识别解决问题所需的关键概念，并根据推理需求生成逐步子问题。解算器模型生成解决方案，然后由验证器模块检查；根据验证器的反馈，使用子问题和解决方案构建推理上下文。这些模型经过策略学习训练以协调。详尽的实验表明，LM2在内外领域推理问题上优于现有方法，在MATH上超过最佳基线8.1％，在JEEBench上超过7.71％，在MedQA问题上超过9.7％（代码可在https://github.com/LCS2-IIITD/Language_Model_Multiplex找到）。

更新时间: 2024-04-02 19:23:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02255v1

Advancing the Search Frontier with AI Agents

As many of us in the information retrieval (IR) research community know and appreciate, search is far from being a solved problem. Millions of people struggle with tasks on search engines every day. Often, their struggles relate to the intrinsic complexity of their task and the failure of search systems to fully understand the task and serve relevant results. The task motivates the search, creating the gap/problematic situation that searchers attempt to bridge/resolve and drives search behavior as they work through different task facets. Complex search tasks require more than support for rudimentary fact finding or re-finding. Research on methods to support complex tasks includes work on generating query and website suggestions, personalizing and contextualizing search, and developing new search experiences, including those that span time and space. The recent emergence of generative artificial intelligence (AI) and the arrival of assistive agents, based on this technology, has the potential to offer further assistance to searchers, especially those engaged in complex tasks. There are profound implications from these advances for the design of intelligent systems and for the future of search itself. This article, based on a keynote by the author at the 2023 ACM SIGIR Conference, explores these issues and how AI agents are advancing the frontier of search system capabilities, with a special focus on information interaction and complex task completion.

Updated: 2024-04-02 19:22:27

标题: 用AI代理推动搜索前沿

摘要: 正如我们信息检索（IR）研究界的许多人所知道和理解的那样，搜索远未解决问题。每天都有数百万人在搜索引擎上与任务作斗争。通常，他们的困难与任务的固有复杂性以及搜索系统未能充分理解任务并提供相关结果有关。任务激发了搜索，创造了搜索者试图弥合/解决的差距/问题情况，并驱动搜索行为，因为他们通过不同的任务方面进行工作。复杂的搜索任务需要支持不仅仅是对基本事实查找或重新查找的支持。支持复杂任务的方法的研究包括生成查询和网站建议、个性化和上下文化搜索以及开发新的搜索体验，包括跨时间和空间的搜索体验。最近出现的生成式人工智能（AI）和基于这项技术的辅助代理的出现有潜力为搜索者提供进一步的帮助，特别是那些从事复杂任务的人。这些进步对智能系统的设计和搜索本身的未来有深远的影响。本文基于作者在2023年ACM SIGIR会议上的主题演讲，探讨了这些问题以及AI代理如何推动搜索系统能力的前沿，特别关注信息交互和复杂任务完成。

更新时间: 2024-04-02 19:22:27

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2311.01235v2

On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning

In multimodal machine learning, multiple modalities of data (e.g., text and images) are combined to facilitate the learning of a better machine learning model, which remains applicable to a corresponding unimodal task (e.g., text generation). Recently, multimodal machine learning has enjoyed huge empirical success (e.g. GPT-4). Motivated to develop theoretical justification for this empirical success, Lu (NeurIPS '23, ALT '24) introduces a theory of multimodal learning, and considers possible separations between theoretical models of multimodal and unimodal learning. In particular, Lu (ALT '24) shows a computational separation, which is relevant to worst-case instances of the learning task. In this paper, we give a stronger average-case computational separation, where for "typical" instances of the learning task, unimodal learning is computationally hard, but multimodal learning is easy. We then question how "organic" the average-case separation is. Would it be encountered in practice? To this end, we prove that under natural conditions, any given computational separation between average-case unimodal and multimodal learning tasks implies a corresponding cryptographic key agreement protocol. We suggest to interpret this as evidence that very strong computational advantages of multimodal learning may arise infrequently in practice, since they exist only for the "pathological" case of inherently cryptographic distributions. However, this does not apply to possible (super-polynomial) statistical advantages.

Updated: 2024-04-02 19:21:28

标题: 关于多模态和单模态机器学习之间更强的计算分离

摘要: 在多模态机器学习中，将多种数据模态（例如文本和图像）结合起来，以促进更好的机器学习模型的学习，使其仍然适用于对应的单模态任务（例如文本生成）。最近，多模态机器学习取得了巨大的经验成功（例如GPT-4）。受到发展理论证明这种经验成功的动机，Lu（NeurIPS '23，ALT '24）引入了一种多模态学习理论，并考虑了多模态和单模态学习的理论模型之间可能的分离。特别是，Lu（ALT '24）展示了一个计算分离，与学习任务的最坏情况实例相关。在本文中，我们提出了一个更强的平均情况计算分离，在学习任务的“典型”实例中，单模态学习在计算上是困难的，但多模态学习很容易。然后我们质疑这种“有机”的平均情况分离有多少实际意义。它会在实践中遇到吗？为此，我们证明在自然条件下，对于任何给定的平均情况单模态和多模态学习任务之间的计算分离都意味着相应的密码密钥协商协议。我们建议将这解释为证据，即多模态学习具有非常强大的计算优势可能在实践中很少出现，因为它们仅适用于固有加密分布的“病态”情况。然而，这并不适用于可能的（超多项式）统计优势。

更新时间: 2024-04-02 19:21:28

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.02254v1

RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction

Predicting click-through rates (CTR) is a fundamental task for Web applications, where a key issue is to devise effective models for feature interactions. Current methodologies predominantly concentrate on modeling feature interactions within an individual sample, while overlooking the potential cross-sample relationships that can serve as a reference context to enhance the prediction. To make up for such deficiency, this paper develops a Retrieval-Augmented Transformer (RAT), aiming to acquire fine-grained feature interactions within and across samples. By retrieving similar samples, we construct augmented input for each target sample. We then build Transformer layers with cascaded attention to capture both intra- and cross-sample feature interactions, facilitating comprehensive reasoning for improved CTR prediction while retaining efficiency. Extensive experiments on real-world datasets substantiate the effectiveness of RAT and suggest its advantage in long-tail scenarios. The code has been open-sourced at \url{https://github.com/YushenLi807/WWW24-RAT}.

Updated: 2024-04-02 19:14:23

标题: RAT：检索增强变压器用于点击率预测

摘要: 预测点击率（CTR）是网络应用的一个基本任务，关键问题是设计有效的特征交互模型。当前的方法主要集中在对个体样本内部的特征交互建模，而忽略了可以作为参考上下文来增强预测的潜在跨样本关系。为弥补这种不足，本文开发了一种检索增强变压器（RAT），旨在获取样本内部和跨样本之间的细粒度特征交互。通过检索相似样本，我们为每个目标样本构建了增强输入。然后，我们建立了具有级联注意力的变压器层，以捕捉样本内和跨样本的特征交互，促进全面推理，以改进CTR预测同时保持效率。对真实世界数据集的广泛实验证实了RAT的有效性，并表明其在长尾场景中的优势。该代码已在\url{https://github.com/YushenLi807/WWW24-RAT}开源。

更新时间: 2024-04-02 19:14:23

领域: cs.IR,cs.AI,cs.LG,cs.SI

下载: http://arxiv.org/abs/2404.02249v1

Anti-LM Decoding for Zero-shot In-context Machine Translation

Zero-shot In-context learning is the phenomenon where models can perform the task simply given the instructions. However, pre-trained large language models are known to be poorly calibrated for this task. One of the most effective approaches to handling this bias is to adopt a contrastive decoding objective, which accounts for the prior probability of generating the next token by conditioning on some context. This work introduces an Anti-Language Model objective with a decay factor designed to address the weaknesses of In-context Machine Translation. We conduct our experiments across 3 model types and sizes, 3 language directions, and for both greedy decoding and beam search ($B=5$). The proposed method outperforms other state-of-art decoding objectives, with up to $20$ BLEU point improvement from the default objective observed in some settings.

Updated: 2024-04-02 19:03:15

标题: 零样本上下文机器翻译的反语言模型解码

摘要: 零样本上下文学习是模型只需根据指令就能执行任务的现象。然而，已知预训练的大型语言模型在这一任务上的校准不佳。处理这种偏差最有效的方法之一是采用对比解码目标，通过在一些上下文条件下考虑生成下一个标记的先验概率。本文介绍了一种具有衰减因子的反语言模型目标，旨在解决上下文机器翻译的弱点。我们在3种模型类型和大小、3种语言方向上以及贪婪解码和集束搜索（$B=5$）进行实验。所提出的方法优于其他最先进的解码目标，在一些设置中与默认目标相比，BLEU分数提高了高达20个点。

更新时间: 2024-04-02 19:03:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.08324v2

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

It is a notable trend to use Large Language Models (LLMs) to tackle complex tasks, e.g., tasks that require a sequence of actions and dynamic interaction with tools and external environments. In this paper, we propose StateFlow, a novel LLM-based task-solving paradigm that conceptualizes complex task-solving processes as state machines. In StateFlow, we distinguish between "process grounding" (via state and state transitions) and "sub-task solving" (through actions within a state), enhancing control and interpretability of the task-solving procedure. A state represents the status of a running process. The transitions between states are controlled by heuristic rules or decisions made by the LLM, allowing for a dynamic and adaptive progression. Upon entering a state, a series of actions is executed, involving not only calling LLMs guided by different prompts, but also the utilization of external tools as needed. Our results show that StateFlow significantly enhances LLMs' efficiency. For instance, StateFlow achieves 13% and 28% higher success rates compared to ReAct in InterCode SQL and ALFWorld benchmark, with 5x and 3x less cost respectively. We also show that StateFlow can be combined with iterative refining methods like Reflexion to further improve performance.

Updated: 2024-04-02 18:57:49

标题: StateFlow：通过状态驱动工作流增强LLM任务解决

摘要: 使用大型语言模型(LLMs)来解决复杂任务是一个显著的趋势，例如需要一系列动作和与工具以及外部环境动态交互的任务。在本文中，我们提出了StateFlow，一种基于LLM的新型任务解决范式，将复杂任务解决过程概念化为状态机。在StateFlow中，我们区分了"过程基础"(通过状态和状态转换)和"子任务解决"(通过状态内的动作)，增强了任务解决过程的控制性和可解释性。一个状态代表着运行过程的状态。状态之间的转换由启发式规则或LLM做出的决定控制，实现了动态和自适应的进展。进入一个状态时，会执行一系列动作，不仅涉及调用由不同提示引导的LLMs，还可能需要利用外部工具。我们的结果表明，StateFlow显著提高了LLMs的效率。例如，与ReAct相比，在InterCode SQL和ALFWorld基准测试中，StateFlow的成功率分别提高了13%和28%，成本分别降低了5倍和3倍。我们还展示了StateFlow可以与像Reflexion这样的迭代优化方法结合，进一步提高性能。

更新时间: 2024-04-02 18:57:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.11322v2

Understanding Video Transformers via Universal Concept Discovery

This paper studies the problem of concept-based interpretability of transformer representations for videos. Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered. Prior research on concept-based interpretability has concentrated solely on image-level tasks. Comparatively, video models deal with the added temporal dimension, increasing complexity and posing challenges in identifying dynamic concepts over time. In this work, we systematically address these challenges by introducing the first Video Transformer Concept Discovery (VTCD) algorithm. To this end, we propose an efficient approach for unsupervised identification of units of video transformer representations - concepts, and ranking their importance to the output of a model. The resulting concepts are highly interpretable, revealing spatio-temporal reasoning mechanisms and object-centric representations in unstructured video models. Performing this analysis jointly over a diverse set of supervised and self-supervised representations, we discover that some of these mechanism are universal in video transformers. Finally, we show that VTCD can be used for fine-grained action recognition and video object segmentation.

Updated: 2024-04-02 18:54:50

标题: 通过通用概念发现理解视频变形器

摘要: 这篇论文研究了基于概念的视频Transformer表示的可解释性问题。具体来说，我们试图解释视频Transformer基于高级时空概念进行决策过程。先前关于基于概念的可解释性的研究仅集中在图像级任务上。相比之下，视频模型处理额外的时间维度，增加了复杂性并在识别随时间变化的动态概念方面提出挑战。在这项工作中，我们通过引入第一个视频Transformer概念发现（VTCD）算法系统地解决了这些挑战。为此，我们提出了一种有效的方法，用于无监督识别视频Transformer表示的单元 - 概念，并对它们对模型输出的重要性进行排名。得到的概念具有很高的可解释性，揭示了在非结构化视频模型中的时空推理机制和以对象为中心的表示。通过在各种监督和自监督表示上联合进行这种分析，我们发现一些机制在视频Transformer中是普遍存在的。最后，我们展示了VTCD可以用于细粒度动作识别和视频对象分割。

更新时间: 2024-04-02 18:54:50

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2401.10831v2

Proximal Oracles for Optimization and Sampling

We consider convex optimization with non-smooth objective function and log-concave sampling with non-smooth potential (negative log density). In particular, we study two specific settings where the convex objective/potential function is either semi-smooth or in composite form as the finite sum of semi-smooth components. To overcome the challenges caused by non-smoothness, our algorithms employ two powerful proximal frameworks in optimization and sampling: the proximal point framework for optimization and the alternating sampling framework (ASF) that uses Gibbs sampling on an augmented distribution. A key component of both optimization and sampling algorithms is the efficient implementation of the proximal map by the regularized cutting-plane method. We establish the iteration-complexity of the proximal map in both semi-smooth and composite settings. We further propose an adaptive proximal bundle method for non-smooth optimization. The proposed method is universal since it does not need any problem parameters as input. Additionally, we develop a proximal sampling oracle that resembles the proximal map in optimization and establish its complexity using a novel technique (a modified Gaussian integral). Finally, we combine this proximal sampling oracle and ASF to obtain a Markov chain Monte Carlo method with non-asymptotic complexity bounds for sampling in semi-smooth and composite settings.

Updated: 2024-04-02 18:52:28

标题: 用于优化和抽样的近端预言者

摘要: 我们考虑具有非光滑目标函数和对数凹采样的凸优化问题，其中潜在函数（负对数密度）也是非光滑的。特别地，我们研究两种具体情况，即凸目标/潜在函数要么是半光滑的，要么是由半光滑组件的有限和构成的复合形式。为了克服非光滑性带来的挑战，我们的算法在优化和采样中采用了两种强大的近端框架：用于优化的近端点框架和使用增强分布上的Gibbs采样的交替采样框架（ASF）。在优化和采样算法中的一个关键组成部分是通过正则化切割平面方法高效实现近端映射。我们在半光滑和复合设置中建立了近端映射的迭代复杂性。我们进一步提出了一种自适应近端捆绑方法用于非光滑优化。该方法是通用的，因为它不需要任何问题参数作为输入。此外，我们开发了一个类似于优化中的近端映射的近端采样预测器，并利用一种新颖的技术（修改的高斯积分）建立了其复杂性。最后，我们将这个近端采样预测器和ASF结合起来，得到了一种在半光滑和复合设置中具有非渐近复杂性界限的马尔可夫链蒙特卡洛方法。

更新时间: 2024-04-02 18:52:28

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2404.02239v1

Is Exploration All You Need? Effective Exploration Characteristics for Transfer in Reinforcement Learning

In deep reinforcement learning (RL) research, there has been a concerted effort to design more efficient and productive exploration methods while solving sparse-reward problems. These exploration methods often share common principles (e.g., improving diversity) and implementation details (e.g., intrinsic reward). Prior work found that non-stationary Markov decision processes (MDPs) require exploration to efficiently adapt to changes in the environment with online transfer learning. However, the relationship between specific exploration characteristics and effective transfer learning in deep RL has not been characterized. In this work, we seek to understand the relationships between salient exploration characteristics and improved performance and efficiency in transfer learning. We test eleven popular exploration algorithms on a variety of transfer types -- or ``novelties'' -- to identify the characteristics that positively affect online transfer learning. Our analysis shows that some characteristics correlate with improved performance and efficiency across a wide range of transfer tasks, while others only improve transfer performance with respect to specific environment changes. From our analysis, make recommendations about which exploration algorithm characteristics are best suited to specific transfer situations.

Updated: 2024-04-02 18:45:01

标题: 您好，这个文献标题的翻译是：探索是否就足够了？强化学习中用于转移的有效探索特征。

摘要: 在深度强化学习（RL）研究中，人们一直在努力设计更高效和更有成效的探索方法，同时解决稀疏奖励问题。这些探索方法通常共享共同原则（例如，提高多样性）和实现细节（例如，内在奖励）。先前的工作发现，非平稳马尔可夫决策过程（MDP）需要探索才能有效地适应环境中的变化，并进行在线迁移学习。然而，在深度RL中，特定探索特征和有效迁移学习之间的关系尚未被表征。在这项工作中，我们试图理解显著的探索特征与改善性能和效率在迁移学习中的关系。我们在各种迁移类型上测试了十一种流行的探索算法，以识别对在线迁移学习产生积极影响的特征。我们的分析显示，一些特征与在各种迁移任务中的改善性能和效率相关，而其他一些只在特定环境变化方面改善迁移性能。从我们的分析中，我们推荐哪些探索算法特征最适合特定的迁移情况。

更新时间: 2024-04-02 18:45:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.02235v1

Deep Neural Networks with 3D Point Clouds for Empirical Friction Measurements in Hydrodynamic Flood Models

Friction is one of the cruxes of hydrodynamic modeling; flood conditions are highly sensitive to the Friction Factors (FFs) used to calculate momentum losses. However, empirical FFs are challenging to measure because they require laboratory experiments. Flood models often rely on surrogate observations (such as land use) to estimate FFs, introducing uncertainty. This research presents a laboratory-trained Deep Neural Network (DNN), trained using flume experiments with data augmentation techniques, to measure Manning's n based on Point Cloud data. The DNN was deployed on real-world lidar Point Clouds to directly measure Manning's n under regulatory and extreme storm events, showing improved prediction capabilities in both 1D and 2D hydrodynamic models. For 1D models, the lidar values decreased differences with regulatory models for in-channel water depth when compared to land cover values. For 1D/2D coupled models, the lidar values produced better agreement with flood extents measured from airborne imagery, while better matching flood insurance claim data for Hurricane Harvey. In both 1D and 1D/2D coupled models, lidar resulted in better agreement with validation gauges. For these reasons, the lidar measurements of Manning's n were found to improve both regulatory models and forecasts for extreme storm events, while simultaneously providing a pathway to standardize the measurement of FFs. Changing FFs significantly affected fluvial and pluvial flood models, while surge flooding was generally unaffected. Downstream flow conditions were found to change the importance of FFs to fluvial models, advancing the literature of friction in flood models. This research introduces a reliable, repeatable, and readily-accessible avenue to measure high-resolution FFs based on 3D point clouds, improving flood prediction, and removing uncertainty from hydrodynamic modeling.

Updated: 2024-04-02 18:44:53

标题: 使用3D点云的深度神经网络进行水动力洪水模型中的摩擦实证测量

摘要: 摩擦是水动力模型的关键之一；洪水条件对于用于计算动量损失的摩擦因子（FFs）非常敏感。然而，经验性的FFs很难测量，因为它们需要实验室实验。洪水模型通常依赖于替代观测（如土地利用）来估计FFs，引入了不确定性。本研究提出了一种经过实验室训练的深度神经网络（DNN），使用数据增强技术进行训练，以基于点云数据测量Manning's n。该DNN被部署在真实世界的激光雷达点云上，直接测量了在监管和极端风暴事件下的Manning's n，展示了在1D和2D水动力模型中的改进预测能力。对于1D模型，与土地覆盖值相比，激光雷达值减少了河道水深的差异。对于1D/2D耦合模型，激光雷达值与从空中图像测量的洪水范围产生更好的一致性，同时更好地匹配了飓风哈维的洪水保险索赔数据。在1D和1D/2D耦合模型中，激光雷达结果与验证测量仪更好地一致。基于这些原因，Manning's n的激光雷达测量被发现可以改善监管模型和极端风暴事件的预测，同时提供了一条标准化测量FFs的途径。更改FFs显着影响了河流和降雨洪水模型，而潮汐洪水通常不受影响。下游流动条件被发现改变了对于河流模型中FFs的重要性，推动了洪水模型中摩擦的文献。这项研究引入了一种可靠、可重复和易于访问的途径，基于3D点云测量高分辨率的FFs，改善了洪水预测，并消除了水动力模型中的不确定性。

更新时间: 2024-04-02 18:44:53

领域: cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2404.02234v1

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

Trajectory prediction is fundamental in computer vision and autonomous driving, particularly for understanding pedestrian behavior and enabling proactive decision-making. Existing approaches in this field often assume precise and complete observational data, neglecting the challenges associated with out-of-view objects and the noise inherent in sensor data due to limited camera range, physical obstructions, and the absence of ground truth for denoised sensor data. Such oversights are critical safety concerns, as they can result in missing essential, non-visible objects. To bridge this gap, we present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique. Our approach denoises noisy sensor observations in an unsupervised manner and precisely maps sensor-based trajectories of out-of-sight objects into visual trajectories. This method has demonstrated state-of-the-art performance in out-of-sight noisy sensor trajectory denoising and prediction on the Vi-Fi and JRDB datasets. By enhancing trajectory prediction accuracy and addressing the challenges of out-of-sight objects, our work significantly contributes to improving the safety and reliability of autonomous driving in complex environments. Our work represents the first initiative towards Out-Of-Sight Trajectory prediction (OOSTraj), setting a new benchmark for future research. The code is available at \url{https://github.com/Hai-chao-Zhang/OOSTraj}.

Updated: 2024-04-02 18:30:29

标题: OOSTraj：利用视觉定位去噪的视野外轨迹预测

摘要: 轨迹预测在计算机视觉和自动驾驶中具有基础性作用，特别是对于理解行人行为和实现积极的决策制定。该领域现有的方法通常假设具有精确和完整的观测数据，忽视了与视野之外物体相关的挑战，以及由于有限摄像头范围、物理障碍和缺乏用于去噪传感器数据的基准真值而产生的传感器数据中固有的噪音。这些疏忽是关键的安全问题，因为它们可能导致关键的、非可见对象的丢失。为了弥补这一差距，我们提出了一种利用视觉定位技术进行视野之外轨迹预测的新方法。我们的方法以无监督方式去噪传感器观测，并将视野之外对象的基于传感器的轨迹精确映射到视觉轨迹上。该方法在Vi-Fi和JRDB数据集上展示了在视野之外噪声传感器轨迹去噪和预测方面的最先进性能。通过提高轨迹预测准确性并解决视野之外对象的挑战，我们的工作显著地有助于提高复杂环境中自动驾驶的安全性和可靠性。我们的工作代表了Out-Of-Sight Trajectory prediction (OOSTraj)的首次尝试，为未来研究设立了新的基准。该代码可在\url{https://github.com/Hai-chao-Zhang/OOSTraj}上获取。

更新时间: 2024-04-02 18:30:29

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.02227v1

CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement

We propose CHOSEN, a simple yet flexible, robust and effective multi-view depth refinement framework. It can be employed in any existing multi-view stereo pipeline, with straightforward generalization capability for different multi-view capture systems such as camera relative positioning and lenses. Given an initial depth estimation, CHOSEN iteratively re-samples and selects the best hypotheses, and automatically adapts to different metric or intrinsic scales determined by the capture system. The key to our approach is the application of contrastive learning in an appropriate solution space and a carefully designed hypothesis feature, based on which positive and negative hypotheses can be effectively distinguished. Integrated in a simple baseline multi-view stereo pipeline, CHOSEN delivers impressive quality in terms of depth and normal accuracy compared to many current deep learning based multi-view stereo pipelines.

Updated: 2024-04-02 18:27:03

标题: CHOSEN：用于多视图深度细化的对比假设选择

摘要: 我们提出了CHOSEN，一个简单而灵活、稳健且有效的多视图深度精化框架。它可以应用于任何现有的多视图立体摄影管线中，具有对不同多视图捕捉系统（如相机相对定位和镜头）的简单泛化能力。给定一个初始深度估计，CHOSEN会迭代地重新采样和选择最佳假设，并自动适应由捕捉系统确定的不同度量或内在尺度。我们方法的关键是在适当的解决方案空间中应用对比学习，以及基于此设计的假设特征，通过这些特征，正负假设可以被有效区分。集成在一个简单的基线多视图立体管线中，CHOSEN在深度和法线准确性方面提供了令人印象深刻的质量，与许多当前基于深度学习的多视图立体管线相比。

更新时间: 2024-04-02 18:27:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.02225v1

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

Are $n$-gram language models still relevant in this era of neural large language models (LLMs)? Our answer is yes, and we showcase their values in both text analysis and improving neural LLMs. This was done by modernizing $n$-gram LMs in two aspects. First, we train them at the same data scale as neural LLMs -- 5 trillion tokens. This is the largest $n$-gram LM ever built. Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $\infty$-gram LM with backoff. Instead of pre-computing $n$-gram count tables (which would be very expensive), we develop an engine named infini-gram -- powered by suffix arrays -- that can compute $\infty$-gram (as well as $n$-gram with arbitrary $n$) probabilities with millisecond-level latency. The $\infty$-gram framework and infini-gram engine enable us to conduct many novel and interesting analyses of human-written and machine-generated text: we find that the $\infty$-gram LM has fairly high accuracy for next-token prediction (47%), and can complement neural LLMs to greatly reduce their perplexity. When analyzing machine-generated text, we also observe irregularities in the machine--$\infty$-gram agreement level with respect to the suffix length, which indicates deficiencies in neural LLM pretraining and the positional embeddings of Transformers.

Updated: 2024-04-02 18:14:53

标题: Infini-gram：将无限制的n-gram语言模型扩展到一万亿个标记

摘要: 在这个神经大语言模型（LLMs）时代，$n$-gram语言模型仍然具有相关性吗？我们的答案是肯定的，并且我们展示了它们在文本分析和改进神经LLMs中的价值。这是通过在两个方面现代化$n$-gram LM完成的。首先，我们以与神经LLMs相同的数据规模进行训练-- 5万亿标记。这是迄今为止构建的最大的$n$-gram LM。其次，现有的$n$-gram LM使用较小的$n$，这限制了它们的性能；相反，我们允许$n$可以是任意大的，通过引入一个新的$\infty$-gram LM与回退。我们开发了一个名为infini-gram的引擎--由后缀数组提供支持--它可以在毫秒级延迟下计算$\infty$-gram（以及任意$n$的$n$-gram）概率，而不是预先计算$n$-gram计数表（这将非常昂贵）。 $\infty$-gram框架和infini-gram引擎使我们能够对人类编写和机器生成的文本进行许多新颖且有趣的分析：我们发现$\infty$-gram LM在下一个标记预测方面具有相当高的准确率（47%），并且可以辅助神经LLMs大大降低其困惑度。在分析机器生成的文本时，我们还观察到与后缀长度相关的机器--$\infty$-gram协议水平的不规则性，这表明神经LLM预训练和Transformer的位置嵌入存在不足。

更新时间: 2024-04-02 18:14:53

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2401.17377v2

Horoballs and the subgradient method

To explore convex optimization on Hadamard spaces, we consider an iteration in the style of a subgradient algorithm. Traditionally, such methods assume that the underlying spaces are manifolds and that the objectives are geodesically convex: the methods are described using tangent spaces and exponential maps. By contrast, our iteration applies in a general Hadamard space, is framed in the underlying space itself, and relies instead on horospherical convexity of the objective level sets. For this restricted class of objectives, we prove a complexity result of the usual form. Notably, the complexity does not depend on a lower bound on the space curvature. We illustrate our subgradient algorithm on the minimal enclosing ball problem in Hadamard spaces.

Updated: 2024-04-02 18:11:09

标题: Horoballs和次梯度方法

摘要: 为了探索在Hadamard空间上的凸优化问题，我们考虑了一种类似次梯度算法的迭代方法。传统上，这种方法假设基础空间是流形，并且目标是测地凸的：这些方法使用切空间和指数映射来描述。相比之下，我们的迭代方法适用于一般的Hadamard空间，是在基础空间本身中构建的，并且依赖于目标水平集的水平凸性。对于这一受限的目标类别，我们证明了一个常规形式的复杂性结果。值得注意的是，该复杂性并不依赖于空间曲率的下界。我们在Hadamard空间中的最小外接球问题上展示了我们的次梯度算法。

更新时间: 2024-04-02 18:11:09

领域: math.OC,cs.CC,cs.LG,90C48, 65Y20, 49M29,G.1.6

下载: http://arxiv.org/abs/2403.15749v2

TAO-Amodal: A Benchmark for Tracking Any Object Amodally

Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of \textit{modal} annotations in most benchmarks. To address the scarcity of amodal benchmarks, we introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences. Our dataset includes \textit{amodal} and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame. We investigate the current lay of the land in both amodal tracking and detection by benchmarking state-of-the-art modal trackers and amodal segmentation methods. We find that existing methods, even when adapted for amodal tracking, struggle to detect and track objects under heavy occlusion. To mitigate this, we explore simple finetuning schemes that can increase the amodal tracking and detection metrics of occluded objects by 2.1\% and 3.3\%.

Updated: 2024-04-02 18:09:22

标题: TAO-Amodal：一个任意物体的无模式跟踪基准

摘要: 跨模态感知是一种理解完整物体结构的能力，即使对婴儿来说也是一项基本技能。其重要性延伸至应用领域，如自动驾驶，在这些应用中，对于严重遮挡物体的清晰理解是必不可少的。然而，现代检测和跟踪算法往往忽视了这一关键能力，或许是因为大多数基准数据集中存在“模态”注释。为了解决跨模态基准数据集的稀缺性，我们引入了TAO-Amodal，其中包含了数千个视频序列中的833个不同类别。我们的数据集包括可见和部分或完全遮挡对象的“跨模态”和模态边界框，包括那些部分超出摄像机框架的对象。我们通过对比最先进的模态跟踪器和跨模态分割方法，调查了当前跨模态跟踪和检测的现状。我们发现，即使是为跨模态跟踪进行调整的现有方法，也难以检测和跟踪严重遮挡的对象。为了缓解这一问题，我们探索了简单的微调方案，可以将遮挡对象的跨模态跟踪和检测指标提高2.1%和3.3%。

更新时间: 2024-04-02 18:09:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.12433v3

Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-solving and learning, we conducted a think-aloud study with 12 novices using the LLM Hint Factory, a system providing four levels of hints from general natural language guidance to concrete code assistance, varying in format and granularity. We discovered that high-level natural language hints alone can be helpless or even misleading, especially when addressing next-step or syntax-related help requests. Adding lower-level hints, like code examples with in-line comments, can better support students. The findings open up future work on customizing help responses from content, format, and granularity levels to accurately identify and meet students' learning needs.

Updated: 2024-04-02 18:05:26

标题: 探索多层次GPT生成的编程提示如何支持或让新手失望

摘要: 最近的研究已经将大型语言模型（LLMs）整合到各种教育环境中，包括提供自适应编程提示，一种反馈类型专注于在解决问题过程中帮助学生继续前进。然而，大多数现有的基于LLM的提示系统仅限于一种单一提示类型。为了调查不同级别的提示如何支持学生的问题解决和学习，我们进行了一项思考研究，其中有12名新手使用LLM提示工厂，这是一个提供四个级别提示的系统，从一般自然语言指导到具体的代码辅助，格式和细节不同。我们发现，仅使用高级别自然语言提示可能是无助的，甚至会误导，特别是在解决下一步或与语法相关的帮助请求时。添加低级别提示，如带有内联注释的代码示例，可以更好地支持学生。这些发现为将来定制帮助响应从内容、格式和细节级别开展了未来工作，以准确识别并满足学生的学习需求。

更新时间: 2024-04-02 18:05:26

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2404.02213v1

A Holistic Indicator of Polarization to Measure Online Sexism

The online trend of the manosphere and feminist discourse on social networks requires a holistic measure of the level of sexism in an online community. This indicator is important for policymakers and moderators of online communities (e.g., subreddits) and computational social scientists, either to revise moderation strategies based on the degree of sexism or to match and compare the temporal sexism across different platforms and communities with real-time events and infer social scientific insights. In this paper, we build a model that can provide a comparable holistic indicator of toxicity targeted toward male and female identity and male and female individuals. Despite previous supervised NLP methods that require annotation of toxic comments at the target level (e.g. annotating comments that are specifically toxic toward women) to detect targeted toxic comments, our indicator uses supervised NLP to detect the presence of toxicity and unsupervised word embedding association test to detect the target automatically. We apply our model to gender discourse communities (e.g., r/TheRedPill, r/MGTOW, r/FemaleDatingStrategy) to detect the level of toxicity toward genders (i.e., sexism). Our results show that our framework accurately and consistently (93% correlation) measures the level of sexism in a community. We finally discuss how our framework can be generalized in the future to measure qualities other than toxicity (e.g. sentiment, humor) toward general-purpose targets and turn into an indicator of different sorts of polarizations.

Updated: 2024-04-02 18:00:42

标题: 一个用于测量在线性别歧视的整体极化指标

摘要: 《男性圈》和女权主义在社交网络上的在线趋势需要对在线社区中的性别歧视水平进行整体衡量。这一指标对政策制定者、在线社区的管理员（例如，subreddits）和计算社会科学家来说至关重要，可以根据性别歧视程度修改管理策略，或者比较不同平台和社区之间的时间性别歧视，并推断社会科学见解。本文构建了一个模型，可以提供一个可比较的针对男性和女性身份以及男性和女性个体的毒性指标。尽管先前的监督NLP方法需要在目标级别注释有毒评论（例如，注释针对女性具体有毒的评论）来检测有针对性的有毒评论，我们的指标使用监督NLP来检测毒性的存在，并使用无监督的词嵌入关联测试来自动检测目标。我们将我们的模型应用于性别话语社区（例如，r/TheRedPill、r/MGTOW、r/FemaleDatingStrategy）来检测对性别的毒性水平（即，性别歧视）。我们的结果显示，我们的框架准确且一致（93%相关性）地测量了社区中的性别歧视水平。最后，我们讨论了我们的框架如何在未来推广以衡量除毒性之外的其他特质（例如情绪、幽默）对普遍目标的倾向，并变成不同种类极化的指标。

更新时间: 2024-04-02 18:00:42

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2404.02205v1

Emergent Abilities in Reduced-Scale Generative Language Models

Large language models can solve new tasks without task-specific fine-tuning. This ability, also known as in-context learning (ICL), is considered an emergent ability and is primarily seen in large language models with billions of parameters. This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data. To explore this, we simplify pre-training data and pre-train 36 causal language models with parameters varying from 1 million to 165 million parameters. We show that models trained on this simplified pre-training data demonstrate enhanced zero-shot capabilities across various tasks in simplified language, achieving performance comparable to that of pre-trained models six times larger on unrestricted language. This suggests that downscaling the language allows zero-shot learning capabilities to emerge in models with limited size. Additionally, we find that these smaller models pre-trained on simplified data demonstrate a power law relationship between the evaluation loss and the three scaling factors: compute, dataset size, and model size.

Updated: 2024-04-02 18:00:28

标题: 规模缩小的生成式语言模型中的新兴能力

摘要: 大型语言模型可以在没有特定任务的微调的情况下解决新任务。这种能力，也被称为上下文学习（ICL），被认为是一种新兴的能力，主要出现在具有数十亿参数的大型语言模型中。本研究调查了这种新兴属性是否严格与模型大小相关，或者可以由在缩小规模数据上训练的较小模型来证明。为了探索这一点，我们简化了预训练数据，并预先训练了36个因果语言模型，参数从100万到1.65亿个参数不等。我们表明，训练在这种简化的预训练数据上的模型在简化语言中展示了增强的零次学习能力，实现了与在无限制语言上大六倍的预训练模型相当的性能。这表明，缩小语言允许在尺寸有限的模型中出现零次学习能力。此外，我们发现，这些在简化数据上预训练的较小模型在评估损失和三个缩放因素之间呈现幂律关系：计算、数据集大小和模型大小。

更新时间: 2024-04-02 18:00:28

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.02204v1

Segment Any 3D Object with Language

In this paper, we investigate Open-Vocabulary 3D Instance Segmentation (OV-3DIS) with free-form language instructions. Earlier works that rely on only annotated base categories for training suffer from limited generalization to unseen novel categories. Recent works mitigate poor generalizability to novel categories by generating class-agnostic masks or projecting generalized masks from 2D to 3D, but disregard semantic or geometry information, leading to sub-optimal performance. Instead, generating generalizable but semantic-related masks directly from 3D point clouds would result in superior outcomes. In this paper, we introduce Segment any 3D Object with LanguagE (SOLE), which is a semantic and geometric-aware visual-language learning framework with strong generalizability by generating semantic-related masks directly from 3D point clouds. Specifically, we propose a multimodal fusion network to incorporate multimodal semantics in both backbone and decoder. In addition, to align the 3D segmentation model with various language instructions and enhance the mask quality, we introduce three types of multimodal associations as supervision. Our SOLE outperforms previous methods by a large margin on ScanNetv2, ScanNet200, and Replica benchmarks, and the results are even close to the fully-supervised counterpart despite the absence of class annotations in the training. Furthermore, extensive qualitative results demonstrate the versatility of our SOLE to language instructions.

Updated: 2024-04-02 17:59:10

标题: 用语言对3D对象进行分割

摘要: 在这篇论文中，我们研究了使用自由形式语言指令进行开放词汇3D实例分割（OV-3DIS）。先前的研究仅依赖于已标注的基本类别进行训练，导致对未见过的新类别的泛化能力有限。最近的研究通过生成与类别无关的蒙版或将通用蒙版从2D投影到3D来缓解对新类别的泛化能力不足，但忽略了语义或几何信息，导致性能次优。相反，直接从3D点云生成可泛化但与语义相关的蒙版将产生更优越的结果。在本文中，我们引入了一种名为带语言的3D对象分割（SOLE）的语义和几何感知视觉语言学习框架，通过直接从3D点云生成与语义相关的蒙版来实现强大的泛化能力。具体来说，我们提出了一个多模态融合网络，以在骨干和解码器中整合多模态语义。此外，为了使3D分割模型与各种语言指令对齐并增强蒙版质量，我们引入了三种类型的多模态关联作为监督。我们的SOLE在ScanNetv2、ScanNet200和Replica基准上远远优于先前的方法，并且尽管在训练中缺少类别注释，结果甚至接近全监督对应物。此外，大量定性结果展示了我们的SOLE对语言指令的多功能性。

更新时间: 2024-04-02 17:59:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.02157v1

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks. First, we demonstrate how to successfully leverage access to logprobs for jailbreaking: we initially design an adversarial prompt template (sometimes adapted to the target LLM), and then we apply random search on a suffix to maximize the target logprob (e.g., of the token "Sure"), potentially with multiple restarts. In this way, we achieve nearly 100\% attack success rate -- according to GPT-4 as a judge -- on GPT-3.5/4, Llama-2-Chat-7B/13B/70B, Gemma-7B, and R2D2 from HarmBench that was adversarially trained against the GCG attack. We also show how to jailbreak all Claude models -- that do not expose logprobs -- via either a transfer or prefilling attack with 100\% success rate. In addition, we show how to use random search on a restricted set of tokens for finding trojan strings in poisoned models -- a task that shares many similarities with jailbreaking -- which is the algorithm that brought us the first place in the SaTML'24 Trojan Detection Competition. The common theme behind these attacks is that adaptivity is crucial: different models are vulnerable to different prompting templates (e.g., R2D2 is very sensitive to in-context learning prompts), some models have unique vulnerabilities based on their APIs (e.g., prefilling for Claude), and in some settings it is crucial to restrict the token search space based on prior knowledge (e.g., for trojan detection). We provide the code, prompts, and logs of the attacks at https://github.com/tml-epfl/llm-adaptive-attacks.

Updated: 2024-04-02 17:58:27

标题: 使用简单的自适应攻击破解主安全对齐的LLMs

摘要: 我们展示了即使最新的与安全对齐的LLMs也无法抵御简单的自适应越狱攻击。首先，我们展示了如何成功利用访问logprobs进行越狱：我们最初设计了一个敌对提示模板（有时适应于目标LLM），然后我们在后缀上应用随机搜索以最大化目标logprob（例如，标记“Sure”），可能会进行多次重启。通过这种方式，我们在GPT-3.5/4、Llama-2-Chat-7B/13B/70B、Gemma-7B和针对GCG攻击进行对抗训练的HarmBench中的R2D2上实现了近100％的攻击成功率-根据GPT-4的评判。我们还展示了如何通过转移攻击或预先填充攻击以100％的成功率越狱所有不暴露logprobs的Claude模型。此外，我们展示了如何在受污染的模型中使用对一组受限制的标记进行随机搜索以找到特洛伊木马字符串-这项任务与越狱有许多相似之处-这是使我们在SaTML'24特洛伊木马检测竞赛中获得第一名的算法。这些攻击背后的共同主题是适应性至关重要：不同的模型对不同的提示模板易受攻击（例如，R2D2对于上下文学习提示非常敏感），一些模型基于其API具有独特的漏洞（例如，对于Claude的预先填充），并且在某些情况下，根据先前的知识限制标记搜索空间是至关重要的（例如，用于特洛伊木马检测）。我们在https://github.com/tml-epfl/llm-adaptive-attacks提供了攻击的代码、提示和日志。

更新时间: 2024-04-02 17:58:27

领域: cs.CR,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.02151v1

From Seaweed to Security: The Emergence of Alginate in Compromising IoT Fingerprint Sensors

The increasing integration of capacitive fingerprint recognition sensors in IoT devices presents new challenges in digital forensics, particularly in the context of advanced fingerprint spoofing. Previous research has highlighted the effectiveness of materials such as latex and silicone in deceiving biometric systems. In this study, we introduce Alginate, a biopolymer derived from brown seaweed, as a novel material with the potential for spoofing IoT-specific capacitive fingerprint sensors. Our research uses Alginate and cutting-edge image recognition techniques to unveil a nuanced IoT vulnerability that raises significant security and privacy concerns. Our proof-of-concept experiments employed authentic fingerprint molds to create Alginate replicas, which exhibited remarkable visual and tactile similarities to real fingerprints. The conductivity and resistivity properties of Alginate, closely resembling human skin, make it a subject of interest in the digital forensics field, especially regarding its ability to spoof IoT device sensors. This study calls upon the digital forensics community to develop advanced anti-spoofing strategies to protect the evolving IoT infrastructure against such sophisticated threats.

Updated: 2024-04-02 17:58:24

标题: 从海藻到安全：褐藻酸盐在危害物联网指纹传感器中的应用

摘要: 越来越多的电容指纹识别传感器被集成到物联网设备中，这给数字取证带来了新的挑战，尤其是在高级指纹欺骗的背景下。先前的研究已经强调了乳胶和硅胶等材料在欺骗生物识别系统方面的有效性。在这项研究中，我们介绍了一种源自棕褐藻类的生物聚合物——海藻酸钠，作为一种新型材料，具有欺骗物联网特定电容指纹传感器的潜力。我们的研究利用海藻酸钠和先进的图像识别技术揭示了一种微妙的物联网漏洞，引发了重大的安全和隐私担忧。我们的概念验证实验证用指纹模具来创建海藻酸钠复制品，这些复制品在视觉和触觉上与真实指纹具有显著的相似性。海藻酸钠的导电性和电阻性特性与人类皮肤非常相似，使其在数字取证领域备受关注，特别是其欺骗物联网设备传感器的能力。这项研究呼吁数字取证社区制定先进的反欺骗策略，以保护不断发展的物联网基础设施免受这些复杂威胁的侵害。

更新时间: 2024-04-02 17:58:24

领域: cs.CR

下载: http://arxiv.org/abs/2404.02150v1

HALO: An Ontology for Representing and Categorizing Hallucinations in Large Language Models

Recent progress in generative AI, including large language models (LLMs) like ChatGPT, has opened up significant opportunities in fields ranging from natural language processing to knowledge discovery and data mining. However, there is also a growing awareness that the models can be prone to problems such as making information up or `hallucinations', and faulty reasoning on seemingly simple problems. Because of the popularity of models like ChatGPT, both academic scholars and citizen scientists have documented hallucinations of several different types and severity. Despite this body of work, a formal model for describing and representing these hallucinations (with relevant meta-data) at a fine-grained level, is still lacking. In this paper, we address this gap by presenting the Hallucination Ontology or HALO, a formal, extensible ontology written in OWL that currently offers support for six different types of hallucinations known to arise in LLMs, along with support for provenance and experimental metadata. We also collect and publish a dataset containing hallucinations that we inductively gathered across multiple independent Web sources, and show that HALO can be successfully used to model this dataset and answer competency questions.

Updated: 2024-04-02 17:55:52

标题: HALO：用于在大型语言模型中表示和分类幻觉的本体论

摘要: 最近在生成式人工智能方面取得的进展，包括像ChatGPT这样的大型语言模型（LLMs），已经在涉及自然语言处理、知识发现和数据挖掘等领域打开了重大机遇。然而，人们也越来越意识到这些模型可能存在问题，比如编造信息或产生“幻觉”，以及在看似简单的问题上出现错误推理。由于像ChatGPT这样的模型的流行，学术学者和公民科学家都记录了几种不同类型和严重程度的幻觉。尽管有这些研究成果，但仍然缺乏一个形式化的模型来描述和表示这些幻觉（以及相关的元数据）在细粒度水平上。在本文中，我们通过介绍Hallucination Ontology（HALO）来填补这一空白，HALO是一个形式化的、可扩展的本体论，使用OWL编写，目前支持六种已知会出现在LLMs中的幻觉类型，并提供支持来源和实验元数据的功能。我们还收集和发布了一个数据集，其中包含我们从多个独立网络来源归纳收集到的幻觉，展示了HALO能够成功地用于建模这个数据集，并回答能力问题。

更新时间: 2024-04-02 17:55:52

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2312.05209v2

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

Many statistical analyses, in both observational data and randomized control trials, ask: how does the outcome of interest vary with combinations of observable covariates? How do various drug combinations affect health outcomes, or how does technology adoption depend on incentives and demographics? Our goal is to partition this factorial space into ``pools'' of covariate combinations where the outcome differs across the pools (but not within a pool). Existing approaches (i) search for a single ``optimal'' partition under assumptions about the association between covariates or (ii) sample from the entire set of possible partitions. Both these approaches ignore the reality that, especially with correlation structure in covariates, many ways to partition the covariate space may be statistically indistinguishable, despite very different implications for policy or science. We develop an alternative perspective, called Rashomon Partition Sets (RPSs). Each item in the RPS partitions the space of covariates using a tree-like geometry. RPSs incorporate all partitions that have posterior values near the maximum a posteriori partition, even if they offer substantively different explanations, and do so using a prior that makes no assumptions about associations between covariates. This prior is the $\ell_0$ prior, which we show is minimax optimal. Given the RPS we calculate the posterior of any measurable function of the feature effects vector on outcomes, conditional on being in the RPS. We also characterize approximation error relative to the entire posterior and provide bounds on the size of the RPS. Simulations demonstrate this framework allows for robust conclusions relative to conventional regularization techniques. We apply our method to three empirical settings: price effects on charitable giving, chromosomal structure (telomere length), and the introduction of microfinance.

Updated: 2024-04-02 17:53:28

标题: 使用Rashomon分区稳健地估计因子数据中的异质性

摘要: 在观察性数据和随机对照试验中，许多统计分析都会问：感兴趣的结果如何随着可观测协变量的组合而变化？各种药物组合如何影响健康结果，或者技术采用如何取决于激励和人口统计信息？我们的目标是将这个因子空间划分为协变量组合的“池”，在这些池中结果会有所不同（但在池内结果相同）。现有方法（i）在关于协变量之间关联的假设下寻找单一“最佳”划分，或者（ii）从可能的所有划分中进行采样。这两种方法都忽视了现实，尤其是在协变量的相关结构中，尽管在政策或科学上有着非常不同的含义，但划分协变量空间的许多方式在统计上可能是无法区分的。我们开发了一种另类视角，称为拉尚模划分集（RPSs）。RPSs中的每个项目使用类似树状的几何形状来划分协变量空间。RPSs包含所有具有接近最大后验划分的后验值的划分，即使它们提供了实质上不同的解释，也是如此，并且使用了一个不对协变量之间关联做任何假设的先验。这个先验是$\ell_0$先验，我们展示了它是极小化最大风险的最优选择。给定RPS，我们计算了在RPS中的情况下，对结果的特征效应向量的任何可测函数的后验。我们还表征了相对于整个后验的近似误差，并提供了关于RPS大小的界限。模拟结果表明，这个框架允许相对于传统的正则化技术得出稳健的结论。我们将我们的方法应用于三个实证设置：慈善捐赠的价格效应，染色体结构（端粒长度）和微金融的引入。

更新时间: 2024-04-02 17:53:28

领域: stat.ME,cs.LG,econ.EM,stat.CO,stat.ML

下载: http://arxiv.org/abs/2404.02141v1

Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding

Inference-time search algorithms such as Monte-Carlo Tree Search (MCTS) may seem unnecessary when generating natural language text based on state-of-the-art reinforcement learning such as Proximal Policy Optimization (PPO). In this paper, we demonstrate that it is possible to get extra mileage out of PPO by integrating MCTS on top. The key idea is not to throw out the value network, a byproduct of PPO training for evaluating partial output sequences, when decoding text out of the policy network. More concretely, we present a novel value-guided decoding algorithm called PPO-MCTS, which can integrate the value network from PPO to work closely with the policy network during inference-time generation. Compared to prior approaches based on MCTS for controlled text generation, the key strength of our approach is to reduce the fundamental mismatch of the scoring mechanisms of the partial outputs between training and test. Evaluation on four text generation tasks demonstrate that PPO-MCTS greatly improves the preferability of generated text compared to the standard practice of using only the PPO policy. Our results demonstrate the promise of search algorithms even on top of the aligned language models from PPO, and the under-explored benefit of the value network.

Updated: 2024-04-02 17:51:49

标题: 不要丢弃您的价值模型！利用价值引导的蒙特卡罗树搜索解码生成更可取的文本

摘要: 推理时间搜索算法，如蒙特卡洛树搜索（MCTS），在基于最先进的强化学习（如Proximal Policy Optimization，PPO）生成自然语言文本时可能看起来是不必要的。在本文中，我们展示了通过在顶部集成MCTS可以从PPO中获得额外的收益。关键思想是在从策略网络解码文本时，不要丢弃值网络，这是PPO训练的副产品，用于评估部分输出序列。更具体地说，我们提出了一种名为PPO-MCTS的新颖的值引导解码算法，它可以集成来自PPO的值网络，与策略网络在推理时间生成期间紧密合作。与基于MCTS进行受控文本生成的先前方法相比，我们方法的关键优势是减少在训练和测试之间部分输出的评分机制之间的基本不匹配。对四项文本生成任务的评估表明，与仅使用PPO策略的标准实践相比，PPO-MCTS显着提高了生成文本的可取性。我们的结果表明，即使在PPO的对齐语言模型之上，搜索算法也具有潜力，并且值网络的未充分利用的好处。

更新时间: 2024-04-02 17:51:49

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.15028v3

Topic-based Watermarks for LLM-Generated Text

Recent advancements of large language models (LLMs) have resulted in indistinguishable text outputs comparable to human-generated text. Watermarking algorithms are potential tools that offer a way to differentiate between LLM- and human-generated text by embedding detectable signatures within LLM-generated output. However, current watermarking schemes lack robustness against known attacks against watermarking algorithms. In addition, they are impractical considering an LLM generates tens of thousands of text outputs per day and the watermarking algorithm needs to memorize each output it generates for the detection to work. In this work, focusing on the limitations of current watermarking schemes, we propose the concept of a "topic-based watermarking algorithm" for LLMs. The proposed algorithm determines how to generate tokens for the watermarked LLM output based on extracted topics of an input prompt or the output of a non-watermarked LLM. Inspired from previous work, we propose using a pair of lists (that are generated based on the specified extracted topic(s)) that specify certain tokens to be included or excluded while generating the watermarked output of the LLM. Using the proposed watermarking algorithm, we show the practicality of a watermark detection algorithm. Furthermore, we discuss a wide range of attacks that can emerge against watermarking algorithms for LLMs and the benefit of the proposed watermarking scheme for the feasibility of modeling a potential attacker considering its benefit vs. loss.

Updated: 2024-04-02 17:49:40

标题: 基于主题的水印技术用于由LLM生成的文本

摘要: 最近大型语言模型(LLMs)的进展导致无法区分的文本输出与人类生成的文本相媲美。水印算法是一种潜在的工具，可以通过在LLM生成的输出中嵌入可检测的签名来区分LLM生成的文本和人类生成的文本。然而，当前的水印方案缺乏对已知攻击的抵抗力。此外，考虑到LLM每天生成数万个文本输出，水印算法需要记住每个生成的输出才能进行检测，这在实践中是不可行的。在本研究中，我们关注当前水印方案的局限性，提出了一种针对LLMs的“基于主题的水印算法”的概念。所提出的算法根据输入提示的提取的主题或非水印LLM的输出确定如何生成水印LLM输出的标记。受先前工作的启发，我们建议使用一对列表(根据指定的提取主题生成)指定在生成LLM的水印输出时要包含或排除的特定标记。通过使用所提出的水印算法，我们展示了水印检测算法的实用性。此外，我们讨论了可能针对LLMs的水印算法出现的各种攻击以及所提出的水印方案对于建模潜在攻击者的可行性的好处与损失。

更新时间: 2024-04-02 17:49:40

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.02138v1

Neural Embedding Compression For Efficient Multi-Task Earth Observation Modelling

As repositories of large scale data in earth observation (EO) have grown, so have transfer and storage costs for model training and inference, expending significant resources. We introduce Neural Embedding Compression (NEC), based on the transfer of compressed embeddings to data consumers instead of raw data. We adapt foundation models (FM) through learned neural compression to generate multi-task embeddings while navigating the tradeoff between compression rate and embedding utility. We update only a small fraction of the FM parameters (10%) for a short training period (1% of the iterations of pre-training). We evaluate NEC on two EO tasks: scene classification and semantic segmentation. Compared with applying traditional compression to the raw data, NEC achieves similar accuracy with a 75% to 90% reduction in data. Even at 99.7% compression, performance drops by only 5% on the scene classification task. Overall, NEC is a data-efficient yet performant approach for multi-task EO modelling.

Updated: 2024-04-02 17:48:54

标题: 神经嵌入压缩用于高效的多任务地球观测建模

摘要: 随着地球观测（EO）中大规模数据仓库的增长，模型训练和推理的传输和存储成本也在增加，消耗了大量资源。我们引入了神经嵌入压缩（NEC）方法，该方法基于将压缩的嵌入传输给数据消费者，而不是原始数据。我们通过学习神经压缩来调整基础模型（FM），生成多任务嵌入，并在在压缩率和嵌入效用之间取得平衡。我们仅针对FM参数的一小部分（10%）进行更新，进行短期训练（预训练迭代的1%）。我们在两个EO任务上评估了NEC：场景分类和语义分割。与将传统压缩应用于原始数据相比，NEC在数据减少了75%至90%的情况下实现了类似的准确性。即使在99.7%的压缩率下，场景分类任务的性能仅下降了5%。总的来说，NEC是一种对多任务EO建模而言数据效率高且性能优越的方法。

更新时间: 2024-04-02 17:48:54

领域: cs.LG

下载: http://arxiv.org/abs/2403.17886v2

Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models

A world model creates a surrogate world to train a controller and predict safety violations by learning the internal dynamic model of systems. However, the existing world models rely solely on statistical learning of how observations change in response to actions, lacking precise quantification of how accurate the surrogate dynamics are, which poses a significant challenge in safety-critical systems. To address this challenge, we propose foundation world models that embed observations into meaningful and causally latent representations. This enables the surrogate dynamics to directly predict causal future states by leveraging a training-free large language model. In two common benchmarks, this novel model outperforms standard world models in the safety prediction task and has a performance comparable to supervised learning despite not using any data. We evaluate its performance with a more specialized and system-relevant metric by comparing estimated states instead of aggregating observation-wide error.

Updated: 2024-04-02 17:39:26

标题: 使用基础世界模型进行无监督安全预测的自主机器人

摘要: 一个世界模型创建了一个替代世界来训练控制器，并通过学习系统的内部动态模型来预测安全违规行为。然而，现有的世界模型仅依赖于统计学习来了解观察结果如何响应行动的变化，缺乏准确量化替代动态的精确性，这在安全关键系统中构成了一个重大挑战。为了解决这一挑战，我们提出了基础世界模型，将观察结果嵌入有意义和因果潜在的表示中。这使得替代动态能够直接通过利用无需训练的大型语言模型来预测因果未来状态。在两个常见的基准测试中，这种新颖模型在安全预测任务中表现优于标准世界模型，并且在不使用任何数据的情况下表现与监督学习相媲美。我们通过比较估计的状态而不是聚合观察误差来评估其性能。

更新时间: 2024-04-02 17:39:26

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.00462v2

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

An increasing number of vision-language tasks can be handled with little to no training, i.e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs). While this has huge upsides, such as not requiring training data or custom architectures, how an input is presented to an LVLM can have a major impact on zero-shot model performance. In particular, inputs phrased in an underspecified way can result in incorrect answers due to factors like missing visual information, complex implicit reasoning, or linguistic ambiguity. Therefore, adding visually-grounded information to the input as a preemptive clarification should improve model performance by reducing underspecification, e.g., by localizing objects and disambiguating references. Similarly, in the VQA setting, changing the way questions are framed can make them easier for models to answer. To this end, we present Rephrase, Augment and Reason (RepARe), a gradient-free framework that extracts salient details about the image using the underlying LVLM as a captioner and reasoner, in order to propose modifications to the original question. We then use the LVLM's confidence over a generated answer as an unsupervised scoring function to select the rephrased question most likely to improve zero-shot performance. Focusing on three visual question answering tasks, we show that RepARe can result in a 3.85% (absolute) increase in zero-shot accuracy on VQAv2, 6.41%, and 7.94% points increase on A-OKVQA, and VizWiz respectively. Additionally, we find that using gold answers for oracle question candidate selection achieves a substantial gain in VQA accuracy by up to 14.41%. Through extensive analysis, we demonstrate that outputs from RepARe increase syntactic complexity, and effectively utilize vision-language interaction and the frozen LLM.

Updated: 2024-04-02 17:37:42

标题: 重述、扩充、论证：视觉-语言模型问题的视觉基础

摘要: 越来越多的视觉-语言任务可以在零或少量训练的情况下处理，即通过将大型语言模型（LLMs）与视觉编码器相结合，形成大型视觉-语言模型（LVLMs）。尽管这具有巨大的优势，如不需要训练数据或自定义架构，但如何呈现输入给LVLM会对零样本模型性能产生重大影响。特别是，以不充分方式表达的输入可能导致错误答案，原因可能是缺少视觉信息、复杂的隐含推理或语言歧义。因此，通过向输入添加与视觉相关的信息作为先发制人的澄清，应该可以通过减少不充分性来提高模型性能，例如通过定位对象和消除歧义引用。同样，在VQA设置中，改变问题的表达方式可以使模型更容易回答。为此，我们提出了一种无梯度框架——重新表述、增强和推理（RepARe），利用底层LVLM作为字幕生成器和推理者来提取图像的显著细节，以提出对原始问题的修改。然后，我们使用LVLM对生成答案的置信度作为无监督评分函数，选择最有可能提高零样本性能的重新表述问题。通过专注于三个视觉问答任务，我们展示了RepARe可以在VQAv2上使零样本准确率提高了3.85%（绝对值），在A-OKVQA和VizWiz分别提高了6.41%和7.94%。此外，我们发现在选择神谕问题候选时使用黄金答案可以使VQA准确率大幅提高，最高可达14.41%。通过广泛的分析，我们证明RepARe的输出增加了句法复杂性，并有效利用了视觉-语言互动和冻结的LLM。

更新时间: 2024-04-02 17:37:42

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.05861v2

FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning

Instruction tuning is an important step in making language models useful for direct user interaction. However, many legal tasks remain out of reach for most open LLMs and there do not yet exist any large scale instruction datasets for the domain. This critically limits research in this application area. In this work, we curate LawInstruct, a large legal instruction dataset, covering 17 jurisdictions, 24 languages and a total of 12M examples. We present evidence that domain-specific pretraining and instruction tuning improve performance on LegalBench, including improving Flan-T5 XL by 8 points or 16\% over the baseline. However, the effect does not generalize across all tasks, training regimes, model sizes, and other factors. LawInstruct is a resource for accelerating the development of models with stronger information processing and decision making capabilities in the legal domain.

Updated: 2024-04-02 17:33:34

标题: FLawN-T5：法律推理有效指导数据混合的实证研究

摘要: 指令调优是使语言模型对直接用户交互有用的重要步骤。然而，许多法律任务对大多数开放的LLMs来说仍然是难以实现的，并且目前还没有针对该领域的大规模指令数据集。这严重限制了该应用领域的研究。在这项工作中，我们策划了一个名为LawInstruct的大型法律指令数据集，涵盖了17个司法管辖区、24种语言和总共1200万个示例。我们提供证据表明，领域特定的预训练和指令调优可以提高在LegalBench上的性能，包括将Flan-T5 XL的得分提高了8个点或16%以上的基线。然而，这种效果并不适用于所有任务、训练方案、模型大小和其他因素。LawInstruct是一个资源，可以加速在法律领域开发具有更强信息处理和决策能力的模型。

更新时间: 2024-04-02 17:33:34

领域: cs.CL,cs.AI,cs.LG,68T50,I.2

下载: http://arxiv.org/abs/2404.02127v1

VC dimension of Graph Neural Networks with Pfaffian activation functions

Graph Neural Networks (GNNs) have emerged in recent years as a powerful tool to learn tasks across a wide range of graph domains in a data-driven fashion; based on a message passing mechanism, GNNs have gained increasing popularity due to their intuitive formulation, closely linked with the Weisfeiler-Lehman (WL) test for graph isomorphism, to which they have proven equivalent. From a theoretical point of view, GNNs have been shown to be universal approximators, and their generalization capability (namely, bounds on the Vapnik Chervonekis (VC) dimension) has recently been investigated for GNNs with piecewise polynomial activation functions. The aim of our work is to extend this analysis on the VC dimension of GNNs to other commonly used activation functions, such as sigmoid and hyperbolic tangent, using the framework of Pfaffian function theory. Bounds are provided with respect to architecture parameters (depth, number of neurons, input size) as well as with respect to the number of colors resulting from the 1-WL test applied on the graph domain. The theoretical analysis is supported by a preliminary experimental study.

Updated: 2024-04-02 17:30:38

标题: 带有Pfaffian激活函数的图神经网络的VC维度

摘要: 图神经网络（GNNs）近年来已成为一种强大的工具，以数据驱动的方式学习各种图领域的任务；基于消息传递机制，GNNs因其直观的表达方式而越来越受欢迎，与Weisfeiler-Lehman（WL）图同构测试密切相关，二者已被证明等效。从理论上看，GNNs被证明是通用逼近器，最近已经对具有分段多项式激活函数的GNNs的泛化能力（即Vapnik Chervonekis（VC）维度的界）进行了研究。我们的工作旨在将对GNNs的VC维度的分析扩展到其他常用的激活函数，如Sigmoid和双曲正切，使用Pfaffian函数理论框架。基于架构参数（深度、神经元数量、输入大小）以及在图领域上应用的1-WL测试产生的颜色数量，提供了界限。理论分析得到了初步实验研究的支持。

更新时间: 2024-04-02 17:30:38

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2401.12362v2

UINav: A Practical Approach to Train On-Device Automation Agents

Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired. Prior automation systems do not produce generalizable models while AI-based automation agents work reliably only in simple, hand-crafted applications or incur high computation costs. We propose UINav, a demonstration-based approach to train automation agents that fit mobile devices, yet achieving high success rates with modest numbers of demonstrations. To reduce the demonstration overhead, UINav uses a referee model that provides users with immediate feedback on tasks where the agent fails, and automatically augments human demonstrations to increase diversity in training data. Our evaluation shows that with only 10 demonstrations UINav can achieve 70% accuracy, and that with enough demonstrations it can surpass 90% accuracy.

Updated: 2024-04-02 17:25:57

标题: UINav：一种实用的方法来训练设备上的自动化代理

摘要: 自动化系统能够自主驾驶应用程序用户界面，完成用户任务，对于情境或永久性受损的用户特别有益。先前的自动化系统并不能产生可推广的模型，而基于人工智能的自动化代理只能在简单的手工制作应用程序中可靠地工作，或者需要高昂的计算成本。我们提出了UINav，这是一种基于演示的方法，用于训练适合移动设备的自动化代理，同时在演示数量有限的情况下实现高成功率。为了减少演示的开销，UINav使用了一个裁判模型，为代理失败的任务提供及时反馈，并自动增加人类演示以增加训练数据的多样性。我们的评估表明，仅通过10次演示，UINav就可以达到70%的准确率，并且通过足够多的演示，它可以超过90%的准确率。

更新时间: 2024-04-02 17:25:57

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2312.10170v2

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.

Updated: 2024-04-02 17:23:16

标题: 像素Splat：从图像对中获取的3D高斯Splat，用于可扩展的通用化3D重建

摘要: 我们介绍了pixelSplat，这是一个前馈模型，它能够从图像对中学习重建由3D高斯基元参数化的辐射场。我们的模型具有实时和内存高效的渲染，可用于可伸缩训练以及在推理时进行快速3D重建。为了克服固有于稀疏和局部支持表示的局部极小值问题，我们预测了3D上的密集概率分布，并从该概率分布中采样高斯均值。通过重新参数化技巧，我们使该采样操作可微分，从而能够通过高斯splatting表示向后传播梯度。我们在真实世界的RealEstate10k和ACID数据集上对我们的方法进行了基准测试，我们在宽基线新视角合成方面优于最先进的光场变换器，并且在重建可解释和可编辑的3D辐射场的同时加速渲染2.5个数量级。

更新时间: 2024-04-02 17:23:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.12337v3

GINopic: Topic Modeling with Graph Isomorphism Network

Topic modeling is a widely used approach for analyzing and exploring large document collections. Recent research efforts have incorporated pre-trained contextualized language models, such as BERT embeddings, into topic modeling. However, they often neglect the intrinsic informational value conveyed by mutual dependencies between words. In this study, we introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words. By conducting intrinsic (quantitative as well as qualitative) and extrinsic evaluations on diverse benchmark datasets, we demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.

Updated: 2024-04-02 17:18:48

标题: GINopic：使用图同构网络进行主题建模

摘要: 主题建模是分析和探索大型文档集合的广泛应用方法。最近的研究工作已将预训练的上下文化语言模型，如BERT嵌入，纳入主题建模中。然而，它们经常忽视词之间的相互依赖所传达的内在信息价值。在这项研究中，我们介绍了GINopic，这是一个基于图同构网络的主题建模框架，用于捕捉单词之间的相关性。通过在各种基准数据集上进行内在（定量和定性）和外在评估，我们展示了与现有主题模型相比，GINopic的有效性，并强调其推动主题建模的潜力。

更新时间: 2024-04-02 17:18:48

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.02115v1

Deployable Reinforcement Learning with Variable Control Rate

Deploying controllers trained with Reinforcement Learning (RL) on real robots can be challenging: RL relies on agents' policies being modeled as Markov Decision Processes (MDPs), which assume an inherently discrete passage of time. The use of MDPs results in that nearly all RL-based control systems employ a fixed-rate control strategy with a period (or time step) typically chosen based on the developer's experience or specific characteristics of the application environment. Unfortunately, the system should be controlled at the highest, worst-case frequency to ensure stability, which can demand significant computational and energy resources and hinder the deployability of the controller on onboard hardware. Adhering to the principles of reactive programming, we surmise that applying control actions only when necessary enables the use of simpler hardware and helps reduce energy consumption. We challenge the fixed frequency assumption by proposing a variant of RL with variable control rate. In this approach, the policy decides the action the agent should take as well as the duration of the time step associated with that action. In our new setting, we expand Soft Actor-Critic (SAC) to compute the optimal policy with a variable control rate, introducing the Soft Elastic Actor-Critic (SEAC) algorithm. We show the efficacy of SEAC through a proof-of-concept simulation driving an agent with Newtonian kinematics. Our experiments show higher average returns, shorter task completion times, and reduced computational resources when compared to fixed rate policies.

Updated: 2024-04-02 17:18:19

标题: 部署可变控制速率的强化学习

摘要: 使用强化学习（RL）训练的控制器在真实机器人上部署可能具有挑战性：RL依赖于代理策略被建模为马尔可夫决策过程（MDPs），这种过程假设时间是离散的。MDPs的使用导致几乎所有基于RL的控制系统采用固定速率控制策略，周期（或时间步长）通常基于开发者的经验或应用环境的特定特征选择。不幸的是，系统应在最高的、最坏情况下的频率下进行控制，以确保稳定性，这可能需要大量的计算和能源资源，从而阻碍了控制器在板载硬件上的部署。我们遵循反应式编程的原则，认为只在必要时应用控制操作可以使用更简单的硬件，并有助于减少能源消耗。我们挑战了固定频率的假设，提出了一种具有可变控制率的RL变体。在这种方法中，策略决定代理应采取的行动以及与该行动相关联的时间步长的持续时间。在我们的新设置中，我们扩展了Soft Actor-Critic（SAC）以计算具有可变控制率的最优策略，引入了Soft Elastic Actor-Critic（SEAC）算法。通过一个以牛顿运动学驱动代理的概念验证模拟，我们展示了SEAC的有效性。与固定速率策略相比，我们的实验显示了更高的平均回报、更短的任务完成时间和减少的计算资源。

更新时间: 2024-04-02 17:18:19

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2401.09286v2

Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably against two strong baselines.

Updated: 2024-04-02 17:13:24

标题: 测量和控制语言模型对话中的指导（不）稳定性

摘要: 系统提示是定制语言模型聊天机器人的标准工具，使其能够遵循特定指令。在使用系统提示时的一个隐含假设是，它们将是稳定的，因此聊天机器人将在整个对话期间根据规定的指令生成文本。我们提出了一个定量基准来测试这一假设，通过两个受指导的聊天机器人之间的自我对话来评估指令的稳定性。通过测试流行的模型如LLaMA2-chat-70B和GPT-3.5，我们揭示了在八轮对话中存在显着的指令漂移。对这一现象的经验和理论分析表明，变压器的注意力机制起着作用，因为在长时间交流中存在注意力衰减。为了对抗注意力衰减和指令漂移，我们提出了一种轻量级方法称为分裂softmax，与两个强基线方法相比表现良好。

更新时间: 2024-04-02 17:13:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.10962v2

Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL

In continual or lifelong reinforcement learning access to the environment should be limited. If we aspire to design algorithms that can run for long-periods of time, continually adapting to new, unexpected situations then we must be willing to deploy our agents without tuning their hyperparameters over the agent's entire lifetime. The standard practice in deep RL -- and even continual RL -- is to assume unfettered access to deployment environment for the full lifetime of the agent. This paper explores the notion that progress in lifelong RL research has been held back by inappropriate empirical methodologies. In this paper we propose a new approach for tuning and evaluating lifelong RL agents where only one percent of the experiment data can be used for hyperparameter tuning. We then conduct an empirical study of DQN and Soft Actor Critic across a variety of continuing and non-stationary domains. We find both methods generally perform poorly when restricted to one-percent tuning, whereas several algorithmic mitigations designed to maintain network plasticity perform surprising well. In addition, we find that properties designed to measure the network's ability to learn continually indeed correlate with performance under one-percent tuning.

Updated: 2024-04-02 17:13:22

标题: "调整未知因素：重新审视终身强化学习的评估策略"

摘要: 在持续或终身强化学习中，应限制对环境的访问。如果我们希望设计能够长时间运行的算法，不断适应新的、意外的情况，那么我们必须愿意在整个代理的生命周期内部署代理，而不调整它们的超参数。在深度强化学习的标准实践中，甚至在持续强化学习中，通常假设代理在整个生命周期内都可以自由访问部署环境。本文探讨了一个观点，即终身强化学习研究的进展受到了不恰当的实证方法的阻碍。在本文中，我们提出了一种新的方法，用于调整和评估终身强化学习代理，其中仅有百分之一的实验数据可用于超参数调整。然后，我们对DQN和Soft Actor Critic在各种持续和非稳态领域进行了实证研究。我们发现，当限制为一百分之一的调整时，这两种方法通常表现不佳，而设计用于维持网络可塑性的几种算法缓解方法表现出令人惊讶的良好效果。此外，我们发现，设计用于衡量网络持续学习能力的属性确实与百分之一的调整下的性能相关。

更新时间: 2024-04-02 17:13:22

领域: cs.LG

下载: http://arxiv.org/abs/2404.02113v1

ImageNot: A contrast with ImageNet preserves model rankings

We introduce ImageNot, a dataset designed to match the scale of ImageNet while differing drastically in other aspects. We show that key model architectures developed for ImageNet over the years rank identically when trained and evaluated on ImageNot to how they rank on ImageNet. This is true when training models from scratch or fine-tuning them. Moreover, the relative improvements of each model over earlier models strongly correlate in both datasets. We further give evidence that ImageNot has a similar utility as ImageNet for transfer learning purposes. Our work demonstrates a surprising degree of external validity in the relative performance of image classification models. This stands in contrast with absolute accuracy numbers that typically drop sharply even under small changes to a dataset.

Updated: 2024-04-02 17:13:04

标题: ImageNot：与ImageNet相比，保留模型排名

摘要: 我们介绍了ImageNot，这是一个专为与ImageNet相匹配的规模而设计的数据集，但在其他方面有着明显的差异。我们展示了多年来为ImageNet开发的关键模型架构在ImageNot上训练和评估时，排名与它们在ImageNet上的排名相同。这一点在从头训练模型或微调模型时都成立。此外，每个模型相对于较早模型的改进在两个数据集中都强烈相关。我们进一步提供证据表明，ImageNot在迁移学习目的上与ImageNet具有类似的效用。我们的工作展示了图像分类模型相对性能在外部有效性方面的令人惊讶的程度。这与绝对准确性数字形成鲜明对比，即使是对数据集进行小幅更改，绝对准确性数字通常也会大幅下降。

更新时间: 2024-04-02 17:13:04

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.02112v1

MedMamba: Vision Mamba for Medical Image Classification

Medical image classification is a very fundamental and crucial task in the field of computer vision. These years, CNN-based and Transformer-based models have been widely used to classify various medical images. Unfortunately, The limitation of CNNs in long-range modeling capabilities prevents them from effectively extracting features in medical images, while Transformers are hampered by their quadratic computational complexity. Recent research has shown that the state space model (SSM) represented by Mamba can efficiently model long-range interactions while maintaining linear computational complexity. Inspired by this, we propose Vision Mamba for medical image classification (MedMamba). More specifically, we introduce a novel Conv-SSM module. Conv-SSM combines the local feature extraction ability of convolutional layers with the ability of SSM to capture long-range dependency, thereby modeling medical images with different modalities. To demonstrate the potential of MedMamba, we conducted extensive experiments using 14 publicly available medical datasets with different imaging techniques and two private datasets built by ourselves. Extensive experimental results demonstrate that the proposed MedMamba performs well in detecting lesions in various medical images. To the best of our knowledge, this is the first Vision Mamba tailored for medical image classification. The purpose of this work is to establish a new baseline for medical image classification tasks and provide valuable insights for the future development of more efficient and effective SSM-based artificial intelligence algorithms and application systems in the medical. Source code has been available at https://github.com/YubiaoYue/MedMamba.

Updated: 2024-04-02 17:11:45

标题: MedMamba：用于医学图像分类的Vision Mamba

摘要: 医学图像分类是计算机视觉领域中非常基础和关键的任务。近年来，基于CNN和Transformer的模型被广泛应用于分类各种医学图像。不幸的是，CNN在长距离建模能力上的局限性阻碍了它们有效提取医学图像特征，而Transformers受到二次计算复杂度的限制。最近的研究表明，由Mamba代表的状态空间模型（SSM）可以有效地建模长距离交互作用，同时保持线性计算复杂度。受此启发，我们提出了用于医学图像分类的Vision Mamba（MedMamba）。更具体地说，我们引入了一种新颖的Conv-SSM模块。Conv-SSM结合了卷积层的局部特征提取能力和SSM捕获长距离依赖的能力，从而对不同模态的医学图像进行建模。为了展示MedMamba的潜力，我们进行了广泛的实验，使用了14个公开可用的医学数据集和两个自建的私有数据集。广泛的实验结果表明，提出的MedMamba在检测各种医学图像中的病变方面表现良好。据我们所知，这是第一个专门针对医学图像分类的Vision Mamba。本文的目的是为医学图像分类任务建立一个新的基准，并为未来更高效和有效的基于SSM的人工智能算法和应用系统的发展提供宝贵见解。源代码已经在https://github.com/YubiaoYue/MedMamba上提供。

更新时间: 2024-04-02 17:11:45

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.03849v3

Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes

We present two Policy Gradient-based methods with general parameterization in the context of infinite horizon average reward Markov Decision Processes. The first approach employs Implicit Gradient Transport for variance reduction, ensuring an expected regret of the order $\tilde{\mathcal{O}}(T^{3/5})$. The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order $\tilde{\mathcal{O}}(\sqrt{T})$. These results significantly improve the state of the art of the problem, which achieves a regret of $\tilde{\mathcal{O}}(T^{3/4})$.

Updated: 2024-04-02 17:08:23

标题: 无限时域平均奖励马尔可夫决策过程的方差减少政策梯度方法

摘要: 我们提出了两种基于策略梯度的通用参数化方法，用于无限时域平均奖励马尔可夫决策过程。第一种方法采用隐式梯度传输进行方差减少，确保期望遗憾为 $\tilde{\mathcal{O}}(T^{3/5})$。第二种方法根植于基于Hessian的技术，确保期望遗憾为 $\tilde{\mathcal{O}}(\sqrt{T})$。这些结果显著改进了目前该问题的最新技术水平，其遗憾达到了 $\tilde{\mathcal{O}}(T^{3/4})$。

更新时间: 2024-04-02 17:08:23

领域: cs.LG

下载: http://arxiv.org/abs/2404.02108v1

Lifelong Continual Learning for Anomaly Detection: New Challenges, Perspectives, and Insights

Anomaly detection is of paramount importance in many real-world domains, characterized by evolving behavior. Lifelong learning represents an emerging trend, answering the need for machine learning models that continuously adapt to new challenges in dynamic environments while retaining past knowledge. However, limited efforts are dedicated to building foundations for lifelong anomaly detection, which provides intrinsically different challenges compared to the more widely explored classification setting. In this paper, we face this issue by exploring, motivating, and discussing lifelong anomaly detection, trying to build foundations for its wider adoption. First, we explain why lifelong anomaly detection is relevant, defining challenges and opportunities to design anomaly detection methods that deal with lifelong learning complexities. Second, we characterize learning settings and a scenario generation procedure that enables researchers to experiment with lifelong anomaly detection using existing datasets. Third, we perform experiments with popular anomaly detection methods on proposed lifelong scenarios, emphasizing the gap in performance that could be gained with the adoption of lifelong learning. Overall, we conclude that the adoption of lifelong anomaly detection is important to design more robust models that provide a comprehensive view of the environment, as well as simultaneous adaptation and knowledge retention.

Updated: 2024-04-02 17:01:05

标题: 终身持续学习在异常检测中的应用：新挑战、视角和见解

摘要: 异常检测在许多现实世界领域中至关重要，其特点是行为不断发展。终身学习代表了一种新兴趋势，满足了机器学习模型在动态环境中持续适应新挑战并保留过去知识的需求。然而，目前对于建立终身异常检测基础的努力有限，这与更广泛探索的分类设置相比，提出了本质上不同的挑战。在本文中，我们通过探索、激励和讨论终身异常检测来应对这一问题，试图为其更广泛的采用建立基础。首先，我们解释了为什么终身异常检测是相关的，定义了设计与终身学习复杂性相适应的异常检测方法的挑战和机遇。其次，我们描述了学习设置和场景生成过程，使研究人员能够使用现有数据集进行终身异常检测实验。第三，我们在提出的终身场景上使用流行的异常检测方法进行实验，强调采用终身学习可以获得的性能差距。总的来说，我们得出结论，采用终身异常检测对于设计更加稳健的模型，提供对环境的全面视图以及同时适应和保留知识是非常重要的。

更新时间: 2024-04-02 17:01:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2303.07557v2

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve AI systems to better reflect value pluralism, the first-order challenge is to explore the extent to which AI systems can model pluralistic human values, rights, and duties as well as their interaction. We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations. ValuePrism's contextualized values are generated by GPT-4 and deemed high-quality by human annotators 91% of the time. We conduct a large-scale study with annotators across diverse social and demographic backgrounds to try to understand whose values are represented. With ValuePrism, we build Kaleido, an open, light-weight, and structured language-based multi-task model that generates, explains, and assesses the relevance and valence (i.e., support or oppose) of human values, rights, and duties within a specific context. Humans prefer the sets of values output by our system over the teacher GPT-4, finding them more accurate and with broader coverage. In addition, we demonstrate that Kaleido can help explain variability in human decision-making by outputting contrasting values. Finally, we show that Kaleido's representations transfer to other philosophical frameworks and datasets, confirming the benefit of an explicit, modular, and interpretable approach to value pluralism. We hope that our work will serve as a step to making more explicit the implicit values behind human decision-making and to steering AI systems to make decisions that are more in accordance with them.

Updated: 2024-04-02 16:52:03

标题: 价值万花筒：将人工智能与多元人类价值观、权利和责任联系起来

摘要: 人类价值观对人类决策至关重要。价值多元论认为，可以同时持有多个正确的价值观，并使它们之间存在紧张关系（例如，在考虑是否向朋友撒谎以保护他们的感受时，如何平衡诚实和友谊？）。作为统计学习者，人工智能系统默认适应平均值，消除这些潜在的不可简化的价值冲突。为了改进人工智能系统以更好地反映多元论的价值观，首要挑战是探索人工智能系统能够建模多元论的人类价值观、权利和义务及其相互作用的程度。我们介绍了ValuePrism，这是一个包含218k个价值观、权利和义务的大规模数据集，与31k个人类编写的情境相关联。ValuePrism的情境化价值观由GPT-4生成，并且91%的时间被人类注释者认为是高质量的。我们进行了一项大规模研究，涉及来自不同社会和人口背景的注释者，试图了解哪些价值观被代表。凭借ValuePrism，我们构建了Kaleido，这是一个开放、轻量级和结构化的基于语言的多任务模型，可以在特定情境中生成、解释和评估人类价值观、权利和义务的相关性和价值（即支持或反对）。人们更倾向于我们系统输出的价值观集合，发现它们更准确且覆盖范围更广，而不是教师GPT-4输出的价值观。此外，我们展示Kaleido可以通过输出对立的价值观来帮助解释人类决策的变化。最后，我们展示Kaleido的表示可以转移到其他哲学框架和数据集，验证了对多元论的明确、模块化和可解释方法的好处。我们希望我们的工作能够作为使人类决策背后的隐含价值观更加明确，并引导人工智能系统做出更符合这些价值观的决策的一步。

更新时间: 2024-04-02 16:52:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.00779v2

Insights from the Use of Previously Unseen Neural Architecture Search Datasets

The boundless possibility of neural networks which can be used to solve a problem -- each with different performance -- leads to a situation where a Deep Learning expert is required to identify the best neural network. This goes against the hope of removing the need for experts. Neural Architecture Search (NAS) offers a solution to this by automatically identifying the best architecture. However, to date, NAS work has focused on a small set of datasets which we argue are not representative of real-world problems. We introduce eight new datasets created for a series of NAS Challenges: AddNIST, Language, MultNIST, CIFARTile, Gutenberg, Isabella, GeoClassing, and Chesseract. These datasets and challenges are developed to direct attention to issues in NAS development and to encourage authors to consider how their models will perform on datasets unknown to them at development time. We present experimentation using standard Deep Learning methods as well as the best results from challenge participants.

Updated: 2024-04-02 16:48:34

标题: 从先前未曾见过的神经架构搜索数据集的使用中获得的见解

摘要: 神经网络的无限可能性可用于解决问题，每个神经网络的性能各不相同，这导致需要深度学习专家来确定最佳的神经网络。这与消除专家需求的希望相悖。神经架构搜索（NAS）通过自动识别最佳架构来解决这个问题。然而，迄今为止，NAS的工作集中在一小组数据集上，我们认为这些数据集不代表现实世界的问题。我们介绍了为一系列NAS挑战创建的八个新数据集：AddNIST、Language、MultNIST、CIFARTile、Gutenberg、Isabella、GeoClassing和Chesseract。这些数据集和挑战旨在引起对NAS发展中的问题的关注，并鼓励作者考虑他们的模型在开发时未知的数据集上的表现。我们展示了使用标准深度学习方法进行的实验，以及挑战参与者的最佳结果。

更新时间: 2024-04-02 16:48:34

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.02189v1

A Post Quantum Key Agreement Protocol Based on a Modified Matrix Power Function over a Rectangular Matrices Semiring

We present an improved post-quantum version of Sakalauskas matrix power function key agreement protocol, using rectangular matrices instead of the original square ones. Sakalauskas matrix power function is an efficient and secure way to generate a shared secret key, and using rectangular matrices provides additional flexibility and security. This method reduces the computational complexity by allowing smaller random integer matrices while maintaining a high level of security. We dont rely on matrices with special formatting to achieve commutativity, instead, we use full random values on those structures, increasing their entropy. Another advantage of using rectangular matrices over key agreement protocols is that they offer better protection against various linearization attacks.

Updated: 2024-04-02 16:45:28

标题: 基于矩阵幂函数的修正后的长方形矩阵半环上的后量子密钥协商协议

摘要: 我们提出了一个改进的后量子版本的Sakalauskas矩阵幂函数密钥协商协议，使用矩形矩阵代替原始的方形矩阵。Sakalauskas矩阵幂函数是一种有效且安全的生成共享秘钥的方式，而使用矩形矩阵提供了额外的灵活性和安全性。该方法通过允许较小的随机整数矩阵来降低计算复杂性，同时保持高水平的安全性。我们不依赖于具有特殊格式的矩阵来实现可交换性，而是在这些结构上使用完全随机的值，增加它们的熵。使用矩形矩阵而不是密钥协商协议的另一个优势是它们提供更好的保护，防止各种线性化攻击。

更新时间: 2024-04-02 16:45:28

领域: cs.CR,11T71, 14G50, 94A60, 81P94

下载: http://arxiv.org/abs/2303.11972v5

READ: Improving Relation Extraction from an ADversarial Perspective

Recent works in relation extraction (RE) have achieved promising benchmark accuracy; however, our adversarial attack experiments show that these works excessively rely on entities, making their generalization capability questionable. To address this issue, we propose an adversarial training method specifically designed for RE. Our approach introduces both sequence- and token-level perturbations to the sample and uses a separate perturbation vocabulary to improve the search for entity and context perturbations. Furthermore, we introduce a probabilistic strategy for leaving clean tokens in the context during adversarial training. This strategy enables a larger attack budget for entities and coaxes the model to leverage relational patterns embedded in the context. Extensive experiments show that compared to various adversarial training methods, our method significantly improves both the accuracy and robustness of the model. Additionally, experiments on different data availability settings highlight the effectiveness of our method in low-resource scenarios. We also perform in-depth analyses of our proposed method and provide further hints. We will release our code at https://github.com/David-Li0406/READ.

Updated: 2024-04-02 16:42:44

标题: 阅读：从对抗性角度改进关系抽取

摘要: 最近的关系抽取（RE）研究取得了令人鼓舞的基准准确性；然而，我们的对抗性攻击实验表明，这些研究过分依赖于实体，使其泛化能力受到质疑。为了解决这个问题，我们提出了一种专门为RE设计的对抗性训练方法。我们的方法引入了序列级和标记级的扰动样本，并使用单独的扰动词汇来改善对实体和上下文扰动的搜索。此外，我们引入了一种在对抗性训练过程中保留干净标记的概率策略。这种策略为实体提供了更大的攻击预算，并促使模型利用上下文中嵌入的关系模式。大量实验证明，与各种对抗性训练方法相比，我们的方法显著提高了模型的准确性和鲁棒性。此外，针对不同数据可用性设置的实验突出显示了我们的方法在资源稀缺情况下的有效性。我们还对我们提出的方法进行了深入分析，并提供了更多提示。我们将在https://github.com/David-Li0406/READ上发布我们的代码。

更新时间: 2024-04-02 16:42:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02931v1

Already Moderate Population Sizes Provably Yield Strong Robustness to Noise

Experience shows that typical evolutionary algorithms can cope well with stochastic disturbances such as noisy function evaluations. In this first mathematical runtime analysis of the $(1+\lambda)$ and $(1,\lambda)$ evolutionary algorithms in the presence of prior bit-wise noise, we show that both algorithms can tolerate constant noise probabilities without increasing the asymptotic runtime on the OneMax benchmark. For this, a population size $\lambda$ suffices that is at least logarithmic in the problem size $n$. The only previous result in this direction regarded the less realistic one-bit noise model, required a population size super-linear in the problem size, and proved a runtime guarantee roughly cubic in the noiseless runtime for the OneMax benchmark. Our significantly stronger results are based on the novel proof argument that the noiseless offspring can be seen as a biased uniform crossover between the parent and the noisy offspring. We are optimistic that the technical lemmas resulting from this insight will find applications also in future mathematical runtime analyses of evolutionary algorithms.

Updated: 2024-04-02 16:35:52

标题: 适度的种群规模已被证明能够对噪音具有强大的稳健性

摘要: 经验表明，典型的进化算法可以很好地处理诸如嘈杂的函数评估之类的随机干扰。在这篇关于$(1+\lambda)$和$(1,\lambda)$进化算法在先验比特级噪声存在的情况下的第一次数学运行时间分析中，我们表明这两种算法可以容忍恒定的噪声概率，而不会增加在OneMax基准测试中的渐近运行时间。为此，一个至少对数级别于问题规模$n$的种群大小$\lambda$就足够了。在这方面唯一的先前结果涉及到较不现实的一位噪声模型，需要一个超线性于问题规模的种群大小，并且为OneMax基准测试的无噪声运行时间提供了大约是噪声运行时间的三次方的运行时间保证。我们的显著更强的结果是基于这一新颖的证明论点，即无噪声后代可以被看作是父代和嘈杂后代之间的有偏差的均匀交叉。我们对这一洞察力所产生的技术引理将在未来的进化算法的数学运行时间分析中找到应用，抱有乐观的态度。

更新时间: 2024-04-02 16:35:52

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.02090v1

Advancing LLM Reasoning Generalists with Preference Trees

We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathematics, code generation, and logical reasoning problems. Notably, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 12 tests covering five tasks, and achieves a 33.3% pass@1 accuracy on LeetCode and 32.6% on TheoremQA, two challenging benchmarks, substantially outperforming existing open-source models by margins more than 13.3%. The strong performance of Eurus can be primarily attributed to UltraInteract, our newly-curated large-scale, high-quality alignment dataset specifically designed for complex reasoning tasks. UltraInteract can be used in both supervised fine-tuning and preference learning. For each instruction, it includes a preference tree consisting of (1) reasoning chains with diverse planning strategies in a unified format, (2) multi-turn interaction trajectories with the environment and the critique, and (3) pairwise data to facilitate preference learning. UltraInteract allows us to conduct an in-depth exploration of preference learning for reasoning tasks. Our investigation reveals that some well-established preference learning algorithms may be less suitable for reasoning tasks compared to their effectiveness in general conversations. Inspired by this, we derive a novel reward modeling objective which, together with UltraInteract, leads to a strong reward model.

Updated: 2024-04-02 16:25:30

标题: 用偏好树推进LLM推理通用专家

摘要: 我们介绍了Eurus，一套针对推理进行优化的大型语言模型（LLMs）。通过对Mistral-7B和CodeLlama-70B进行微调，Eurus模型在涵盖数学、代码生成和逻辑推理问题的各种基准测试中取得了最先进的结果。值得注意的是，Eurus-70B在涵盖五项任务的12项测试中击败了GPT-3.5 Turbo，在LeetCode上实现了33.3%的pass@1准确率，在TheoremQA上达到了32.6%，这两个具有挑战性的基准测试，较现有的开源模型表现出明显的优势超过13.3%。Eurus的强大性能主要归功于我们新精心策划的大规模、高质量对齐数据集UltraInteract，专门设计用于复杂的推理任务。UltraInteract可以用于监督微调和偏好学习。对于每个指令，它包括一个偏好树，包含（1）以统一格式展现多样的规划策略的推理链，（2）与环境和评论的多轮交互轨迹，以及（3）促进偏好学习的成对数据。UltraInteract使我们能够深入探讨推理任务的偏好学习。我们的研究发现，一些成熟的偏好学习算法在推理任务中可能不如它们在一般对话中的有效性。在此启发下，我们提出了一个新颖的奖励建模目标，结合UltraInteract，产生了一个强大的奖励模型。

更新时间: 2024-04-02 16:25:30

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.02078v1

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing technique that adjusts the relation label adaptively according to the quality of the detected objects. By the relation smoothing, the model is trained according to the continuous curriculum that focuses on object detection task at the beginning of training and performs multi-task learning as the object detection performance gradually improves. Furthermore, we propose a connectivity prediction task that predicts whether a relation exists between object pairs as an auxiliary task of the relation extraction. We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets. Our code is publicly available at https://github.com/naver-ai/egtr .

Updated: 2024-04-02 16:20:02

标题: EGTR：从Transformer中提取图形用于场景图生成

摘要: 场景图生成（SGG）是一项具有挑战性的任务，涉及检测对象并预测对象之间的关系。在开发了DETR之后，基于一阶段目标检测器的一阶段SGG模型得到了积极研究。然而，为了预测对象之间的关系，使用了复杂的建模，并且在对象检测器的多头自注意力中学习的对象查询之间的固有关系被忽视了。我们提出了一个轻量级的一阶段SGG模型，从DETR解码器的多头自注意力层中学习到的各种关系中提取关系图。通过充分利用自注意力的副产品，可以有效地使用浅层关系提取头提取关系图。考虑到关系提取任务对对象检测任务的依赖性，我们提出了一种新颖的关系平滑技术，根据检测到的对象的质量自适应地调整关系标签。通过关系平滑，模型根据连续课程进行训练，该课程在训练初期侧重于对象检测任务，并随着对象检测性能逐渐提高而执行多任务学习。此外，我们提出了一个连接性预测任务，作为关系提取的辅助任务，预测对象对之间是否存在关系。我们在Visual Genome和Open Image V6数据集上展示了我们方法的有效性和效率。我们的代码可在https://github.com/naver-ai/egtr 上公开获取。

更新时间: 2024-04-02 16:20:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.02072v1

Semantically-Prompted Language Models Improve Visual Descriptions

Language-vision models like CLIP have made significant strides in vision tasks, such as zero-shot image classification (ZSIC). However, generating specific and expressive visual descriptions remains challenging; descriptions produced by current methods are often ambiguous and lacking in granularity. To tackle these issues, we propose V-GLOSS: Visual Glosses, a novel method built upon two key ideas. The first is Semantic Prompting, which conditions a language model on structured semantic knowledge. The second is a new contrastive algorithm that elicits fine-grained distinctions between similar concepts. With both ideas, we demonstrate that V-GLOSS improves visual descriptions and achieves strong results in the zero-shot setting on general and fine-grained image-classification datasets, including ImageNet, STL-10, FGVC Aircraft, and Flowers 102. Moreover, these descriptive capabilities contribute to enhancing image-generation performance. Finally, we introduce a quality-tested silver dataset with descriptions generated with V-GLOSS for all ImageNet classes.

Updated: 2024-04-02 16:19:22

标题: 基于语义提示的语言模型改进视觉描述

摘要: 语言-视觉模型（如CLIP）在视觉任务中取得了重大进展，例如零样本图像分类（ZSIC）。然而，生成具体和表达丰富的视觉描述仍然具有挑战性；当前方法生成的描述通常含糊不清，缺乏细粒度。为了解决这些问题，我们提出了V-GLOSS：Visual Glosses，这是一种建立在两个关键思想之上的新方法。第一个思想是语义提示，它在结构化语义知识上对语言模型进行条件化。第二个思想是一种新的对比算法，可以引出类似概念之间的细粒度区别。通过这两个思想，我们证明V-GLOSS改进了视觉描述，在通用和细粒度图像分类数据集（包括ImageNet、STL-10、FGVC飞机和Flowers 102）的零样本设置中取得了强大的结果。此外，这些描述能力有助于提高图像生成性能。最后，我们为所有ImageNet类别引入了一个经过质量测试的银标准数据集，其中包含使用V-GLOSS生成的描述。

更新时间: 2024-04-02 16:19:22

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2306.06077v3

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark and the code for scoring have been open-sourced.

Updated: 2024-04-02 16:15:35

标题: MuChin：一个用于评估音乐领域语言模型的汉语口语描述基准

摘要: 随着多模态大型语言模型（LLMs）的快速发展，迫切需要新的基准来统一评估它们在理解和描述音乐方面的表现。然而，由于音乐信息检索（MIR）算法与人类理解之间的语义差距，专业人士和公众之间的差异，以及注释的低准确性，现有的音乐描述数据集无法作为基准。为此，我们提出了MuChin，第一个以中文口语为基础的开源音乐描述基准，旨在评估多模态LLMs在理解和描述音乐方面的表现。我们建立了采宠音乐注释平台（CaiMAP），采用创新的多人、多阶段保障方法，并招募了业余爱好者和专业人士，以确保注释的准确性和与流行语义的对齐。利用这种方法，我们建立了一个具有多维、高精度音乐注释的数据集，即采宠音乐数据集（CaiMD），并精心挑选了1,000个高质量条目作为MuChin的测试集。基于MuChin，我们分析了专业人士和业余爱好者在音乐描述方面的差异，并在经验上证明了注释数据对LLMs微调的有效性。最终，我们利用MuChin评估了现有音乐理解模型在提供音乐口语描述能力方面的表现。与基准相关的所有数据和评分代码均已开源。

更新时间: 2024-04-02 16:15:35

领域: cs.SD,cs.AI,cs.MM,eess.AS,68Txx(Primary)14F05, 91Fxx(Secondary),I.2.7; J.5

下载: http://arxiv.org/abs/2402.09871v2

Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters

Many autonomous systems face safety challenges, requiring robust closed-loop control to handle physical limitations and safety constraints. Real-world systems, like autonomous ships, encounter nonlinear dynamics and environmental disturbances. Reinforcement learning is increasingly used to adapt to complex scenarios, but standard frameworks ensuring safety and stability are lacking. Predictive Safety Filters (PSF) offer a promising solution, ensuring constraint satisfaction in learning-based control without explicit constraint handling. This modular approach allows using arbitrary control policies, with the safety filter optimizing proposed actions to meet physical and safety constraints. We apply this approach to marine navigation, combining RL with PSF on a simulated Cybership II model. The RL agent is trained on path following and collision avpodance, while the PSF monitors and modifies control actions for safety. Results demonstrate the PSF's effectiveness in maintaining safety without hindering the RL agent's learning rate and performance, evaluated against a standard RL agent without PSF.

Updated: 2024-04-02 16:12:11

标题: 安全海上导航的模块化控制架构：强化学习和预测安全过滤器

摘要: 许多自主系统面临安全挑战，需要强大的闭环控制来处理物理限制和安全约束。像自主船舶这样的现实世界系统遇到非线性动态和环境干扰。强化学习越来越多地用于适应复杂情况，但缺乏确保安全和稳定性的标准框架。预测安全过滤器（PSF）提供了一种有希望的解决方案，确保学习控制中的约束满足而无需显式处理约束。这种模块化方法允许使用任意的控制策略，安全过滤器优化提议的动作以满足物理和安全约束。我们将这种方法应用于海洋导航，在模拟的Cybership II模型上将强化学习与PSF结合起来。强化学习代理在路径跟随和避碰方面接受训练，而PSF监控并修改控制动作以确保安全。结果表明，PSF在保持安全性方面的有效性，同时不会妨碍强化学习代理的学习速度和表现，与没有PSF的标准强化学习代理进行评估。

更新时间: 2024-04-02 16:12:11

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2312.01855v2

Red-Teaming Segment Anything Model

Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against challenging tasks: (1) We analyze the impact of style transfer on segmentation masks, demonstrating that applying adverse weather conditions and raindrops to dashboard images of city roads significantly distorts generated masks. (2) We focus on assessing whether the model can be used for attacks on privacy, such as recognizing celebrities' faces, and show that the model possesses some undesired knowledge in this task. (3) Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. We not only show the effectiveness of popular white-box attacks and resistance to black-box attacks but also introduce a novel approach - Focused Iterative Gradient Attack (FIGA) that combines white-box approaches to construct an efficient attack resulting in a smaller number of modified pixels. All of our testing methods and analyses indicate a need for enhanced safety measures in foundation models for image segmentation.

Updated: 2024-04-02 16:07:50

标题: 红队分割任意模型

摘要: 基础模型已经成为重要的工具，通过在庞大数据集上进行预训练，并随后对特定应用进行微调，来处理许多复杂任务。Segment Anything Model是计算机视觉分割任务中最早也是最知名的基础模型之一。本文提出了一个多方面的红队分析，对Segment Anything Model在挑战性任务中进行测试：（1）我们分析了风格转移对分割掩模的影响，演示了在城市道路仪表板图像上应用恶劣天气条件和雨滴会显著扭曲生成的掩模。（2）我们重点评估模型是否可以用于侵犯隐私的攻击，例如识别名人的脸部，结果显示模型在这项任务中具有一些不希望的知识。（3）最后，我们检查模型在文本提示下对分割掩模的对抗攻击的抵抗性。我们不仅展示了流行的白盒攻击的有效性和对黑盒攻击的抵抗力，还引入了一种新颖的方法——Focused Iterative Gradient Attack（FIGA），结合白盒方法构建了一种有效的攻击，结果是修改像素数量较少。我们所有的测试方法和分析都表明，在图像分割的基础模型中需要增强安全措施。

更新时间: 2024-04-02 16:07:50

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.02067v1

A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data

Crash data is often greatly imbalanced, with the majority of crashes being non-fatal crashes, and only a small number being fatal crashes due to their rarity. Such data imbalance issue poses a challenge for crash severity modeling since it struggles to fit and interpret fatal crash outcomes with very limited samples. Usually, such data imbalance issues are addressed by data resampling methods, such as under-sampling and over-sampling techniques. However, most traditional and deep learning-based data resampling methods, such as synthetic minority oversampling technique (SMOTE) and generative Adversarial Networks (GAN) are designed dedicated to processing continuous variables. Though some resampling methods have improved to handle both continuous and discrete variables, they may have difficulties in dealing with the collapse issue associated with sparse discrete risk factors. Moreover, there is a lack of comprehensive studies that compare the performance of various resampling methods in crash severity modeling. To address the aforementioned issues, the current study proposes a crash data generation method based on the Conditional Tabular GAN. After data balancing, a crash severity model is employed to estimate the performance of classification and interpretation. A comparative study is conducted to assess classification accuracy and distribution consistency of the proposed generation method using a 4-year imbalanced crash dataset collected in Washington State, U.S. Additionally, Monte Carlo simulation is employed to estimate the performance of parameter and probability estimation in both two- and three-class imbalance scenarios. The results indicate that using synthetic data generated by CTGAN-RU for crash severity modeling outperforms using original data or synthetic data generated by other resampling methods.

Updated: 2024-04-02 16:07:27

标题: 一种用于不平衡数据的交通事故严重性建模的生成式深度学习方法

摘要: 事故数据往往存在严重的不平衡，大多数事故都是非致命事故，只有少数是由于其罕见性而致命事故。这种数据不平衡问题对事故严重性建模构成了挑战，因为它很难拟合和解释具有非常有限样本的致命事故结果。通常，这种数据不平衡问题通过数据重采样方法来解决，例如欠采样和过采样技术。然而，大多数传统和基于深度学习的数据重采样方法，如合成少数过采样技术（SMOTE）和生成对抗网络（GAN），都是专门设计用于处理连续变量。尽管一些重采样方法已经改进以处理连续和离散变量，但它们可能在处理与稀疏离散风险因素相关的崩溃问题时遇到困难。此外，缺乏全面研究来比较各种重采样方法在事故严重性建模中的性能。为了解决上述问题，本研究提出了一种基于条件表GAN的事故数据生成方法。在数据平衡后，采用事故严重性模型来估计分类和解释的性能。进行了一项比较研究，评估了利用在美国华盛顿州收集的4年不平衡事故数据集进行的提出生成方法的分类准确性和分布一致性。此外，蒙特卡罗模拟被用来估计在两类和三类不平衡情景下参数和概率估计的性能。结果表明，使用CTGAN-RU生成的合成数据进行事故严重性建模优于使用原始数据或其他重采样方法生成的合成数据。

更新时间: 2024-04-02 16:07:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.02187v1

SPMamba: State-space model is all you need in speech separation

In speech separation, both CNN- and Transformer-based models have demonstrated robust separation capabilities, garnering significant attention within the research community. However, CNN-based methods have limited modelling capability for long-sequence audio, leading to suboptimal separation performance. Conversely, Transformer-based methods are limited in practical applications due to their high computational complexity. Notably, within computer vision, Mamba-based methods have been celebrated for their formidable performance and reduced computational requirements. In this paper, we propose a network architecture for speech separation using a state-space model, namely SPMamba. We adopt the TF-GridNet model as the foundational framework and substitute its Transformer component with a bidirectional Mamba module, aiming to capture a broader range of contextual information. Our experimental results reveal an important role in the performance aspects of Mamba-based models. SPMamba demonstrates superior performance with a significant advantage over existing separation models in a dataset built on Librispeech. Notably, SPMamba achieves a substantial improvement in separation quality, with a 2.42 dB enhancement in SI-SNRi compared to the TF-GridNet. The source code for SPMamba is publicly accessible at https://github.com/JusperLee/SPMamba .

Updated: 2024-04-02 16:04:31

标题: SPMamba：在语音分离中您所需的全部是状态空间模型

摘要: 在语音分离领域，基于CNN和Transformer的模型都展示出了强大的分离能力，引起了研究界的广泛关注。然而，基于CNN的方法在对长序列音频进行建模方面能力有限，导致分离性能不佳。相反，基于Transformer的方法由于计算复杂度高而在实际应用中受到限制。值得注意的是，在计算机视觉领域，基于Mamba的方法因其强大的性能和降低的计算需求而备受赞誉。在本文中，我们提出了一种用于语音分离的网络架构，即SPMamba，采用TF-GridNet模型作为基础框架，并将其Transformer组件替换为双向Mamba模块，旨在捕获更广泛的上下文信息。我们的实验结果揭示了Mamba模型在性能方面的重要作用。SPMamba在建立在Librispeech上的数据集中表现出优越的性能，明显优于现有的分离模型。值得注意的是，与TF-GridNet相比，SPMamba在分离质量上取得了显著改善，SI-SNRi提高了2.42 dB。SPMamba的源代码可以在https://github.com/JusperLee/SPMamba 上公开访问。

更新时间: 2024-04-02 16:04:31

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2404.02063v1

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

The objective of digital forgetting is, given a model with undesirable knowledge or behavior, obtain a new model where the detected issues are no longer present. The motivations for forgetting include privacy protection, copyright protection, elimination of biases and discrimination, and prevention of harmful content generation. Effective digital forgetting has to be effective (meaning how well the new model has forgotten the undesired knowledge/behavior), retain the performance of the original model on the desirable tasks, and be scalable (in particular forgetting has to be more efficient than retraining from scratch on just the tasks/data to be retained). This survey focuses on forgetting in large language models (LLMs). We first provide background on LLMs, including their components, the types of LLMs, and their usual training pipeline. Second, we describe the motivations, types, and desired properties of digital forgetting. Third, we introduce the approaches to digital forgetting in LLMs, among which unlearning methodologies stand out as the state of the art. Fourth, we provide a detailed taxonomy of machine unlearning methods for LLMs, and we survey and compare current approaches. Fifth, we detail datasets, models and metrics used for the evaluation of forgetting, retaining and runtime. Sixth, we discuss challenges in the area. Finally, we provide some concluding remarks.

Updated: 2024-04-02 16:01:18

标题: 大型语言模型中的数字遗忘：反学习方法综述

摘要: 数字遗忘的目标是，在给定一个具有不良知识或行为的模型的情况下，获得一个新模型，在新模型中检测到的问题不再存在。遗忘的动机包括隐私保护、版权保护、消除偏见和歧视以及预防有害内容生成。有效的数字遗忘必须是有效的（即新模型对不良知识/行为的遗忘程度如何）、保留原始模型在理想任务上的性能，并且可扩展（特别是遗忘必须比仅在保留的任务/数据上从头重新训练更高效）。本调查重点关注大型语言模型（LLMs）中的遗忘。首先，我们介绍LLMs的背景，包括它们的组件、LLMs的类型以及它们通常的训练流程。其次，我们描述数字遗忘的动机、类型和期望的属性。第三，我们介绍LLMs中数字遗忘的方法，其中取消学习方法被认为是最先进的。第四，我们提供了LLMs的机器取消学习方法的详细分类，对当前方法进行调查和比较。第五，我们详细介绍了用于评估遗忘、保留和运行时间的数据集、模型和指标。第六，我们讨论了该领域的挑战。最后，我们提供一些结论性的评论。

更新时间: 2024-04-02 16:01:18

领域: cs.CR,cs.AI,cs.LG,68,K.4.1; I.2.6; I.2.7

下载: http://arxiv.org/abs/2404.02062v1

Long-context LLMs Struggle with Long In-context Learning

Large Language Models (LLMs) have made significant strides in handling long sequences exceeding 32K tokens. However, their performance evaluation has largely been confined to metrics like perplexity and synthetic tasks, which may not fully capture their abilities in more nuanced, real-world scenarios. This study introduces a specialized benchmark (LIConBench) focusing on long in-context learning within the realm of extreme-label classification. We meticulously selected six datasets with a label range spanning 28 to 174 classes covering different input (few-shot demonstration) length from 2K to 50K. Our benchmark requires LLMs to comprehend the entire input to recognize the massive label spaces to make correct prediction. We evaluate 13 long-context LLMs on our benchmarks. We find that the long-context LLMs perform relatively well under the token length of 20K and the performance benefits from utilizing the long context window. However, after the context window exceeds 20K, most LLMs except GPT-4 will dip dramatically. This suggests a notable gap in current LLM capabilities for processing and understanding long, context-rich sequences. Further analysis revealed a tendency among models to favor predictions for labels presented towards the end at the sequence. Their ability to reason over multiple pieces in the long sequence is yet to be improved. Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs. We believe LIConBench could serve as a more realistic evaluation for the future long context LLMs.

Updated: 2024-04-02 15:59:11

标题: 长上下文LLMs在长上下文学习中的困难

摘要: 大型语言模型（LLMs）在处理超过32K个标记的长序列方面取得了显著进展。然而，它们的性能评估主要局限于困惑度和合成任务等指标，这些指标可能无法充分捕捉它们在更微妙、现实世界情景中的能力。本研究引入了一个专门的基准（LIConBench），重点关注长文本环境学习在极端标签分类领域的应用。我们精心选择了六个数据集，标签范围涵盖了28至174个类别，涵盖了不同长度从2K到50K的输入（少样本演示）。我们的基准要求LLMs理解整个输入，以识别庞大的标签空间，从而进行正确的预测。我们在我们的基准上评估了13个长文本LLMs。我们发现，在标记长度为20K时，长文本LLMs表现相对良好，并且从利用长上下文窗口中获益。然而，在上下文窗口超过20K后，除了GPT-4之外，大多数LLMs将急剧下降。这表明当前LLMs在处理和理解长、上下文丰富序列方面存在显著差距。进一步的分析显示，模型倾向于偏爱对序列末尾呈现的标签进行预测。它们在长序列中推理多个部分的能力仍有待提高。我们的研究表明，对于现有的LLMs来说，长上下文的理解和推理仍然是一项具有挑战性的任务。我们相信LIConBench可以作为未来长上下文LLMs更加现实的评估。

更新时间: 2024-04-02 15:59:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02060v1

Generalizable, Fast, and Accurate DeepQSPR with fastprop Part 1: Framework and Benchmarks

Quantitative Structure Property Relationship studies aim to define a mapping between molecular structure and arbitrary quantities of interest. This was historically accomplished via the development of descriptors which requires significant domain expertise and struggles to generalize. Thus the field has morphed into Molecular Property Prediction and been given over to learned representations which are highly generalizable. The paper introduces fastprop, a DeepQSPR framework which uses a cogent set of molecular level descriptors to meet and exceed the performance of learned representations on diverse datasets in dramatically less time. fastprop is freely available on github at github.com/JacksonBurns/fastprop.

Updated: 2024-04-02 15:57:32

标题: 可推广、快速且准确的带有fastprop的DeepQSPR：框架和基准测试第一部分

摘要: 数量结构性质关系研究旨在定义分子结构和任意感兴趣的数量之间的映射关系。这一目标在历史上通过开发描述符来实现，这需要领域专业知识，并且难以泛化。因此，该领域已经演变为分子性质预测，并已转向学习表示，这些表示具有高度泛化能力。本文介绍了fastprop，这是一个DeepQSPR框架，使用一组明智的分子级描述符来满足并超越学习表示在多样数据集上的性能，而且时间大大减少。fastprop可以在github上免费获取，网址为github.com/JacksonBurns/fastprop。

更新时间: 2024-04-02 15:57:32

领域: cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2404.02058v1

The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation

Continuous-output neural machine translation (CoNMT) replaces the discrete next-word prediction problem with an embedding prediction. The semantic structure of the target embedding space (i.e., closeness of related words) is intuitively believed to be crucial. We challenge this assumption and show that completely random output embeddings can outperform laboriously pretrained ones, especially on larger datasets. Further investigation shows this surprising effect is strongest for rare words, due to the geometry of their embeddings. We shed further light on this finding by designing a mixed strategy that combines random and pre-trained embeddings for different tokens.

Updated: 2024-04-02 15:56:46

标题: 随机目标嵌入对连续输出神经机器翻译的不合理有效性

摘要: 连续输出神经机器翻译（CoNMT）将离散的下一个单词预测问题替换为嵌入预测。目标嵌入空间的语义结构（即相关单词的接近程度）在直觉上被认为至关重要。我们挑战这一假设，并展示完全随机的输出嵌入可以在更大的数据集上胜过费力预训练的嵌入，尤其是在罕见词汇方面表现更为突出，这是由于它们的嵌入几何特性所致。进一步的研究显示，这种令人惊讶的效果在稀有词汇上最为显著。我们通过设计一种混合策略，将随机和预训练的嵌入结合在不同的标记中，进一步阐明了这一发现。

更新时间: 2024-04-02 15:56:46

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.20620v2

XLB: A differentiable massively parallel lattice Boltzmann library in Python

The lattice Boltzmann method (LBM) has emerged as a prominent technique for solving fluid dynamics problems due to its algorithmic potential for computational scalability. We introduce XLB library, a Python-based differentiable LBM library based on the JAX platform. The architecture of XLB is predicated upon ensuring accessibility, extensibility, and computational performance, enabling scaling effectively across CPU, TPU, multi-GPU, and distributed multi-GPU or TPU systems. The library can be readily augmented with novel boundary conditions, collision models, or multi-physics simulation capabilities. XLB's differentiability and data structure is compatible with the extensive JAX-based machine learning ecosystem, enabling it to address physics-based machine learning, optimization, and inverse problems. XLB has been successfully scaled to handle simulations with billions of cells, achieving giga-scale lattice updates per second. XLB is released under the permissive Apache-2.0 license and is available on GitHub at https://github.com/Autodesk/XLB.

Updated: 2024-04-02 15:56:38

标题: XLB：一个在Python中可微分的大规模并行晶格Boltzmann库

摘要: 晶格玻尔兹曼方法（LBM）已经成为解决流体动力学问题的一种重要技术，因为其具有计算可扩展性的算法潜力。我们介绍了XLB库，这是一个基于Python的可微分LBM库，基于JAX平台。XLB的架构基于确保可访问性、可扩展性和计算性能，可以有效地在CPU、TPU、多GPU和分布式多GPU或TPU系统上进行扩展。这个库可以很容易地增加新颖的边界条件、碰撞模型或多物理模拟能力。XLB的可微性和数据结构与广泛的基于JAX的机器学习生态系统兼容，使其能够解决基于物理的机器学习、优化和反问题。XLB已成功扩展到处理数十亿个单元的模拟，实现每秒数十亿级的晶格更新。XLB以宽松的Apache-2.0许可证发布，在GitHub上可用：https://github.com/Autodesk/XLB。

更新时间: 2024-04-02 15:56:38

领域: physics.comp-ph,cs.CE,cs.LG

下载: http://arxiv.org/abs/2311.16080v3

Noise Masking Attacks and Defenses for Pretrained Speech Models

Speech models are often trained on sensitive data in order to improve model performance, leading to potential privacy leakage. Our work considers noise masking attacks, introduced by Amid et al. 2022, which attack automatic speech recognition (ASR) models by requesting a transcript of an utterance which is partially replaced with noise. They show that when a record has been seen at training time, the model will transcribe the noisy record with its memorized sensitive transcript. In our work, we extend these attacks beyond ASR models, to attack pretrained speech encoders. Our method fine-tunes the encoder to produce an ASR model, and then performs noise masking on this model, which we find recovers private information from the pretraining data, despite the model never having seen transcripts at pretraining time! We show how to improve the precision of these attacks and investigate a number of countermeasures to our attacks.

Updated: 2024-04-02 15:49:03

标题: 预训练语音模型的噪声掩码攻击与防御

摘要: 演讲模型通常在敏感数据上进行训练，以提高模型性能，从而导致潜在的隐私泄露。我们的工作考虑了Amid等人在2022年引入的噪声掩码攻击，这种攻击通过请求一个部分替换为噪音的话语的抄本来攻击自动语音识别（ASR）模型。他们表明，当一个记录在训练时被看到时，模型将用其记忆的敏感抄本转录有噪音的记录。在我们的工作中，我们将这些攻击扩展到超越ASR模型的预训练语音编码器。我们的方法微调编码器以产生一个ASR模型，然后对这个模型进行噪声掩码，我们发现可以从预训练数据中恢复私人信息，尽管模型在预训练时从未看到抄本！我们展示了如何提高这些攻击的精度，并研究了一些抵御我们攻击的对策。

更新时间: 2024-04-02 15:49:03

领域: cs.LG

下载: http://arxiv.org/abs/2404.02052v1

PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models

We introduce PokeLLMon, the first LLM-embodied agent that achieves human-parity performance in tactical battle games, as demonstrated in Pokemon battles. The design of PokeLLMon incorporates three key strategies: (i) In-context reinforcement learning that instantly consumes text-based feedback derived from battles to iteratively refine the policy; (ii) Knowledge-augmented generation that retrieves external knowledge to counteract hallucination and enables the agent to act timely and properly; (iii) Consistent action generation to mitigate the panic switching phenomenon when the agent faces a powerful opponent and wants to elude the battle. We show that online battles against human demonstrates PokeLLMon's human-like battle strategies and just-in-time decision making, achieving 49% of win rate in the Ladder competitions and 56% of win rate in the invited battles. Our implementation and playable battle logs are available at: https://github.com/git-disl/PokeLLMon.

Updated: 2024-04-02 15:46:35

标题: PokeLLMon: 一个基于大型语言模型的精灵对战人机平衡代理

摘要: 我们介绍了PokeLLMon，这是第一个在战术战斗游戏中实现人类水平表现的LLM体现代理，正如在Pokemon战斗中所展示的那样。PokeLLMon的设计包括三个关键策略：(i)在上下文中的强化学习，即立即利用从战斗中得出的基于文本的反馈来迭代地改进策略；(ii)知识增强生成，检索外部知识以抵消幻觉，并使代理能够及时和适当地行动；(iii)一致的行动生成，以减轻代理在面对强大对手并想要逃避战斗时出现的惊慌切换现象。我们展示了与人类在线对战中PokeLLMon的人类化战略和及时决策能力，获得了梯队比赛中49%的胜率和邀请战斗中56%的胜率。我们的实现和可玩战斗记录可在以下链接找到：https://github.com/git-disl/PokeLLMon。

更新时间: 2024-04-02 15:46:35

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.01118v3

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs. But do their capabilities change depending on the input modality? In this work, we propose $\textbf{IsoBench}$, a benchmark dataset containing problems from four major areas: math, science, algorithms, and games. Each example is presented with multiple $\textbf{isomorphic representations}$ of inputs, such as visual, textual, and mathematical presentations. IsoBench provides fine-grained feedback to diagnose performance gaps caused by the form of the representation. Across various foundation models, we observe that on the same problem, models have a consistent preference towards textual representations. Most prominently, when evaluated on all IsoBench problems, Claude-3 Opus performs 28.7 points worse when provided with images instead of text; similarly, GPT-4 Turbo is 18.7 points worse and Gemini Pro is 14.9 points worse. Finally, we present two prompting techniques, $\textit{IsoCombination}$ and $\textit{IsoScratchPad}$, which improve model performance by considering combinations of, and translations between, different input representations.

Updated: 2024-04-02 15:46:13

标题: IsoBench：在同构表示上对多模态基础模型进行基准测试

摘要: 当前的基础模型在仅使用文本或同时使用图像和文本输入时表现出令人印象深刻的能力。但它们的能力是否会根据输入模态而改变呢？在这项工作中，我们提出了一个基准数据集$\textbf{IsoBench}$，其中包含来自四个主要领域的问题：数学、科学、算法和游戏。每个例子都以多个$\textbf{同构表示}$的输入形式呈现，例如视觉、文本和数学表示。IsoBench提供细粒度的反馈，以诊断由表示形式引起的性能差距。在各种基础模型中，我们观察到在相同问题上，模型普遍倾向于文本表示。最显著的是，在所有IsoBench问题上评估时，Claude-3 Opus在提供图像而非文本时表现差28.7分；类似地，GPT-4 Turbo差18.7分，Gemini Pro差14.9分。最后，我们提出了两种提示技术$\textit{IsoCombination}$和$\textit{IsoScratchPad}$，通过考虑不同输入表示之间的组合和转换来提高模型性能。

更新时间: 2024-04-02 15:46:13

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.01266v2

Ink and Individuality: Crafting a Personalised Narrative in the Age of LLMs

Individuality and personalization comprise the distinctive characteristics that make each writer unique and influence their words in order to effectively engage readers while conveying authenticity. However, our growing reliance on LLM-based writing assistants risks compromising our creativity and individuality over time. We often overlook the negative impacts of this trend on our creativity and uniqueness, despite the possible consequences. This study investigates these concerns by performing a brief survey to explore different perspectives and concepts, as well as trying to understand people's viewpoints, in conjunction with past studies in the area. Addressing these issues is essential for improving human-computer interaction systems and enhancing writing assistants for personalization and individuality.

Updated: 2024-04-02 15:42:05

标题: 墨水与个性：在低廉的激光制造机时代打造个性化叙事

摘要: 个性化和个性化组成了使每位作家独特并影响他们的文字以有效吸引读者并传达真实性的独特特征。然而，我们越来越依赖基于LLM的写作助手，这可能会危及我们的创造力和个性。我们经常忽视这种趋势对我们的创造力和独特性的负面影响，尽管可能会有后果。本研究通过进行简要调查来探索不同的观点和概念，以及试图理解人们的观点，结合该领域过去的研究。解决这些问题对于改善人机交互系统并增强写作助手的个性化和个性至关重要。

更新时间: 2024-04-02 15:42:05

领域: cs.HC,cs.AI,cs.CL,cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.00026v2

LLMs as Writing Assistants: Exploring Perspectives on Sense of Ownership and Reasoning

Sense of ownership in writing confines our investment of thoughts, time, and contribution, leading to attachment to the output. However, using writing assistants introduces a mental dilemma, as some content isn't directly our creation. For instance, we tend to credit Large Language Models (LLMs) more in creative tasks, even though all tasks are equal for them. Additionally, while we may not claim complete ownership of LLM-generated content, we freely claim authorship. We conduct a short survey to examine these issues and understand underlying cognitive processes in order to gain a better knowledge of human-computer interaction in writing and improve writing aid systems.

Updated: 2024-04-02 15:40:21

标题: LLMs作为写作助手：探索对所有权感和推理的视角

摘要: 写作中的所有权感限制了我们对思想、时间和贡献的投入，导致我们对产出物产生依恋。然而，使用写作助手会引入一种心理困境，因为有些内容并非直接由我们创作。例如，我们倾向于在创造性任务中更多地归功于大型语言模型（LLMs），尽管对它们来说所有任务都是平等的。此外，虽然我们可能不会完全声称对LLM生成的内容拥有完整的所有权，但我们会自由地声称作者身份。我们进行了一项简短的调查来研究这些问题，并了解潜在的认知过程，以便更好地了解写作中的人机交互，并改进写作辅助系统。

更新时间: 2024-04-02 15:40:21

领域: cs.HC,cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.00027v2

Samplet basis pursuit: Multiresolution scattered data approximation with sparsity constraints

We consider scattered data approximation in samplet coordinates with $\ell_1$-regularization. The application of an $\ell_1$-regularization term enforces sparsity of the coefficients with respect to the samplet basis. Samplets are wavelet-type signed measures, which are tailored to scattered data. Therefore, samplets enable the use of well-established multiresolution techniques on general scattered data sets. They provide similar properties as wavelets in terms of localization, multiresolution analysis, and data compression. By using the Riesz isometry, we embed samplets into reproducing kernel Hilbert spaces and discuss the properties of the resulting functions. We argue that the class of signals that are sparse with respect to the embedded samplet basis is considerably larger than the class of signals that are sparse with respect to the basis of kernel translates. Vice versa, every signal that is a linear combination of only a few kernel translates is sparse in samplet coordinates. We propose the rapid solution of the problem under consideration by combining soft-shrinkage with the semi-smooth Newton method. Leveraging on the sparse representation of kernel matrices in samplet coordinates, this approach converges faster than the fast iterative shrinkage thresholding algorithm and is feasible for large-scale data. Numerical benchmarks are presented and demonstrate the superiority of the multiresolution approach over the single-scale approach. As large-scale applications, the surface reconstruction from scattered data and the reconstruction of scattered temperature data using a dictionary of multiple kernels are considered.

Updated: 2024-04-02 15:40:16

标题: 样本基 Pursuit：具有稀疏约束的多分辨率散点数据逼近

摘要: 我们考虑在样本坐标中进行散乱数据逼近，采用$\ell_1$正则化。应用$\ell_1$正则化项可以强制系数在样本基础上的稀疏性。样本是波纹型符号测度，专门用于散乱数据。因此，样本使得可以在一般散乱数据集上使用成熟的多分辨率技术。它们在本地化、多分辨率分析和数据压缩方面具有类似的特性。通过使用Riesz等距性，我们将样本嵌入再生核希尔伯特空间，并讨论所得函数的特性。我们认为，相对于核平移基础稀疏的信号类比起来，相对于嵌入样本基础稀疏的信号类要大得多。反之，只由几个核平移的线性组合的每个信号在样本坐标中都是稀疏的。我们提出通过将软缩减和半光滑牛顿方法相结合，快速解决所考虑的问题。利用在样本坐标中核矩阵的稀疏表示，这种方法比快速迭代缩减阈值算法收敛更快，并且适用于大规模数据。我们提供了数值基准，并展示了多分辨率方法优于单尺度方法的优越性。作为大规模应用，我们考虑了从散乱数据中重建曲面和使用多个核的字典重建散乱温度数据。

更新时间: 2024-04-02 15:40:16

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2306.10180v4

Universal representations for financial transactional data: embracing local, global, and external contexts

Effective processing of financial transactions is essential for banking data analysis. However, in this domain, most methods focus on specialized solutions to stand-alone problems instead of constructing universal representations suitable for many problems. We present a representation learning framework that addresses diverse business challenges. We also suggest novel generative models that account for data specifics, and a way to integrate external information into a client's representation, leveraging insights from other customers' actions. Finally, we offer a benchmark, describing representation quality globally, concerning the entire transaction history; locally, reflecting the client's current state; and dynamically, capturing representation evolution over time. Our generative approach demonstrates superior performance in local tasks, with an increase in ROC-AUC of up to 14\% for the next MCC prediction task and up to 46\% for downstream tasks from existing contrastive baselines. Incorporating external information improves the scores by an additional 20\%.

Updated: 2024-04-02 15:39:14

标题: 金融交易数据的通用表示：融合本地、全球和外部环境

摘要: 金融交易的有效处理对银行数据分析至关重要。然而，在这个领域，大多数方法都集中在解决独立问题而非构建适用于多种问题的通用表示上。我们提出了一个表示学习框架，可以应对各种业务挑战。我们还提出了新颖的生成模型，考虑了数据的特定性，并提出了一种整合外部信息到客户表示中的方法，借鉴了其他客户行为的见解。最后，我们提供了一个基准，全面描述了表示质量，涉及整个交易历史；局部反映了客户当前状态；动态捕捉了表示随时间演变的情况。我们的生成方法在本地任务中表现出卓越性能，在下一个MCC预测任务中ROC-AUC增加了高达14％，在现有对比基线的下游任务中增加了高达46％。整合外部信息可以额外提高得分20％。

更新时间: 2024-04-02 15:39:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.02047v1

Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches

Despite the extensive amount of labeled datasets in the NLP text classification field, the persistent imbalance in data availability across various languages remains evident. Ukrainian, in particular, stands as a language that still can benefit from the continued refinement of cross-lingual methodologies. Due to our knowledge, there is a tremendous lack of Ukrainian corpora for typical text classification tasks. In this work, we leverage the state-of-the-art advances in NLP, exploring cross-lingual knowledge transfer methods avoiding manual data curation: large multilingual encoders and translation systems, LLMs, and language adapters. We test the approaches on three text classification tasks -- toxicity classification, formality classification, and natural language inference -- providing the "recipe" for the optimal setups.

Updated: 2024-04-02 15:37:09

标题: 乌克兰文本分类：跨语言知识转移方法的探索

摘要: 尽管在NLP文本分类领域存在大量带有标签的数据集，但各种语言之间数据可用性的持续不平衡仍然明显。尤其是乌克兰语作为一种语言，仍然可以从跨语言方法的持续改进中受益。据我们所知，乌克兰语语料库在典型文本分类任务中极度缺乏。在这项工作中，我们利用NLP领域的最新进展，探索跨语言知识转移方法，避免手动数据整理：大型多语言编码器和翻译系统、LLMs和语言适配器。我们在三个文本分类任务上测试这些方法--毒性分类、正式分类和自然语言推理--为最佳设置提供了"配方"。

更新时间: 2024-04-02 15:37:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02043v1

Transformers as Transducers

We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence functions and show that it computes exactly the first-order rational functions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular functions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular functions. Finally, we show that masked average-hard attention transformers can simulate S-RASP. A corollary of our results is a new proof that transformer decoders are Turing-complete.

Updated: 2024-04-02 15:34:47

标题: 变压器作为传感器

摘要: 我们通过将transformer与有限传感器联系起来，研究了它们的序列到序列映射能力，并发现它们可以表达出令人惊讶的大量传感器类别。我们使用RASP的变体来做到这一点，RASP是一种旨在帮助人们“像transformers一样思考”的编程语言，作为中间表示。我们将现有的布尔变体B-RASP扩展到序列到序列函数，并展示它确切地计算第一阶有理函数（例如字符串旋转）。然后，我们引入了两个新扩展。B-RASP[pos]允许在位置上进行计算（例如复制字符串的前半部分），并包含所有第一阶正则函数。S-RASP添加前缀和，可以进行额外的算术运算（例如对字符串进行平方），并包含所有第一阶多正则函数。最后，我们展示了掩码平均难度注意力transformers可以模拟S-RASP。我们结果的一个推论是transformer解码器是图灵完备的的新证明。

更新时间: 2024-04-02 15:34:47

领域: cs.FL,cs.LG

下载: http://arxiv.org/abs/2404.02040v1

A Survey on Large Language Model-Based Game Agents

The development of game agents holds a critical role in advancing towards Artificial General Intelligence (AGI). The progress of LLMs and their multimodal counterparts (MLLMs) offers an unprecedented opportunity to evolve and empower game agents with human-like decision-making capabilities in complex computer game environments. This paper provides a comprehensive overview of LLM-based game agents from a holistic viewpoint. First, we introduce the conceptual architecture of LLM-based game agents, centered around six essential functional components: perception, memory, thinking, role-playing, action, and learning. Second, we survey existing representative LLM-based game agents documented in the literature with respect to methodologies and adaptation agility across six genres of games, including adventure, communication, competition, cooperation, simulation, and crafting & exploration games. Finally, we present an outlook of future research and development directions in this burgeoning field. A curated list of relevant papers is maintained and made accessible at: https://github.com/git-disl/awesome-LLM-game-agent-papers.

Updated: 2024-04-02 15:34:18

标题: 基于大型语言模型的游戏代理调查

摘要: 游戏代理的发展在推进人工通用智能（AGI）方面发挥着关键作用。LLM及其多模态对应物（MLLM）的进展为在复杂的计算机游戏环境中赋予游戏代理类似于人类决策能力提供了前所未有的机会。本文从整体观点提供了基于LLM的游戏代理的全面概述。首先，我们介绍了以六个关键功能组件为中心的LLM游戏代理的概念架构：感知、记忆、思维、角色扮演、行动和学习。其次，我们针对六种游戏类型（包括冒险、交流、竞争、合作、模拟和制作与探索游戏）调查了文献中记录的现有代表性LLM游戏代理，涉及方法论和适应性敏捷性。最后，我们展望了这一新兴领域的未来研究和发展方向。相关论文的精选列表在以下链接中维护并提供访问：https://github.com/git-disl/awesome-LLM-game-agent-papers。

更新时间: 2024-04-02 15:34:18

领域: cs.AI

下载: http://arxiv.org/abs/2404.02039v1

Joint Multimodal Transformer for Emotion Recognition in the Wild

Systems for multimodal emotion recognition (MMER) can typically outperform unimodal systems by leveraging the inter- and intra-modal relationships between, e.g., visual, textual, physiological, and auditory modalities. In this paper, an MMER method is proposed that relies on a joint multimodal transformer for fusion with key-based cross-attention. This framework aims to exploit the diverse and complementary nature of different modalities to improve predictive accuracy. Separate backbones capture intra-modal spatiotemporal dependencies within each modality over video sequences. Subsequently, a joint multimodal transformer fusion architecture integrates the individual modality embeddings, allowing the model to capture inter-modal and intra-modal relationships effectively. Extensive experiments on two challenging expression recognition tasks: (1) dimensional emotion recognition on the Affwild2 dataset (with face and voice), and (2) pain estimation on the Biovid dataset (with face and biosensors), indicate that the proposed method can work effectively with different modalities. Empirical results show that MMER systems with our proposed fusion method allow us to outperform relevant baseline and state-of-the-art methods.

Updated: 2024-04-02 15:34:04

标题: 野外情绪识别的联合多模态Transformer

摘要: 多模情感识别（MMER）系统通常通过利用视觉、文本、生理和听觉等模式之间的跨模态和内部模态关系，可以胜过单模态系统。本文提出了一种依赖于联合多模态变压器进行融合的MMER方法，该方法采用基于关键的跨注意力。该框架旨在利用不同模式的多样性和互补性来提高预测准确性。单独的主干网络在视频序列中捕获每种模态内部的时空依赖关系。随后，一个联合多模态变压器融合架构集成了各个模态的嵌入，使模型能够有效地捕捉跨模态和内部模态关系。在两个具有挑战性的表情识别任务上进行了广泛实验：（1）在Affwild2数据集（面部和声音）上进行维度情感识别，（2）在Biovid数据集（面部和生物传感器）上进行疼痛估计，结果表明，所提出的方法可以有效地处理不同模态。实证结果表明，我们提出的融合方法的MMER系统能够胜过相关基线和最先进的方法。

更新时间: 2024-04-02 15:34:04

领域: cs.CV,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2403.10488v2

MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages

Text detoxification is a textual style transfer (TST) task where a text is paraphrased from a toxic surface form, e.g. featuring rude words, to the neutral register. Recently, text detoxification methods found their applications in various task such as detoxification of Large Language Models (LLMs) (Leong et al., 2023; He et al., 2024; Tang et al., 2023) and toxic speech combating in social networks (Deng et al., 2023; Mun et al., 2023; Agarwal et al., 2023). All these applications are extremely important to ensure safe communication in modern digital worlds. However, the previous approaches for parallel text detoxification corpora collection -- ParaDetox (Logacheva et al., 2022) and APPADIA (Atwell et al., 2022) -- were explored only in monolingual setup. In this work, we aim to extend ParaDetox pipeline to multiple languages presenting MultiParaDetox to automate parallel detoxification corpus collection for potentially any language. Then, we experiment with different text detoxification models -- from unsupervised baselines to LLMs and fine-tuned models on the presented parallel corpora -- showing the great benefit of parallel corpus presence to obtain state-of-the-art text detoxification models for any language.

Updated: 2024-04-02 15:32:32

标题: 多语言文本净化：通过并行数据扩展文本净化

摘要: 文本解毒是一种文本风格转移（TST）任务，其中文本被重新表述为中性风格，例如从粗鲁词汇的表达形式。最近，文本解毒方法在各种任务中找到了应用，例如解毒大型语言模型（LLMs）和在社交网络中对抗有毒言论。所有这些应用对确保现代数字世界中的安全通信非常重要。然而，以前用于平行文本解毒语料库收集的方法--ParaDetox和APPADIA--仅在单语设置中进行了探讨。在这项工作中，我们旨在将ParaDetox管道扩展到多种语言，提出MultiParaDetox以自动化潜在任何语言的平行解毒语料库收集。然后，我们尝试使用不同的文本解毒模型--从无监督基线到在提供的平行语料库上进行微调的LLMs和模型--展示平行语料库存在对获得任何语言的最先进文本解毒模型的巨大益处。

更新时间: 2024-04-02 15:32:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02037v1

What is to be gained by ensemble models in analysis of spectroscopic data?

An empirical study was carried out to compare different implementations of ensemble models aimed at improving prediction in spectroscopic data. A wide range of candidate models were fitted to benchmark datasets from regression and classification settings. A statistical analysis using linear mixed model was carried out on prediction performance criteria resulting from model fits over random splits of the data. The results showed that the ensemble classifiers were able to consistently outperform candidate models in our application

Updated: 2024-04-02 15:28:59

标题: 集成模型在分析光谱数据中有什么优势？

摘要: 进行了一项实证研究，比较了旨在改善光谱数据预测的集成模型的不同实现方式。将各种候选模型拟合到回归和分类设置的基准数据集中。对模型拟合产生的预测性能指标进行了线性混合模型的统计分析，该分析是在数据的随机分割上进行的。结果显示，集成分类器能够在我们的应用中始终优于候选模型。

更新时间: 2024-04-02 15:28:59

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2404.02184v1

How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities

The rapid progress in open-source Large Language Models (LLMs) is significantly driving AI development forward. However, there is still a limited understanding of their trustworthiness. Deploying these models at scale without sufficient trustworthiness can pose significant risks, highlighting the need to uncover these issues promptly. In this work, we conduct an adversarial assessment of open-source LLMs on trustworthiness, scrutinizing them across eight different aspects including toxicity, stereotypes, ethics, hallucination, fairness, sycophancy, privacy, and robustness against adversarial demonstrations. We propose advCoU, an extended Chain of Utterances-based (CoU) prompting strategy by incorporating carefully crafted malicious demonstrations for trustworthiness attack. Our extensive experiments encompass recent and representative series of open-source LLMs, including Vicuna, MPT, Falcon, Mistral, and Llama 2. The empirical outcomes underscore the efficacy of our attack strategy across diverse aspects. More interestingly, our result analysis reveals that models with superior performance in general NLP tasks do not always have greater trustworthiness; in fact, larger models can be more vulnerable to attacks. Additionally, models that have undergone instruction tuning, focusing on instruction following, tend to be more susceptible, although fine-tuning LLMs for safety alignment proves effective in mitigating adversarial trustworthiness attacks.

Updated: 2024-04-02 15:21:55

标题: 开源LLMs有多可信？在恶意演示下的评估显示它们的脆弱性

摘要: 开源大型语言模型（LLMs）的快速进展显著推动了人工智能的发展。然而，对它们的可信度仍有限了解。在没有足够可信度的情况下大规模部署这些模型可能会带来重大风险，突显出迅速发现这些问题的必要性。在这项工作中，我们对开源LLMs进行了可信度的对抗评估，从毒性、刻板印象、伦理、幻觉、公平性、阿谀奉承、隐私以及对抗性演示的稳健性等八个不同方面进行了审查。我们提出了advCoU，一种基于连续话语链（CoU）的扩展提示策略，通过精心设计的恶意演示来进行可信度攻击。我们的广泛实验涵盖了最近和代表性的一系列开源LLMs，包括维库纳、MPT、猎鹰、迷雾和Llama 2。实证结果突显了我们攻击策略在各个方面的有效性。更有趣的是，我们的结果分析显示，在一般自然语言处理任务中表现出色的模型并不总是具有更高的可信度；事实上，更大的模型可能更容易受到攻击。此外，经过指令调优、专注于指令遵循的模型往往更容易受攻击，尽管对LLMs进行安全调整的微调在减轻对抗性可信度攻击方面证明有效。

更新时间: 2024-04-02 15:21:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.09447v2

Large Language Models for Orchestrating Bimanual Robots

Although there has been rapid progress in endowing robots with the ability to solve complex manipulation tasks, generating control policies for bimanual robots to solve tasks involving two hands is still challenging because of the difficulties in effective temporal and spatial coordination. With emergent abilities in terms of step-by-step reasoning and in-context learning, Large Language Models (LLMs) have taken control of a variety of robotic tasks. However, the nature of language communication via a single sequence of discrete symbols makes LLM-based coordination in continuous space a particular challenge for bimanual tasks. To tackle this challenge for the first time by an LLM, we present LAnguage-model-based Bimanual ORchestration (LABOR), an agent utilizing an LLM to analyze task configurations and devise coordination control policies for addressing long-horizon bimanual tasks. In the simulated environment, the LABOR agent is evaluated through several everyday tasks on the NICOL humanoid robot. Reported success rates indicate that overall coordination efficiency is close to optimal performance, while the analysis of failure causes, classified into spatial and temporal coordination and skill selection, shows that these vary over tasks. The project website can be found at http://labor-agent.github.io

Updated: 2024-04-02 15:08:35

标题: 大型语言模型用于协调双手机器人

摘要: 尽管在赋予机器人解决复杂操作任务的能力方面取得了快速进展，但为双手机器人生成控制策略以解决涉及两只手的任务仍然具有挑战性，因为在有效的时间和空间协调方面存在困难。具有逐步推理和上下文学习能力的大型语言模型（LLMs）已经控制了各种机器人任务。然而，通过一系列离散符号进行语言交流的性质使LLM基于连续空间的协调对于双手任务成为一个特殊挑战。为了首次由LLM解决这一挑战，我们提出了基于语言模型的双手协调（LABOR），这是一种利用LLM分析任务配置并制定协调控制政策来解决长期双手任务的代理。在模拟环境中，通过NICOL人形机器人进行了几个日常任务的评估LABOR代理。报告的成功率表明整体协调效率接近最佳性能，而对失败原因的分析，分为空间和时间协调以及技能选择，显示这些在任务之间有所变化。该项目网站地址为http://labor-agent.github.io。

更新时间: 2024-04-02 15:08:35

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.02018v1

JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models

This paper presents our system development for SemEval-2024 Task 3: "The Competition of Multimodal Emotion Cause Analysis in Conversations". Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video. However, the complexities of these diverse modalities pose challenges for developing an efficient multimodal emotion cause analysis (ECA) system. Our proposed approach addresses these challenges by a two-step framework. We adopt two different approaches in our implementation. In Approach 1, we employ instruction-tuning with two separate Llama 2 models for emotion and cause prediction. In Approach 2, we use GPT-4V for conversation-level video description and employ in-context learning with annotated conversation using GPT 3.5. Our system wins rank 4, and system ablation experiments demonstrate that our proposed solutions achieve significant performance gains. All the experimental codes are available on Github.

Updated: 2024-04-02 14:52:37

标题: JMI在SemEval 2024任务3中的表现：使用GPT和经过调整的Llama模型进行多模态ECAC的两步方法

摘要: 本文介绍了我们为SemEval-2024任务3开发的系统：“对话中的多模态情绪原因分析竞赛”。有效地捕捉人类对话中的情绪需要整合多种模态，如文本、音频和视频。然而，这些多样化模态的复杂性给开发高效的多模态情绪原因分析（ECA）系统带来挑战。我们提出的方法通过两步框架解决了这些挑战。在我们的实现中采用了两种不同的方法。在方法1中，我们使用两个单独的Llama 2模型进行情绪和原因预测的调整。在方法2中，我们使用GPT-4V进行对话级视频描述，并使用带有GPT 3.5的上下文学习对带有注释的对话进行处理。我们的系统获得了第4名，系统消融实验表明我们提出的解决方案取得了显著的性能提升。所有实验代码均可在Github上找到。

更新时间: 2024-04-02 14:52:37

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.04798v2

Cross-modality debiasing: using language to mitigate sub-population shifts in imaging

Sub-population shift is a specific type of domain shift that highlights changes in data distribution within specific sub-groups or populations between training and testing. Sub-population shift accounts for a significant source of algorithmic bias and calls for distributional robustness. Recent studies found inherent distributional robustness in multi-modality foundation models, such as the vision-language model CLIP, yet this robustness is vulnerable through parameter fine-tuning. In this paper, we propose leveraging the connection of robustness among different modalities and reshaping the distributional robustness of one modality with another. Specifically, in the context of the distributional robustness of CLIP, we propose to leverage natural language inputs to debias the image feature representations, to improve worst-case performance on sub-populations. Our extensive empirical studies show that image representations debiased by natural language can achieve significant performance improvement and reduction of performance instability under sub-population shifts.

Updated: 2024-04-02 14:47:23

标题: 跨模态去偏见：利用语言缓解影像中的次群体转变

摘要: 亚群体转变是一种特定类型的域转变，突出显示在训练和测试之间特定亚组或人口的数据分布的变化。亚群体转变占据了算法偏差的一个重要来源，并呼吁具有分布鲁棒性。最近的研究发现，多模态基础模型（如视觉语言模型CLIP）具有固有的分布鲁棒性，然而这种鲁棒性容易受到参数微调的影响。在本文中，我们提出利用不同模态之间的鲁棒性联系，并用其中一个模态重新塑造另一个模态的分布鲁棒性。具体来说，在CLIP的分布鲁棒性背景下，我们建议利用自然语言输入来消除图像特征表示的偏见，以提高在亚群体中的最坏情况表现。我们的大量实证研究表明，通过自然语言消除偏见的图像表示可以实现显著的性能改进，并在亚群体转变下减少性能的不稳定性。

更新时间: 2024-04-02 14:47:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.07888v2

AUTODIFF: Autoregressive Diffusion Modeling for Structure-based Drug Design

Structure-based drug design (SBDD), which aims to generate molecules that can bind tightly to the target protein, is an essential problem in drug discovery, and previous approaches have achieved initial success. However, most existing methods still suffer from invalid local structure or unrealistic conformation issues, which are mainly due to the poor leaning of bond angles or torsional angles. To alleviate these problems, we propose AUTODIFF, a diffusion-based fragment-wise autoregressive generation model. Specifically, we design a novel molecule assembly strategy named conformal motif that preserves the conformation of local structures of molecules first, then we encode the interaction of the protein-ligand complex with an SE(3)-equivariant convolutional network and generate molecules motif-by-motif with diffusion modeling. In addition, we also improve the evaluation framework of SBDD by constraining the molecular weights of the generated molecules in the same range, together with some new metrics, which make the evaluation more fair and practical. Extensive experiments on CrossDocked2020 demonstrate that our approach outperforms the existing models in generating realistic molecules with valid structures and conformations while maintaining high binding affinity.

Updated: 2024-04-02 14:44:02

标题: AUTODIFF：基于结构的药物设计的自回归扩散建模

摘要: 基于结构的药物设计（SBDD）旨在生成能够紧密结合目标蛋白质的分子，是药物发现中的一个关键问题，先前的方法已经取得了初步成功。然而，大多数现有方法仍然存在无效的局部结构或不切实际的构象问题，主要是由于键角或扭转角度的倾斜不佳所致。为了缓解这些问题，我们提出了AUTODIFF，一种基于扩散的片段式自回归生成模型。具体来说，我们设计了一种名为构象基元的新型分子组装策略，首先保留了分子的局部结构的构象，然后我们使用SE(3)-等变卷积网络对蛋白质-配体复合物的相互作用进行编码，并使用扩散建模逐个构象地生成分子。此外，我们还通过将生成的分子的分子量限制在相同范围内，以及一些新的指标，改进了SBDD的评估框架，使评估更加公平和实用。对CrossDocked2020进行的大量实验表明，我们的方法在生成具有有效结构和构象的现实分子的同时保持高结合亲和力方面优于现有模型。

更新时间: 2024-04-02 14:44:02

领域: cs.LG

下载: http://arxiv.org/abs/2404.02003v1

Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context

We present the first self-supervised multilingual speech model trained exclusively on African speech. The model learned from nearly 60 000 hours of unlabeled speech segments in 21 languages and dialects spoken in sub-Saharan Africa. On the SSA subset of the FLEURS-102 dataset, our approach based on a HuBERT$_{base}$ (0.09B) architecture shows competitive results, for ASR downstream task, compared to the w2v-bert-51 (0.6B) pre-trained model proposed in the FLEURS benchmark, while being more efficient by using 7x less data and 6x less parameters. Furthermore, in the context of a LID downstream task, our approach outperforms FLEURS baselines accuracy by over 22\%.

Updated: 2024-04-02 14:43:36

标题: 非洲为中心的自监督预训练用于撒哈拉以南地区的多语言语音表示

摘要: 我们提出了第一个完全在非洲语音上训练的自监督多语言语音模型。该模型从撒哈拉以南非洲使用的21种语言和方言中学习了近60,000小时的无标记语音片段。在FLEURS-102数据集的SSA子集上，基于HuBERT$_{base}$ (0.09B)架构的我们的方法展现了竞争力强的结果，与在FLEURS基准中提出的w2v-bert-51 (0.6B)预训练模型相比，在ASR下游任务中表现出色，同时使用的数据量少了7倍，参数量少了6倍。此外，在LID下游任务的背景下，我们的方法的准确率比FLEURS基线提高了超过22\%。

更新时间: 2024-04-02 14:43:36

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2404.02000v1

Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning

Reinforcement learning (RL) is a flexible and efficient method for programming micro-robots in complex environments. Here we investigate whether reinforcement learning can provide insights into biological systems when trained to perform chemotaxis. Namely, whether we can learn about how intelligent agents process given information in order to swim towards a target. We run simulations covering a range of agent shapes, sizes, and swim speeds to determine if the physical constraints on biological swimmers, namely Brownian motion, lead to regions where reinforcement learners' training fails. We find that the RL agents can perform chemotaxis as soon as it is physically possible and, in some cases, even before the active swimming overpowers the stochastic environment. We study the efficiency of the emergent policy and identify convergence in agent size and swim speeds. Finally, we study the strategy adopted by the reinforcement learning algorithm to explain how the agents perform their tasks. To this end, we identify three emerging dominant strategies and several rare approaches taken. These strategies, whilst producing almost identical trajectories in simulation, are distinct and give insight into the possible mechanisms behind which biological agents explore their environment and respond to changing conditions.

Updated: 2024-04-02 14:42:52

标题: 多智能体强化学习中的趋化策略出现

摘要: 强化学习（RL）是一种灵活高效的方法，用于在复杂环境中编程微型机器人。在这里，我们调查了强化学习是否能够为生物系统提供洞察，当被训练执行化学趋向时。也就是说，我们是否可以了解智能体如何处理给定信息以向目标游泳。我们进行了一系列模拟，涵盖了各种不同的智能体形状、大小和游泳速度，以确定生物游泳者的物理约束，即布朗运动，是否导致强化学习者的训练失败的区域。我们发现，只要在物理上可能，RL代理就可以执行化学趋向，甚至在主动游泳压倒随机环境之前就能做到这一点。我们研究了新兴策略的效率，并确定了智能体大小和游泳速度的收敛性。最后，我们研究了强化学习算法采用的策略，以解释智能体如何执行任务。为此，我们确定了三种新兴的主导策略和几种罕见的方法。这些策略在模拟中产生几乎相同的轨迹，但却是独特的，并揭示了生物智能体探索环境和应对变化条件的可能机制。

更新时间: 2024-04-02 14:42:52

领域: physics.bio-ph,cs.LG,cs.MA

下载: http://arxiv.org/abs/2404.01999v1

Specularity Factorization for Low-Light Enhancement

We present a new additive image factorization technique that treats images to be composed of multiple latent specular components which can be simply estimated recursively by modulating the sparsity during decomposition. Our model-driven {\em RSFNet} estimates these factors by unrolling the optimization into network layers requiring only a few scalars to be learned. The resultant factors are interpretable by design and can be fused for different image enhancement tasks via a network or combined directly by the user in a controllable fashion. Based on RSFNet, we detail a zero-reference Low Light Enhancement (LLE) application trained without paired or unpaired supervision. Our system improves the state-of-the-art performance on standard benchmarks and achieves better generalization on multiple other datasets. We also integrate our factors with other task specific fusion networks for applications like deraining, deblurring and dehazing with negligible overhead thereby highlighting the multi-domain and multi-task generalizability of our proposed RSFNet. The code and data is released for reproducibility on the project homepage.

Updated: 2024-04-02 14:41:42

标题: 低光增强的镜面因子分解

摘要: 我们提出了一种新的加法图像因子分解技术，将图像视为由多个潜在的镜面成分组成，可以通过在分解过程中调制稀疏性来简单地递归估计。我们的基于模型的RSFNet通过将优化展开成网络层来估计这些因子，只需学习几个标量。设计的结果因子可解释，并可以通过网络融合用于不同的图像增强任务，或者以可控的方式直接由用户组合。基于RSFNet，我们详细介绍了一个无需配对或非配对监督的零参考低光增强（LLE）应用。我们的系统在标准基准测试中提高了最先进的性能，并在多个其他数据集上实现了更好的泛化。我们还将我们的因子与其他特定任务的融合网络集成，用于去雨、去模糊和去雾等应用，几乎没有额外开销，从而突显了我们提出的RSFNet的多领域和多任务泛化能力。为了可重复性，代码和数据已在项目主页上发布。

更新时间: 2024-04-02 14:41:42

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01998v1

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Vision-and-Language navigation (VLN) requires an agent to navigate in unseen environment by following natural language instruction. For task completion, the agent needs to align and integrate various navigation modalities, including instruction, observation and navigation history. Existing works primarily concentrate on cross-modal attention at the fusion stage to achieve this objective. Nevertheless, modality features generated by disparate uni-encoders reside in their own spaces, leading to a decline in the quality of cross-modal fusion and decision. To address this problem, we propose a Dual-levEL AligNment (DELAN) framework by cross-modal contrastive learning. This framework is designed to align various navigation-related modalities before fusion, thereby enhancing cross-modal interaction and action decision-making. Specifically, we divide the pre-fusion alignment into dual levels: instruction-history level and landmark-observation level according to their semantic correlations. We also reconstruct a dual-level instruction for adaptation to the dual-level alignment. As the training signals for pre-fusion alignment are extremely limited, self-supervised contrastive learning strategies are employed to enforce the matching between different modalities. Our approach seamlessly integrates with the majority of existing models, resulting in improved navigation performance on various VLN benchmarks, including R2R, R4R, RxR and CVDN.

Updated: 2024-04-02 14:40:04

标题: DELAN：通过跨模态对比学习进行视觉与语言导航的双层对齐

摘要: 视觉和语言导航（VLN）要求一个代理人通过遵循自然语言指令在未知环境中导航。为了完成任务，代理人需要对齐和整合各种导航模式，包括指令、观察和导航历史。现有的研究主要集中在融合阶段的跨模态注意力，以实现这一目标。然而，由不同单一编码器生成的模态特征存在于它们自己的空间中，导致跨模态融合和决策质量下降。为了解决这个问题，我们提出了一个通过跨模态对比学习实现的双级对齐（DELAN）框架。该框架旨在在融合前对齐各种与导航相关的模态，从而增强跨模态交互和行动决策。具体地，我们根据它们的语义相关性将预融合对齐分为两个级别：指令-历史级别和地标-观察级别。我们还重构了一个双级指令，以适应双级对齐。由于用于预融合对齐的训练信号极为有限，我们采用了自监督对比学习策略来强化不同模态之间的匹配。我们的方法与大多数现有模型无缝集成，提高了在各种VLN基准上的导航性能，包括R2R、R4R、RxR和CVDN。

更新时间: 2024-04-02 14:40:04

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.01994v1

Large Language Models for Mathematicians

Large language models (LLMs) such as ChatGPT have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code. For many professions, LLMs represent an invaluable tool that can speed up and improve the quality of work. In this note, we discuss to what extent they can aid professional mathematicians. We first provide a mathematical description of the transformer model used in all modern language models. Based on recent studies, we then outline best practices and potential issues and report on the mathematical abilities of language models. Finally, we shed light on the potential of LLMs to change how mathematicians work.

Updated: 2024-04-02 14:35:40

标题: 数学家的大型语言模型

摘要: 大型语言模型（LLMs）如ChatGPT因其通用语言理解能力，特别是生成高质量文本或计算机代码的能力而受到广泛关注。对许多职业来说，LLMs代表了一种无价的工具，可以加快并提高工作质量。在这篇文章中，我们讨论它们在多大程度上可以帮助专业数学家。我们首先提供了所有现代语言模型中使用的transformer模型的数学描述。基于最近的研究，我们然后概述了最佳实践和潜在问题，并报告了语言模型的数学能力。最后，我们阐明了LLMs改变数学家工作方式的潜力。

更新时间: 2024-04-02 14:35:40

领域: cs.CL,cs.AI,cs.LG,math.HO

下载: http://arxiv.org/abs/2312.04556v2

Traffic State Estimation from Vehicle Trajectories with Anisotropic Gaussian Processes

Accurately monitoring road traffic state is crucial for various applications, including travel time prediction, traffic control, and traffic safety. However, the lack of sensors often results in incomplete traffic state data, making it challenging to obtain reliable information for decision-making. This paper proposes a novel method for imputing traffic state data using Gaussian processes (GP) to address this issue. We propose a kernel rotation re-parametrization scheme that transforms a standard isotropic GP kernel into an anisotropic kernel, which can better model the congestion propagation in traffic flow data. The model parameters can be estimated by statistical inference using data from sparse probe vehicles or loop detectors. Moreover, the rotated GP method provides statistical uncertainty quantification for the imputed traffic state, making it more reliable. We also extend our approach to a multi-output GP, which allows for simultaneously estimating the traffic state for multiple lanes. We evaluate our method using real-world traffic data from the Next Generation simulation (NGSIM) and HighD programs, along with simulated data representing a traffic bottleneck scenario. Considering current and future mixed traffic of connected vehicles (CVs) and human-driven vehicles (HVs), we experiment with the traffic state estimation (TSE) scheme from 5% to 50% available trajectories, mimicking different CV penetration rates in a mixed traffic environment. We also test the traffic state estimation when traffic flow information is obtained from loop detectors. The results demonstrate the adaptability of our TSE method across different CV penetration rates and types of detectors, achieving state-of-the-art accuracy in scenarios with sparse observation rates.

Updated: 2024-04-02 14:34:37

标题: 使用各向异性高斯过程从车辆轨迹中估计交通状态

摘要: 准确监测道路交通状态对于各种应用至关重要，包括旅行时间预测、交通控制和交通安全。然而，传感器的缺乏经常导致交通状态数据不完整，使得获取可靠信息用于决策变得具有挑战性。本文提出了一种使用高斯过程（GP）对交通状态数据进行插补的新方法来解决这个问题。我们提出了一个核旋转重新参数化方案，将标准各向同性GP核转换为各向异性核，可以更好地模拟交通流数据中的拥堵传播。模型参数可以通过使用来自稀疏探测车辆或环形检测器的数据进行统计推断来估计。此外，旋转的GP方法为插补的交通状态提供了统计不确定性量化，使其更可靠。我们还将我们的方法扩展到多输出GP，可以同时估计多条车道的交通状态。我们使用来自Next Generation模拟（NGSIM）和HighD项目的真实交通数据以及代表交通瓶颈场景的模拟数据来评估我们的方法。考虑到连接车辆（CVs）和人驾驶车辆（HVs）的当前和未来混合交通，我们在混合交通环境中模拟不同CV渗透率的交通状态估计（TSE）方案，从5%到50%的可用轨迹。我们还测试了当交通流信息来自环形检测器时的交通状态估计。结果表明，我们的TSE方法在不同CV渗透率和检测器类型下具有良好的适应性，在观测率稀疏的情况下达到了最先进的准确性。

更新时间: 2024-04-02 14:34:37

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2303.02311v2

Large Human Language Models: A Need and the Challenges

As research in human-centered NLP advances, there is a growing recognition of the importance of incorporating human and social factors into NLP models. At the same time, our NLP systems have become heavily reliant on LLMs, most of which do not model authors. To build NLP systems that can truly understand human language, we must better integrate human contexts into LLMs. This brings to the fore a range of design considerations and challenges in terms of what human aspects to capture, how to represent them, and what modeling strategies to pursue. To address these, we advocate for three positions toward creating large human language models (LHLMs) using concepts from psychological and behavioral sciences: First, LM training should include the human context. Second, LHLMs should recognize that people are more than their group(s). Third, LHLMs should be able to account for the dynamic and temporally-dependent nature of the human context. We refer to relevant advances and present open challenges that need to be addressed and their possible solutions in realizing these goals.

Updated: 2024-04-02 14:30:12

标题: 大型人类语言模型：需求和挑战

摘要: 随着人类中心自然语言处理研究的进展，越来越多的人认识到将人类和社会因素纳入自然语言处理模型的重要性。与此同时，我们的自然语言处理系统已经过分依赖大型语言模型（LLMs），其中大多数并没有对作者进行建模。为了构建真正能够理解人类语言的自然语言处理系统，我们必须更好地将人类背景融入到LLMs中。这引出了一系列设计考虑和挑战，包括捕捉哪些人类因素、如何表示它们以及采取什么建模策略。为了解决这些问题，我们提倡采用心理和行为科学的概念创建大型人类语言模型（LHLMs）的三个立场：首先，语言模型训练应包括人类背景。其次，LHLMs应认识到人们不仅仅是他们所在的群体。第三，LHLMs应能够考虑人类背景的动态和依赖时间的特性。我们提及相关进展，并提出需要解决的开放挑战以及实现这些目标的可能解决方案。

更新时间: 2024-04-02 14:30:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.07751v2

VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models

The booming use of text-to-image generative models has raised concerns about their high risk of producing copyright-infringing content. While probabilistic copyright protection methods provide a probabilistic guarantee against such infringement, in this paper, we introduce Virtually Assured Amplification Attack (VA3), a novel online attack framework that exposes the vulnerabilities of these protection mechanisms. The proposed framework significantly amplifies the probability of generating infringing content on the sustained interactions with generative models and a non-trivial lower-bound on the success probability of each engagement. Our theoretical and experimental results demonstrate the effectiveness of our approach under various scenarios. These findings highlight the potential risk of implementing probabilistic copyright protection in practical applications of text-to-image generative models. Code is available at https://github.com/South7X/VA3.

Updated: 2024-04-02 14:28:26

标题: VA3：对文本到图像生成模型的概率版权保护进行虚拟确认攻击

摘要: 文献摘要：文本到图像生成模型的广泛使用引发了对其高风险产生侵犯版权内容的担忧。虽然概率版权保护方法提供了一种概率保证来抵御此类侵权行为，但在本文中，我们介绍了一种名为几乎确定性放大攻击（VA3）的新型在线攻击框架，它暴露了这些保护机制的漏洞。所提出的框架显著地增加了在与生成模型的持续交互中生成侵权内容的概率，并对每次交互的成功概率设定了一个非平凡的下界。我们的理论和实验结果证明了我们方法在各种情况下的有效性。这些发现突显了在文本到图像生成模型的实际应用中实施概率版权保护的潜在风险。代码可在https://github.com/South7X/VA3获取。

更新时间: 2024-04-02 14:28:26

领域: cs.CR,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2312.00057v2

Deciphering the Interplay between Local Differential Privacy, Average Bayesian Privacy, and Maximum Bayesian Privacy

The swift evolution of machine learning has led to emergence of various definitions of privacy due to the threats it poses to privacy, including the concept of local differential privacy (LDP). Although widely embraced and utilized across numerous domains, this conventional approach to measure privacy still exhibits certain limitations, spanning from failure to prevent inferential disclosure to lack of consideration for the adversary's background knowledge. In this comprehensive study, we introduce Bayesian privacy and delve into the intricate relationship between LDP and its Bayesian counterparts, unveiling novel insights into utility-privacy trade-offs. We introduce a framework that encapsulates both attack and defense strategies, highlighting their interplay and effectiveness. The relationship between LDP and Maximum Bayesian Privacy (MBP) is first revealed, demonstrating that under uniform prior distribution, a mechanism satisfying $\xi$-LDP will satisfy $\xi$-MBP and conversely $\xi$-MBP also confers 2$\xi$-LDP. Our next theoretical contribution are anchored in the rigorous definitions and relationships between Average Bayesian Privacy (ABP) and Maximum Bayesian Privacy (MBP), encapsulated by equations $\epsilon_{p,a} \leq \frac{1}{\sqrt{2}}\sqrt{(\epsilon_{p,m} + \epsilon)\cdot(e^{\epsilon_{p,m} + \epsilon} - 1)}$. These relationships fortify our understanding of the privacy guarantees provided by various mechanisms. Our work not only lays the groundwork for future empirical exploration but also promises to facilitate the design of privacy-preserving algorithms, thereby fostering the development of trustworthy machine learning solutions.

Updated: 2024-04-02 14:28:06

标题: 解密局部差分隐私、平均贝叶斯隐私和最大贝叶斯隐私之间的相互作用

摘要: 机器学习的迅速发展导致了各种对隐私的定义的出现，这是由于它对隐私造成的威胁，包括局部差分隐私（LDP）的概念。尽管被广泛接受并应用于许多领域，但这种常规的隐私测量方法仍然存在一定的局限性，从无法防止推理披露到缺乏对对手背景知识的考虑。在这项全面研究中，我们介绍了贝叶斯隐私，并深入探讨了LDP及其贝叶斯对应物之间的复杂关系，揭示了关于效用和隐私权衡的新见解。我们提出了一个涵盖攻击和防御策略的框架，突出它们之间的相互作用和有效性。首次揭示了LDP与最大贝叶斯隐私（MBP）之间的关系，表明在均匀先验分布下，满足ξ-LDP的机制将满足ξ-MBP，反之亦然ξ-MBP也具有2ξ-LDP。我们下一个理论贡献是基于平均贝叶斯隐私（ABP）和最大贝叶斯隐私（MBP）之间的严格定义和关系，由方程式εp，a≤1/√2√（εp，m+ε）⋅（e^(εp，m+ε)−1）所概括。这些关系加强了我们对各种机制提供的隐私保证的理解。我们的工作不仅为未来的实证探索奠定了基础，而且有望促进隐私保护算法的设计，从而促进可信赖的机器学习解决方案的发展。

更新时间: 2024-04-02 14:28:06

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2403.16591v3

Predicting the Intention to Interact with a Service Robot:the Role of Gaze Cues

For a service robot, it is crucial to perceive as early as possible that an approaching person intends to interact: in this case, it can proactively enact friendly behaviors that lead to an improved user experience. We solve this perception task with a sequence-to-sequence classifier of a potential user intention to interact, which can be trained in a self-supervised way. Our main contribution is a study of the benefit of features representing the person's gaze in this context. Extensive experiments on a novel dataset show that the inclusion of gaze cues significantly improves the classifier performance (AUROC increases from 84.5% to 91.2%); the distance at which an accurate classification can be achieved improves from 2.4 m to 3.2 m. We also quantify the system's ability to adapt to new environments without external supervision. Qualitative experiments show practical applications with a waiter robot.

Updated: 2024-04-02 14:22:54

标题: 预测与服务机器人互动意图：凝视线索的作用

摘要: 对于服务机器人来说，尽早感知到一个接近的人有意互动是至关重要的：在这种情况下，它可以主动展示友好行为，从而提高用户体验。我们通过一个潜在用户意图互动的序列到序列分类器来解决这个感知任务，该分类器可以通过自监督方式进行训练。我们的主要贡献是研究在这种情况下代表人的注视的特征的益处。对一个新颖数据集进行的大量实验表明，包含注视线索显著提高了分类器的性能（AUROC从84.5%增加到91.2%）；可以实现准确分类的距离从2.4米提高到3.2米。我们还量化了系统在没有外部监督的情况下适应新环境的能力。定性实验展示了服务员机器人的实际应用。

更新时间: 2024-04-02 14:22:54

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01986v1

MAgNET: A Graph U-Net Architecture for Mesh-Based Simulations

In many cutting-edge applications, high-fidelity computational models prove to be too slow for practical use and are therefore replaced by much faster surrogate models. Recently, deep learning techniques have increasingly been utilized to accelerate such predictions. To enable learning on large-dimensional and complex data, specific neural network architectures have been developed, including convolutional and graph neural networks. In this work, we present a novel encoder-decoder geometric deep learning framework called MAgNET, which extends the well-known convolutional neural networks to accommodate arbitrary graph-structured data. MAgNET consists of innovative Multichannel Aggregation (MAg) layers and graph pooling/unpooling layers, forming a graph U-Net architecture that is analogous to convolutional U-Nets. We demonstrate the predictive capabilities of MAgNET in surrogate modeling for non-linear finite element simulations in the mechanics of solids.

Updated: 2024-04-02 14:22:26

标题: MAgNET: 一种用于基于网格的模拟的图形U-Net架构

摘要: 在许多前沿应用中，高保真计算模型被证明对实际应用来说速度太慢，因此被远比较快的代理模型所取代。最近，深度学习技术越来越多地被用来加速这类预测。为了实现对大维度和复杂数据的学习，已经开发了特定的神经网络架构，包括卷积神经网络和图神经网络。在这项工作中，我们提出了一个名为MAgNET的新型编码器-解码器几何深度学习框架，它将著名的卷积神经网络扩展到容纳任意图结构数据。MAgNET由创新的Multichannel Aggregation（MAg）层和图池化/解池化层组成，形成了类似于卷积U-Net的图U-Net架构。我们展示了MAgNET在固体力学非线性有限元模拟的代理建模中的预测能力。

更新时间: 2024-04-02 14:22:26

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2211.00713v3

Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials

Due to the substantial number of clinicians, patients, and data collection environments involved in clinical trials, gathering data of superior quality poses a significant challenge. In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders. We propose using these speech recordings to verify the identities of enrolled patients and identify and exclude the individuals who try to enroll multiple times in the same trial. Since clinical studies are often conducted across different countries, creating a system that can perform speaker verification in diverse languages without additional development effort is imperative. We evaluate pre-trained TitaNet, ECAPA-TDNN, and SpeakerNet models by enrolling and testing with speech-impaired patients speaking English, German, Danish, Spanish, and Arabic languages. Our results demonstrate that tested models can effectively generalize to clinical speakers, with less than 2.7% EER for European Languages and 8.26% EER for Arabic. This represents a significant step in developing more versatile and efficient speaker verification systems for cognitive and mental health clinical trials that can be used across a wide range of languages and dialects, substantially reducing the effort required to develop speaker verification systems for multiple languages. We also evaluate how speech tasks and number of speakers involved in the trial influence the performance and show that the type of speech tasks impacts the model performance.

Updated: 2024-04-02 14:19:30

标题: 在临床试验中的零唇多语言说话者验证

摘要: 由于参与临床试验的临床医生、患者和数据收集环境数量庞大，收集高质量数据面临重大挑战。在临床试验中，患者根据其语音数据进行评估，以便检测和监测认知和心理健康障碍。我们建议使用这些语音录音来验证已注册患者的身份，并识别并排除试图多次在同一试验中注册的个体。由于临床研究通常在不同国家进行，因此创建一个可以在不同语言中执行说话者验证而无需额外开发工作的系统至关重要。我们通过招募和测试说英语、德语、丹麦语、西班牙语和阿拉伯语的语音受损患者，评估了预训练的TitaNet、ECAPA-TDNN和SpeakerNet模型。我们的结果表明，经过测试的模型可以有效地推广到临床讲话者，对于欧洲语言的EER不到2.7%，对于阿拉伯语为8.26%。这代表了为认知和心理健康临床试验开发更多功能和高效的说话者验证系统迈出了重要一步，这些系统可以在多种语言和方言中使用，大大减少了为多种语言开发说话者验证系统所需的工作量。我们还评估了言语任务和参与试验的说话者数量如何影响性能，并展示了言语任务类型如何影响模型性能。

更新时间: 2024-04-02 14:19:30

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2404.01981v1

Joint-Task Regularization for Partially Labeled Multi-Task Learning

Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets. Most multi-task learning methods depend on fully labeled datasets wherein each input example is accompanied by ground-truth labels for all target tasks. Unfortunately, curating such datasets can be prohibitively expensive and impractical, especially for dense prediction tasks which require per-pixel labels for each image. With this in mind, we propose Joint-Task Regularization (JTR), an intuitive technique which leverages cross-task relations to simultaneously regularize all tasks in a single joint-task latent space to improve learning when data is not fully labeled for all tasks. JTR stands out from existing approaches in that it regularizes all tasks jointly rather than separately in pairs -- therefore, it achieves linear complexity relative to the number of tasks while previous methods scale quadratically. To demonstrate the validity of our approach, we extensively benchmark our method across a wide variety of partially labeled scenarios based on NYU-v2, Cityscapes, and Taskonomy.

Updated: 2024-04-02 14:16:59

标题: 部分标记的多任务学习中的联合任务正则化

摘要: 多任务学习在机器学习领域越来越受欢迎，但其实用性受到需要大量标记数据集的限制。大多数多任务学习方法依赖于完全标记的数据集，其中每个输入示例都附带了所有目标任务的地面真实标签。不幸的是，整理这样的数据集可能成本高昂且不切实际，特别是对于需要为每个图像提供像素级标签的密集预测任务。考虑到这一点，我们提出了联合任务正则化（JTR），这是一种直观的技术，利用跨任务关系在单个联合任务潜在空间中同时对所有任务进行正则化，以改善在数据未完全标记的情况下的学习效果。JTR与现有方法的不同之处在于，它联合正则化所有任务，而不是分开成对地进行正则化 - 因此，相对于任务数，它实现了线性复杂度，而以前的方法是二次的。为了证明我们方法的有效性，我们广泛地在基于NYU-v2、Cityscapes和Taskonomy的各种部分标记场景中对我们的方法进行基准测试。

更新时间: 2024-04-02 14:16:59

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01976v1

DSGNN: A Dual-View Supergrid-Aware Graph Neural Network for Regional Air Quality Estimation

Air quality estimation can provide air quality for target regions without air quality stations, which is useful for the public. Existing air quality estimation methods divide the study area into disjointed grid regions, and apply 2D convolution to model the spatial dependencies of adjacent grid regions based on the first law of geography, failing to model the spatial dependencies of distant grid regions. To this end, we propose a Dual-view Supergrid-aware Graph Neural Network (DSGNN) for regional air quality estimation, which can model the spatial dependencies of distant grid regions from dual views (i.e., satellite-derived aerosol optical depth (AOD) and meteorology). Specifically, images are utilized to represent the regional data (i.e., AOD data and meteorology data). The dual-view supergrid learning module is introduced to generate supergrids in a parameterized way. Based on the dual-view supergrids, the dual-view implicit correlation encoding module is introduced to learn the correlations between pairwise supergrids. In addition, the dual-view message passing network is introduced to implement the information interaction on the supergrid graphs and images. Extensive experiments on two real-world datasets demonstrate that DSGNN achieves the state-of-the-art performances on the air quality estimation task, outperforming the best baseline by an average of 19.64% in MAE.

Updated: 2024-04-02 14:16:57

标题: DSGNN: 一种用于区域空气质量估计的双视图超网格感知图神经网络

摘要: 空气质量估计可以为没有空气质量监测站的目标地区提供空气质量信息，这对公众很有用。现有的空气质量估计方法将研究区域划分为不相交的网格区域，并根据地理学的第一定律应用2D卷积来建模相邻网格区域的空间依赖关系，未能建模远距离网格区域的空间依赖关系。为此，我们提出了一种用于区域空气质量估计的双视图超网格感知图神经网络（DSGNN），可以从双视图（即卫星获取的气溶胶光学厚度（AOD）和气象数据）中建模远距离网格区域的空间依赖关系。具体来说，图像被用来表示区域数据（即AOD数据和气象数据）。双视图超网格学习模块以参数化方式生成超网格。基于双视图超网格，引入了双视图隐性相关编码模块来学习成对超网格之间的相关性。此外，引入了双视图消息传递网络来实现超网格图和图像上的信息交互。对两个真实世界数据集进行了大量实验，结果表明DSGNN在空气质量估计任务上取得了最先进的性能，平均优于最佳基线19.64％的MAE。

更新时间: 2024-04-02 14:16:57

领域: cs.LG

下载: http://arxiv.org/abs/2404.01975v1

Dual-Activated Lightweight Attention ResNet50 for Automatic Histopathology Breast Cancer Image Classification

Automatic breast cancer classification in histopathology images is crucial for precise diagnosis and treatment planning. Recently, classification approaches based on the ResNet architecture have gained popularity for significantly improving accuracy by using skip connections to mitigate vanishing gradient problems, thereby integrating low-level and high-level feature information. Nevertheless, the conventional ResNet architecture faces challenges such as data imbalance and limited interpretability, necessitating cross-domain knowledge and collaboration among medical experts. This study effectively addresses these challenges by introducing a novel method for breast cancer classification, the Dual-Activated Lightweight Attention ResNet50 (DALAResNet50) model. It integrates a pre-trained ResNet50 model with a lightweight attention mechanism, embedding an attention module in the fourth layer of ResNet50 and incorporating two fully connected layers with LeakyReLU and ReLU activation functions to enhance feature learning capabilities. The DALAResNet50 method was tested on breast cancer histopathology images from the BreakHis Database across magnification factors of 40X, 100X, 200X, and 400X, achieving accuracies of 98.5%, 98.7%, 97.9%, and 94.3%, respectively. It was also compared with established deep learning models such as SEResNet50, DenseNet121, VGG16, VGG16Inception, ViT, Swin-Transformer, Dinov2_Vitb14, and ResNet50. The reported results of DALAResNet50 have been shown to outperform the compared approaches regarding accuracy, F1 score, IBA, and GMean, demonstrating significant robustness and broad applicability when dealing with different magnifications and imbalanced breast cancer datasets

Updated: 2024-04-02 14:14:26

标题: 双重激活轻量级注意力ResNet50用于自动组织病理学乳腺癌图像分类

摘要: 在组织病理学图像中自动分类乳腺癌对于精确诊断和治疗规划至关重要。最近，基于ResNet架构的分类方法因使用跳跃连接以减轻梯度消失问题而受到欢迎，从而整合低级和高级特征信息显着提高准确性。然而，传统的ResNet架构面临数据不平衡和有限的解释能力等挑战，需要跨领域知识和医学专家之间的合作。本研究通过引入一种新颖的乳腺癌分类方法，即双激活轻量级注意力ResNet50（DALAResNet50）模型，有效地解决了这些挑战。该方法将预训练的ResNet50模型与轻量级注意力机制相结合，在ResNet50的第四层中嵌入一个注意力模块，并将两个具有LeakyReLU和ReLU激活函数的全连接层整合进来，以增强特征学习能力。DALAResNet50方法在BreakHis数据库中的乳腺癌组织病理学图像上进行了测试，跨40X、100X、200X和400X的放大因子，分别实现了98.5%、98.7%、97.9%和94.3%的准确率。它还与建立的深度学习模型如SEResNet50、DenseNet121、VGG16、VGG16Inception、ViT、Swin-Transformer、Dinov2_Vitb14和ResNet50进行了比较。据报道，DALAResNet50的结果在准确性、F1分数、IBA和GMean方面优于所比较的方法，表明在处理不同放大率和不平衡的乳腺癌数据集时具有显著的稳健性和广泛适用性。

更新时间: 2024-04-02 14:14:26

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2308.13150v8

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.

Updated: 2024-04-02 14:14:24

标题: 利用原型表示来减轻社会偏见，而不需要人口统计信息

摘要: 减轻社会偏见通常需要识别与每个数据样本相关联的社会群体。在本文中，我们提出了DAFair，这是一种应对语言模型中社会偏见的新方法。与传统方法依赖于明确的人口统计标签不同，我们的方法不需要任何此类信息。相反，我们利用预定义的人口统计原型文本，并在微调过程中引入正则化项来减轻模型表示中的偏见。我们在两个任务和两个模型上的实证结果表明，与不依赖于标记数据的先前方法相比，我们的方法的有效性。此外，通过有限的人口统计注释数据，我们的方法优于常见的去偏方法。

更新时间: 2024-04-02 14:14:24

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2403.09516v2

PatchCURE: Improving Certifiable Robustness, Model Utility, and Computation Efficiency of Adversarial Patch Defenses

State-of-the-art defenses against adversarial patch attacks can now achieve strong certifiable robustness with a marginal drop in model utility. However, this impressive performance typically comes at the cost of 10-100x more inference-time computation compared to undefended models -- the research community has witnessed an intense three-way trade-off between certifiable robustness, model utility, and computation efficiency. In this paper, we propose a defense framework named PatchCURE to approach this trade-off problem. PatchCURE provides sufficient "knobs" for tuning defense performance and allows us to build a family of defenses: the most robust PatchCURE instance can match the performance of any existing state-of-the-art defense (without efficiency considerations); the most efficient PatchCURE instance has similar inference efficiency as undefended models. Notably, PatchCURE achieves state-of-the-art robustness and utility performance across all different efficiency levels, e.g., 16-23% absolute clean accuracy and certified robust accuracy advantages over prior defenses when requiring computation efficiency to be close to undefended models. The family of PatchCURE defenses enables us to flexibly choose appropriate defenses to satisfy given computation and/or utility constraints in practice.

Updated: 2024-04-02 14:14:16

标题: PatchCURE：提高对抗性贴片防御的证明鲁棒性、模型效用和计算效率

摘要: 目前针对对抗性贴片攻击的最新防御技术可以在模型效用略微下降的情况下实现强大的可证明鲁棒性。然而，这种令人印象深刻的性能通常是以比未受防御的模型高出10-100倍的推理时间计算为代价的 - 研究界目睹了可证明鲁棒性、模型效用和计算效率之间的激烈三方权衡。在本文中，我们提出了一个名为PatchCURE的防御框架来处理这个权衡问题。PatchCURE提供了足够的"旋钮"来调整防御性能，并允许我们构建一系列防御措施：最强大的PatchCURE实例可以匹配任何现有最新防御技术的性能（忽略效率考虑）；最高效的PatchCURE实例具有与未受防御模型类似的推理效率。值得注意的是，PatchCURE在所有不同效率水平上均实现了最新的鲁棒性和效用性能，例如，在需要计算效率接近未受防御模型时，相对于先前的防御措施，16-23%的绝对干净准确率和认证鲁棒准确率优势。PatchCURE防御系列使我们能够灵活选择适当的防御措施，以满足实践中给定的计算和/或效用约束条件。

更新时间: 2024-04-02 14:14:16

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2310.13076v2

On the Stability of Iterative Retraining of Generative Models on their own Data

Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models will be trained on both clean and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets -- from classical training on real data to self-consuming generative models trained on purely synthetic data. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough and the proportion of clean training data (w.r.t. synthetic data) is large enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models on CIFAR10 and FFHQ.

Updated: 2024-04-02 14:09:40

标题: 关于生成模型在自身数据上迭代重新训练的稳定性

摘要: 深度生成模型在建模复杂数据方面取得了巨大进展，通常展现出生成质量超过典型人类辨认样本真实性的能力。毫无疑问，这一成功的关键推动因素是这些模型消耗的大量网络规模数据。由于这些模型引人注目的性能和易获取性，网络将不可避免地被合成内容所充斥。这一事实直接暗示着未来的生成模型将在过去模型产生的纯净和人工生成数据的基础上进行训练。在本文中，我们制定了一个框架，以严谨地研究训练生成模型对混合数据集的影响——从传统训练真实数据到仅在纯合成数据上训练的自我消耗生成模型。我们首先证明了在初始生成模型足够好地逼近数据分布且干净训练数据的比例（相对于合成数据）足够大的条件下，迭代训练的稳定性。我们通过在CIFAR10和FFHQ上迭代训练标准化流和最先进的扩散模型，从合成和自然图像上验证了我们的理论。

更新时间: 2024-04-02 14:09:40

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.00429v5

Towards Leveraging AutoML for Sustainable Deep Learning: A Multi-Objective HPO Approach on Deep Shift Neural Networks

Deep Learning (DL) has advanced various fields by extracting complex patterns from large datasets. However, the computational demands of DL models pose environmental and resource challenges. Deep shift neural networks (DSNNs) offer a solution by leveraging shift operations to reduce computational complexity at inference. Following the insights from standard DNNs, we are interested in leveraging the full potential of DSNNs by means of AutoML techniques. We study the impact of hyperparameter optimization (HPO) to maximize DSNN performance while minimizing resource consumption. Since this combines multi-objective (MO) optimization with accuracy and energy consumption as potentially complementary objectives, we propose to combine state-of-the-art multi-fidelity (MF) HPO with multi-objective optimization. Experimental results demonstrate the effectiveness of our approach, resulting in models with over 80\% in accuracy and low computational cost. Overall, our method accelerates efficient model development while enabling sustainable AI applications.

Updated: 2024-04-02 14:03:37

标题: 朝着利用AutoML实现可持续深度学习：基于深度转移神经网络的多目标HPO方法

摘要: 深度学习（DL）通过从大型数据集中提取复杂模式推动了各个领域的发展。然而，DL模型的计算需求带来了环境和资源挑战。深度移位神经网络（DSNNs）通过利用移位操作来减少推断时的计算复杂性，提供了一种解决方案。在借鉴标准DNNs的见解的基础上，我们有兴趣通过AutoML技术充分发挥DSNNs的潜力。我们研究了超参数优化（HPO）对于最大化DSNN性能并最小化资源消耗的影响。由于这将准确性和能源消耗作为潜在互补目标结合了多目标（MO）优化，我们提出将最先进的多保真度（MF）HPO与多目标优化相结合。实验结果表明我们的方法的有效性，产生了准确率超过80％且计算成本低的模型。总体而言，我们的方法加速了高效模型开发，同时实现了可持续的人工智能应用。

更新时间: 2024-04-02 14:03:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.01965v1

Reinforcement Learning with Elastic Time Steps

Traditional Reinforcement Learning (RL) algorithms are usually applied in robotics to learn controllers that act with a fixed control rate. Given the discrete nature of RL algorithms, they are oblivious to the effects of the choice of control rate: finding the correct control rate can be difficult and mistakes often result in excessive use of computing resources or even lack of convergence. We propose Soft Elastic Actor-Critic (SEAC), a novel off-policy actor-critic algorithm to address this issue. SEAC implements elastic time steps, time steps with a known, variable duration, which allow the agent to change its control frequency to adapt to the situation. In practice, SEAC applies control only when necessary, minimizing computational resources and data usage. We evaluate SEAC's capabilities in simulation in a Newtonian kinematics maze navigation task and on a 3D racing video game, Trackmania. SEAC outperforms the SAC baseline in terms of energy efficiency and overall time management, and most importantly without the need to identify a control frequency for the learned controller. SEAC demonstrated faster and more stable training speeds than SAC, especially at control rates where SAC struggled to converge. We also compared SEAC with a similar approach, the Continuous-Time Continuous-Options (CTCO) model, and SEAC resulted in better task performance. These findings highlight the potential of SEAC for practical, real-world RL applications in robotics.

Updated: 2024-04-02 14:02:07

标题: 使用弹性时间步长的强化学习

摘要: 传统的强化学习（RL）算法通常应用于机器人学习控制器，这些控制器以固定的控制速率行动。由于RL算法的离散性质，它们对控制速率的选择效果视而不见：找到正确的控制速率可能很困难，错误通常导致计算资源的过度使用甚至无法收敛。我们提出了Soft Elastic Actor-Critic（SEAC），这是一种新颖的离策略actor-critic算法，旨在解决这个问题。SEAC实现了弹性时间步长，即具有已知的、可变持续时间的时间步长，允许代理根据情况改变其控制频率。在实践中，SEAC仅在必要时应用控制，最小化计算资源和数据使用。我们在模拟环境中评估了SEAC在牛顿运动学迷宫导航任务和3D赛车视频游戏Trackmania中的能力。SEAC在能源效率和整体时间管理方面优于SAC基线，而且最重要的是不需要为学习的控制器识别控制频率。SEAC表现出比SAC更快、更稳定的训练速度，特别是在SAC难以收敛的控制速率下。我们还将SEAC与类似方法Continuous-Time Continuous-Options（CTCO）模型进行了比较，结果表明SEAC在任务性能方面表现更好。这些发现突显了SEAC在机器人实际应用中的潜力。

更新时间: 2024-04-02 14:02:07

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2402.14961v2

CAM-Based Methods Can See through Walls

CAM-based methods are widely-used post-hoc interpretability method that produce a saliency map to explain the decision of an image classification model. The saliency map highlights the important areas of the image relevant to the prediction. In this paper, we show that most of these methods can incorrectly attribute an important score to parts of the image that the model cannot see. We show that this phenomenon occurs both theoretically and experimentally. On the theory side, we analyze the behavior of GradCAM on a simple masked CNN model at initialization. Experimentally, we train a VGG-like model constrained to not use the lower part of the image and nevertheless observe positive scores in the unseen part of the image. This behavior is evaluated quantitatively on two new datasets. We believe that this is problematic, potentially leading to mis-interpretation of the model's behavior.

Updated: 2024-04-02 13:57:30

标题: 基于CAM的方法可以穿墙看见

摘要: CAM-based方法是一种广泛使用的事后可解释性方法，它产生一个显著性地图来解释图像分类模型的决策。显著性地图突出显示了与预测相关的图像重要区域。在本文中，我们展示了大多数这些方法可能错误地将重要分数归因于模型无法看到的图像部分。我们展示了这种现象在理论上和实验上都发生。在理论方面，我们分析了在初始化时掩蔽的简单CNN模型上GradCAM的行为。在实验上，我们训练了一个类似VGG的模型，限制其不使用图像的下部分，但仍然观察到未见部分图像中的正分数。这种行为在两个新数据集上进行了定量评估。我们认为这是有问题的，可能导致对模型行为的错误解释。

更新时间: 2024-04-02 13:57:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01964v1

Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code

In the realm of formal theorem proving, the Coq proof assistant stands out for its rigorous approach to verifying mathematical assertions and software correctness. Despite the advances in artificial intelligence and machine learning, the specialized nature of Coq syntax and semantics poses unique challenges for Large Language Models (LLMs). Addressing this gap, we present a comprehensive dataset specifically designed to enhance LLMs' proficiency in interpreting and generating Coq code. This dataset, derived from a collection of over 10,000 Coq source files, encompasses a wide array of propositions, proofs, and definitions, enriched with metadata including source references and licensing information. Our primary aim is to facilitate the development of LLMs capable of generating syntactically correct and semantically meaningful Coq constructs, thereby advancing the frontier of automated theorem proving. Initial experiments with this dataset have showcased its significant potential; models trained on this data exhibited enhanced accuracy in Coq code generation. Notably, a particular experiment revealed that a fine-tuned LLM was capable of generating 141 valid proofs for a basic lemma, highlighting the dataset's utility in facilitating the discovery of diverse and valid proof strategies. This paper discusses the dataset's composition, the methodology behind its creation, and the implications of our findings for the future of machine learning in formal verification. The dataset is accessible for further research and exploration: https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1

Updated: 2024-04-02 13:54:47

标题: 提升形式定理证明：用于在Coq代码上训练AI模型的全面数据集

摘要: 在形式定理证明领域，Coq证明助手以其严格的方法来验证数学断言和软件正确性而脱颖而出。尽管人工智能和机器学习取得了进展，但Coq语法和语义的专门性特质为大型语言模型(LLMs)带来了独特的挑战。为了填补这一空白，我们提出了一个专门设计的全面数据集，旨在提高LLMs在解释和生成Coq代码方面的能力。这个数据集源自超过10,000个Coq源文件的集合，涵盖了各种命题、证明和定义，丰富了包括源参考和许可信息在内的元数据。我们的主要目标是促进能够生成语法正确和语义有意义的Coq结构的LLMs的发展，从而推进自动定理证明的前沿。对这个数据集的初步实验展示了其巨大潜力；在这些数据上训练的模型在Coq代码生成方面表现出了增强的准确性。值得注意的是，一项特定实验显示，经过精细调整的LLM能够为一个基本引理生成141个有效证明，突显了数据集在促进多样化和有效证明策略发现方面的实用性。本文讨论了数据集的组成、创建背后的方法论，以及我们发现对形式验证中机器学习未来的影响。该数据集可供进一步研究和探索使用：https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1

更新时间: 2024-04-02 13:54:47

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2403.12627v2

Bi-LORA: A Vision-Language Approach for Synthetic Image Detection

Advancements in deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs), have ushered in an era of generating highly realistic images. While this technological progress has captured significant interest, it has also raised concerns about the potential difficulty in distinguishing real images from their synthetic counterparts. This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs). We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images. The pivotal conceptual shift in our methodology revolves around reframing binary classification as an image captioning task, leveraging the distinctive capabilities of cutting-edge VLM, notably bootstrapping language image pre-training (BLIP2). Rigorous and comprehensive experiments are conducted to validate the effectiveness of our proposed approach, particularly in detecting unseen diffusion-generated images from unknown diffusion-based generative models during training, showcasing robustness to noise, and demonstrating generalization capabilities to GANs. The obtained results showcase an impressive average accuracy of 93.41% in synthetic image detection on unseen generation models. The code and models associated with this research can be publicly accessed at https://github.com/Mamadou-Keita/VLM-DETECT.

Updated: 2024-04-02 13:54:22

标题: Bi-LORA：一种用于合成图像检测的视觉-语言方法

摘要: 深度图像合成技术的进步，如生成对抗网络（GAN）和扩散模型（DM），开启了一个生成高度逼真图像的时代。虽然这种技术进步引起了广泛关注，但也引发了人们对于区分真实图像和合成图像的潜在困难的担忧。本文汲取了视觉和语言之间强大的收敛能力的灵感，结合视觉-语言模型（VLMs）的零样本特性。我们引入了一种名为Bi-LORA的创新方法，利用VLMs结合低秩适应（LORA）调整技术，以增强对于未见过的模型生成图像的合成图像检测的准确性。我们方法的关键概念转变在于将二元分类重新构建为图像字幕任务，利用尖端VLM的独特能力，特别是引导语言图像预训练（BLIP2）。我们进行了严谨和全面的实验，以验证我们提出的方法的有效性，特别是在检测训练期间从未见过的扩散生成图像和未知扩散生成模型中，展示对噪声的稳健性，并展示对GAN的泛化能力。获得的结果展示了在未见生成模型上的合成图像检测中的令人印象深刻的平均准确率为93.41%。与这项研究相关的代码和模型可以在https://github.com/Mamadou-Keita/VLM-DETECT上公开访问。

更新时间: 2024-04-02 13:54:22

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.01959v1

MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few Labels

Human activity recognition (HAR) will be an essential function of various emerging applications. However, HAR typically encounters challenges related to modality limitations and label scarcity, leading to an application gap between current solutions and real-world requirements. In this work, we propose MESEN, a multimodal-empowered unimodal sensing framework, to utilize unlabeled multimodal data available during the HAR model design phase for unimodal HAR enhancement during the deployment phase. From a study on the impact of supervised multimodal fusion on unimodal feature extraction, MESEN is designed to feature a multi-task mechanism during the multimodal-aided pre-training stage. With the proposed mechanism integrating cross-modal feature contrastive learning and multimodal pseudo-classification aligning, MESEN exploits unlabeled multimodal data to extract effective unimodal features for each modality. Subsequently, MESEN can adapt to downstream unimodal HAR with only a few labeled samples. Extensive experiments on eight public multimodal datasets demonstrate that MESEN achieves significant performance improvements over state-of-the-art baselines in enhancing unimodal HAR by exploiting multimodal data.

Updated: 2024-04-02 13:54:05

标题: MESEN：利用多模态数据设计具有少量标签的单模态人类活动识别

摘要: 人类活动识别（HAR）将成为各种新兴应用的基本功能。然而，HAR通常面临与模态限制和标签稀缺相关的挑战，导致当前解决方案与现实需求之间存在应用差距。在这项工作中，我们提出了MESEN，这是一个多模态增强的单模态传感框架，用于利用HAR模型设计阶段可用的无标签多模态数据，在部署阶段增强单模态HAR。通过对监督多模态融合对单模态特征提取的影响的研究，MESEN在多模态辅助预训练阶段设计了一个多任务机制。通过提出的机制整合跨模态特征对比学习和多模态伪分类对齐，MESEN利用无标签多模态数据提取每种模态的有效单模态特征。随后，MESEN可以仅使用少量标记样本适应下游单模态HAR。对八个公共多模态数据集进行的大量实验表明，MESEN通过利用多模态数据在增强单模态HAR方面取得了显著的性能改进，超过了现有技术基线。

更新时间: 2024-04-02 13:54:05

领域: cs.LG

下载: http://arxiv.org/abs/2404.01958v1

FedSN: A Novel Federated Learning Framework over LEO Satellite Networks

Recently, a large number of Low Earth Orbit (LEO) satellites have been launched and deployed successfully in space by commercial companies, such as SpaceX. Due to multimodal sensors equipped by the LEO satellites, they serve not only for communication but also for various machine learning applications, such as space modulation recognition, remote sensing image classification, etc. However, the ground station (GS) may be incapable of downloading such a large volume of raw sensing data for centralized model training due to the limited contact time with LEO satellites (e.g. 5 minutes). Therefore, federated learning (FL) has emerged as the promising solution to address this problem via on-device training. Unfortunately, to enable FL on LEO satellites, we still face three critical challenges that are i) heterogeneous computing and memory capabilities, ii) limited uplink rate, and iii) model staleness. To this end, we propose FedSN as a general FL framework to tackle the above challenges, and fully explore data diversity on LEO satellites. Specifically, we first present a novel sub-structure scheme to enable heterogeneous local model training considering different computing, memory, and communication constraints on LEO satellites. Additionally, we propose a pseudo-synchronous model aggregation strategy to dynamically schedule model aggregation for compensating model staleness. To further demonstrate the effectiveness of the FedSN, we evaluate it using space modulation recognition and remote sensing image classification tasks by leveraging the data from real-world satellite networks. Extensive experimental results demonstrate that FedSN framework achieves higher accuracy, lower computing, and communication overhead than the state-of-the-art benchmarks and the effectiveness of each components in FedSN.

Updated: 2024-04-02 13:53:20

标题: FedSN：一种新颖的基于低地球轨道卫星网络的联邦学习框架

摘要: 最近，许多低地球轨道（LEO）卫星已被商业公司成功发射并在太空中部署，如SpaceX。由于LEO卫星配备了多模式传感器，它们不仅用于通信，还用于各种机器学习应用，如空间调制识别，遥感图像分类等。然而，由于地面站（GS）可能无法由于与LEO卫星的有限接触时间（例如5分钟）而下载如此大量的原始传感数据用于集中模型训练。因此，联邦学习（FL）已经成为通过设备上的训练来解决这个问题的有希望的解决方案。不幸的是，要在LEO卫星上启用FL，我们仍然面临三个关键挑战，即i）异构计算和内存能力，ii）有限的上行速率，和iii）模型陈旧。为此，我们提出FedSN作为一个通用的FL框架来解决上述挑战，并充分利用LEO卫星上的数据多样性。具体地，我们首先提出了一种新颖的子结构方案，以实现考虑LEO卫星上不同计算、内存和通信约束的异构本地模型训练。此外，我们提出了一种伪同步模型聚合策略，动态调度模型聚合以弥补模型陈旧。为了进一步证明FedSN的有效性，我们利用来自真实卫星网络的数据，评估它在空间调制识别和遥感图像分类任务中的表现。广泛的实验结果表明，FedSN框架实现了比最先进的基准更高的准确性，更低的计算和通信开销，以及FedSN中每个组件的有效性。

更新时间: 2024-04-02 13:53:20

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2311.01483v4

GIDN: A Lightweight Graph Inception Diffusion Network for High-efficient Link Prediction

In this paper, we propose a Graph Inception Diffusion Networks(GIDN) model. This model generalizes graph diffusion in different feature spaces, and uses the inception module to avoid the large amount of computations caused by complex network structures. We evaluate GIDN model on Open Graph Benchmark(OGB) datasets, reached an 11% higher performance than AGDN on ogbl-collab dataset.

Updated: 2024-04-02 13:52:53

标题: GIDN：一种轻量级图创始扩散网络，用于高效的链接预测

摘要: 在这篇论文中，我们提出了一个图形启发扩散网络（GIDN）模型。该模型将图形扩散泛化到不同的特征空间，并使用启发模块来避免复杂网络结构导致的大量计算量。我们在Open Graph Benchmark（OGB）数据集上评估了GIDN模型，在ogbl-collab数据集上比AGDN表现高出11%。

更新时间: 2024-04-02 13:52:53

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2210.01301v3

Bayesian Floor Field: Transferring people flow predictions across environments

Mapping people dynamics is a crucial skill for robots, because it enables them to coexist in human-inhabited environments. However, learning a model of people dynamics is a time consuming process which requires observation of large amount of people moving in an environment. Moreover, approaches for mapping dynamics are unable to transfer the learned models across environments: each model is only able to describe the dynamics of the environment it has been built in. However, the impact of architectural geometry on people's movement can be used to anticipate their patterns of dynamics, and recent work has looked into learning maps of dynamics from occupancy. So far however, approaches based on trajectories and those based on geometry have not been combined. In this work we propose a novel Bayesian approach to learn people dynamics able to combine knowledge about the environment geometry with observations from human trajectories. An occupancy-based deep prior is used to build an initial transition model without requiring any observations of pedestrian; the model is then updated when observations become available using Bayesian inference. We demonstrate the ability of our model to increase data efficiency and to generalize across real large-scale environments, which is unprecedented for maps of dynamics.

Updated: 2024-04-02 13:49:07

标题: 贝叶斯地板场：在不同环境中转移人流预测

摘要: 映射人类动态是机器人的关键技能，因为它使它们能够在人类居住的环境中共存。然而，学习人类动态模型是一个耗时的过程，需要观察大量在环境中移动的人群。此外，映射动态的方法无法在不同环境之间传递学习的模型：每个模型只能描述其构建环境的动态。然而，建筑几何对人们移动的影响可以用来预测他们的动态模式，最近的工作已经开始从占用中学习动态地图。然而，到目前为止，基于轨迹和基于几何的方法尚未结合。在这项工作中，我们提出了一种新颖的贝叶斯方法来学习人类动态，能够将关于环境几何的知识与人类轨迹的观察相结合。使用基于占用的深层先验构建初始转换模型，无需观察行人；当观察到时，使用贝叶斯推理更新模型。我们展示了我们的模型增加数据效率并在真实大规模环境中泛化的能力，这在动态地图中是前所未有的。

更新时间: 2024-04-02 13:49:07

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2208.10851v2

HyperCLOVA X Technical Report

We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.

Updated: 2024-04-02 13:48:49

标题: 超克洛瓦X技术报告

摘要: 我们介绍了HyperCLOVA X，这是一系列专为韩国语言和文化量身定制的大型语言模型（LLMs），具有竞争力的英语、数学和编码能力。HyperCLOVA X在平衡的韩语、英语和代码数据上进行训练，并在遵守严格的安全准则的情况下，通过高质量的人工标注数据进行微调，体现了我们对负责任人工智能的承诺。该模型在韩语和英语的各种基准测试中得到评估，包括全面推理、知识、常识、事实性、编码、数学、聊天、遵循指令和无害性。HyperCLOVA X在韩语中展现出强大的推理能力，背后是对语言和文化细微差别的深刻理解。对其固有的双语特性及其扩展到多语言能力的进一步分析突显了该模型的跨语言熟练度和对非目标语言的强大泛化能力，包括多种语言对之间的机器翻译和跨语言推理任务。我们相信HyperCLOVA X可以为地区或国家在开发其主权LLMs方面提供有益的指导。

更新时间: 2024-04-02 13:48:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01954v1

Synthetic Data for Robust Stroke Segmentation

Deep learning-based semantic segmentation in neuroimaging currently requires high-resolution scans and extensive annotated datasets, posing significant barriers to clinical applicability. We present a novel synthetic framework for the task of lesion segmentation, extending the capabilities of the established SynthSeg approach to accommodate large heterogeneous pathologies with lesion-specific augmentation strategies. Our method trains deep learning models, demonstrated here with the UNet architecture, using label maps derived from healthy and stroke datasets, facilitating the segmentation of both healthy tissue and pathological lesions without sequence-specific training data. Evaluated against in-domain and out-of-domain (OOD) datasets, our framework demonstrates robust performance, rivaling current methods within the training domain and significantly outperforming them on OOD data. This contribution holds promise for advancing medical imaging analysis in clinical settings, especially for stroke pathology, by enabling reliable segmentation across varied imaging sequences with reduced dependency on large annotated corpora. Code and weights available at https://github.com/liamchalcroft/SynthStroke.

Updated: 2024-04-02 13:42:29

标题: 合成数据用于鲁棒性脑卒中分割

摘要: 基于深度学习的神经影像语义分割目前需要高分辨率扫描和大量标注数据集，这给临床应用带来了重大障碍。我们提出了一种新颖的合成框架，用于病变分割任务，扩展了已建立的SynthSeg方法的能力，以适应具有病变特定增强策略的大型异质病变。我们的方法训练深度学习模型，这里展示了使用来自健康和中风数据集的标签地图的UNet架构，促进了对健康组织和病变的分割，而无需特定于序列的训练数据。通过对领域内和领域外（OOD）数据集进行评估，我们的框架表现出稳健的性能，在训练领域内与当前方法相媲美，并在OOD数据上明显优于它们。这一贡献有望推动临床环境中医学影像分析的发展，特别是对于中风病理学，通过实现在不依赖大型标注语料库的情况下，跨不同影像序列的可靠分割。代码和权重可在https://github.com/liamchalcroft/SynthStroke 上找到。

更新时间: 2024-04-02 13:42:29

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01946v1

Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization

Recent advancements in automatic code generation using large language model (LLM) agent have brought us closer to the future of automated software development. However, existing single-agent approaches face limitations in generating and improving large-scale, complex codebases due to constraints in context length. To tackle this challenge, we propose Self-Organized multi-Agent framework (SoA), a novel multi-agent framework that enables the scalable and efficient generation and optimization of large-scale code. In SoA, self-organized agents operate independently to generate and modify code components while seamlessly collaborating to construct the overall codebase. A key feature of our framework is the automatic multiplication of agents based on problem complexity, allowing for dynamic scalability. This enables the overall code volume to be increased indefinitely according to the number of agents, while the amount of code managed by each agent remains constant. We evaluate SoA on the HumanEval benchmark and demonstrate that, compared to a single-agent system, each agent in SoA handles significantly less code, yet the overall generated code is substantially greater. Moreover, SoA surpasses the powerful single-agent baseline by 5% in terms of Pass@1 accuracy.

Updated: 2024-04-02 13:37:28

标题: 自组织代理：面向超大规模代码生成和优化的LLM多代理框架

摘要: 最近利用大型语言模型（LLM）代理进行自动生成代码的技术进展，使我们更接近自动化软件开发的未来。然而，现有的单一代理方法在生成和改进大规模、复杂的代码库方面存在限制，原因在于上下文长度的限制。为了解决这一挑战，我们提出了自组织多代理框架（SoA），这是一个新颖的多代理框架，可以实现大规模代码的可扩展和高效生成和优化。在SoA中，自组织代理独立运行，生成和修改代码组件，同时无缝协作构建整体代码库。我们框架的一个关键特点是基于问题复杂性自动增加代理的数量，实现动态可扩展性。这使得整体代码量可以根据代理数量无限增加，同时每个代理管理的代码量保持恒定。我们在HumanEval基准上评估了SoA，并展示与单一代理系统相比，SoA中的每个代理处理的代码量显著较少，但生成的整体代码量大大增加。此外，SoA在Pass@1准确度方面超过了强大的单一代理基线5%。

更新时间: 2024-04-02 13:37:28

领域: cs.SE,cs.AI,cs.CL,cs.LG,cs.MA

下载: http://arxiv.org/abs/2404.02183v1

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with a frozen LLM, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from captions in the MusicCaps datasets, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones.

Updated: 2024-04-02 13:35:59

标题: MusiLingo：利用预训练语言模型在音乐标题和查询响应中连接音乐和文本

摘要: 大型语言模型（LLMs）在多模态应用中显示出巨大潜力，但文本和音乐领域的融合仍未得到充分探索。为了解决这一差距，我们提出了MusiLingo，这是一个用于音乐字幕生成和音乐相关查询响应的新系统。MusiLingo使用一个单一的投射层来将预训练的冻结音乐音频模型MERT的音乐表示与冻结的LLM对齐，弥合音乐音频和文本环境之间的差距。我们将其训练在一个广泛的音乐字幕数据集上，并用指导性数据进行微调。由于高质量音乐问答数据集的稀缺性，我们从MusicCaps数据集中的字幕中创建了MusicInstruct（MI）数据集，专门用于开放式音乐查询。实证评估表明，它在生成音乐字幕和编辑音乐相关问答对方面表现出竞争性性能。我们引入的数据集使得能够在此基础上取得显着进展。

更新时间: 2024-04-02 13:35:59

领域: eess.AS,cs.AI,cs.CL,cs.MM,cs.SD

下载: http://arxiv.org/abs/2309.08730v3

FLIGAN: Enhancing Federated Learning with Incomplete Data using GAN

Federated Learning (FL) provides a privacy-preserving mechanism for distributed training of machine learning models on networked devices (e.g., mobile devices, IoT edge nodes). It enables Artificial Intelligence (AI) at the edge by creating models without sharing actual data across the network. Existing research typically focuses on generic aspects of non-IID data and heterogeneity in client's system characteristics, but they often neglect the issue of insufficient data for model development, which can arise from uneven class label distribution and highly variable data volumes across edge nodes. In this work, we propose FLIGAN, a novel approach to address the issue of data incompleteness in FL. First, we leverage Generative Adversarial Networks (GANs) to adeptly capture complex data distributions and generate synthetic data that closely resemble real-world data. Then, we use synthetic data to enhance the robustness and completeness of datasets across nodes. Our methodology adheres to FL's privacy requirements by generating synthetic data in a federated manner without sharing the actual data in the process. We incorporate techniques such as classwise sampling and node grouping, designed to improve the federated GAN's performance, enabling the creation of high-quality synthetic datasets and facilitating efficient FL training. Empirical results from our experiments demonstrate that FLIGAN significantly improves model accuracy, especially in scenarios with high class imbalances, achieving up to a 20% increase in model accuracy over traditional FL baselines.

Updated: 2024-04-02 13:33:06

标题: FLIGAN：利用GAN增强不完整数据的联邦学习

摘要: 联邦学习（FL）为分布式训练网络设备上的机器学习模型（例如移动设备、物联网边缘节点）提供了一种保护隐私的机制。它通过在网络上不共享实际数据来实现边缘人工智能（AI）。现有研究通常关注非IID数据的通用方面和客户系统特征的异质性，但他们经常忽视模型开发中数据不足的问题，这可能源于边缘节点之间类标签分布不均匀和数据量差异很大。在这项工作中，我们提出了FLIGAN，一个解决FL中数据不完整问题的新方法。首先，我们利用生成对抗网络（GANs）灵活捕获复杂数据分布并生成类似真实数据的合成数据。然后，我们使用合成数据增强节点间数据集的稳健性和完整性。我们的方法遵循FL的隐私要求，通过以联邦方式生成合成数据，而在此过程中不共享实际数据。我们结合诸如按类别采样和节点分组等技术，旨在改善联邦GAN的性能，实现高质量合成数据集的创建，并促进有效的FL训练。我们实验证明，FLIGAN显著提高了模型准确性，特别是在高类别不平衡的情况下，与传统FL基准相比，模型准确性可提高高达20%。

更新时间: 2024-04-02 13:33:06

领域: cs.LG

下载: http://arxiv.org/abs/2403.16930v2

Diverse Representation Embedding for Lifelong Person Re-Identification

Lifelong Person Re-Identification (LReID) aims to continuously learn from successive data streams, matching individuals across multiple cameras. The key challenge for LReID is how to effectively preserve old knowledge while incrementally learning new information, which is caused by task-level domain gaps and limited old task datasets. Existing methods based on CNN backbone are insufficient to explore the representation of each instance from different perspectives, limiting model performance on limited old task datasets and new task datasets. Unlike these methods, we propose a Diverse Representations Embedding (DRE) framework that first explores a pure transformer for LReID. The proposed DRE preserves old knowledge while adapting to new information based on instance-level and task-level layout. Concretely, an Adaptive Constraint Module (ACM) is proposed to implement integration and push away operations between multiple overlapping representations generated by transformer-based backbone, obtaining rich and discriminative representations for each instance to improve adaptive ability of LReID. Based on the processed diverse representations, we propose Knowledge Update (KU) and Knowledge Preservation (KP) strategies at the task-level layout by introducing the adjustment model and the learner model. KU strategy enhances the adaptive learning ability of learner models for new information under the adjustment model prior, and KP strategy preserves old knowledge operated by representation-level alignment and logit-level supervision in limited old task datasets while guaranteeing the adaptive learning information capacity of the LReID model. Compared to state-of-the-art methods, our method achieves significantly improved performance in holistic, large-scale, and occluded datasets.

Updated: 2024-04-02 13:31:41

标题: 多样化表征嵌入用于终身人员再识别

摘要: 终身人员再识别（LReID）旨在不断从连续的数据流中学习，跨多个摄像头匹配个体。LReID面临的关键挑战是如何在增量学习新信息的同时有效保存旧知识，这是由任务级域间差距和有限的旧任务数据集引起的。基于CNN骨干的现有方法不足以探索不同视角下每个实例的表示，从而限制了模型在有限的旧任务数据集和新任务数据集上的性能。与这些方法不同，我们提出了一种多元表示嵌入（DRE）框架，首先探索了纯Transformer用于LReID。所提出的DRE基于实例级和任务级布局，保留旧知识同时适应新信息。具体来说，提出了自适应约束模块（ACM），在基于Transformer骨干生成的多个重叠表示之间实现整合和推开操作，为每个实例获取丰富和有区别性的表示，以提高LReID的适应能力。基于处理的多样表示，我们在任务级布局上提出了知识更新（KU）和知识保留（KP）策略，引入了调整模型和学习模型。KU策略在调整模型之前增强了学习模型对新信息的适应学习能力，而KP策略通过在有限的旧任务数据集中进行表示级对齐和对数级监督，保留了旧知识，同时确保了LReID模型的适应学习信息容量。与最先进的方法相比，我们的方法在整体、大规模和遮挡数据集上实现了显著的性能提升。

更新时间: 2024-04-02 13:31:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.16003v2

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly compress the data and perform the clustering on the compressed representation. Unfortunately, there is no universal best choice for compressing the number of points - while random sampling runs in sublinear time and coresets provide theoretical guarantees, the former does not enforce accuracy while the latter is too slow as the numbers of points and clusters grow. Indeed, it has been conjectured that any sensitivity-based coreset construction requires super-linear time in the dataset size. We examine this relationship by first showing that there does exist an algorithm that obtains coresets via sensitivity sampling in effectively linear time - within log-factors of the time it takes to read the data. Any approach that significantly improves on this must then resort to practical heuristics, leading us to consider the spectrum of sampling strategies across both real and artificial datasets in the static and streaming settings. Through this, we show the conditions in which coresets are necessary for preserving cluster validity as well as the settings in which faster, cruder sampling strategies are sufficient. As a result, we provide a comprehensive theoretical and practical blueprint for effective clustering regardless of data size. Our code is publicly available and has scripts to recreate the experiments.

Updated: 2024-04-02 13:31:19

标题: 聚类大数据中的稳定时间与准确性权衡

摘要: 我们研究了k-means和k-median在大型数据集上的理论和实际运行时限制。由于实际上所有的聚类方法都比读取数据集所需的时间慢，最快的方法是快速压缩数据，并在压缩表示上执行聚类。不幸的是，没有通用的最佳选择来压缩数据点数量 - 尽管随机抽样在次线性时间内运行，并且核心集提供了理论保证，但前者不强制准确性，而后者在数据点和集群数量增加时太慢。事实上，有人猜测任何基于灵敏度的核心集构造都需要在数据集大小上花费超线性时间。我们通过首先展示存在一种算法，可以通过灵敏度抽样在实际上线性时间内获得核心集 - 在读取数据所需的时间内的对数因子内。任何显着优于此的方法都必须借助实际启发式方法，这导致我们考虑在静态和流式设置中跨真实和人工数据集的采样策略谱。通过这一点，我们展示了核心集在保留集群有效性方面是必要的条件，以及在哪些设置中更快、更粗糙的采样策略是足够的。因此，我们提供了一个全面的理论和实用蓝图，用于有效的聚类，无论数据规模如何。我们的代码是公开可用的，并具有重新创建实验的脚本。

更新时间: 2024-04-02 13:31:19

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2404.01936v1

Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks

In this work, we focus on unsupervised vision-language-action mapping in the area of robotic manipulation. Recently, multiple approaches employing pre-trained large language and vision models have been proposed for this task. However, they are computationally demanding and require careful fine-tuning of the produced outputs. A more lightweight alternative would be the implementation of multimodal Variational Autoencoders (VAEs) which can extract the latent features of the data and integrate them into a joint representation, as has been demonstrated mostly on image-image or image-text data for the state-of-the-art models. Here we explore whether and how can multimodal VAEs be employed in unsupervised robotic manipulation tasks in a simulated environment. Based on the obtained results, we propose a model-invariant training alternative that improves the models' performance in a simulator by up to 55%. Moreover, we systematically evaluate the challenges raised by the individual tasks such as object or robot position variability, number of distractors or the task length. Our work thus also sheds light on the potential benefits and limitations of using the current multimodal VAEs for unsupervised learning of robotic motion trajectories based on vision and language.

Updated: 2024-04-02 13:25:16

标题: 跨越语言、视觉和行动：多模态VAEs在机器人操纵任务中的应用

摘要: 在这项工作中，我们专注于机器人操作领域的无监督视觉-语言-动作映射。最近，已经提出了多种利用预训练大型语言和视觉模型的方法来完成该任务。然而，它们在计算上要求较高，并且需要仔细微调生成的输出。更轻量级的替代方案是实现多模态变分自动编码器（VAEs），它可以提取数据的潜在特征并将它们整合到一个联合表示中，这在最先进的模型中主要是在图像-图像或图像-文本数据上进行展示。在这里，我们探讨了多模态VAEs如何在模拟环境中用于无监督机器人操作任务。根据获得的结果，我们提出了一种模型不变的训练替代方案，可以使模型在模拟器中的性能提高高达55%。此外，我们系统地评估了个别任务提出的挑战，如物体或机器人位置的变化、分心物的数量或任务长度。因此，我们的工作也揭示了利用当前多模态VAEs进行基于视觉和语言的无监督学习机器人运动轨迹的潜在优势和局限性。

更新时间: 2024-04-02 13:25:16

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.01932v1

SVGDreamer: Text Guided SVG Generation with Diffusion Model

Recently, text-guided scalable vector graphics (SVGs) synthesis has shown promise in domains such as iconography and sketch. However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity. To address these limitations, we propose a novel text-guided vector graphics synthesis method called SVGDreamer. SVGDreamer incorporates a semantic-driven image vectorization (SIVE) process that enables the decomposition of synthesis into foreground objects and background, thereby enhancing editability. Specifically, the SIVE process introduces attention-based primitive control and an attention-mask loss function for effective control and manipulation of individual elements. Additionally, we propose a Vectorized Particle-based Score Distillation (VPSD) approach to address issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence of the existing text-to-SVG generation methods by modeling SVGs as distributions of control points and colors. Furthermore, VPSD leverages a reward model to re-weight vector particles, which improves aesthetic appeal and accelerates convergence. Extensive experiments are conducted to validate the effectiveness of SVGDreamer, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. Project page: \href{https://ximinng.github.io/SVGDreamer-project/}{https://ximinng.github.io/SVGDreamer-project/}

Updated: 2024-04-02 13:25:04

标题: SVGDreamer: 使用扩散模型进行文本引导的SVG生成

摘要: 最近，文本引导的可伸缩矢量图形（SVG）合成在图标设计和草图等领域展示了潜力。然而，现有的文本到SVG生成方法缺乏可编辑性，且在视觉质量和结果多样性方面存在困难。为了解决这些限制，我们提出了一种新颖的文本引导矢量图形合成方法，称为SVGDreamer。SVGDreamer包含一种语义驱动的图像矢量化（SIVE）过程，该过程使合成分解为前景对象和背景，从而增强了可编辑性。具体而言，SIVE过程引入了基于注意力的基元控制和注意力掩膜损失函数，以有效控制和操作各个元素。此外，我们提出了一种基于矢量粒子评分提炼（VPSD）方法，通过将SVG建模为控制点和颜色的分布，以解决现有文本到SVG生成方法中形状过度平滑、颜色过度饱和、多样性有限以及收敛速度缓慢的问题。此外，VPSD利用奖励模型重新加权矢量粒子，提高美学吸引力并加快收敛速度。进行了大量实验以验证SVGDreamer的有效性，证明了其在可编辑性、视觉质量和多样性方面优于基准方法。项目页面：https://ximinng.github.io/SVGDreamer-project/

更新时间: 2024-04-02 13:25:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.16476v5

Adaptive Combinatorial Maximization: Beyond Approximate Greedy Policies

We study adaptive combinatorial maximization, which is a core challenge in machine learning, with applications in active learning as well as many other domains. We study the Bayesian setting, and consider the objectives of maximization under a cardinality constraint and minimum cost coverage. We provide new comprehensive approximation guarantees that subsume previous results, as well as considerably strengthen them. Our approximation guarantees simultaneously support the maximal gain ratio as well as near-submodular utility functions, and include both maximization under a cardinality constraint and a minimum cost coverage guarantee. In addition, we provided an approximation guarantee for a modified prior, which is crucial for obtaining active learning guarantees that do not depend on the smallest probability in the prior. Moreover, we discover a new parameter of adaptive selection policies, which we term the "maximal gain ratio". We show that this parameter is strictly less restrictive than the greedy approximation parameter that has been used in previous approximation guarantees, and show that it can be used to provide stronger approximation guarantees than previous results. In particular, we show that the maximal gain ratio is never larger than the greedy approximation factor of a policy, and that it can be considerably smaller. This provides a new insight into the properties that make a policy useful for adaptive combinatorial maximization.

Updated: 2024-04-02 13:23:54

标题: 自适应组合最大化：超越近似贪婪策略

摘要: 我们研究自适应组合最大化，在机器学习中是一个核心挑战，在主动学习以及许多其他领域都有应用。我们研究贝叶斯设置，并考虑在基数约束和最小成本覆盖下的最大化目标。我们提供了新的全面近似保证，涵盖了先前的结果，并显著加强了它们。我们的近似保证同时支持最大增益比率和接近次模性效用函数，并包括在基数约束和最小成本覆盖保证下的最大化。此外，我们为修改后的先验提供了一个近似保证，这对于获得不依赖于先验中最小概率的主动学习保证至关重要。此外，我们发现了一种自适应选择策略的新参数，我们称之为“最大增益比率”。我们表明，这个参数严格比先前近似保证中使用的贪婪近似参数更为宽松，并且可以用来提供比先前结果更强的近似保证。特别地，我们展示最大增益比率永远不会大于策略的贪婪近似因子，而且它可以明显更小。这为使策略对自适应组合最大化有用的特性提供了新的见解。

更新时间: 2024-04-02 13:23:54

领域: cs.LG,cs.DM,stat.ML

下载: http://arxiv.org/abs/2404.01930v1

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment. In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns. The second stage involves mapping RGB input images into the BEV latent space of the first stage, directly optimizing the correlations between the two views at the feature level. Our approach simplifies the complexity of combining perception and generation into distinct steps, equipping the model to handle intricate and challenging scenes effectively. Besides, we propose to transform the BEV segmentation map from the Cartesian to the polar coordinate system to establish the column-wise correspondence between RGB images and BEV maps. Moreover, our method requires neither multi-scale features nor camera intrinsic parameters for depth estimation and saves computational overhead. Extensive experiments on nuScenes and Argoverse show the effectiveness and efficiency of our method. Code is available at https://github.com/happytianhao/TaDe.

Updated: 2024-04-02 13:19:45

标题: 通过任务分解改进鸟瞰视角语义分割

摘要: 鸟瞰视图（BEV）中的语义分割在自动驾驶中起着至关重要的作用。先前的方法通常遵循端到端的流程，直接从单眼RGB输入预测BEV分割图。然而，当来自不同视角的RGB输入和BEV目标时，挑战就出现了，使得直接点对点预测难以优化。在本文中，我们将原始BEV分割任务分解为两个阶段，即BEV地图重建和RGB-BEV特征对齐。在第一阶段，我们训练一个BEV自编码器，以恢复给定受损嘈杂潜在表示的BEV分割图，促使解码器学习典型BEV模式的基础知识。第二阶段涉及将RGB输入图像映射到第一阶段的BEV潜在空间，直接优化两个视图之间的特征级相关性。我们的方法简化了将感知和生成结合到不同步骤中的复杂性，使模型能够有效处理复杂和具有挑战性的场景。此外，我们建议将BEV分割图从笛卡尔坐标系转换为极坐标系，以建立RGB图像和BEV地图之间的列对应关系。此外，我们的方法既不需要多尺度特征，也不需要相机内参数进行深度估计，从而节省计算开销。对nuScenes和Argoverse进行的大量实验显示了我们方法的有效性和效率。代码可在https://github.com/happytianhao/TaDe获得。

更新时间: 2024-04-02 13:19:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01925v1

SGSH: Stimulate Large Language Models with Skeleton Heuristics for Knowledge Base Question Generation

Knowledge base question generation (KBQG) aims to generate natural language questions from a set of triplet facts extracted from KB. Existing methods have significantly boosted the performance of KBQG via pre-trained language models (PLMs) thanks to the richly endowed semantic knowledge. With the advance of pre-training techniques, large language models (LLMs) (e.g., GPT-3.5) undoubtedly possess much more semantic knowledge. Therefore, how to effectively organize and exploit the abundant knowledge for KBQG becomes the focus of our study. In this work, we propose SGSH--a simple and effective framework to Stimulate GPT-3.5 with Skeleton Heuristics to enhance KBQG. The framework incorporates "skeleton heuristics", which provides more fine-grained guidance associated with each input to stimulate LLMs to generate optimal questions, encompassing essential elements like the question phrase and the auxiliary verb.More specifically, we devise an automatic data construction strategy leveraging ChatGPT to construct a skeleton training dataset, based on which we employ a soft prompting approach to train a BART model dedicated to generating the skeleton associated with each input. Subsequently, skeleton heuristics are encoded into the prompt to incentivize GPT-3.5 to generate desired questions. Extensive experiments demonstrate that SGSH derives the new state-of-the-art performance on the KBQG tasks.

Updated: 2024-04-02 13:17:36

标题: SGSH：利用骨架启发法激发大型语言模型进行知识库问题生成

摘要: 知识库问题生成（KBQG）旨在从从知识库中提取的三元组事实集生成自然语言问题。现有方法通过预训练语言模型（PLMs）显著提升了KBQG的性能，这要归功于丰富的语义知识。随着预训练技术的进步，大型语言模型（LLMs）（例如GPT-3.5）无疑具有更丰富的语义知识。因此，如何有效组织和利用丰富的知识来进行KBQG成为我们研究的焦点。在这项工作中，我们提出了SGSH——一个简单且有效的框架，通过骨架启发法激发GPT-3.5以增强KBQG。该框架融合了“骨架启发法”，为每个输入提供更加细致的指导，以激发LLMs生成最佳问题，包括问题短语和助动词等关键要素。具体而言，我们设计了一种自动数据构建策略，利用ChatGPT构建一个骨架训练数据集，基于此我们采用软提示方法训练一个专门用于生成与每个输入关联的骨架的BART模型。随后，将骨架启发法编码到提示中，以激励GPT-3.5生成所需的问题。广泛的实验证明，SGSH在KBQG任务上获得了新的最先进性能。

更新时间: 2024-04-02 13:17:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01923v1

FusionINN: Invertible Image Fusion for Brain Tumor Monitoring

Image fusion typically employs non-invertible neural networks to merge multiple source images into a single fused image. However, for clinical experts, solely relying on fused images may be insufficient for making diagnostic decisions, as the fusion mechanism blends features from source images, thereby making it difficult to interpret the underlying tumor pathology. We introduce FusionINN, a novel invertible image fusion framework, capable of efficiently generating fused images and also decomposing them back to the source images by solving the inverse of the fusion process. FusionINN guarantees lossless one-to-one pixel mapping by integrating a normally distributed latent image alongside the fused image to facilitate the generative modeling of the decomposition process. To the best of our knowledge, we are the first to investigate the decomposability of fused images, which is particularly crucial for life-sensitive applications such as medical image fusion compared to other tasks like multi-focus or multi-exposure image fusion. Our extensive experimentation validates FusionINN over existing discriminative and generative fusion methods, both subjectively and objectively. Moreover, compared to a recent denoising diffusion-based fusion model, our approach offers faster and qualitatively better fusion results. We also exhibit the clinical utility of our results in aiding disease prognosis.

Updated: 2024-04-02 13:16:46

标题: FusionINN：可逆图像融合用于脑肿瘤监测

摘要: 图像融合通常使用不可逆神经网络将多个源图像融合成一个单一的融合图像。然而，对于临床专家来说，仅依靠融合图像可能不足以做出诊断决策，因为融合机制混合了来自源图像的特征，从而使解释潜在肿瘤病理变得困难。我们引入了FusionINN，这是一种新颖的可逆图像融合框架，能够有效地生成融合图像，并通过解决融合过程的逆过程将其分解回源图像。FusionINN通过整合一个正态分布的潜在图像与融合图像一起，实现了无损的像素映射，以促进分解过程的生成建模。据我们所知，我们是第一个研究融合图像可分解性的人，这对于像医学图像融合这样的对生命敏感的应用特别关键，与其他任务如多焦点或多曝光图像融合相比。我们的广泛实验验证了FusionINN相对于现有的辨别和生成融合方法具有主观和客观上的优势。此外，与最近基于去噪扩散的融合模型相比，我们的方法提供了更快速和质量更好的融合结果。我们还展示了我们的结果在辅助疾病预后方面的临床实用性。

更新时间: 2024-04-02 13:16:46

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.15769v2

DRCT: Saving Image Super-resolution away from Information Bottleneck

In recent years, Vision Transformer-based applications to low-level vision tasks have achieved widespread success. Unlike CNN-based models, Transformers are more adept at capturing long-range dependencies, enabling the reconstruction of images utilizing information from non-local areas. In the domain of super-resolution, Swin-transformer-based approaches have become mainstream due to their capacity to capture global spatial information and their shifting-window attention mechanism that facilitates the interchange of information between different windows. Many researchers have enhanced image quality and network efficiency by expanding the receptive field or designing complex networks, yielding commendable results. However, we observed that spatial information tends to diminish during the forward propagation process due to increased depth, leading to a loss of spatial information and, consequently, limiting the model's potential. To address this, we propose the Dense-residual-connected Transformer (DRCT), aimed at mitigating the loss of spatial information through dense-residual connections between layers, thereby unleashing the model's potential and enhancing performance. Experiment results indicate that our approach is not only straightforward but also achieves remarkable efficiency, surpassing state-of-the-art methods and performing commendably at NTIRE2024.

Updated: 2024-04-02 13:15:36

标题: DRCT：将图像超分辨率保存远离信息瓶颈

摘要: 近年来，基于Vision Transformer的低级视觉任务应用取得了广泛成功。与基于CNN的模型不同，Transformer更擅长捕捉长距离依赖关系，从而能够利用来自非局部区域的信息重建图像。在超分辨领域，基于Swin-transformer的方法已成为主流，因为它们能够捕获全局空间信息，并具有能够促进不同窗口之间信息交换的移动窗口注意机制。许多研究人员通过扩展感受野或设计复杂网络来提高图像质量和网络效率，取得了值得称赞的结果。然而，我们观察到在前向传播过程中，由于深度增加，空间信息往往会减少，导致空间信息丢失，从而限制了模型的潜力。为了解决这个问题，我们提出了Dense-residual-connected Transformer（DRCT），旨在通过层之间的稠密残差连接来减轻空间信息的丢失，从而释放模型的潜力并提升性能。实验结果表明，我们的方法不仅简单直接，而且实现了卓越的效率，超越了最先进的方法，在NTIRE2024中表现出色。

更新时间: 2024-04-02 13:15:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.00722v2

SCANNER: Knowledge-Enhanced Approach for Robust Multi-modal Named Entity Recognition of Unseen Entities

Recent advances in named entity recognition (NER) have pushed the boundary of the task to incorporate visual signals, leading to many variants, including multi-modal NER (MNER) or grounded MNER (GMNER). A key challenge to these tasks is that the model should be able to generalize to the entities unseen during the training, and should be able to handle the training samples with noisy annotations. To address this obstacle, we propose SCANNER (Span CANdidate detection and recognition for NER), a model capable of effectively handling all three NER variants. SCANNER is a two-stage structure; we extract entity candidates in the first stage and use it as a query to get knowledge, effectively pulling knowledge from various sources. We can boost our performance by utilizing this entity-centric extracted knowledge to address unseen entities. Furthermore, to tackle the challenges arising from noisy annotations in NER datasets, we introduce a novel self-distillation method, enhancing the robustness and accuracy of our model in processing training data with inherent uncertainties. Our approach demonstrates competitive performance on the NER benchmark and surpasses existing methods on both MNER and GMNER benchmarks. Further analysis shows that the proposed distillation and knowledge utilization methods improve the performance of our model on various benchmarks.

Updated: 2024-04-02 13:05:41

标题: SCANNER：用于稳健多模态未知实体命名识别的知识增强方法

摘要: 最近在命名实体识别（NER）领域取得的进展已经将该任务的边界推向了融入视觉信号的方向，导致出现了许多变体，包括多模态NER（MNER）或基于视觉的MNER（GMNER）。这些任务面临的一个关键挑战是，模型应该能够泛化到训练过程中未见过的实体，并且应该能够处理带有噪声标注的训练样本。为了解决这一障碍，我们提出了SCANNER（用于NER的Span CANdidate检测和识别）模型，该模型能够有效处理所有三种NER变体。SCANNER是一个两阶段结构；我们在第一阶段提取实体候选项，并将其用作查询以获取知识，有效地从各种来源获取知识。通过利用这种以实体为中心提取的知识，我们可以提高性能以处理未见过的实体。此外，为了应对NER数据集中由于标注噪声而产生的挑战，我们引入了一种新颖的自我蒸馏方法，增强了我们模型在处理具有固有不确定性的训练数据时的鲁棒性和准确性。我们的方法在NER基准上表现出竞争性能，并在MNER和GMNER基准上超越了现有方法。进一步的分析显示，提出的蒸馏和知识利用方法提高了我们模型在各种基准上的性能。

更新时间: 2024-04-02 13:05:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01914v1

What Blocks My Blockchain's Throughput? Developing a Generalizable Approach for Identifying Bottlenecks in Permissioned Blockchains

Permissioned blockchains have been proposed for a variety of use cases that require decentralization yet address enterprise requirements that permissionless blockchains to date cannot satisfy -- particularly in terms of performance. However, popular permissioned blockchains still exhibit a relatively low maximum throughput in comparison to established centralized systems. Consequently, researchers have conducted several benchmarking studies on different permissioned blockchains to identify their limitations and -- in some cases -- their bottlenecks in an attempt to find avenues for improvement. Yet, these approaches are highly heterogeneous, difficult to compare, and require a high level of expertise in the implementation of the underlying specific blockchain. In this paper, we develop a more unified and graphical approach for identifying bottlenecks in permissioned blockchains based on a systematic review of related work, experiments with the Distributed Ledger Performance Scan (DLPS), and an extension of its graphical evaluation functionalities. We conduct in-depth case studies on Hyperledger Fabric and Quorum, two widely used permissioned blockchains with distinct architectural designs, demonstrating the adaptability of our framework across different blockchains. We provide researchers and practitioners working on evaluating or improving permissioned blockchains with a toolkit, guidelines on what data to document, and insights on how to proceed in the search process for bottlenecks.

Updated: 2024-04-02 13:00:50

标题: 什么阻碍了我的区块链的吞吐量？开发一种可推广的方法来识别许可区块链中的瓶颈

摘要: Permissioned blockchains have been proposed for a variety of use cases that require decentralization yet address enterprise requirements that permissionless blockchains to date cannot satisfy -- particularly in terms of performance. However, popular permissioned blockchains still exhibit a relatively low maximum throughput in comparison to established centralized systems. Consequently, researchers have conducted several benchmarking studies on different permissioned blockchains to identify their limitations and -- in some cases -- their bottlenecks in an attempt to find avenues for improvement. Yet, these approaches are highly heterogeneous, difficult to compare, and require a high level of expertise in the implementation of the underlying specific blockchain. In this paper, we develop a more unified and graphical approach for identifying bottlenecks in permissioned blockchains based on a systematic review of related work, experiments with the Distributed Ledger Performance Scan (DLPS), and an extension of its graphical evaluation functionalities. We conduct in-depth case studies on Hyperledger Fabric and Quorum, two widely used permissioned blockchains with distinct architectural designs, demonstrating the adaptability of our framework across different blockchains. We provide researchers and practitioners working on evaluating or improving permissioned blockchains with a toolkit, guidelines on what data to document, and insights on how to proceed in the search process for bottlenecks.

更新时间: 2024-04-02 13:00:50

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2404.02930v1

Multicore DRAM Bank-& Row-Conflict Bomb for Timing Attacks in Mixed-Criticality Systems

With the increasing use of multicore platforms to realize mixed-criticality systems, understanding the underlying shared resources, such as the memory hierarchy shared among cores, and achieving isolation between co-executing tasks running on the same platform with different criticality levels becomes relevant. In addition to safety considerations, a malicious entity can exploit shared resources to create timing attacks on critical applications. In this paper, we focus on understanding the shared DRAM dual in-line memory module and created a timing attack, that we named the "bank & row conflict bomb", to target a victim task in a multicore platform. We also created a "navigate" algorithm to understand how victim requests are managed by the Memory Controller and provide valuable inputs for designing the bank & row conflict bomb. We performed experimental tests on a 2nd Gen Intel Xeon Processor with an 8GB DDR4-2666 DRAM module to show that such an attack can produce a significant increase in the execution time of the victim task by about 150%, motivating the need for proper countermeasures to help ensure the safety and security of critical applications.

Updated: 2024-04-02 12:57:20

标题: 多核DRAM银行和行冲突炸弹对混合关键性系统中的时序攻击的影响

摘要: 随着多核平台越来越多地被用于实现混合关键性系统，理解底层共享资源的重要性日益增加，比如在核之间共享的内存层次结构，并且在同一平台上运行具有不同关键性级别的协同执行任务之间实现隔离变得相关。除了安全考虑外，恶意实体可以利用共享资源对关键应用程序发动定时攻击。在本文中，我们专注于理解共享的DRAM双列直插内存模块，并创建了一种我们称之为“bank & row conflict bomb”的定时攻击，以针对多核平台中的受害任务。我们还创建了一个“navigate”算法来了解受害请求是如何由内存控制器管理的，并为设计“bank & row conflict bomb”提供有价值的输入。我们在一台配备8GB DDR4-2666 DRAM模块的第二代Intel Xeon处理器上进行了实验测试，结果表明这种攻击可以使受害任务的执行时间显著增加约150%，这促使我们需要适当的对策来确保关键应用程序的安全性和安全性。

更新时间: 2024-04-02 12:57:20

领域: cs.CR,68M25, 68M15,C.3; B.3.3; D.4.6

下载: http://arxiv.org/abs/2404.01910v1

Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack

With the development of large language models (LLMs), detecting whether text is generated by a machine becomes increasingly challenging in the face of malicious use cases like the spread of false information, protection of intellectual property, and prevention of academic plagiarism. While well-trained text detectors have demonstrated promising performance on unseen test data, recent research suggests that these detectors have vulnerabilities when dealing with adversarial attacks such as paraphrasing. In this paper, we propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection. We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness against such attacks. The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content. Furthermore, we explore the prospect of improving the model's robustness over iterative adversarial learning. Although some improvements in model robustness are observed, practical applications still face significant challenges. These findings shed light on the future development of AI-text detectors, emphasizing the need for more accurate and robust detection methods.

Updated: 2024-04-02 12:49:22

标题: 人性化机器生成内容：通过对抗性攻击规避AI文本检测

摘要: 随着大型语言模型（LLMs）的发展，检测文本是否由机器生成在面对恶意用例，如虚假信息传播、知识产权保护和学术抄袭预防等方面变得越来越具有挑战性。虽然经过良好训练的文本检测器在未见测试数据上表现出有希望的性能，但最近的研究表明，这些检测器在处理对抗性攻击，如改写时存在漏洞。本文提出了一个更广泛类别的对抗性攻击框架，旨在对机器生成内容进行微小扰动以避开检测。我们考虑了两种攻击设置：白盒和黑盒，并在动态场景中利用对抗性学习来评估当前检测模型对此类攻击的鲁棒性潜力增强。实证结果显示，当前的检测模型可能在短短10秒内受到侵害，导致将机器生成文本误分类为人类编写内容。此外，我们探讨了通过迭代对抗性学习提高模型鲁棒性的前景。尽管观察到了模型鲁棒性的一些改进，实际应用仍面临重大挑战。这些发现为未来AI文本检测器的发展提供了启示，强调了更准确和强大的检测方法的需求。

更新时间: 2024-04-02 12:49:22

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.01907v1

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM). However, in tasks involving video instruction-following, providing informative feedback, especially for detecting hallucinations in generated responses, remains a significant challenge. Previous studies have explored using large large multimodal models (LMMs) as reward models to guide preference modeling, but their ability to accurately assess the factuality of generated responses compared to corresponding videos has not been conclusively established. This paper introduces a novel framework that utilizes detailed video captions as a proxy of video content, enabling language models to incorporate this information as supporting evidence for scoring video Question Answering (QA) predictions. Our approach demonstrates robust alignment with OpenAI GPT-4V model's reward mechanism, which directly takes video frames as input. Furthermore, we show that applying this tailored reward through DPO significantly improves the performance of video LMMs on video QA tasks.

Updated: 2024-04-02 12:47:49

标题: 语言模型奖励对视频大型多模型直接偏好优化

摘要: 偏好建模技术，比如直接偏好优化（DPO），已经被证明在增强大型语言模型（LLM）的泛化能力方面非常有效。然而，在涉及视频指令跟随的任务中，尤其是为了检测生成的回复中的幻觉而提供信息反馈，仍然是一个重要挑战。先前的研究已经探讨了使用大型多模态模型（LMMs）作为奖励模型来指导偏好建模，但是相比对应视频的准确评估生成的回复的真实性的能力尚未被明确建立。本文介绍了一个利用详细视频标题作为视频内容代理的新框架，使语言模型能够将这些信息作为支持证据来评分视频问答（QA）预测。我们的方法展示了与OpenAI GPT-4V模型的奖励机制的稳健对齐，该机制直接将视频帧作为输入。此外，我们展示了通过DPO应用这种定制奖励明显提高了视频LMMs在视频QA任务上的性能。

更新时间: 2024-04-02 12:47:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01258v2

Leveraging Machine Learning for Early Autism Detection via INDT-ASD Indian Database

Machine learning (ML) has advanced quickly, particularly throughout the area of health care. The diagnosis of neurodevelopment problems using ML is a very important area of healthcare. Autism spectrum disorder (ASD) is one of the developmental disorders that is growing the fastest globally. The clinical screening tests used to identify autistic symptoms are expensive and time-consuming. But now that ML has been advanced, it's feasible to identify autism early on. Previously, many different techniques have been used in investigations. Still, none of them have produced the anticipated outcomes when it comes to the capacity to predict autistic features utilizing a clinically validated Indian ASD database. Therefore, this study aimed to develop a simple, quick, and inexpensive technique for identifying ASD by using ML. Various machine learning classifiers, including Adaboost (AB), Gradient Boost (GB), Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM), were used to develop the autism prediction model. The proposed method was tested with records from the AIIMS Modified INDT-ASD (AMI) database, which were collected through an application developed by AIIMS in Delhi, India. Feature engineering has been applied to make the proposed solution easier than already available solutions. Using the proposed model, we succeeded in predicting ASD using a minimized set of 20 questions rather than the 28 questions presented in AMI with promising accuracy. In a comparative evaluation, SVM emerged as the superior model among others, with 100 $\pm$ 0.05\% accuracy, higher recall by 5.34\%, and improved accuracy by 2.22\%-6.67\% over RF. We have also introduced a web-based solution supporting both Hindi and English.

Updated: 2024-04-02 12:44:51

标题: 利用机器学习通过INDT-ASD印度数据库进行早期自闭症检测

摘要: 机器学习（ML）在医疗领域得到了快速发展。使用ML诊断神经发育问题是医疗保健中非常重要的领域之一。孤独症谱系障碍（ASD）是全球增长最快的发育障碍之一。用于识别自闭症症状的临床筛查测试昂贵且耗时。但现在随着ML的发展，早期识别自闭症是可行的。以前，许多不同的技术已经被用于研究。但是，当涉及利用经过临床验证的印度ASD数据库预测自闭症特征的能力时，它们都没有产生预期的结果。因此，本研究旨在利用ML开发一种简单、快速、廉价的技术来识别ASD。使用各种机器学习分类器，包括Adaboost（AB）、Gradient Boost（GB）、Decision Tree（DT）、Logistic Regression（LR）、Random Forest（RF）、Gaussian Naive Bayes（GNB）、Linear Discriminant Analysis（LDA）、Quadratic Discriminant Analysis（QDA）、K-Nearest Neighbors（KNN）和Support Vector Machine（SVM），以开发自闭症预测模型。提出的方法在印度德里AIIMS开发的应用收集的AIIMS Modified INDT-ASD（AMI）数据库记录上进行了测试。特征工程已被应用于使提出的解决方案比已有解决方案更容易。使用提出的模型，我们成功地使用20个问题的最小化集合而不是AMI中提出的28个问题来预测ASD，预测准确率有望。在比较评估中，SVM脱颖而出，是其他模型中表现最好的模型，准确率为100 ± 0.05％，召回率比RF高5.34％，准确率提高了2.22％-6.67％。我们还推出了一个支持印地语和英语的基于Web的解决方案。

更新时间: 2024-04-02 12:44:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.02181v1

Activation Steering for Robust Type Prediction in CodeLLMs

Contemporary LLMs pretrained on code are capable of succeeding at a wide variety of programming tasks. However, their performance is very sensitive to syntactic features, such as the names of variables and types, the structure of code, and presence of type hints. We contribute an inference-time technique to make CodeLLMs more robust to syntactic distractors that are semantically irrelevant. Our methodology relies on activation steering, which involves editing internal model activations to steer the model towards the correct prediction. We contribute a novel way to construct steering vectors by taking inspiration from mutation testing, which constructs minimal semantics-breaking code edits. In contrast, we construct steering vectors from semantics-preserving code edits. We apply our approach to the task of type prediction for the gradually typed languages Python and TypeScript. This approach corrects up to 90% of type mispredictions. Finally, we show that steering vectors calculated from Python activations reliably correct type mispredictions in TypeScript, and vice versa. This result suggests that LLMs may be learning to transfer knowledge of types across programming languages.

Updated: 2024-04-02 12:44:44

标题: 代码LLMs中的稳健类型预测的激活引导

摘要: 基于代码预训练的当代LLMs能够成功地完成各种编程任务。然而，它们的性能对语法特征非常敏感，如变量和类型的名称、代码结构以及类型提示的存在。我们提出了一种推理时间技术，使CodeLLMs更加稳健，可以抵御语法干扰因素，这些因素在语义上是不相关的。我们的方法依赖于激活引导，涉及编辑内部模型激活以引导模型朝向正确的预测。我们提出了一种构建引导向量的新方法，灵感来自突变测试，该方法构建最小的语义破坏代码编辑。相比之下，我们从保留语义的代码编辑中构建引导向量。我们将这种方法应用于逐渐类型化语言Python和TypeScript的类型预测任务。这种方法可以纠正高达90%的类型错误预测。最后，我们展示了从Python激活计算出的引导向量可靠地纠正TypeScript中的类型错误预测，反之亦然。这一结果表明，LLMs可能正在学习在不同编程语言之间转移类型知识。

更新时间: 2024-04-02 12:44:44

领域: cs.CL,cs.LG,cs.PL

下载: http://arxiv.org/abs/2404.01903v1

Bayesian neural networks via MCMC: a Python-based tutorial

Bayesian inference provides a methodology for parameter estimation and uncertainty quantification in machine learning and deep learning methods. Variational inference and Markov Chain Monte-Carlo (MCMC) sampling methods are used to implement Bayesian inference. In the past three decades, MCMC sampling methods have faced some challenges in being adapted to larger models (such as in deep learning) and big data problems. Advanced proposal distributions that incorporate gradients, such as a Langevin proposal distribution, provide a means to address some of the limitations of MCMC sampling for Bayesian neural networks. Furthermore, MCMC methods have typically been constrained to statisticians and currently not well-known among deep learning researchers. We present a tutorial for MCMC methods that covers simple Bayesian linear and logistic models, and Bayesian neural networks. The aim of this tutorial is to bridge the gap between theory and implementation via coding, given a general sparsity of libraries and tutorials to this end. This tutorial provides code in Python with data and instructions that enable their use and extension. We provide results for some benchmark problems showing the strengths and weaknesses of implementing the respective Bayesian models via MCMC. We highlight the challenges in sampling multi-modal posterior distributions for the case of Bayesian neural networks and the need for further improvement of convergence diagnosis methods.

Updated: 2024-04-02 12:38:25

标题: 通过MCMC的贝叶斯神经网络：一个基于Python的教程

摘要: 贝叶斯推断为机器学习和深度学习方法中的参数估计和不确定性量化提供了一种方法论。变分推断和马尔科夫链蒙特卡洛（MCMC）采样方法被用来实现贝叶斯推断。在过去的三十年里，MCMC采样方法在适应更大模型（如深度学习）和大数据问题方面面临一些挑战。融合梯度的高级提议分布，如朗之万提议分布，提供了一种解决一些MCMC采样在贝叶斯神经网络中的限制的方法。此外，MCMC方法通常限于统计学家，目前在深度学习研究人员中并不为人所熟知。我们提供了一个涵盖简单贝叶斯线性和逻辑模型以及贝叶斯神经网络的MCMC方法教程。该教程旨在通过编码来弥合理论与实现之间的差距，鉴于目前库和教程的普遍稀缺。该教程提供了Python代码、数据和说明，使其能够被使用和扩展。我们提供了一些基准问题的结果，展示了通过MCMC实现相应贝叶斯模型的优势和劣势。我们强调了在采样贝叶斯神经网络多模态后验分布方面的挑战，以及进一步改进收敛诊断方法的需求。

更新时间: 2024-04-02 12:38:25

领域: stat.ML,cs.AI,cs.LG,stat.CO

下载: http://arxiv.org/abs/2304.02595v2

Foundation Models for Time Series Analysis: A Tutorial and Survey

Time series analysis stands as a focal point within the data mining community, serving as a cornerstone for extracting valuable insights crucial to a myriad of real-world applications. Recent advancements in Foundation Models (FMs) have fundamentally reshaped the paradigm of model design for time series analysis, boosting various downstream tasks in practice. These innovative approaches often leverage pre-trained or fine-tuned FMs to harness generalized knowledge tailored specifically for time series analysis. In this survey, we aim to furnish a comprehensive and up-to-date overview of FMs for time series analysis. While prior surveys have predominantly focused on either the application or the pipeline aspects of FMs in time series analysis, they have often lacked an in-depth understanding of the underlying mechanisms that elucidate why and how FMs benefit time series analysis. To address this gap, our survey adopts a model-centric classification, delineating various pivotal elements of time-series FMs, including model architectures, pre-training techniques, adaptation methods, and data modalities. Overall, this survey serves to consolidate the latest advancements in FMs pertinent to time series analysis, accentuating their theoretical underpinnings, recent strides in development, and avenues for future research exploration.

Updated: 2024-04-02 12:38:02

标题: 时间序列分析的基础模型：教程和调查

摘要: 时间序列分析是数据挖掘社区的一个焦点，是提取对许多实际应用至关重要的宝贵见解的基石。最近，在基础模型（FMs）方面取得的进展从根本上改变了时间序列分析模型设计的范式，在实践中提升了各种下游任务。这些创新方法通常利用预训练或微调的FMs，以利用专门为时间序列分析量身定制的广义知识。在本调查中，我们旨在提供关于时间序列分析的FMs全面且最新的概述。虽然以往的调查主要集中在FMs在时间序列分析中的应用或流程方面，但它们往往缺乏对解释为什么和如何FMs有益于时间序列分析的基础机制的深入理解。为了弥补这一差距，我们的调查采用了以模型为中心的分类，描述了时间序列FMs的各种关键要素，包括模型架构、预训练技术、适应方法和数据模态。总体而言，本调查旨在梳理与时间序列分析相关的FMs的最新进展，突显它们的理论基础、发展中的最新进展以及未来研究探索的途径。

更新时间: 2024-04-02 12:38:02

领域: cs.LG

下载: http://arxiv.org/abs/2403.14735v2

Continuous Spiking Graph Neural Networks

Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) by introducing continuous dynamics. They typically draw inspiration from diffusion-based methods to introduce a novel propagation scheme, which is analyzed using ordinary differential equations (ODE). However, the implementation of CGNNs requires significant computational power, making them challenging to deploy on battery-powered devices. Inspired by recent spiking neural networks (SNNs), which emulate a biological inference process and provide an energy-efficient neural architecture, we incorporate the SNNs with CGNNs in a unified framework, named Continuous Spiking Graph Neural Networks (COS-GNN). We employ SNNs for graph node representation at each time step, which are further integrated into the ODE process along with time. To enhance information preservation and mitigate information loss in SNNs, we introduce the high-order structure of COS-GNN, which utilizes the second-order ODE for spiking representation and continuous propagation. Moreover, we provide the theoretical proof that COS-GNN effectively mitigates the issues of exploding and vanishing gradients, enabling us to capture long-range dependencies between nodes. Experimental results on graph-based learning tasks demonstrate the effectiveness of the proposed COS-GNN over competitive baselines.

Updated: 2024-04-02 12:36:40

标题: 连续尖峰图神经网络

摘要: 连续图神经网络（CGNNs）由于引入连续动态而能够泛化现有的离散图神经网络（GNNs），因此受到了广泛关注。它们通常从扩散方法中汲取灵感，引入了一种新颖的传播方案，该方案使用常微分方程（ODE）进行分析。然而，CGNNs的实现需要大量的计算资源，使它们难以部署在电池供电设备上。受最近脉冲神经网络（SNNs）的启发，SNNs模拟生物推理过程并提供节能的神经架构，我们将SNNs与CGNNs结合在一个统一框架中，命名为连续脉冲图神经网络（COS-GNN）。我们在每个时间步骤使用SNNs进行图节点表示，然后将其与时间一起集成到ODE过程中。为了增强信息保留并减少SNNs中的信息丢失，我们引入了COS-GNN的高阶结构，该结构利用第二阶ODE进行脉冲表示和连续传播。此外，我们提供了理论证明，表明COS-GNN有效地缓解了梯度爆炸和消失的问题，使我们能够捕获节点之间的长距离依赖关系。基于图的学习任务的实验结果显示，所提出的COS-GNN相对竞争基线具有更高的有效性。

更新时间: 2024-04-02 12:36:40

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01897v1

RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement

In this paper we propose a novel modification of Contrastive Language-Image Pre-Training (CLIP) guidance for the task of unsupervised backlit image enhancement. Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLIP embedding space. Learned prompts then guide an image enhancement network. Based on the CLIP-LIT framework, we propose two novel methods for CLIP guidance. First, we show that instead of tuning prompts in the space of text embeddings, it is possible to directly tune their embeddings in the latent space without any loss in quality. This accelerates training and potentially enables the use of additional encoders that do not have a text encoder. Second, we propose a novel approach that does not require any prompt tuning. Instead, based on CLIP embeddings of backlit and well-lit images from training data, we compute the residual vector in the embedding space as a simple difference between the mean embeddings of the well-lit and backlit images. This vector then guides the enhancement network during training, pushing a backlit image towards the space of well-lit images. This approach further dramatically reduces training time, stabilizes training and produces high quality enhanced images without artifacts, both in supervised and unsupervised training regimes. Additionally, we show that residual vectors can be interpreted, revealing biases in training data, and thereby enabling potential bias correction.

Updated: 2024-04-02 12:28:40

标题: RAVE：用于CLIP引导的背光图像增强的残余向量嵌入

摘要: 在本文中，我们提出了一种新颖的对比语言-图像预训练（CLIP）指导的修改，用于无监督逆光图像增强任务。我们的工作基于最先进的CLIP-LIT方法，该方法通过在CLIP嵌入空间中约束提示对之间的文本-图像相似性（负/正样本的提示和相应图像的逆光图像/照明良好的图像），学习提示对。然后，学习的提示指导图像增强网络。基于CLIP-LIT框架，我们提出了两种新颖的CLIP指导方法。首先，我们表明，与在文本嵌入空间中调整提示相比，直接在潜在空间中调整它们的嵌入是可能的，而不会损失质量。这加快了训练速度，有可能使用不具有文本编码器的额外编码器。其次，我们提出了一种不需要任何提示调整的新颖方法。相反，基于训练数据中逆光图像和照明良好图像的CLIP嵌入，我们计算嵌入空间中的残差向量，作为照明良好图像和逆光图像的平均嵌入之间的简单差异。然后，在训练过程中，该向量指导增强网络，将逆光图像推向照明良好图像的空间。这种方法进一步显著减少了训练时间，稳定了训练，并在有监督和无监督的训练模式下产生了无瑕疵的高质量增强图像。此外，我们表明残差向量可以被解释，揭示出训练数据中的偏见，从而实现潜在的偏见修正。

更新时间: 2024-04-02 12:28:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01889v1

High-performance real-world optical computing trained by in situ model-free optimization

Optical computing systems provide high-speed and low-energy data processing but face deficiencies in computationally demanding training and simulation-to-reality gaps. We propose a gradient-based model-free optimization (G-MFO) method based on a Monte Carlo gradient estimation algorithm for computationally efficient in situ training of optical computing systems. This approach treats an optical computing system as a black box and back-propagates the loss directly to the optical computing weights' probability distributions, circumventing the need for a computationally heavy and biased system simulation. Our experiments on diffractive optical computing systems show that G-MFO outperforms hybrid training on the MNIST and FMNIST datasets. Furthermore, we demonstrate image-free and high-speed classification of cells from their marker-free phase maps. Our method's model-free and high-performance nature, combined with its low demand for computational resources, paves the way for accelerating the transition of optical computing from laboratory demonstrations to practical, real-world applications.

Updated: 2024-04-02 12:25:10

标题: 高性能实际光学计算通过原位无模型优化进行训练

摘要: 光计算系统提供高速和低能耗的数据处理，但在计算密集型训练和模拟到现实之间存在缺陷。我们提出了一种基于蒙特卡罗梯度估计算法的基于梯度的无模型优化（G-MFO）方法，用于光计算系统的就地高效训练。这种方法将光计算系统视为黑匣子，并直接将损失反向传播到光计算权重的概率分布，避免了对计算量大且有偏见的系统模拟的需求。我们在衍射光计算系统上的实验表明，G-MFO在MNIST和FMNIST数据集上优于混合训练。此外，我们展示了从无标记细胞相位图中无图像进行高速分类。我们的方法无模型且高性能的特性，结合其对计算资源的低需求，为光计算从实验室演示向实际应用的过渡加速铺平了道路。

更新时间: 2024-04-02 12:25:10

领域: physics.optics,cs.CV,cs.ET,cs.LG

下载: http://arxiv.org/abs/2307.11957v5

PeersimGym: An Environment for Solving the Task Offloading Problem with Reinforcement Learning

Task offloading, crucial for balancing computational loads across devices in networks such as the Internet of Things, poses significant optimization challenges, including minimizing latency and energy usage under strict communication and storage constraints. While traditional optimization falls short in scalability; and heuristic approaches lack in achieving optimal outcomes, Reinforcement Learning (RL) offers a promising avenue by enabling the learning of optimal offloading strategies through iterative interactions. However, the efficacy of RL hinges on access to rich datasets and custom-tailored, realistic training environments. To address this, we introduce PeersimGym, an open-source, customizable simulation environment tailored for developing and optimizing task offloading strategies within computational networks. PeersimGym supports a wide range of network topologies and computational constraints and integrates a \textit{PettingZoo}-based interface for RL agent deployment in both solo and multi-agent setups. Furthermore, we demonstrate the utility of the environment through experiments with Deep Reinforcement Learning agents, showcasing the potential of RL-based approaches to significantly enhance offloading strategies in distributed computing settings. PeersimGym thus bridges the gap between theoretical RL models and their practical applications, paving the way for advancements in efficient task offloading methodologies.

Updated: 2024-04-02 12:17:30

标题: PeersimGym：用强化学习解决任务卸载问题的环境

摘要: 任务卸载在网络中如物联网等设备之间平衡计算负载方面至关重要，它提出了重要的优化挑战，包括在严格的通信和存储限制下最小化延迟和能源使用。传统优化在可扩展性方面存在不足；启发式方法在实现最佳结果方面存在不足，强化学习（RL）通过启用迭代交互学习最佳卸载策略，提供了一个有希望的途径。然而，RL的有效性取决于对丰富数据集和定制的、真实的训练环境的访问。为了解决这个问题，我们引入了PeersimGym，一个开源、可定制的仿真环境，专门用于开发和优化计算网络中的任务卸载策略。PeersimGym支持各种网络拓扑和计算约束，并集成了基于PettingZoo的接口，用于在单一代理和多代理设置中部署RL代理。此外，我们通过使用深度强化学习代理进行实验，展示了该环境的实用性，展示了基于RL的方法显著提升分布式计算环境中的卸载策略的潜力。PeersimGym因此弥合了理论RL模型与实际应用之间的差距，为高效任务卸载方法的进步铺平了道路。

更新时间: 2024-04-02 12:17:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.17637v2

Adversarial Combinatorial Bandits with Switching Costs

We study the problem of adversarial combinatorial bandit with a switching cost $\lambda$ for a switch of each selected arm in each round, considering both the bandit feedback and semi-bandit feedback settings. In the oblivious adversarial case with $K$ base arms and time horizon $T$, we derive lower bounds for the minimax regret and design algorithms to approach them. To prove these lower bounds, we design stochastic loss sequences for both feedback settings, building on an idea from previous work in Dekel et al. (2014). The lower bound for bandit feedback is $ \tilde{\Omega}\big( (\lambda K)^{\frac{1}{3}} (TI)^{\frac{2}{3}}\big)$ while that for semi-bandit feedback is $ \tilde{\Omega}\big( (\lambda K I)^{\frac{1}{3}} T^{\frac{2}{3}}\big)$ where $I$ is the number of base arms in the combinatorial arm played in each round. To approach these lower bounds, we design algorithms that operate in batches by dividing the time horizon into batches to restrict the number of switches between actions. For the bandit feedback setting, where only the total loss of the combinatorial arm is observed, we introduce the Batched-Exp2 algorithm which achieves a regret upper bound of $\tilde{O}\big((\lambda K)^{\frac{1}{3}}T^{\frac{2}{3}}I^{\frac{4}{3}}\big)$ as $T$ tends to infinity. In the semi-bandit feedback setting, where all losses for the combinatorial arm are observed, we propose the Batched-BROAD algorithm which achieves a regret upper bound of $\tilde{O}\big( (\lambda K)^{\frac{1}{3}} (TI)^{\frac{2}{3}}\big)$.

Updated: 2024-04-02 12:15:37

标题: 对抗性组合赌博问题及其切换成本

摘要: 我们研究了带有每轮选择的臂的切换成本$\lambda$的对抗组合赌博问题，考虑了赌博反馈和半赌博反馈设置。在具有$K$个基本臂和时间界$T$的无知对抗情况下，我们推导了最小化遗憾的下界，并设计了算法来接近这些下界。为了证明这些下界，我们为两种反馈设置设计了随机损失序列，依据Dekel等人（2014年）的先前工作中的一个想法。赌博反馈的下界为$ \tilde{\Omega}\big( (\lambda K)^{\frac{1}{3}} (TI)^{\frac{2}{3}}\big)$，而半赌博反馈的下界为$ \tilde{\Omega}\big( (\lambda K I)^{\frac{1}{3}} T^{\frac{2}{3}}\big)$，其中$I$是每轮中玩的组合臂中的基本臂数。为了接近这些下界，我们设计了通过将时间界划分为批处理来操作的算法，以限制动作之间的切换次数。对于只观察组合臂的总损失的赌博反馈设置，我们介绍了Batched-Exp2算法，其遗憾上限为$\tilde{O}\big((\lambda K)^{\frac{1}{3}}T^{\frac{2}{3}}I^{\frac{4}{3}}\big)$，当$T$趋于无穷大时。在半赌博反馈设置中，观察到组合臂的所有损失，我们提出了Batched-BROAD算法，其遗憾上限为$\tilde{O}\big( (\lambda K)^{\frac{1}{3}}(TI)^{\frac{2}{3}}\big)$。

更新时间: 2024-04-02 12:15:37

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.01883v1

Real, fake and synthetic faces - does the coin have three sides?

With the ever-growing power of generative artificial intelligence, deepfake and artificially generated (synthetic) media have continued to spread online, which creates various ethical and moral concerns regarding their usage. To tackle this, we thus present a novel exploration of the trends and patterns observed in real, deepfake and synthetic facial images. The proposed analysis is done in two parts: firstly, we incorporate eight deep learning models and analyze their performances in distinguishing between the three classes of images. Next, we look to further delve into the similarities and differences between these three sets of images by investigating their image properties both in the context of the entire image as well as in the context of specific regions within the image. ANOVA test was also performed and provided further clarity amongst the patterns associated between the images of the three classes. From our findings, we observe that the investigated deeplearning models found it easier to detect synthetic facial images, with the ViT Patch-16 model performing best on this task with a class-averaged sensitivity, specificity, precision, and accuracy of 97.37%, 98.69%, 97.48%, and 98.25%, respectively. This observation was supported by further analysis of various image properties. We saw noticeable differences across the three category of images. This analysis can help us build better algorithms for facial image generation, and also shows that synthetic, deepfake and real face images are indeed three different classes.

Updated: 2024-04-02 12:08:26

标题: 真实、虚假和合成面孔-硬币有三面吗？

摘要: 随着生成式人工智能的不断增强，深度伪造和人工生成（合成）媒体继续在网络上传播，这引发了关于它们使用的各种道德和伦理关切。为了解决这个问题，我们提出了一种新颖的探索，研究了真实、深度伪造和合成面部图像中观察到的趋势和模式。提出的分析分为两部分：首先，我们整合了八个深度学习模型，并分析它们在区分这三类图像方面的性能。接下来，我们继续探究这三组图像之间的相似性和差异，通过研究它们在整个图像以及图像内特定区域的属性。ANOVA测试也被执行，并提供了进一步澄清这三类图像之间相关模式的清晰度。从我们的研究结果中，我们观察到被调查的深度学习模型更容易检测到合成面部图像，其中ViT Patch-16模型在这项任务中表现最佳，其类平均敏感性、特异性、精确度和准确度分别为97.37％，98.69％，97.48％和98.25％。这一观察结果得到了对各种图像属性的进一步分析的支持。我们看到在三类图像之间存在明显的差异。这种分析可以帮助我们构建更好的面部图像生成算法，并且显示了合成、深度伪造和真实面部图像确实是三个不同的类别。

更新时间: 2024-04-02 12:08:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01878v1

MapGuide: A Simple yet Effective Method to Reconstruct Continuous Language from Brain Activities

Decoding continuous language from brain activity is a formidable yet promising field of research. It is particularly significant for aiding people with speech disabilities to communicate through brain signals. This field addresses the complex task of mapping brain signals to text. The previous best attempt reverse-engineered this process in an indirect way: it began by learning to encode brain activity from text and then guided text generation by aligning with predicted brain responses. In contrast, we propose a simple yet effective method that guides text reconstruction by directly comparing them with the predicted text embeddings mapped from brain activities. Comprehensive experiments reveal that our method significantly outperforms the current state-of-the-art model, showing average improvements of 77% and 54% on BLEU and METEOR scores. We further validate the proposed modules through detailed ablation studies and case analyses and highlight a critical correlation: the more precisely we map brain activities to text embeddings, the better the text reconstruction results. Such insight can simplify the task of reconstructing language from brain activities for future work, emphasizing the importance of improving brain-to-text-embedding mapping techniques.

Updated: 2024-04-02 12:05:41

标题: MapGuide：一种简单而有效的方法，用于从大脑活动中重建连续语言

摘要: 将大脑活动解码为连续语言是一项艰巨但有前景的研究领域。对于帮助语言障碍者通过大脑信号进行交流尤为重要。该领域涉及将大脑信号映射到文本的复杂任务。先前最佳的尝试以间接方式反向工程化了这一过程：它首先学习从文本中编码大脑活动，然后通过与预测的大脑响应对齐来指导文本生成。相比之下，我们提出了一种简单而有效的方法，通过直接将它们与从大脑活动映射出的预测文本嵌入进行比较，来指导文本重建。综合实验表明，我们的方法明显优于当前的最先进模型，在BLEU和METEOR分数上平均改进了77％和54％。我们进一步通过详细的消融研究和案例分析验证了提出的模块，并强调了一个关键的相关性：我们将大脑活动精确映射到文本嵌入，文本重建结果就会更好。这种洞察力可以简化未来工作中从大脑活动重建语言的任务，并强调改进从大脑到文本嵌入映射技术的重要性。

更新时间: 2024-04-02 12:05:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.17516v2

Procedural Fairness in Machine Learning

Fairness in machine learning (ML) has received much attention. However, existing studies have mainly focused on the distributive fairness of ML models. The other dimension of fairness, i.e., procedural fairness, has been neglected. In this paper, we first define the procedural fairness of ML models, and then give formal definitions of individual and group procedural fairness. We propose a novel metric to evaluate the group procedural fairness of ML models, called $GPF_{FAE}$, which utilizes a widely used explainable artificial intelligence technique, namely feature attribution explanation (FAE), to capture the decision process of the ML models. We validate the effectiveness of $GPF_{FAE}$ on a synthetic dataset and eight real-world datasets. Our experiments reveal the relationship between procedural and distributive fairness of the ML model. Based on our analysis, we propose a method for identifying the features that lead to the procedural unfairness of the model and propose two methods to improve procedural fairness after identifying unfair features. Our experimental results demonstrate that we can accurately identify the features that lead to procedural unfairness in the ML model, and both of our proposed methods can significantly improve procedural fairness with a slight impact on model performance, while also improving distributive fairness.

Updated: 2024-04-02 12:05:02

标题: 机器学习中的程序公平性

摘要: 机器学习（ML）中的公平性受到了广泛关注。然而，现有研究主要集中在ML模型的分配公平性上。公平性的另一个维度，即程序公平性，却被忽视了。在本文中，我们首先定义了ML模型的程序公平性，然后给出了个人和群体程序公平性的正式定义。我们提出了一种用于评估ML模型群体程序公平性的新型指标，称为$GPF_{FAE}$，它利用了一种广泛使用的可解释人工智能技术，即特征归因解释（FAE），来捕捉ML模型的决策过程。我们在一个合成数据集和八个真实世界数据集上验证了$GPF_{FAE}$的有效性。我们的实验揭示了ML模型程序和分配公平性之间的关系。根据我们的分析，我们提出了一种识别导致模型程序不公平的特征的方法，并在识别不公平特征后提出了两种改善程序公平性的方法。我们的实验结果表明，我们可以准确识别导致ML模型程序不公平的特征，并且我们提出的两种方法都可以显著改善程序公平性，对模型性能的影响很小，同时也改善了分配公平性。

更新时间: 2024-04-02 12:05:02

领域: cs.LG

下载: http://arxiv.org/abs/2404.01877v1

Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

The proliferation of low-earth-orbit (LEO) satellite networks leads to the generation of vast volumes of remote sensing data which is traditionally transferred to the ground server for centralized processing, raising privacy and bandwidth concerns. Federated edge learning (FEEL), as a distributed machine learning approach, has the potential to address these challenges by sharing only model parameters instead of raw data. Although promising, the dynamics of LEO networks, characterized by the high mobility of satellites and short ground-to-satellite link (GSL) duration, pose unique challenges for FEEL. Notably, frequent model transmission between the satellites and ground incurs prolonged waiting time and large transmission latency. This paper introduces a novel FEEL algorithm, named FEDMEGA, tailored to LEO mega-constellation networks. By integrating inter-satellite links (ISL) for intra-orbit model aggregation, the proposed algorithm significantly reduces the usage of low data rate and intermittent GSL. Our proposed method includes a ring all-reduce based intra-orbit aggregation mechanism, coupled with a network flow-based transmission scheme for global model aggregation, which enhances transmission efficiency. Theoretical convergence analysis is provided to characterize the algorithm performance. Extensive simulations show that our FEDMEGA algorithm outperforms existing satellite FEEL algorithms, exhibiting an approximate 30% improvement in convergence rate.

Updated: 2024-04-02 11:59:58

标题: 卫星联合边缘学习：架构设计和收敛分析

摘要: 低地球轨道（LEO）卫星网络的蓬勃发展导致大量遥感数据的生成，传统上将这些数据传输到地面服务器进行集中处理，引发了隐私和带宽方面的担忧。联邦边缘学习（FEEL）作为一种分布式机器学习方法，具有通过仅共享模型参数而非原始数据来解决这些挑战的潜力。尽管有希望，但LEO网络的动态特征，即卫星高度移动性和短地对卫星链路（GSL）持续时间，为FEEL提出了独特的挑战。值得注意的是，卫星与地面之间频繁的模型传输会导致延迟等待时间和大量传输延迟。本文介绍了一种针对LEO大型卫星星座网络量身定制的新型FEEL算法，称为FEDMEGA。通过集成卫星间链路（ISL）进行轨道内模型聚合，所提出的算法显著减少了对低数据速率和间歇性GSL的使用。我们提出的方法包括基于环形全聚合的轨道内聚合机制，以及基于网络流的传输方案，用于全局模型聚合，从而提高传输效率。提供了理论收敛分析，以表征算法性能。广泛的模拟表明，我们的FEDMEGA算法优于现有的卫星FEEL算法，收敛速度提高了约30％。

更新时间: 2024-04-02 11:59:58

领域: eess.SP,cs.DC,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.01875v1

Fast and Adaptive Questionnaires for Voting Advice Applications

The effectiveness of Voting Advice Applications (VAA) is often compromised by the length of their questionnaires. To address user fatigue and incomplete responses, some applications (such as the Swiss Smartvote) offer a condensed version of their questionnaire. However, these condensed versions can not ensure the accuracy of recommended parties or candidates, which we show to remain below 40%. To tackle these limitations, this work introduces an adaptive questionnaire approach that selects subsequent questions based on users' previous answers, aiming to enhance recommendation accuracy while reducing the number of questions posed to the voters. Our method uses an encoder and decoder module to predict missing values at any completion stage, leveraging a two-dimensional latent space reflective of political science's traditional methods for visualizing political orientations. Additionally, a selector module is proposed to determine the most informative subsequent question based on the voter's current position in the latent space and the remaining unanswered questions. We validated our approach using the Smartvote dataset from the Swiss Federal elections in 2019, testing various spatial models and selection methods to optimize the system's predictive accuracy. Our findings indicate that employing the IDEAL model both as encoder and decoder, combined with a PosteriorRMSE method for question selection, significantly improves the accuracy of recommendations, achieving 74% accuracy after asking the same number of questions as in the condensed version.

Updated: 2024-04-02 11:55:50

标题: 快速且灵活的投票建议应用问卷

摘要: 投票建议应用程序（VAA）的有效性常常受到其问卷长度的影响。为了解决用户疲劳和不完整回答的问题，一些应用程序（如瑞士的Smartvote）提供了问卷的简化版本。然而，这些简化版本不能确保推荐的政党或候选人的准确性，我们发现其准确率仍然低于40%。为了解决这些限制，本文介绍了一种自适应问卷方法，根据用户先前的答案选择后续问题，旨在提高推荐准确性同时减少提问给选民的数量。我们的方法使用编码器和解码器模块来预测任何完成阶段的缺失值，利用反映政治学传统方法的二维潜在空间来可视化政治取向。此外，提出了一个选择器模块，根据选民在潜在空间中的当前位置和尚未回答的问题来确定最具信息量的后续问题。我们使用2019年瑞士联邦选举的Smartvote数据集验证了我们的方法，测试了各种空间模型和选择方法以优化系统的预测准确性。我们的研究结果表明，将IDEAL模型作为编码器和解码器结合PosteriorRMSE方法用于问题选择，显著提高了推荐准确性，在问了与简化版本相同数量的问题后达到了74%的准确率。

更新时间: 2024-04-02 11:55:50

领域: cs.LG,cs.HC,cs.IT,math.IT

下载: http://arxiv.org/abs/2404.01872v1

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey

Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to humans. However, despite these successes, the depth of LLMs' reasoning abilities remains uncertain. This uncertainty partly stems from the predominant focus on task performance, measured through shallow accuracy metrics, rather than a thorough investigation of the models' reasoning behavior. This paper seeks to address this gap by providing a comprehensive review of studies that go beyond task accuracy, offering deeper insights into the models' reasoning processes. Furthermore, we survey prevalent methodologies to evaluate the reasoning behavior of LLMs, emphasizing current trends and efforts towards more nuanced reasoning analyses. Our review suggests that LLMs tend to rely on surface-level patterns and correlations in their training data, rather than on genuine reasoning abilities. Additionally, we identify the need for further research that delineates the key differences between human and LLM-based reasoning. Through this survey, we aim to shed light on the complex reasoning processes within LLMs.

Updated: 2024-04-02 11:46:31

标题: 超越准确性：评估大型语言模型的推理行为--调查

摘要: 大型语言模型（LLMs）最近在涉及推理的任务上表现出色，引发了关于这些模型是否具有类似于人类的推理能力的激烈讨论。然而，尽管取得了成功，LLMs的推理能力的深度仍然存在不确定性。这种不确定性部分源于对任务性能的主要关注，通过浅层准确度指标来衡量，而不是对模型推理行为进行彻底调查。本文试图通过提供对超越任务准确度的研究的全面审查，深入了解模型的推理过程，以弥补这一差距。此外，我们调查了评估LLMs推理行为的流行方法论，强调了当前对更细致推理分析的努力和趋势。我们的审查表明，LLMs倾向于依赖于其训练数据中的表面模式和相关性，而不是真正的推理能力。此外，我们确定了需要进一步研究人类与基于LLMs的推理之间的关键差异。通过这项调查，我们的目标是揭示LLMs内部复杂推理过程。

更新时间: 2024-04-02 11:46:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01869v1

Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation

Efficiently tackling multiple tasks within complex environment, such as those found in robot manipulation, remains an ongoing challenge in robotics and an opportunity for data-driven solutions, such as reinforcement learning (RL). Model-based RL, by building a dynamic model of the robot, enables data reuse and transfer learning between tasks with the same robot and similar environment. Furthermore, data gathering in robotics is expensive and we must rely on data efficient approaches such as model-based RL, where policy learning is mostly conducted on cheaper simulations based on the learned model. Therefore, the quality of the model is fundamental for the performance of the posterior tasks. In this work, we focus on improving the quality of the model and maintaining the data efficiency by performing active learning of the dynamic model during a preliminary exploration phase based on maximize information gathering. We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration. With our presented strategies we manage to actively estimate the novelty of each transition, using this as the exploration reward. In this work, we compare several Bayesian inference methods for neural networks, some of which have never been used in a robotics context, and evaluate them in a realistic robot manipulation setup. Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives with much lower requirements regarding robot execution steps. Unlike related previous studies that focused the validation solely on toy problems, our research takes a step towards more realistic setups, tackling robotic arm end-tasks.

Updated: 2024-04-02 11:44:37

标题: 机器人操作中基于贝叶斯模型的强化学习中的主动探索

摘要: 在复杂环境中有效处理多项任务，比如在机器人操作中发现的那些情况，仍然是机器人技术领域的一个持续挑战，同时也是数据驱动解决方案，如强化学习（RL）的一个机会。通过建立机器人的动态模型，基于模型的RL使得数据在相同机器人和类似环境下的任务之间可以重复使用和迁移学习。此外，在机器人技术中，数据采集是昂贵的，我们必须依赖数据高效的方法，如基于模型的RL，在学习模型的便宜模拟中主要进行策略学习。因此，模型的质量对后续任务的性能至关重要。在这项工作中，我们专注于通过在初步探索阶段进行动态模型的主动学习，以最大化信息收集来提高模型的质量和保持数据效率。我们采用贝叶斯神经网络模型以概率方式表示动态模型中的信念和信息。通过我们提出的策略，我们成功地主动估计每个转换的新颖性，并将其作为探索奖励。在这项工作中，我们比较了几种用于神经网络的贝叶斯推断方法，其中一些从未在机器人技术领域使用过，并在一个现实的机器人操作设置中对它们进行评估。我们的实验显示了我们的贝叶斯模型基于RL方法的优势，结果质量与相关替代方案相似，但对于机器人执行步骤要求低得多。与以往侧重于玩具问题验证的相关研究不同，我们的研究朝着更加现实的设置迈出了一步，解决了机器人手臂末端任务。

更新时间: 2024-04-02 11:44:37

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.01867v1

Efficient tensor network simulation of IBM's largest quantum processors

We show how quantum-inspired 2d tensor networks can be used to efficiently and accurately simulate the largest quantum processors from IBM, namely Eagle (127 qubits), Osprey (433 qubits) and Condor (1121 qubits). We simulate the dynamics of a complex quantum many-body system -specifically, the kicked Ising experiment considered recently by IBM in Nature 618, p. 500-505 (2023)- using graph-based Projected Entangled Pair States (gPEPS), which was proposed by some of us in PRB 99, 195105 (2019). Our results show that simple tensor updates are already sufficient to achieve very large unprecedented accuracy with remarkably low computational resources for this model. Apart from simulating the original experiment for 127 qubits, we also extend our results to 433 and 1121 qubits, and for evolution times around 8 times longer, thus setting a benchmark for the newest IBM quantum machines. We also report accurate simulations for infinitely-many qubits. Our results show that gPEPS are a natural tool to efficiently simulate quantum computers with an underlying lattice-based qubit connectivity, such as all quantum processors based on superconducting qubits.

Updated: 2024-04-02 11:44:04

标题: IBM最大量子处理器的高效张量网络模拟

摘要: 我们展示了如何利用受量子启发的二维张量网络来高效且准确地模拟IBM最大的量子处理器，即Eagle（127量子比特）、Osprey（433量子比特）和Condor（1121量子比特）。我们使用基于图的Projected Entangled Pair States（gPEPS）来模拟一个复杂的量子多体系统的动力学，具体来说，这是IBM最近在《自然》618期中考虑的踢击量子自旋实验（2023年第500-505页），该方法是我们其中一些人在PRB 99, 195105（2019）中提出的。我们的结果表明，简单的张量更新已足以在该模型中以极低的计算资源实现非常高的准确性。除了模拟127量子比特的原始实验外，我们还将我们的结果扩展到433和1121量子比特，并延长演化时间约8倍，从而为最新的IBM量子机器设定了一个基准。我们还报告了无限多量子比特的准确模拟结果。我们的结果表明，gPEPS是一种有效模拟具有基于晶格的量子比特连接性的量子计算机的自然工具，例如所有基于超导量子比特的量子处理器。

更新时间: 2024-04-02 11:44:04

领域: quant-ph,cond-mat.str-el,cs.CE,cs.LG

下载: http://arxiv.org/abs/2309.15642v3

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent. However, excessive optimization with such reward models, which serve as mere proxy objectives, can compromise the performance of fine-tuned models, a phenomenon known as reward overoptimization. To investigate this issue in depth, we introduce the Text-Image Alignment Assessment (TIA2) benchmark, which comprises a diverse collection of text prompts, images, and human annotations. Our evaluation of several state-of-the-art reward models on this benchmark reveals their frequent misalignment with human assessment. We empirically demonstrate that overoptimization occurs notably when a poorly aligned reward model is used as the fine-tuning objective. To address this, we propose TextNorm, a simple method that enhances alignment based on a measure of reward model confidence estimated across a set of semantically contrastive text prompts. We demonstrate that incorporating the confidence-calibrated rewards in fine-tuning effectively reduces overoptimization, resulting in twice as many wins in human evaluation for text-image alignment compared against the baseline reward models.

Updated: 2024-04-02 11:40:38

标题: 自信感知奖励优化用于微调文本到图像模型

摘要: 使用在人类反馈数据上训练的奖励函数微调文本到图像模型已被证明对齐模型行为和人类意图有效。然而，通过这种仅作为代理目标的奖励模型进行过度优化可能会损害微调模型的性能，这种现象被称为奖励过度优化。为了深入研究这个问题，我们引入了文本-图像对齐评估（TIA2）基准，其中包括各种文本提示、图像和人类注释的集合。我们对该基准上的几种最新奖励模型进行评估，发现它们经常与人类评估不一致。我们经验性地证明，当使用不良对齐的奖励模型作为微调目标时，过度优化明显发生。为了解决这个问题，我们提出了TextNorm，一种简单的方法，根据在一组语义对比文本提示上估计的奖励模型置信度来增强对齐。我们证明，在微调中加入置信度校准的奖励有效地减少了过度优化，使文本-图像对齐在人类评估中获胜次数是基准奖励模型的两倍。

更新时间: 2024-04-02 11:40:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.01863v1

Detecting Gender Bias in Course Evaluations

An outtake from the findnings of a master thesis studying gender bias in course evaluations through the lense of machine learning and nlp. We use different methods to examine and explore the data and find differences in what students write about courses depending on gender of the examiner. Data from English and Swedish courses are evaluated and compared, in order to capture more nuance in the gender bias that might be found. Here we present the results from the work so far, but this is an ongoing project and there is more work to do.

Updated: 2024-04-02 11:35:05

标题: 检测课程评价中的性别偏见

摘要: 这是一篇关于通过机器学习和自然语言处理来研究课程评价中性别偏见的硕士论文的发现摘要。我们使用不同的方法来检验和探索数据，并发现学生对课程的评价会因为考官的性别而有所不同。我们评估和比较了英语和瑞典语课程的数据，以捕捉可能存在的性别偏见中更多的细微差别。在这里，我们展示了迄今为止的工作结果，但这是一个正在进行的项目，还有更多的工作要做。

更新时间: 2024-04-02 11:35:05

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.01857v1

Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation

Next Point-of-interest (POI) recommendation provides valuable suggestions for users to explore their surrounding environment. Existing studies rely on building recommendation models from large-scale users' check-in data, which is task-specific and needs extensive computational resources. Recently, the pretrained large language models (LLMs) have achieved significant advancements in various NLP tasks and have also been investigated for recommendation scenarios. However, the generalization abilities of LLMs still are unexplored to address the next POI recommendations, where users' geographical movement patterns should be extracted. Although there are studies that leverage LLMs for next-item recommendations, they fail to consider the geographical influence and sequential transitions. Hence, they cannot effectively solve the next POI recommendation task. To this end, we design novel prompting strategies and conduct empirical studies to assess the capability of LLMs, e.g., ChatGPT, for predicting a user's next check-in. Specifically, we consider several essential factors in human movement behaviors, including user geographical preference, spatial distance, and sequential transitions, and formulate the recommendation task as a ranking problem. Through extensive experiments on two widely used real-world datasets, we derive several key findings. Empirical evaluations demonstrate that LLMs have promising zero-shot recommendation abilities and can provide accurate and reasonable predictions. We also reveal that LLMs cannot accurately comprehend geographical context information and are sensitive to the order of presentation of candidate POIs, which shows the limitations of LLMs and necessitates further research on robust human mobility reasoning mechanisms.

Updated: 2024-04-02 11:33:04

标题: 下一步去哪里：零样本泛化的LLMs对下一个POI推荐

摘要: 下一个兴趣点（POI）推荐为用户探索周围环境提供了宝贵建议。现有研究依赖于基于大规模用户签到数据构建推荐模型，这是任务特定的，并且需要大量计算资源。最近，预训练的大型语言模型（LLMs）在各种自然语言处理任务中取得了重大进展，并且已经被用于推荐场景。然而，LLMs的泛化能力仍未被探索，以解决下一个POI推荐问题，其中应提取用户的地理移动模式。尽管有研究利用LLMs进行下一个项目推荐，但它们未考虑地理影响和顺序转换。因此，它们无法有效解决下一个POI推荐任务。为此，我们设计了新颖的提示策略，并进行实证研究以评估LLMs（例如ChatGPT）预测用户下次签到的能力。具体而言，我们考虑了人类移动行为中的几个重要因素，包括用户地理偏好、空间距离和顺序转换，并将推荐任务构建为一个排名问题。通过对两个广泛使用的真实世界数据集进行大量实验，我们得出了几个关键发现。实证评估表明，LLMs具有有希望的零-shot推荐能力，并且可以提供准确和合理的预测。我们还发现LLMs无法准确理解地理上下文信息，并且对候选POI的呈现顺序敏感，这显示了LLMs的局限性，并需要进一步研究鲁棒的人类移动推理机制。

更新时间: 2024-04-02 11:33:04

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.01855v1

Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation

In this paper, we present Export3D, a one-shot 3D-aware portrait animation method that is able to control the facial expression and camera view of a given portrait image. To achieve this, we introduce a tri-plane generator that directly generates a tri-plane of 3D prior by transferring the expression parameter of 3DMM into the source image. The tri-plane is then decoded into the image of different view through a differentiable volume rendering. Existing portrait animation methods heavily rely on image warping to transfer the expression in the motion space, challenging on disentanglement of appearance and expression. In contrast, we propose a contrastive pre-training framework for appearance-free expression parameter, eliminating undesirable appearance swap when transferring a cross-identity expression. Extensive experiments show that our pre-training framework can learn the appearance-free expression representation hidden in 3DMM, and our model can generate 3D-aware expression controllable portrait image without appearance swap in the cross-identity manner.

Updated: 2024-04-02 11:31:50

标题: 学习生成条件Tri-plane用于3D感知表情可控人像动画

摘要: 在这篇论文中，我们提出了Export3D，一种一次性的3D感知肖像动画方法，能够控制给定肖像图像的面部表情和摄像机视角。为了实现这一点，我们引入了一个三平面生成器，通过将3DMM的表情参数转移到源图像中，直接生成3D先验的三平面。然后，通过可微分体积渲染将三平面解码为不同视角的图像。现有的肖像动画方法严重依赖于图像变形来在运动空间中转移表情，挑战在外观和表情的解缠上。相比之下，我们提出了一个用于无外观表达参数的对比预训练框架，消除了在转移跨身份表情时出现不良外观交换的问题。大量实验证明，我们的预训练框架可以学习隐藏在3DMM中的无外观表达表示，我们的模型可以生成具有3D感知表情可控的肖像图像，而不会在跨身份方式中进行外观交换。

更新时间: 2024-04-02 11:31:50

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2404.00636v2

Pairwise Similarity Distribution Clustering for Noisy Label Learning

Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels, whose main challenge comes from how to deal with the inaccurate supervision caused by wrong labels. Existing works either take the label correction or sample selection paradigm to involve more samples with accurate labels into the training process. In this paper, we propose a simple yet effective sample selection algorithm, termed as Pairwise Similarity Distribution Clustering~(PSDC), to divide the training samples into one clean set and another noisy set, which can power any of the off-the-shelf semi-supervised learning regimes to further train networks for different downstream tasks. Specifically, we take the pairwise similarity between sample pairs to represent the sample structure, and the Gaussian Mixture Model~(GMM) to model the similarity distribution between sample pairs belonging to the same noisy cluster, therefore each sample can be confidently divided into the clean set or noisy set. Even under severe label noise rate, the resulting data partition mechanism has been proved to be more robust in judging the label confidence in both theory and practice. Experimental results on various benchmark datasets, such as CIFAR-10, CIFAR-100 and Clothing1M, demonstrate significant improvements over state-of-the-art methods.

Updated: 2024-04-02 11:30:22

标题: 嘈杂标签学习的成对相似性分布聚类

摘要: 嘈杂标签学习旨在使用带有嘈杂标签的大量样本训练深度神经网络，其主要挑战在于如何处理由错误标签导致的不准确监督。现有的工作要么采取标签校正，要么采取样本选择范式，将更多带有准确标签的样本纳入训练过程中。本文提出了一种简单但有效的样本选择算法，称为Pairwise Similarity Distribution Clustering（PSDC），将训练样本分为一个干净集和一个嘈杂集，这可以为任何现成的半监督学习方案提供动力，进一步训练网络用于不同的下游任务。具体来说，我们采用样本对之间的配对相似性来表示样本结构，并使用高斯混合模型（GMM）来建模属于同一嘈杂聚类的样本对之间的相似性分布，因此每个样本可以被自信地分为干净集或嘈杂集。即使在严重的标签噪声率下，所得到的数据分区机制在理论和实践中都被证明对判断标签信心更加稳健。对各种基准数据集（如CIFAR-10、CIFAR-100和Clothing1M）上的实验结果显示，与最先进的方法相比，取得了显著的改进。

更新时间: 2024-04-02 11:30:22

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.01853v1

Hexa: Self-Improving for Knowledge-Grounded Dialogue System

A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation.

Updated: 2024-04-02 11:28:40

标题: 六角：知识驱动对话系统的自我改进

摘要: 在基于知识的对话生成中，一种常见的做法是明确利用中间步骤（例如网络搜索、记忆检索）与模块化方法。然而，与对话响应相比，这些步骤的数据通常难以获取，因为它们在普通对话中不可观察。为了填补这些数据的缺失，我们开发了一种自我改进的方法，以提高中间步骤的生成性能，而无需地面真实数据。特别是，我们提出了一种新颖的引导提示和修改损失函数的自举方案，以增强适当的自生成响应的多样性。通过在各种基准数据集上进行实验，我们从实证上证明了我们的方法成功利用了自我改进机制来生成中间和最终响应，并提高了基于知识的对话生成任务的性能。

更新时间: 2024-04-02 11:28:40

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.06404v3

EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking

As electric vehicle (EV) numbers rise, concerns about the capacity of current charging and power grid infrastructure grow, necessitating the development of smart charging solutions. While many smart charging simulators have been developed in recent years, only a few support the development of Reinforcement Learning (RL) algorithms in the form of a Gym environment, and those that do usually lack depth in modeling Vehicle-to-Grid (V2G) scenarios. To address the aforementioned issues, this paper introduces the EV2Gym, a realistic simulator platform for the development and assessment of small and large-scale smart charging algorithms within a standardized platform. The proposed simulator is populated with comprehensive EV, charging station, power transformer, and EV behavior models validated using real data. EV2Gym has a highly customizable interface empowering users to choose from pre-designed case studies or craft their own customized scenarios to suit their specific requirements. Moreover, it incorporates a diverse array of RL, mathematical programming, and heuristic algorithms to speed up the development and benchmarking of new solutions. By offering a unified and standardized platform, EV2Gym aims to provide researchers and practitioners with a robust environment for advancing and assessing smart charging algorithms.

Updated: 2024-04-02 11:22:53

标题: EV2Gym：用于电动智能充电研究和基准测试的灵活V2G模拟器

摘要: 随着电动汽车（EV）数量的增加，对当前充电和电网基础设施容量的担忧日益加剧，这促使了智能充电解决方案的发展。近年来，虽然开发了许多智能充电模拟器，但只有少数支持以Gym环境形式开发强化学习（RL）算法，而且这些模拟器通常在建模车辆对电网（V2G）场景方面缺乏深度。为了解决上述问题，本文介绍了EV2Gym，这是一个用于开发和评估小型和大型智能充电算法的现实模拟器平台，且具有标准化平台。所提出的模拟器通过使用真实数据验证的全面EV、充电站、电力变压器和EV行为模型进行填充。EV2Gym具有高度可定制的接口，使用户可以选择预先设计的案例研究或制定自己的定制场景以满足其特定需求。此外，它结合了各种RL、数学规划和启发式算法，以加快新解决方案的开发和基准测试。通过提供统一和标准化的平台，EV2Gym旨在为研究人员和从业者提供一个强大的环境，以推进和评估智能充电算法。

更新时间: 2024-04-02 11:22:53

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.01849v1

Constrained Optimal Fuel Consumption of HEV: A Constrained Reinforcement Learning Approach

Hybrid electric vehicles (HEVs) are becoming increasingly popular because they can better combine the working characteristics of internal combustion engines and electric motors. However, the minimum fuel consumption of an HEV for a battery electrical balance case under a specific assembly condition and a specific speed curve still needs to be clarified in academia and industry. Regarding this problem, this work provides the mathematical expression of constrained optimal fuel consumption (COFC) from the perspective of constrained reinforcement learning (CRL) for the first time globally. Also, two mainstream approaches of CRL, constrained variational policy optimization (CVPO) and Lagrangian-based approaches, are utilized for the first time to obtain the vehicle's minimum fuel consumption under the battery electrical balance condition. We conduct case studies on the well-known Prius TOYOTA hybrid system (THS) under the NEDC condition; we give vital steps to implement CRL approaches and compare the performance between the CVPO and Lagrangian-based approaches. Our case study found that CVPO and Lagrangian-based approaches can obtain the lowest fuel consumption while maintaining the SOC balance constraint. The CVPO approach converges stable, but the Lagrangian-based approach can obtain the lowest fuel consumption at 3.95 L/100km, though with more significant oscillations. This result verifies the effectiveness of our proposed CRL approaches to the COFC problem.

Updated: 2024-04-02 11:20:22

标题: 混合动力车辆的受限最优燃料消耗：一种受限强化学习方法

摘要: 混合动力电动车（HEVs）因能更好地结合内燃机和电动机的工作特性而变得越来越受欢迎。然而，在特定装配条件和特定速度曲线下，HEV在电池电量平衡情况下的最低燃料消耗仍需在学术界和工业界得到澄清。针对这一问题，本研究首次从受限强化学习（CRL）的角度提供了约束最优燃料消耗（COFC）的数学表达。同时，首次利用约束变分策略优化（CVPO）和基于拉格朗日的方法两种主流CRL方法，以获得在电池电量平衡条件下的车辆最低燃料消耗。我们在著名的Prius TOYOTA混合系统（THS）下NEDC条件进行案例研究；给出了实施CRL方法的关键步骤，并比较了CVPO和基于拉格朗日的方法之间的性能。我们的案例研究发现，CVPO和基于拉格朗日的方法可以在保持SOC平衡约束的同时获得最低燃料消耗。CVPO方法收敛稳定，而基于拉格朗日的方法可以获得3.95升/100公里的最低燃料消耗，尽管波动更为显著。这一结果验证了我们提出的CRL方法对COFC问题的有效性。

更新时间: 2024-04-02 11:20:22

领域: cs.LG

下载: http://arxiv.org/abs/2403.07503v2

Corrupting Convolution-based Unlearnable Datasets with Pixel-based Image Transformations

Unlearnable datasets lead to a drastic drop in the generalization performance of models trained on them by introducing elaborate and imperceptible perturbations into clean training sets. Many existing defenses, e.g., JPEG compression and adversarial training, effectively counter UDs based on norm-constrained additive noise. However, a fire-new type of convolution-based UDs have been proposed and render existing defenses all ineffective, presenting a greater challenge to defenders. To address this, we express the convolution-based unlearnable sample as the result of multiplying a matrix by a clean sample in a simplified scenario, and formalize the intra-class matrix inconsistency as $\Theta_{imi}$, inter-class matrix consistency as $\Theta_{imc}$ to investigate the working mechanism of the convolution-based UDs. We conjecture that increasing both of these metrics will mitigate the unlearnability effect. Through validation experiments that commendably support our hypothesis, we further design a random matrix to boost both $\Theta_{imi}$ and $\Theta_{imc}$, achieving a notable degree of defense effect. Hence, by building upon and extending these facts, we first propose a brand-new image COrruption that employs randomly multiplicative transformation via INterpolation operation to successfully defend against convolution-based UDs. Our approach leverages global pixel random interpolations, effectively suppressing the impact of multiplicative noise in convolution-based UDs. Additionally, we have also designed two new forms of convolution-based UDs, and find that our defense is the most effective against them.

Updated: 2024-04-02 11:17:49

标题: 使用基于像素的图像转换对基于卷积的不可学习数据集进行破坏

摘要: 不可学习的数据集会通过向清洁训练集引入复杂且难以察觉的扰动，导致在其上训练的模型的泛化性能急剧下降。许多现有的防御方法，如JPEG压缩和对抗训练，可以有效地对抗基于约束范数的加性噪声的不可学习数据集。然而，最近提出了一种全新类型的基于卷积的不可学习数据集，使得现有的防御方法都失效，给防御者带来更大的挑战。为了解决这个问题，我们将卷积-based不可学习样本表达为在简化场景中通过矩阵乘以一个清洁样本所得，并将类内矩阵不一致性形式化为$ \Theta_{imi} $，类间矩阵一致性形式化为$ \Theta_{imc} $以研究卷积-based不可学习数据集的工作机制。我们推测增加这两个指标将减轻不可学习效果。通过验证实验，我们的假设得到了令人赞赏的支持，我们进一步设计了一个随机矩阵来提高$ \Theta_{imi} $和$ \Theta_{imc} $，达到显著的防御效果。因此，通过借鉴和扩展这些事实，我们首次提出了一种全新的图像破坏方法，通过随机乘法变换和插值操作成功抵御基于卷积的不可学习数据集。我们的方法利用全局像素随机插值，有效地抑制了卷积-based不可学习数据集中的乘法噪声的影响。此外，我们还设计了两种新形式的基于卷积的不可学习数据集，并发现我们的防御方法对它们最为有效。

更新时间: 2024-04-02 11:17:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.18403v2

Accelerating Transformer Pre-Training with 2:4 Sparsity

Training large Transformers is slow, but recent innovations on GPU architecture gives us an advantage. NVIDIA Ampere GPUs can execute a fine-grained 2:4 sparse matrix multiplication twice as fast as its dense equivalent. In the light of this property, we comprehensively investigate the feasibility of accelerating feed-forward networks (FFNs) of Transformers in pre-training. First, we define a "flip rate" to monitor the stability of a 2:4 training process. Utilizing this metric, we suggest two techniques to preserve accuracy: to modify the sparse-refined straight-through estimator by applying the mask decay term on gradients, and to enhance the model's quality by a simple yet effective dense fine-tuning procedure near the end of pre-training. Besides, we devise two effective techniques to practically accelerate training: to calculate transposable 2:4 mask by convolution, and to accelerate gated activation functions by reducing GPU L2 cache miss. Experiments show that a combination of our methods reaches the best performance on multiple Transformers among different 2:4 training methods, while actual acceleration can be observed on different shapes of Transformer block.

Updated: 2024-04-02 11:12:42

标题: 使用2:4稀疏度加速Transformer的预训练

摘要: 训练大型Transformer模型是缓慢的，但最近的GPU架构创新为我们带来了优势。 NVIDIA的Ampere GPU可以比其稠密等效物快两倍执行细粒度2:4稀疏矩阵乘法。鉴于这一特性，我们全面调查了加速预训练中Transformer的前馈网络（FFN）的可行性。首先，我们定义了一个“翻转率”来监测2:4训练过程的稳定性。利用这个指标，我们提出了两种保持准确性的技术：通过在梯度上应用蒙版衰减项来修改稀疏-精细化的直通估计器，以及通过在预训练结束附近进行简单而有效的密集微调程序来增强模型的质量。此外，我们设计了两种有效的技术来实际加速训练：通过卷积计算可转置的2:4蒙版，并通过减少GPU L2缓存缺失来加速门控激活函数。实验证明，我们方法的组合在多个Transformer中达到了最佳性能，而实际加速可以在不同形状的Transformer块上观察到。

更新时间: 2024-04-02 11:12:42

领域: cs.LG

下载: http://arxiv.org/abs/2404.01847v1

BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete Annotations

Methods to detect malignant lesions from screening mammograms are usually trained with fully annotated datasets, where images are labelled with the localisation and classification of cancerous lesions. However, real-world screening mammogram datasets commonly have a subset that is fully annotated and another subset that is weakly annotated with just the global classification (i.e., without lesion localisation). Given the large size of such datasets, researchers usually face a dilemma with the weakly annotated subset: to not use it or to fully annotate it. The first option will reduce detection accuracy because it does not use the whole dataset, and the second option is too expensive given that the annotation needs to be done by expert radiologists. In this paper, we propose a middle-ground solution for the dilemma, which is to formulate the training as a weakly- and semi-supervised learning problem that we refer to as malignant breast lesion detection with incomplete annotations. To address this problem, our new method comprises two stages, namely: 1) pre-training a multi-view mammogram classifier with weak supervision from the whole dataset, and 2) extending the trained classifier to become a multi-view detector that is trained with semi-supervised student-teacher learning, where the training set contains fully and weakly-annotated mammograms. We provide extensive detection results on two real-world screening mammogram datasets containing incomplete annotations, and show that our proposed approach achieves state-of-the-art results in the detection of malignant breast lesions with incomplete annotations.

Updated: 2024-04-02 11:03:02

标题: BRAIxDet：学习如何检测具有不完整注释的恶性乳腺病变

摘要: 通常用于检测筛查乳腺X线摄影中的恶性病变的方法通常是使用完全标记的数据集进行训练，图像被标记为癌症病变的定位和分类。然而，现实世界中的筛查乳腺X线摄影数据集通常有一个完全标记的子集和另一个仅具有全局分类（即没有病变定位）的弱标记子集。鉴于这些数据集的规模很大，研究人员通常面临一个困境，即如何处理弱标记子集：不使用它还是完全标记它。第一种选择会降低检测准确性，因为它没有使用整个数据集，而第二种选择由于注释需要由专业放射科医生完成而过于昂贵。在本文中，我们提出了一个折中的解决方案，即将训练定位为使用具有不完整注释的弱监督和半监督学习问题，我们称之为具有不完整注释的恶性乳腺病变检测。为了解决这个问题，我们的新方法包括两个阶段，即：1）使用来自整个数据集的弱监督对多视角乳腺X线摄影分类器进行预训练，2）将经过训练的分类器扩展为通过半监督师生学习进行训练的多视角检测器，其中训练集包含完全和弱标记的乳腺X线摄影。我们在包含不完整注释的两个现实世界筛查乳腺X线摄影数据集上提供了广泛的检测结果，并展示了我们提出的方法在具有不完整注释的恶性乳腺病变检测中实现了最先进的结果。

更新时间: 2024-04-02 11:03:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2301.13418v4

LLM-Rec: Personalized Recommendation via Prompting Large Language Models

Text-based recommendation holds a wide range of practical applications due to its versatility, as textual descriptions can represent nearly any type of item. However, directly employing the original item descriptions may not yield optimal recommendation performance due to the lack of comprehensive information to align with user preferences. Recent advances in large language models (LLMs) have showcased their remarkable ability to harness commonsense knowledge and reasoning. In this study, we introduce a novel approach, coined LLM-Rec, which incorporates four distinct prompting strategies of text enrichment for improving personalized text-based recommendations. Our empirical experiments reveal that using LLM-augmented text significantly enhances recommendation quality. Even basic MLP (Multi-Layer Perceptron) models achieve comparable or even better results than complex content-based methods. Notably, the success of LLM-Rec lies in its prompting strategies, which effectively tap into the language model's comprehension of both general and specific item characteristics. This highlights the importance of employing diverse prompts and input augmentation techniques to boost the recommendation effectiveness of LLMs.

Updated: 2024-04-02 10:59:51

标题: LLM-Rec：通过大型语言模型的个性化推荐

摘要: 基于文本的推荐具有广泛的实际应用，因为文本描述可以代表几乎任何类型的物品。然而，直接使用原始物品描述可能不会产生最佳的推荐性能，因为缺乏与用户偏好对齐的综合信息。最近大型语言模型（LLMs）的进展展示了它们利用常识知识和推理的显著能力。在本研究中，我们介绍了一种新颖的方法，称为LLM-Rec，它结合了四种不同的文本丰富提示策略，以改善个性化的基于文本的推荐。我们的实证实验表明，使用LLM增强文本显着提高了推荐质量。甚至基本的MLP（多层感知器）模型也能够获得与复杂基于内容的方法相媲美甚至更好的结果。值得注意的是，LLM-Rec的成功在于其提示策略，有效地利用了语言模型对一般和具体物品特征的理解。这突显了利用多样的提示和输入增强技术来提高LLMs推荐效果的重要性。

更新时间: 2024-04-02 10:59:51

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2307.15780v3

Knowledge Boundary and Persona Dynamic Shape A Better Social Media Agent

Constructing personalized and anthropomorphic agents holds significant importance in the simulation of social networks. However, there are still two key problems in existing works: the agent possesses world knowledge that does not belong to its personas, and it cannot eliminate the interference of diverse persona information on current actions, which reduces the personalization and anthropomorphism of the agent. To solve the above problems, we construct the social media agent based on personalized knowledge and dynamic persona information. For personalized knowledge, we add external knowledge sources and match them with the persona information of agents, thereby giving the agent personalized world knowledge. For dynamic persona information, we use current action information to internally retrieve the persona information of the agent, thereby reducing the interference of diverse persona information on the current action. To make the agent suitable for social media, we design five basic modules for it: persona, planning, action, memory and reflection. To provide an interaction and verification environment for the agent, we build a social media simulation sandbox. In the experimental verification, automatic and human evaluations demonstrated the effectiveness of the agent we constructed.

Updated: 2024-04-02 10:59:23

标题: 知识边界和角色动态塑造更好的社交媒体代理

摘要: 在社交网络模拟中构建个性化和拟人化代理具有重要意义。然而，现有作品中仍存在两个关键问题：代理拥有不属于其人物角色的世界知识，且无法消除多样化人物信息对当前行动的干扰，从而降低了代理的个性化和拟人化水平。为解决上述问题，我们基于个性化知识和动态人物角色信息构建社交媒体代理。对于个性化知识，我们添加外部知识源，并将其与代理的人物角色信息匹配，从而赋予代理个性化的世界知识。对于动态人物角色信息，我们使用当前行动信息内部检索代理的人物角色信息，从而降低多样化人物信息对当前行动的干扰。为使代理适用于社交媒体，我们为其设计了五个基本模块：人物角色、规划、行动、记忆和反思。为代理提供交互和验证环境，我们构建了一个社交媒体模拟沙盒。在实验验证中，自动和人工评估证明了我们构建的代理的有效性。

更新时间: 2024-04-02 10:59:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.19275v2

LLM meets Vision-Language Models for Zero-Shot One-Class Classification

We consider the problem of zero-shot one-class visual classification. In this setting, only the label of the target class is available, and the goal is to discriminate between positive and negative query samples without requiring any validation example from the target task. We propose a two-step solution that first queries large language models for visually confusing objects and then relies on vision-language pre-trained models (e.g., CLIP) to perform classification. By adapting large-scale vision benchmarks, we demonstrate the ability of the proposed method to outperform adapted off-the-shelf alternatives in this setting. Namely, we propose a realistic benchmark where negative query samples are drawn from the same original dataset as positive ones, including a granularity-controlled version of iNaturalist, where negative samples are at a fixed distance in the taxonomy tree from the positive ones. Our work shows that it is possible to discriminate between a single category and other semantically related ones using only its label

Updated: 2024-04-02 10:59:05

标题: LLM遇上视觉-语言模型用于零样本一类分类

摘要: 我们考虑了零样本一类视觉分类问题。在这种情况下，只有目标类别的标签是可用的，目标是在不需要来自目标任务的任何验证示例的情况下区分正负查询样本。我们提出了一个两步解决方案，首先查询大型语言模型以获取视觉混淆的对象，然后依赖于视觉-语言预训练模型（例如，CLIP）进行分类。通过调整大规模视觉基准，我们展示了所提出的方法在这种情况下优于适应的现成替代方案的能力。换句话说，我们提出了一个现实的基准，其中负查询样本来自与正样本相同的原始数据集，包括iNaturalist的一个粒度受控版本，其中负样本在分类树中与正样本相距固定距离。我们的工作表明，仅使用其标签就可以区分单个类别和其他语义相关类别。

更新时间: 2024-04-02 10:59:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.00675v2

Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

Configurable software systems are prone to configuration errors, resulting in significant losses to companies. However, diagnosing these errors is challenging due to the vast and complex configuration space. These errors pose significant challenges for both experienced maintainers and new end-users, particularly those without access to the source code of the software systems. Given that logs are easily accessible to most end-users, we conduct a preliminary study to outline the challenges and opportunities of utilizing logs in localizing configuration errors. Based on the insights gained from the preliminary study, we propose an LLM-based two-stage strategy for end-users to localize the root-cause configuration properties based on logs. We further implement a tool, LogConfigLocalizer, aligned with the design of the aforementioned strategy, hoping to assist end-users in coping with configuration errors through log analysis. To the best of our knowledge, this is the first work to localize the root-cause configuration properties for end-users based on Large Language Models~(LLMs) and logs. We evaluate the proposed strategy on Hadoop by LogConfigLocalizer and prove its efficiency with an average accuracy as high as 99.91%. Additionally, we also demonstrate the effectiveness and necessity of different phases of the methodology by comparing it with two other variants and a baseline tool. Moreover, we validate the proposed methodology through a practical case study to demonstrate its effectiveness and feasibility.

Updated: 2024-04-02 10:53:41

标题: 面对自己：一种基于LLM的两阶段策略，通过日志定位配置错误

摘要: 可配置软件系统容易出现配置错误，给公司带来重大损失。然而，由于庞大而复杂的配置空间，诊断这些错误是具有挑战性的。这些错误对有经验的维护人员和新用户都构成了重大挑战，特别是那些无法访问软件系统源代码的用户。鉴于大多数用户很容易访问日志，我们进行了一项初步研究，概述了利用日志定位配置错误的挑战和机遇。根据初步研究获得的见解，我们提出了一种基于LLM的两阶段策略，供终端用户根据日志定位根本原因配置属性。我们进一步实施了一个工具LogConfigLocalizer，与前述策略的设计相一致，希望通过日志分析帮助终端用户解决配置错误。据我们所知，这是首个基于大型语言模型（LLMs）和日志为终端用户定位根本原因配置属性的工作。我们通过LogConfigLocalizer在Hadoop上评估了提出的策略，并证明其平均准确度高达99.91%。此外，我们还通过与其他两种变体和基准工具进行比较，展示了方法论不同阶段的有效性和必要性。此外，我们通过一个实际案例研究验证了提出的方法论的有效性和可行性。

更新时间: 2024-04-02 10:53:41

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2404.00640v2

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

Large Language Models (LLMs) have risen significantly in popularity and are increasingly being adopted across multiple applications. These LLMs are heavily aligned to resist engaging in illegal or unethical topics as a means to avoid contributing to responsible AI harms. However, a recent line of attacks, known as "jailbreaks", seek to overcome this alignment. Intuitively, jailbreak attacks aim to narrow the gap between what the model can do and what it is willing to do. In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model's replies, progressively leading to a successful jailbreak. We evaluate Crescendo on various public systems, including ChatGPT, Gemini Pro, Gemini-Ultra, LlaMA-2 70b Chat, and Anthropic Chat. Our results demonstrate the strong efficacy of Crescendo, with it achieving high attack success rates across all evaluated models and tasks. Furthermore, we introduce Crescendomation, a tool that automates the Crescendo attack, and our evaluation showcases its effectiveness against state-of-the-art models.

Updated: 2024-04-02 10:45:49

标题: 太好了，现在写一篇关于这个的文章：Crescendo多轮LLM越狱攻击

摘要: 大型语言模型(LLMs)在近年来受到了极大的关注，并且越来越多地被应用于多个领域。这些LLMs在很大程度上避免参与非法或不道德的话题，以避免对负责任的AI造成伤害。然而，最近出现了一种被称为"越狱"的攻击方式，旨在打破这种对齐。直觉上，越狱攻击旨在缩小模型可以做到的事情和愿意做的事情之间的差距。在本文中，我们介绍了一种名为Crescendo的新型越狱攻击。与现有的越狱方法不同，Crescendo是一种多轮越狱，以看似良性的方式与模型互动。它从关于手头任务的一般提示或问题开始，然后逐渐通过引用模型的回复来逐步升级对话，最终实现成功的越狱。我们在包括ChatGPT、Gemini Pro、Gemini-Ultra、LlaMA-270b Chat和Anthropic Chat在内的各种公共系统上评估了Crescendo。我们的结果表明，Crescendo具有强大的攻击效果，在所有评估的模型和任务中均取得了高攻击成功率。此外，我们引入了Crescendomation，这是一个自动化Crescendo攻击的工具，我们的评估展示了它对最先进模型的有效性。

更新时间: 2024-04-02 10:45:49

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.01833v1

When does Subagging Work?

We study the effectiveness of subagging, or subsample aggregating, on regression trees, a popular non-parametric method in machine learning. First, we give sufficient conditions for pointwise consistency of trees. We formalize that (i) the bias depends on the diameter of cells, hence trees with few splits tend to be biased, and (ii) the variance depends on the number of observations in cells, hence trees with many splits tend to have large variance. While these statements for bias and variance are known to hold globally in the covariate space, we show that, under some constraints, they are also true locally. Second, we compare the performance of subagging to that of trees across different numbers of splits. We find that (1) for any given number of splits, subagging improves upon a single tree, and (2) this improvement is larger for many splits than it is for few splits. However, (3) a single tree grown at optimal size can outperform subagging if the size of its individual trees is not optimally chosen. This last result goes against common practice of growing large randomized trees to eliminate bias and then averaging to reduce variance.

Updated: 2024-04-02 10:44:55

标题: 何时适合使用Subagging？

摘要: 我们研究了子抽样聚合（subagging）对回归树的有效性，回归树是机器学习中一种流行的非参数方法。首先，我们给出了树点一致性的充分条件。我们形式化了偏差取决于单元格直径的条件，因此具有较少分裂的树往往有偏差，以及方差取决于单元格中观测数的条件，因此具有较多分裂的树往往具有较大的方差。虽然这些关于偏差和方差的说法在协变量空间中被认为是全局有效的，但我们表明，在一定约束条件下，它们在局部也是成立的。其次，我们比较了子抽样方法和不同分裂数下树的性能。我们发现，（1）对于任何给定的分裂数，子抽样方法优于单棵树，（2）对于较多分裂，这种提升更大。然而，（3）如果单棵树在最佳尺寸下生长，可以胜过子抽样方法，前提是其各自树的尺寸没有最优选择。这最后的结果与常规做法相悖，即生长大型随机树以消除偏差，然后平均以减少方差。

更新时间: 2024-04-02 10:44:55

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.01832v1

Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy

We introduce a novel doubly-robust (DR) off-policy evaluation (OPE) estimator for Markov decision processes, DRUnknown, designed for situations where both the logging policy and the value function are unknown. The proposed estimator initially estimates the logging policy and then estimates the value function model by minimizing the asymptotic variance of the estimator while considering the estimating effect of the logging policy. When the logging policy model is correctly specified, DRUnknown achieves the smallest asymptotic variance within the class containing existing OPE estimators. When the value function model is also correctly specified, DRUnknown is optimal as its asymptotic variance reaches the semiparametric lower bound. We present experimental results conducted in contextual bandits and reinforcement learning to compare the performance of DRUnknown with that of existing methods.

Updated: 2024-04-02 10:42:44

标题: 用估计的记录策略进行双重稳健的离线策略评估

摘要: 我们引入了一种新颖的双重稳健（DR）离线政策评估（OPE）估计器，称为DRUnknown，专为logging策略和价值函数均未知的情况设计。所提出的估计器首先估计logging策略，然后通过最小化估计器的渐近方差来估计价值函数模型，同时考虑logging策略的估计效果。当logging策略模型正确指定时，DRUnknown在包含现有OPE估计器的类中实现了最小的渐近方差。当价值函数模型也正确指定时，DRUnknown是最优的，因为其渐近方差达到了半参数下界。我们在上下文匹配和强化学习中进行了实验结果，以比较DRUnknown与现有方法的性能。

更新时间: 2024-04-02 10:42:44

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.01830v1

Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay

Deep neural networks have demonstrated susceptibility to adversarial attacks. Adversarial defense techniques often focus on one-shot setting to maintain robustness against attack. However, new attacks can emerge in sequences in real-world deployment scenarios. As a result, it is crucial for a defense model to constantly adapt to new attacks, but the adaptation process can lead to catastrophic forgetting of previously defended against attacks. In this paper, we discuss for the first time the concept of continual adversarial defense under a sequence of attacks, and propose a lifelong defense baseline called Anisotropic \& Isotropic Replay (AIR), which offers three advantages: (1) Isotropic replay ensures model consistency in the neighborhood distribution of new data, indirectly aligning the output preference between old and new tasks. (2) Anisotropic replay enables the model to learn a compromise data manifold with fresh mixed semantics for further replay constraints and potential future attacks. (3) A straightforward regularizer mitigates the 'plasticity-stability' trade-off by aligning model output between new and old tasks. Experiment results demonstrate that AIR can approximate or even exceed the empirical performance upper bounds achieved by Joint Training.

Updated: 2024-04-02 10:41:51

标题: 无遗忘的防御：具有各向异性和各向同性伪重播的持续对抗性防御

摘要: 深度神经网络已经显示出对对抗攻击的易受性。对抗性防御技术通常专注于一次性设置，以保持对抗攻击的稳健性。然而，在实际部署场景中，新的攻击可能会以序列形式出现。因此，对于一个防御模型来说，不断适应新的攻击是至关重要的，但适应过程可能会导致对以前抵御的攻击产生灾难性的遗忘。在本文中，我们首次讨论了连续对抗性防御的概念，在一系列攻击下提出了一种名为Anisotropic \& Isotropic Replay（AIR）的终身防御基线，它具有三个优点：（1）各向同性重播确保了模型在新数据的邻域分布中的一致性，间接地调整了旧任务和新任务之间的输出偏好。（2）各向异性重播使模型能够学习一个折衷的数据流形，具有新的混合语义，以进一步重播约束和潜在的未来攻击。（3）一个简单的正则化器通过调整新旧任务之间的模型输出来减轻“可塑性-稳定性”之间的权衡。实验结果表明，AIR可以逼近甚至超过联合训练所实现的经验性能上限。

更新时间: 2024-04-02 10:41:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.01828v1

Short vs. Long-term Coordination of Drones: When Distributed Optimization Meets Deep Reinforcement Learning

Swarms of autonomous interactive drones, with the support of recharging technology, can provide compelling sensing capabilities in Smart Cities, such as traffic monitoring and disaster response. This paper aims to deliver a novel coordination solution for the cost-effective navigation, sensing, and recharging of drones. Existing approaches, such as deep reinforcement learning (DRL), offer long-term adaptability, but lack energy efficiency, resilience, and flexibility in dynamic environments. Therefore, this paper proposes a novel approach where each drone independently determines its flying direction and recharging place using DRL, while adapting navigation and sensing through distributed optimization, which improves energy-efficiency during sensing tasks. Furthermore, drones efficiently exchange information while retaining decision-making autonomy via a structured tree communication model. Extensive experimentation with datasets generated from realistic urban mobility underscores an outstanding performance of the proposed solution compared to state-of-the-art methods. Significant new insights show that long-term methods optimize scarce drone resource for traffic management, while the integration of short-term methods is crucial for advising on charging policies and maintaining battery safety.

Updated: 2024-04-02 10:35:12

标题: 无人机的短期与长期协调：分布式优化与深度强化学习的结合

摘要: 群体自主互动的无人机，在充电技术的支持下，可以在智能城市中提供引人注目的感知能力，如交通监控和灾难响应。本文旨在为无人机的成本效益导航、感知和充电提供创新的协调解决方案。现有方法，如深度强化学习（DRL），提供了长期的适应性，但在动态环境中缺乏能源效率、弹性和灵活性。因此，本文提出了一种新颖的方法，每架无人机通过DRL独立确定其飞行方向和充电地点，同时通过分布式优化适应导航和感知，从而提高感知任务中的能效。此外，无人机通过结构化树通信模型高效交换信息，同时保持决策自主性。通过基于现实城市移动性生成的数据集进行了大量实验，与现有方法相比，所提出的解决方案表现出色。重要的新见解表明，长期方法优化了稀缺的无人机资源用于交通管理，而短期方法的整合对制定充电政策和维护电池安全至关重要。

更新时间: 2024-04-02 10:35:12

领域: cs.RO,cs.LG,cs.MA

下载: http://arxiv.org/abs/2311.09852v3

A (More) Realistic Evaluation Setup for Generalisation of Community Models on Malicious Content Detection

Community models for malicious content detection, which take into account the context from a social graph alongside the content itself, have shown remarkable performance on benchmark datasets. Yet, misinformation and hate speech continue to propagate on social media networks. This mismatch can be partially attributed to the limitations of current evaluation setups that neglect the rapid evolution of online content and the underlying social graph. In this paper, we propose a novel evaluation setup for model generalisation based on our few-shot subgraph sampling approach. This setup tests for generalisation through few labelled examples in local explorations of a larger graph, emulating more realistic application settings. We show this to be a challenging inductive setup, wherein strong performance on the training graph is not indicative of performance on unseen tasks, domains, or graph structures. Lastly, we show that graph meta-learners trained with our proposed few-shot subgraph sampling outperform standard community models in the inductive setup. We make our code publicly available.

Updated: 2024-04-02 10:32:21

标题: 一个更为真实的评估设置，用于社区模型在恶意内容检测上的泛化

摘要: 社区模型用于检测恶意内容，考虑到社交图中的上下文以及内容本身，已经在基准数据集上表现出色。然而，在社交媒体网络上，虚假信息和仇恨言论仍在传播。这种不匹配部分归因于当前评估设置的限制，忽略了在线内容的快速演变和基础社交图。在本文中，我们提出了一个基于我们的少样本子图采样方法的模型泛化的新型评估设置。这个设置通过在更大图的本地探索中使用少量标记示例来测试泛化性能，模拟更真实的应用设置。我们展示这是一个具有挑战性的归纳设置，在这个设置中，在训练图上表现良好并不意味着在看不见的任务、领域或图结构上表现良好。最后，我们展示了使用我们提出的少样本子图采样训练的图元学习器在归纳设置中优于标准社区模型。我们将我们的代码公开提供。

更新时间: 2024-04-02 10:32:21

领域: cs.LG,cs.CL,cs.SI

下载: http://arxiv.org/abs/2404.01822v1

CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification

Contaminated or adulterated food poses a substantial risk to human health. Given sets of labeled web texts for training, Machine Learning and Natural Language Processing can be applied to automatically detect such risks. We publish a dataset of 7,546 short texts describing public food recall announcements. Each text is manually labeled, on two granularity levels (coarse and fine), for food products and hazards that the recall corresponds to. We describe the dataset and benchmark naive, traditional, and Transformer models. Based on our analysis, Logistic Regression based on a tf-idf representation outperforms RoBERTa and XLM-R on classes with low support. Finally, we discuss different prompting strategies and present an LLM-in-the-loop framework, based on Conformal Prediction, which boosts the performance of the base classifier while reducing energy consumption compared to normal prompting.

Updated: 2024-04-02 10:25:34

标题: CICLe：大规模多类食品风险分类的保守上下文学习

摘要: 受污染或掺假的食品对人类健康构成重大风险。通过在训练中使用带有标签的网络文本集，可以应用机器学习和自然语言处理技术来自动检测此类风险。我们发布了一个包含7,546个短文本的数据集，描述了公共食品召回公告。每个文本在两个粒度级别（粗糙和精细）上手动标记，以标识召回所对应的食品产品和危害。我们描述了数据集，并对朴素、传统和Transformer模型进行了基准测试。根据我们的分析，基于tf-idf表示的逻辑回归在支持度较低的类别上优于RoBERTa和XLM-R。最后，我们讨论了不同的提示策略，并提出了一个基于符合预测的LLM-in-the-loop框架，该框架可以提高基本分类器的性能，同时减少能源消耗。

更新时间: 2024-04-02 10:25:34

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.11904v2

A neural network-based approach to hybrid systems identification for control

We consider the problem of designing a machine learning-based model of an unknown dynamical system from a finite number of (state-input)-successor state data points, such that the model obtained is also suitable for optimal control design. We propose a specific neural network (NN) architecture that yields a hybrid system with piecewise-affine dynamics that is differentiable with respect to the network's parameters, thereby enabling the use of derivative-based training procedures. We show that a careful choice of our NN's weights produces a hybrid system model with structural properties that are highly favourable when used as part of a finite horizon optimal control problem (OCP). Specifically, we show that optimal solutions with strong local optimality guarantees can be computed via nonlinear programming, in contrast to classical OCPs for general hybrid systems which typically require mixed-integer optimization. In addition to being well-suited for optimal control design, numerical simulations illustrate that our NN-based technique enjoys very similar performance to state-of-the-art system identification methodologies for hybrid systems and it is competitive on nonlinear benchmarks.

Updated: 2024-04-02 10:16:30

标题: 基于神经网络的混合系统辨识方法用于控制

摘要: 我们考虑从有限数量的（状态-输入）-后继状态数据点设计一个基于机器学习的未知动态系统模型的问题，以便所得模型也适用于最优控制设计。我们提出了一个特定的神经网络（NN）架构，该架构产生具有分段线性动态的混合系统，对网络参数可微分，从而使得可以使用基于导数的训练过程。我们展示了通过精心选择我们NN的权重，可以生成一个具有极具优势的结构特性的混合系统模型，当作为有限时间最优控制问题（OCP）的一部分时，可以通过非线性规划计算具有强大局部最优性保证的最优解，与一般混合系统的经典OCP相比，后者通常需要混合整数优化。除了非常适合最优控制设计外，数值模拟表明，我们基于NN的技术在混合系统的系统识别方法方面表现出与现有技术相似的性能，并且在非线性基准测试中具有竞争力。

更新时间: 2024-04-02 10:16:30

领域: eess.SY,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2404.01814v1

Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions

Manipulating unseen objects is challenging without a 3D representation, as objects generally have occluded surfaces. This requires physical interaction with objects to build their internal representations. This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations. We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action (a visual or re-orientation action) by optimizing informativeness and feasibility. Further, our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction. Experiments with a simulated Franka Emika Robot Manipulator operating in a tabletop environment with benchmark objects demonstrate an improvement of (i) 14% in visual reconstruction quality (PSNR), (ii) 20% in the geometric/depth reconstruction of the object surface (F-score) and (iii) 71% in the task success rate of manipulating objects a-priori unseen orientations/stable configurations in the scene; over current methods. The project page can be found here: https://actnerf.github.io.

Updated: 2024-04-02 10:15:06

标题: 基于NeRF的对象模型的不确定性感知主动学习，利用视觉和重新定位动作对机器人操纵器进行翻译

摘要: 在没有3D表示的情况下操纵看不见的物体是具有挑战性的，因为物体通常有遮挡表面。这需要与物体进行物理交互以建立它们的内部表示。本文提出了一种方法，使机器人能够快速学习给定物体的完整3D模型，以便在陌生方向进行操纵。我们使用一组部分构建的NeRF模型来量化模型的不确定性，以确定下一步操作（视觉或重新定向操作）通过优化信息量和可行性。此外，我们的方法根据部分NeRF模型确定何时以及如何抓取和重新定向物体，并重新估计物体姿势以纠正交互过程中引入的不对齐。在一个模拟的Franka Emika机器人操作在桌面环境中的实验中，使用基准物体展示了在视觉重建质量（PSNR）方面提高了14％，在物体表面的几何/深度重建（F-分数）方面提高了20％，以及在操作看不见方向/稳定配置的物体的任务成功率上提高了71％；相比当前的方法。项目页面在此处可以找到：https://actnerf.github.io。

更新时间: 2024-04-02 10:15:06

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.01812v1

Software-Defined Cryptography: A Design Feature of Cryptographic Agility

Cryptographic agility, or crypto-agility, is a design feature that enables agile updates to new cryptographic algorithms and standards without the need to modify or replace the surrounding infrastructure. This paper examines the prerequisites for crypto-agility and proposes its desired design feature. More specifically, we investigate the design characteristics of widely deployed cybersecurity paradigms, i.e., zero trust, and apply its design feature to crypto-agility, achieving greater visibility and automation in cryptographic management.

Updated: 2024-04-02 10:11:58

标题: 软件定义密码学：密码敏捷性的设计特征

摘要: 密码敏捷性，或者密码敏捷性，是一种设计特性，可以实现对新的密码算法和标准进行灵活更新，而无需修改或替换周围基础设施。本文探讨了实现密码敏捷性的先决条件，并提出了其期望的设计特性。更具体地，我们研究了广泛部署的网络安全范式，即零信任，并将其设计特性应用于密码敏捷性，实现了在密码管理中更大的可见性和自动化。

更新时间: 2024-04-02 10:11:58

领域: cs.CR

下载: http://arxiv.org/abs/2404.01808v1

Social, Legal, Ethical, Empathetic, and Cultural Rules: Compilation and Reasoning (Extended Version)

The rise of AI-based and autonomous systems is raising concerns and apprehension due to potential negative repercussions stemming from their behavior or decisions. These systems must be designed to comply with the human contexts in which they will operate. To this extent, Townsend et al. (2022) introduce the concept of SLEEC (social, legal, ethical, empathetic, or cultural) rules that aim to facilitate the formulation, verification, and enforcement of the rules AI-based and autonomous systems should obey. They lay out a methodology to elicit them and to let philosophers, lawyers, domain experts, and others to formulate them in natural language. To enable their effective use in AI systems, it is necessary to translate these rules systematically into a formal language that supports automated reasoning. In this study, we first conduct a linguistic analysis of the SLEEC rules pattern, which justifies the translation of SLEEC rules into classical logic. Then we investigate the computational complexity of reasoning about SLEEC rules and show how logical programming frameworks can be employed to implement SLEEC rules in practical scenarios. The result is a readily applicable strategy for implementing AI systems that conform to norms expressed as SLEEC rules.

Updated: 2024-04-02 10:09:15

标题: 社会、法律、伦理、共情和文化规则：编制与推理（扩展版）

摘要: 基于人工智能和自主系统的崛起引发了人们对潜在负面影响的担忧和忧虑，这些影响可能源自它们的行为或决策。这些系统必须设计成符合它们将运行的人类环境。在这方面，Townsend等人（2022年）提出了SLEEC（社会、法律、道德、共情或文化）规则的概念，旨在促进基于人工智能和自主系统应遵守的规则的制定、验证和执行。他们提出了一种方法论来引出这些规则，让哲学家、律师、领域专家和其他人以自然语言形式制定这些规则。为了使这些规则在人工智能系统中有效使用，有必要将这些规则系统地翻译成支持自动推理的形式语言。在本研究中，我们首先对SLEEC规则模式进行了语言分析，证明了将SLEEC规则翻译成经典逻辑的合理性。然后我们研究了关于SLEEC规则推理的计算复杂性，并展示了逻辑编程框架如何被应用于在实际场景中实施SLEEC规则。结果是一个可以立即应用的策略，用于实施符合以SLEEC规则表达的规范的人工智能系统。

更新时间: 2024-04-02 10:09:15

领域: cs.AI

下载: http://arxiv.org/abs/2312.09699v2

Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

Emotion detection in textual data has received growing interest in recent years, as it is pivotal for developing empathetic human-computer interaction systems. This paper introduces a method for categorizing emotions from text, which acknowledges and differentiates between the diversified similarities and distinctions of various emotions. Initially, we establish a baseline by training a transformer-based model for standard emotion classification, achieving state-of-the-art performance. We argue that not all misclassifications are of the same importance, as there are perceptual similarities among emotional classes. We thus redefine the emotion labeling problem by shifting it from a traditional classification model to an ordinal classification one, where discrete emotions are arranged in a sequential order according to their valence levels. Finally, we propose a method that performs ordinal classification in the two-dimensional emotion space, considering both valence and arousal scales. The results show that our approach not only preserves high accuracy in emotion prediction but also significantly reduces the magnitude of errors in cases of misclassification.

Updated: 2024-04-02 10:06:30

标题: 利用结合的情感价值和触发度序列分类改进文本情感预测

摘要: 情感检测在文本数据中引起了近年来越来越多的关注，因为这对于开发具有同理心的人机交互系统至关重要。本文介绍了一种从文本中对情感进行分类的方法，该方法承认并区分了各种情感之间的多样性相似性和差异性。首先，我们通过训练基于transformer的模型进行标准情感分类，实现了最先进的性能。我们认为，并非所有的误分类都具有相同的重要性，因为情感类别之间存在感知上的相似性。因此，我们通过将情感标记问题从传统的分类模型转变为序数分类模型，将离散的情感按照它们的价值水平进行顺序排列。最后，我们提出了一种在二维情感空间中执行序数分类的方法，考虑了价值和唤醒度量表。结果表明，我们的方法不仅在情感预测方面保持了高准确性，而且在误分类情况下显著降低了错误的数量。

更新时间: 2024-04-02 10:06:30

领域: cs.LG

下载: http://arxiv.org/abs/2404.01805v1

Learning quantum properties from short-range correlations using multi-task networks

Characterizing multipartite quantum systems is crucial for quantum computing and many-body physics. The problem, however, becomes challenging when the system size is large and the properties of interest involve correlations among a large number of particles. Here we introduce a neural network model that can predict various quantum properties of many-body quantum states with constant correlation length, using only measurement data from a small number of neighboring sites. The model is based on the technique of multi-task learning, which we show to offer several advantages over traditional single-task approaches. Through numerical experiments, we show that multi-task learning can be applied to sufficiently regular states to predict global properties, like string order parameters, from the observation of short-range correlations, and to distinguish between quantum phases that cannot be distinguished by single-task networks. Remarkably, our model appears to be able to transfer information learnt from lower dimensional quantum systems to higher dimensional ones, and to make accurate predictions for Hamiltonians that were not seen in the training.

Updated: 2024-04-02 10:06:26

标题: 使用多任务网络从短程相关性学习量子特性

摘要: 表征多体量子系统对于量子计算和多体物理至关重要。然而，当系统规模较大且感兴趣的性质涉及大量粒子之间的相关性时，问题变得具有挑战性。在这里，我们引入了一个神经网络模型，可以预测具有恒定相关长度的多体量子态的各种量子性质，仅使用少量相邻点的测量数据。该模型基于多任务学习技术，我们展示了它相对于传统的单任务方法具有几个优势。通过数值实验，我们证明多任务学习可以应用于足够规则的状态，从短程相关性的观察中预测全局性质，如弦序参数，并区分无法通过单任务网络区分的量子相。值得注意的是，我们的模型似乎能够将从低维量子系统学到的信息转移到高维系统，并对训练中未见过的哈密顿量做出准确预测。

更新时间: 2024-04-02 10:06:26

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2310.11807v3

Neuromorphic Wireless Device-Edge Co-Inference via the Directed Information Bottleneck

An important use case of next-generation wireless systems is device-edge co-inference, where a semantic task is partitioned between a device and an edge server. The device carries out data collection and partial processing of the data, while the remote server completes the given task based on information received from the device. It is often required that processing and communication be run as efficiently as possible at the device, while more computing resources are available at the edge. To address such scenarios, we introduce a new system solution, termed neuromorphic wireless device-edge co-inference. According to it, the device runs sensing, processing, and communication units using neuromorphic hardware, while the server employs conventional radio and computing technologies. The proposed system is designed using a transmitter-centric information-theoretic criterion that targets a reduction of the communication overhead, while retaining the most relevant information for the end-to-end semantic task of interest. Numerical results on standard data sets validate the proposed architecture, and a preliminary testbed realization is reported.

Updated: 2024-04-02 10:06:21

标题: 神经形态的无线设备边缘共同推断通过指导信息瓶颈

摘要: 下一代无线系统的一个重要用例是设备-边缘协同推断，其中一个语义任务在设备和边缘服务器之间分割。设备进行数据收集和部分处理，而远程服务器根据从设备接收的信息完成给定的任务。通常要求在设备上尽可能高效地运行处理和通信，而边缘具有更多的计算资源。为了解决这种情况，我们引入了一种新的系统解决方案，称为神经形态无线设备-边缘协同推断。根据这个解决方案，设备使用神经形态硬件运行感知、处理和通信单元，而服务器采用传统的无线电和计算技术。所提出的系统使用以发射为中心的信息理论标准进行设计，旨在减少通信开销，同时保留与感兴趣的端到端语义任务最相关的信息。标准数据集上的数值结果验证了所提出的架构，并报告了一个初步的实验平台实现。

更新时间: 2024-04-02 10:06:21

领域: cs.LG,cs.IT,cs.NE,math.IT

下载: http://arxiv.org/abs/2404.01804v1

Systematic Solutions to Login and Authentication Security: A Dual-Password Login-Authentication Mechanism

Credential theft and remote attacks are the most serious threats to authentication mechanisms. The crux of the problems is that we cannot control such behaviors. However, if a password does not contain user's secrets, stealing it is useless. If unauthorized inputs are disabled, the remote attacks can be invalidated. Thereby, credential secrets and input fields to our accounts can be controlled. Rather than encrypting passwords, we design a dual-password login-authentication mechanism, where a user-selected secret-free login password is converted into an untypable authentication password. Subsequently, the authenticatable functionality of the login password and the typable functionality of the authentication password may be disabled or invalidated so that the credential theft and remote attacks can be prevented. Thus, the usability-security trade-off and password reuse are resolved; local storage of authentication passwords is no longer necessary. More importantly, the password converter acts as an open hash algorithm, meaning that its intermediate elements can be used to define a truly unique identity of the login process to implement a novel dual-identity authentication. Particularly, the elements are concealed, inaccessible, and independent of any personal information, and therefore can be used to define a perfect unforgeable process identifier to identify and disable the unauthorized inputs.

Updated: 2024-04-02 10:05:47

标题: 登录和认证安全的系统解决方案：双密码登录认证机制

摘要: 凭证盗窃和远程攻击是身份验证机制面临的最严重威胁。问题的关键在于我们无法控制这种行为。然而，如果密码不包含用户的秘密，那么窃取它也是无用的。如果禁用未经授权的输入，远程攻击可以被无效化。因此，我们可以控制凭证秘密和账户的输入字段。我们设计了一个双密码登录身份验证机制，其中用户选择的无秘密登录密码被转换为不可输入的身份验证密码。随后，登录密码的可验证功能和身份验证密码的可输入功能可以被禁用或无效化，从而防止凭证盗窃和远程攻击。因此，可用性和安全性之间的权衡以及密码重用问题得到解决；不再需要本地存储身份验证密码。更重要的是，密码转换器充当开放哈希算法，这意味着其中间元素可以用来定义登录过程的真正唯一身份，以实现一种新颖的双身份验证。特别是，这些元素被隐藏，无法访问，并且独立于任何个人信息，因此可以用来定义一个完美的不可伪造的过程标识符，以识别并禁用未经授权的输入。

更新时间: 2024-04-02 10:05:47

领域: cs.CR,cs.ET,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.01803v1

SCAPE: Searching Conceptual Architecture Prompts using Evolution

Conceptual architecture involves a highly creative exploration of novel ideas, often taken from other disciplines as architects consider radical new forms, materials, textures and colors for buildings. While today's generative AI systems can produce remarkable results, they lack the creativity demonstrated for decades by evolutionary algorithms. SCAPE, our proposed tool, combines evolutionary search with generative AI, enabling users to explore creative and good quality designs inspired by their initial input through a simple point and click interface. SCAPE injects randomness into generative AI, and enables memory, making use of the built-in language skills of GPT-4 to vary prompts via text-based mutation and crossover. We demonstrate that compared to DALL-E 3, SCAPE enables a 67% improvement in image novelty, plus improvements in quality and effectiveness of use; we show that in just three iterations SCAPE has a 24% image novelty increase enabling effective exploration, plus optimization of images by users. We use more than 20 independent architects to assess SCAPE, who provide markedly positive feedback.

Updated: 2024-04-02 10:05:33

标题: SCAPE: 使用进化搜索概念架构提示

摘要: 概念架构涉及对新颖想法的高度创造性探索，通常来自其他学科，建筑师考虑建筑物的激进新形式、材料、纹理和颜色。尽管今天的生成AI系统可以产生令人瞩目的结果，但它们缺乏几十年来进化算法所展示的创造力。我们提出的工具SCAPE将进化搜索与生成AI相结合，使用户能够通过简单的点选界面探索由其初始输入启发的创造性和高质量设计。SCAPE将随机性注入到生成AI中，并利用GPT-4的内置语言技能通过基于文本的变异和交叉来改变提示。我们证明，与DALL-E 3相比，SCAPE在图像新颖性方面实现了67%的改进，还提高了质量和使用效果；我们展示，仅在三次迭代中，SCAPE就实现了24%的图像新颖性增加，实现了用户对图像的有效探索和优化。我们邀请了20多位独立的建筑师来评估SCAPE，他们提供了明显积极的反馈。

更新时间: 2024-04-02 10:05:33

领域: cs.NE,cs.AI,68W50, 68T07,G.1.6; I.2.10

下载: http://arxiv.org/abs/2402.00089v2

Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid

Autonomous and learning systems based on Deep Reinforcement Learning have firmly established themselves as a foundation for approaches to creating resilient and efficient Cyber-Physical Energy Systems. However, most current approaches suffer from two distinct problems: Modern model-free algorithms such as Soft Actor Critic need a high number of samples to learn a meaningful policy, as well as a fallback to ward against concept drifts (e. g., catastrophic forgetting). In this paper, we present the work in progress towards a hybrid agent architecture that combines model-based Deep Reinforcement Learning with imitation learning to overcome both problems.

Updated: 2024-04-02 09:55:30

标题: 《模仿游戏：基于模型和模仿学习的深度强化学习混合模型》

摘要: 基于深度强化学习的自主学习系统已经牢固地确立自己作为创建具有韧性和高效的网络物理能源系统方法的基础。然而，大多数当前方法存在两个明显问题：现代无模型算法（如Soft Actor Critic）需要大量样本来学习有意义的策略，并且需要一种应对概念漂移（例如灾难性遗忘）的后备方案。本文介绍了一种混合代理架构的工作进展，该架构将基于模型的深度强化学习与模仿学习相结合，以克服这两个问题。

更新时间: 2024-04-02 09:55:30

领域: cs.AI

下载: http://arxiv.org/abs/2404.01794v1

SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging

Medical image segmentation models adapting to new tasks in a training-free manner through in-context learning is an exciting advancement. Universal segmentation models aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to out-of-distribution (OOD) data modalities and tasks, requiring intricate fine-tuning of model for optimal performance. For addressing this challenge, we introduce SegICL, a novel approach leveraging In-Context Learning (ICL) for image segmentation. Unlike existing methods, SegICL has the capability to employ text-guided segmentation and conduct in-context learning with a small set of image-mask pairs, eliminating the need for training the model from scratch or fine-tuning for OOD tasks (including OOD modality and dataset). Extensive experimental validation of SegICL demonstrates a positive correlation between the number of prompt samples and segmentation performance on OOD modalities and tasks. This indicates that SegICL effectively address new segmentation tasks based on contextual information. Additionally, SegICL also exhibits comparable segmentation performance to mainstream models on OOD and in-distribution tasks. Our code will be released soon.

Updated: 2024-04-02 09:55:02

标题: SegICL：增强医学影像分割的通用上下文学习框架

摘要: 医学图像分割模型通过上下文学习适应新任务而无需训练是一项令人兴奋的进步。通用分割模型旨在概括医学图像的多样性模态，然而当应用于分布之外的数据模态和任务时，它们的有效性往往会减弱，需要对模型进行复杂的微调以实现最佳性能。为了解决这一挑战，我们引入了SegICL，一种利用In-Context Learning (ICL)进行图像分割的新方法。与现有方法不同，SegICL具有利用文本引导分割和使用少量图像-掩模对进行上下文学习的能力，消除了从头开始训练模型或为OOD任务（包括OOD模态和数据集）进行微调的需求。对SegICL进行了广泛的实验验证，结果显示提示样本数量与OOD模态和任务上的分割性能之间存在正相关。这表明SegICL有效地基于上下文信息解决新的分割任务。此外，SegICL在OOD和分布内任务上也展现出与主流模型相媲美的分割性能。我们的代码即将发布。

更新时间: 2024-04-02 09:55:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.16578v2

Using Large Language Models to Understand Telecom Standards

The Third Generation Partnership Project (3GPP) has successfully introduced standards for global mobility. However, the volume and complexity of these standards has increased over time, thus complicating access to relevant information for vendors and service providers. Use of Generative Artificial Intelligence (AI) and in particular Large Language Models (LLMs), may provide faster access to relevant information. In this paper, we evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants for 3GPP document reference. Our contribution is threefold. First, we provide a benchmark and measuring methods for evaluating performance of LLMs. Second, we do data preprocessing and fine-tuning for one of these LLMs and provide guidelines to increase accuracy of the responses that apply to all LLMs. Third, we provide a model of our own, TeleRoBERTa, that performs on-par with foundation LLMs but with an order of magnitude less number of parameters. Results show that LLMs can be used as a credible reference tool on telecom technical documents, and thus have potential for a number of different applications from troubleshooting and maintenance, to network operations and software product development.

Updated: 2024-04-02 09:54:51

标题: 使用大型语言模型理解电信标准

摘要: 第三代合作伙伴计划（3GPP）成功引入了全球移动性标准。然而，这些标准的数量和复杂性随着时间的推移而增加，因此对供应商和服务提供商获取相关信息造成了困难。利用生成式人工智能（AI），特别是大型语言模型（LLMs），可以更快地获取相关信息。在本文中，我们评估了最先进的LLMs作为3GPP文档参考的问答助手的能力。我们的贡献有三点。首先，我们提供了评估LLMs性能的基准和测量方法。其次，我们对其中一个LLMs进行了数据预处理和微调，并提供了提高所有LLMs响应准确性的指导。第三，我们提供了我们自己的模型TeleRoBERTa，其性能与基础LLMs相当，但参数数量少一个数量级。结果表明，LLMs可以作为电信技术文件的可靠参考工具，并因此在故障排除和维护、网络运营和软件产品开发等多种应用中具有潜力。

更新时间: 2024-04-02 09:54:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02929v1

Super-Resolution Analysis for Landfill Waste Classification

Illegal landfills are a critical issue due to their environmental, economic, and public health impacts. This study leverages aerial imagery for environmental crime monitoring. While advances in artificial intelligence and computer vision hold promise, the challenge lies in training models with high-resolution literature datasets and adapting them to open-access low-resolution images. Considering the substantial quality differences and limited annotation, this research explores the adaptability of models across these domains. Motivated by the necessity for a comprehensive evaluation of waste detection algorithms, it advocates cross-domain classification and super-resolution enhancement to analyze the impact of different image resolutions on waste classification as an evaluation to combat the proliferation of illegal landfills. We observed performance improvements by enhancing image quality but noted an influence on model sensitivity, necessitating careful threshold fine-tuning.

Updated: 2024-04-02 09:53:20

标题: 垃圾填埋场废物分类的超分辨率分析

摘要: 非法填埋场是一个重要问题，因为它们对环境、经济和公共健康产生影响。本研究利用航空影像进行环境犯罪监测。虽然人工智能和计算机视觉的进步有望解决问题，但挑战在于用高分辨率文献数据集训练模型，并将其适应开放获取的低分辨率图像。考虑到质量差异和有限的注释，本研究探讨了模型在这些领域之间的适应性。出于对废物检测算法全面评估的必要性的动机，它主张跨领域分类和超分辨率增强，以分析不同图像分辨率对废物分类的影响，作为打击非法填埋场扩散的评估。我们观察到通过提高图像质量可以提高性能，但注意到对模型敏感性的影响，需要进行仔细的阈值微调。

更新时间: 2024-04-02 09:53:20

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01790v1

Vision-Language Models in Remote Sensing: Current Progress and Future Trends

The remarkable achievements of ChatGPT and GPT-4 have sparked a wave of interest and research in the field of large language models for Artificial General Intelligence (AGI). These models provide intelligent solutions close to human thinking, enabling us to use general artificial intelligence to solve problems in various applications. However, in remote sensing (RS), the scientific literature on the implementation of AGI remains relatively scant. Existing AI-related research in remote sensing primarily focuses on visual understanding tasks while neglecting the semantic understanding of the objects and their relationships. This is where vision-language models excel, as they enable reasoning about images and their associated textual descriptions, allowing for a deeper understanding of the underlying semantics. Vision-language models can go beyond visual recognition of RS images, model semantic relationships, and generate natural language descriptions of the image. This makes them better suited for tasks requiring visual and textual understanding, such as image captioning, and visual question answering. This paper provides a comprehensive review of the research on vision-language models in remote sensing, summarizing the latest progress, highlighting challenges, and identifying potential research opportunities.

Updated: 2024-04-02 09:52:41

标题: Remote Sensing中的视觉语言模型：当前进展和未来趋势

摘要: ChatGPT和GPT-4的显著成就引发了对大型语言模型在人工智能普遍智能（AGI）领域的兴趣和研究浪潮。这些模型提供了接近人类思维的智能解决方案，使我们能够利用普通人工智能来解决各种应用中的问题。然而，在遥感（RS）领域，有关实施AGI的科学文献仍然相对稀缺。遥感领域现有的与人工智能相关的研究主要集中在视觉理解任务上，而忽视了对象及其关系的语义理解。这正是视觉语言模型擅长的地方，因为它们能够推理关于图像及其相关文本描述的内容，从而更深入地理解底层语义。视觉语言模型可以超越遥感图像的视觉识别，建模语义关系，并生成图像的自然语言描述。这使它们更适合于需要视觉和文本理解的任务，如图像字幕和视觉问题回答。本文对遥感中视觉语言模型的研究进行了全面回顾，总结了最新进展，强调了挑战，并确定了潜在的研究机会。

更新时间: 2024-04-02 09:52:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2305.05726v2

Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models

The fast advance of the image generation community has attracted attention worldwide. The safety issue needs to be further scrutinized and studied. There have been a few works around this area mostly achieving a post-processing design, model-specific, or yielding suboptimal image quality generation. Despite that, in this article, we discover a black-box attack method that enjoys three merits. It enables (i)-attacks both directed and semantic-driven that theoretically and practically pose a hazard to this vast user community, (ii)-surprisingly surpasses the white-box attack in a black-box manner and (iii)-without requiring any post-processing effort. Core to our approach is inspired by the concept guidance intriguing property of Classifier-Free guidance (CFG) in T2I models, and we discover that conducting frustratingly simple guidance in the CLIP embedding space, coupled with the semantic loss and an additionally sensitive word list works very well. Moreover, our results expose and highlight the vulnerabilities in existing defense mechanisms.

Updated: 2024-04-02 09:49:35

标题: 越狱提示攻击：一种可控的散播模型对抗性攻击

摘要: 图像生成领域的快速发展已经引起了全球的关注。安全问题需要进一步进行审查和研究。已经有一些作品围绕这一领域，大多实现了后处理设计，特定模型或产生次优的图像质量生成。尽管如此，在这篇文章中，我们发现了一种黑盒攻击方法，具有三个优点。它实现了（i）理论上和实际上对这个广大用户群体构成危险的有针对性和语义驱动的攻击，（ii）以黑盒方式出乎意料地超越了白盒攻击，（iii）而且无需任何后处理工作。我们方法的核心受到了T2I模型中分类器自由引导（CFG）的概念引导引人注目的特性的启发，我们发现在CLIP嵌入空间中进行令人沮丧的简单引导，结合语义损失和额外敏感的词汇列表效果非常好。此外，我们的结果暴露并突出了现有防御机制中的弱点。

更新时间: 2024-04-02 09:49:35

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.02928v1

A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

The ability to detect unfamiliar or unexpected images is essential for safe deployment of computer vision systems. In the context of classification, the task of detecting images outside of a model's training domain is known as out-of-distribution (OOD) detection. While there has been a growing research interest in developing post-hoc OOD detection methods, there has been comparably little discussion around how these methods perform when the underlying classifier is not trained on a clean, carefully curated dataset. In this work, we take a closer look at 20 state-of-the-art OOD detection methods in the (more realistic) scenario where the labels used to train the underlying classifier are unreliable (e.g. crowd-sourced or web-scraped labels). Extensive experiments across different datasets, noise types & levels, architectures and checkpointing strategies provide insights into the effect of class label noise on OOD detection, and show that poor separation between incorrectly classified ID samples vs. OOD samples is an overlooked yet important limitation of existing methods. Code: https://github.com/glhr/ood-labelnoise

Updated: 2024-04-02 09:40:22

标题: 一个房间里的吵闹大象：您的越界检测器对标签噪声是否鲁棒？

摘要: 检测不熟悉或意外图像的能力对于安全部署计算机视觉系统至关重要。在分类的背景下，检测模型训练领域之外的图像被称为超出分布（OOD）检测任务。虽然在开发事后OOD检测方法方面存在着越来越多的研究兴趣，但在讨论当基础分类器未经过清洁、精心策划的数据集训练时这些方法的表现时却相对较少。在这项工作中，我们着眼于20种最先进的OOD检测方法，采用更现实的情景，即基础分类器训练时使用的标签不可靠（例如众包或网络抓取标签）。通过在不同数据集、噪声类型和级别、架构和检查点策略上进行广泛实验，提供了关于类标签噪声对OOD检测的影响的见解，并显示出误分类的ID样本与OOD样本之间的差异度不足是现有方法中被忽视但重要的限制。代码：https://github.com/glhr/ood-labelnoise

更新时间: 2024-04-02 09:40:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01775v1

AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models

Researchers and innovators have made enormous efforts in developing ideation methods, such as morphological analysis and design-by-analogy, to aid engineering design ideation for problem solving and innovation. Among these, TRIZ stands out as the most well-known approach, widely applied for systematic innovation. However, the complexity of TRIZ resources and concepts, coupled with its reliance on users' knowledge, experience, and reasoning capabilities, limits its practicability. This paper proposes AutoTRIZ, an artificial ideation tool that leverages large language models (LLMs) to automate and enhance the TRIZ methodology. By leveraging the broad knowledge and advanced reasoning capabilities of LLMs, AutoTRIZ offers a novel approach to design automation and interpretable ideation with artificial intelligence. We demonstrate and evaluate the effectiveness of AutoTRIZ through consistency experiments in contradiction detection and comparative studies with cases collected from TRIZ textbooks. Moreover, the proposed LLM-based framework holds the potential for extension to automate other knowledge-based ideation methods, including SCAMPER, Design Heuristics, and Design-by-Analogy, paving the way for a new era of artificial ideation for design and innovation.

Updated: 2024-04-02 09:38:05

标题: AutoTRIZ：利用TRIZ和大型语言模型进行人工创意

摘要: 研究人员和创新者在发展构思方法方面做出了巨大的努力，如形态分析和类比设计，以帮助工程设计构思解决问题和创新。在这些方法中，TRIZ作为最著名的方法脱颖而出，被广泛应用于系统创新。然而，TRIZ资源和概念的复杂性，以及其依赖用户的知识、经验和推理能力，限制了其实用性。本文提出了AutoTRIZ，这是一种利用大型语言模型（LLMs）自动化和增强TRIZ方法的人工构思工具。通过利用LLMs的广泛知识和先进推理能力，AutoTRIZ提供了一种新颖的人工智能设计自动化和可解释构思的方法。我们通过一致性实验中的矛盾检测以及与从TRIZ教材中收集的案例进行比较研究，展示并评估了AutoTRIZ的有效性。此外，提出的基于LLM的框架具有将自动化其他基于知识的构思方法，包括SCAMPER、设计启发和类比设计的潜力，为设计和创新的人工构思开启了新时代。

更新时间: 2024-04-02 09:38:05

领域: cs.HC,cs.AI,cs.CL,I.2.7; I.2.1

下载: http://arxiv.org/abs/2403.13002v2

YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction

The difficulty of the information extraction task lies in dealing with the task-specific label schemas and heterogeneous data structures. Recent work has proposed methods based on large language models to uniformly model different information extraction tasks. However, these existing methods are deficient in their information extraction capabilities for Chinese languages other than English. In this paper, we propose an end-to-end chat-enhanced instruction tuning framework for universal information extraction (YAYI-UIE), which supports both Chinese and English. Specifically, we utilize dialogue data and information extraction data to enhance the information extraction performance jointly. Experimental results show that our proposed framework achieves state-of-the-art performance on Chinese datasets while also achieving comparable performance on English datasets under both supervised settings and zero-shot settings.

Updated: 2024-04-02 09:36:35

标题: YAYI-UIE: 一个用于通用信息提取的聊天增强式指令调整框架

摘要: 信息提取任务的困难在于处理特定任务的标签模式和异构数据结构。最近的研究提出了基于大型语言模型的方法，以统一地建模不同的信息提取任务。然而，这些现有方法在除英语以外的中文语言的信息提取能力方面存在不足。在本文中，我们提出了一种端到端的聊天增强指令调整框架，用于通用信息提取（YAYI-UIE），支持中英文。具体而言，我们利用对话数据和信息提取数据共同增强信息提取性能。实验结果显示，我们提出的框架在中文数据集上实现了最先进的性能，同时在受监督设置和零样本设置下在英文数据集上实现了可比较的性能。

更新时间: 2024-04-02 09:36:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.15548v3

SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window

Reference-based metrics that operate at the sentence-level typically outperform quality estimation metrics, which have access only to the source and system output. This is unsurprising, since references resolve ambiguities that may be present in the source. In this paper, we investigate whether additional source context can effectively substitute for a reference. We present a metric named SLIDE (SLIding Document Evaluator), which operates on blocks of sentences. SLIDE leverages a moving window that slides over each document in the test set, feeding each chunk of sentences into an unmodified, off-the-shelf quality estimation model. We find that SLIDE obtains significantly higher pairwise system accuracy than its sentence-level baseline, in some cases even eliminating the gap with reference-base metrics. This suggests that source context may provide the same information as a human reference in disambiguating source ambiguities. This finding is especially pertinent for reference-free document-level evaluation, wherein SLIDE could provide higher-quality pairwise system assessments while only requiring document boundary annotations.

Updated: 2024-04-02 09:36:24

标题: SLIDE：使用滑动文档窗口进行机器翻译的无参考评估

摘要: 参考文献的度量指标通常在句子级别上运行，通常优于仅访问源和系统输出的质量估计度量。这并不奇怪，因为参考解决了可能存在于源文本中的歧义。在本文中，我们调查了是否额外的源文本背景可以有效地替代参考文献。我们提出了一个名为SLIDE（SLIding Document Evaluator）的度量标准，它在句子块上运行。SLIDE利用一个移动窗口，在测试集中的每个文档上滑动，将每个句子块输入到一个未修改的、现成的质量估计模型中。我们发现，SLIDE在某些情况下比其句子级别基线获得了显著更高的系统准确性，甚至消除了与基于参考文献的度量之间的差距。这表明源文本背景可能提供了与人类参考文献在消除源文本歧义方面相同的信息。这一发现对于无参考文献的文档级评估尤为重要，SLIDE可以在只需文档边界注释的情况下提供更高质量的系统对比评估。

更新时间: 2024-04-02 09:36:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.08832v2

All in an Aggregated Image for In-Image Learning

This paper introduces a new in-context learning (ICL) mechanism called In-Image Learning (I$^2$L) that combines demonstration examples, visual cues, and chain-of-thought reasoning into an aggregated image to enhance the capabilities of Large Multimodal Models (e.g., GPT-4V) in multimodal reasoning tasks. Unlike previous approaches that rely on converting images to text or incorporating visual input into language models, I$^2$L consolidates all information into an aggregated image and leverages image processing, understanding, and reasoning abilities. This has several advantages: it reduces inaccurate textual descriptions of complex images, provides flexibility in positioning demonstration examples, and avoids multiple input images and lengthy prompts. We also introduce I$^2$L-Hybrid, a method that combines the strengths of I$^2$L with other ICL methods. Specifically, it uses an automatic strategy to select the most suitable method (I$^2$L or another certain ICL method) for a specific task instance. We conduct extensive experiments to assess the effectiveness of I$^2$L and I$^2$L-Hybrid on MathVista, which covers a variety of complex multimodal reasoning tasks. Additionally, we investigate the influence of image resolution, the number of demonstration examples in a single image, and the positions of these demonstrations in the aggregated image on the effectiveness of I$^2$L. Our code is publicly available at https://github.com/AGI-Edgerunners/IIL.

Updated: 2024-04-02 09:32:51

标题: 在图像学习中的全部聚合图像

摘要: 这篇论文介绍了一种名为In-Image Learning（I$^2$L）的新型上下文学习（ICL）机制，它将示范例、视觉线索和思维链理由结合到一个聚合图像中，以增强大型多模态模型（例如GPT-4V）在多模态推理任务中的能力。与先前依赖将图像转换为文本或将视觉输入纳入语言模型的方法不同，I$^2$L将所有信息整合到一个聚合图像中，并利用图像处理、理解和推理能力。这有几个优点：减少了对复杂图像的不准确文本描述，提供了在定位示范例时的灵活性，并避免了多个输入图像和冗长提示。我们还介绍了I$^2$L-Hybrid，这是一种将I$^2$L与其他ICL方法的优势结合起来的方法。具体而言，它使用自动策略为特定任务实例选择最适合的方法（I$^2$L或另一种特定的ICL方法）。我们进行了大量实验，评估了在MathVista上I$^2$L和I$^2$L-Hybrid的有效性，该平台涵盖了各种复杂的多模态推理任务。此外，我们还调查了图像分辨率、单个图像中示范例的数量以及这些示范在聚合图像中的位置对I$^2$L有效性的影响。我们的代码可以在https://github.com/AGI-Edgerunners/IIL上公开获取。

更新时间: 2024-04-02 09:32:51

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.17971v2

Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems

The rapid advance of deep reinforcement learning techniques enables the oversight of safety-critical systems through the utilization of Deep Neural Networks (DNNs). This underscores the pressing need to promptly establish certified safety guarantees for such DNN-controlled systems. Most of the existing verification approaches rely on qualitative approaches, predominantly employing reachability analysis. However, qualitative verification proves inadequate for DNN-controlled systems as their behaviors exhibit stochastic tendencies when operating in open and adversarial environments. In this paper, we propose a novel framework for unifying both qualitative and quantitative safety verification problems of DNN-controlled systems. This is achieved by formulating the verification tasks as the synthesis of valid neural barrier certificates (NBCs). Initially, the framework seeks to establish almost-sure safety guarantees through qualitative verification. In cases where qualitative verification fails, our quantitative verification method is invoked, yielding precise lower and upper bounds on probabilistic safety across both infinite and finite time horizons. To facilitate the synthesis of NBCs, we introduce their $k$-inductive variants. We also devise a simulation-guided approach for training NBCs, aiming to achieve tightness in computing precise certified lower and upper bounds. We prototype our approach into a tool called $\textsf{UniQQ}$ and showcase its efficacy on four classic DNN-controlled systems.

Updated: 2024-04-02 09:31:51

标题: 将文献标题翻译为：深度神经网络控制系统的定性和定量安全验证统一化

摘要: 深度强化学习技术的快速发展使得通过深度神经网络（DNNs）对安全关键系统进行监督成为可能。这凸显了迫切需要为这些由DNN控制的系统迅速建立认证安全保证的需求。大多数现有的验证方法依赖于定性方法，主要使用可达性分析。然而，对于在开放和对抗环境中运行时行为呈现出随机趋势的DNN控制系统，定性验证证明不足以应对。在本文中，我们提出了一个新颖的框架，用于统一DNN控制系统的定性和定量安全验证问题。这是通过将验证任务定型为有效神经屏障证书（NBCs）的合成来实现的。最初，该框架通过定性验证寻求建立几乎确定的安全保证。在定性验证失败的情况下，我们调用定量验证方法，得出概率安全在无限和有限时间范围内的精确下限和上限。为了促进NBCs的合成，我们引入了它们的$k$-归纳变体。我们还设计了一个以模拟为指导的方法来训练NBCs，旨在计算精确认证的下限和上限。我们将我们的方法原型化为一个名为$\textsf{UniQQ}$的工具，并展示其在四个经典的DNN控制系统上的有效性。

更新时间: 2024-04-02 09:31:51

领域: cs.LG

下载: http://arxiv.org/abs/2404.01769v1

Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation

Recent advancements in Large Language Models (LLMs) have significantly increased their presence in human-facing Artificial Intelligence (AI) applications. However, LLMs could reproduce and even exacerbate stereotypical outputs from training data. This work introduces the Multi-Grain Stereotype (MGS) dataset, encompassing 51,867 instances across gender, race, profession, religion, and stereotypical text, collected by fusing multiple previously publicly available stereotype detection datasets. We explore different machine learning approaches aimed at establishing baselines for stereotype detection, and fine-tune several language models of various architectures and model sizes, presenting in this work a series of stereotypes classifier models for English text trained on MGS. To understand whether our stereotype detectors capture relevant features (aligning with human common sense) we utilise a variety of explanainable AI tools, including SHAP, LIME, and BertViz, and analyse a series of example cases discussing the results. Finally, we develop a series of stereotype elicitation prompts and evaluate the presence of stereotypes in text generation tasks with popular LLMs, using one of our best performing previously presented stereotypes detectors. Our experiments yielded several key findings: i) Training stereotype detectors in a multi-dimension setting yields better results than training multiple single-dimension classifiers.ii) The integrated MGS Dataset enhances both the in-dataset and cross-dataset generalisation ability of stereotype detectors compared to using the datasets separately. iii) There is a reduction in stereotypes in the content generated by GPT Family LLMs with newer versions.

Updated: 2024-04-02 09:31:32

标题: 审计大型语言模型以增强基于文本的陈规检测和基于探针的偏见评估

摘要: 最近大型语言模型（LLMs）的进展显著增加了它们在面向人类的人工智能（AI）应用中的存在。然而，LLMs可能会复制甚至加剧训练数据中的刻板化输出。本文介绍了多粒度刻板（MGS）数据集，包括通过融合多个先前公开可用的刻板检测数据集收集的51,867个实例，涵盖性别、种族、职业、宗教和刻板文本。我们探索了不同的机器学习方法，旨在建立刻板检测的基线，并对各种架构和模型大小的几个语言模型进行微调，在本工作中提出了一系列在MGS上训练的英文文本的刻板分类器模型。为了了解我们的刻板检测器是否捕捉到相关特征（与人类常识一致），我们利用各种可解释的AI工具，包括SHAP、LIME和BertViz，并分析一系列示例案例讨论结果。最后，我们制定了一系列刻板唤起提示，并使用我们之前表现最佳的刻板检测器之一，在文本生成任务中评估流行的LLMs中的刻板存在。我们的实验得出了几个关键发现：i）在多维设置中训练刻板检测器比训练多个单一维度分类器效果更好。ii）整合的MGS数据集相对于单独使用数据集，增强了刻板检测器的数据集内外泛化能力。iii）与较新版本相比，GPT Family LLMs生成的内容中刻板的减少。

更新时间: 2024-04-02 09:31:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01768v1

Artificial Neural Networks-based Real-time Classification of ENG Signals for Implanted Nerve Interfaces

Neuropathies are gaining higher relevance in clinical settings, as they risk permanently jeopardizing a person's life. To support the recovery of patients, the use of fully implanted devices is emerging as one of the most promising solutions. However, these devices, even if becoming an integral part of a fully complex neural nanonetwork system, pose numerous challenges. In this article, we address one of them, which consists of the classification of motor/sensory stimuli. The task is performed by exploring four different types of artificial neural networks (ANNs) to extract various sensory stimuli from the electroneurographic (ENG) signal measured in the sciatic nerve of rats. Different sizes of the data sets are considered to analyze the feasibility of the investigated ANNs for real-time classification through a comparison of their performance in terms of accuracy, F1-score, and prediction time. The design of the ANNs takes advantage of the modelling of the ENG signal as a multiple-input multiple-output (MIMO) system to describe the measures taken by state-of-the-art implanted nerve interfaces. These are based on the use of multi-contact cuff electrodes to achieve nanoscale spatial discrimination of the nerve activity. The MIMO ENG signal model is another contribution of this paper. Our results show that some ANNs are more suitable for real-time applications, being capable of achieving accuracies over $90\%$ for signal windows of $100$ and $200\,$ms with a low enough processing time to be effective for pathology recovery.

Updated: 2024-04-02 09:26:43

标题: 基于人工神经网络的实时分类ENG信号用于植入神经接口

摘要: 神经病变在临床环境中变得越来越重要，因为它们可能永久危及一个人的生命。为了支持患者的康复，全植入式设备的使用正在成为最有前途的解决方案之一。然而，即使这些设备已成为完全复杂的神经纳米网络系统的一个组成部分，仍然存在许多挑战。在本文中，我们讨论其中之一，即运动/感觉刺激的分类。通过探索四种不同类型的人工神经网络（ANNs），我们执行该任务，以从大鼠坐骨神经中测量的电神经图（ENG）信号中提取各种感觉刺激。考虑了不同大小的数据集，以分析研究的ANNs在实时分类方面的可行性，通过比较它们在准确性、F1分数和预测时间方面的表现。ANNs的设计利用了将ENG信号建模为多输入多输出（MIMO）系统，以描述最新植入神经接口所采取的措施。这些接口基于使用多接触袖口电极实现对神经活动的纳米尺度空间判别。MIMO ENG信号模型是本文的另一个贡献。我们的结果表明，一些ANNs更适合实时应用，能够在信号窗口为100和200毫秒时实现超过90％的准确性，同时处理时间足够低以有效用于病理康复。

更新时间: 2024-04-02 09:26:43

领域: cs.AI

下载: http://arxiv.org/abs/2403.20234v2

Security for adversarial wiretap channels

We consider the wiretap channel, where the individual channel uses have memory or are influenced by an adversary. We analyze the explicit and computationally efficient construction of information-theoretically secure coding schemes which use the inverse of an extractor and an error-correcting code. These schemes are known to achieve secrecy capacity on a large class of memoryless wiretap channels. We show that this also holds for certain channel types with memory. In particular, they can achieve secrecy capacity on channels where an adversary can pick a sequence of ``states'' governing the channel's behavior, as long as, given every possible state, the channel is strongly symmetric.

Updated: 2024-04-02 09:22:40

标题: 对抗性窃听信道的安全性

摘要: 我们考虑窃听信道，其中个体信道具有记忆或受到对手的影响。我们分析了信息论安全编码方案的明确和计算有效的构造，这些方案使用提取器的逆和纠错码。已知这些方案在大类无记忆窃听信道上实现了保密容量。我们展示了对于某些具有记忆的信道类型，这也成立。特别是，它们可以在对手可以选择控制信道行为的“状态”序列的信道上实现保密容量，只要在给定每种可能的状态时，信道是强对称的。

更新时间: 2024-04-02 09:22:40

领域: cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2404.01760v1

On Automating Video Game Regression Testing by Planning and Learning

In this paper, we propose a method and workflow for automating regression testing of certain video game aspects using automated planning and incremental action model learning techniques. The basic idea is to use detailed game logs and incremental action model learning techniques to maintain a formal model in the planning domain description language (PDDL) of the gameplay mechanics. The workflow enables efficient cooperation of game developers without any experience with PDDL or other formal systems and a person experienced with PDDL modeling but no game development skills. We describe the method and workflow in general and then demonstrate it on a concrete proof-of-concept example -- a simple role-playing game provided as one of the tutorial projects in the popular game development engine Unity. This paper presents the first step towards minimizing or even eliminating the need for a modeling expert in the workflow, thus making automated planning accessible to a broader audience.

Updated: 2024-04-02 09:16:14

标题: 论通过规划和学习来自动化视频游戏回归测试

摘要: 在本文中，我们提出了一种利用自动规划和增量动作模型学习技术自动化回归测试特定视频游戏方面的方法和工作流程。基本思想是利用详细的游戏日志和增量动作模型学习技术来维护游戏机制的规划领域描述语言（PDDL）的形式模型。该工作流程使游戏开发人员能够高效地合作，而无需具有PDDL或其他形式系统经验，以及具有PDDL建模经验但没有游戏开发技能的人。我们首先概述了该方法和工作流程，然后在一个具体的概念验证示例上进行演示——这是一个简单的角色扮演游戏，作为流行的游戏开发引擎Unity中的一个教程项目提供。本文展示了在工作流程中最小化甚至消除对建模专家需求的第一步，从而使自动规划对更广泛的受众可及。

更新时间: 2024-04-02 09:16:14

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2402.12393v2

Remote sensing framework for geological mapping via stacked autoencoders and clustering

Supervised learning methods for geological mapping via remote sensing face limitations due to the scarcity of accurately labelled training data. In contrast, unsupervised learning methods, such as dimensionality reduction and clustering have the ability to uncover patterns and structures in remote sensing data without relying on predefined labels. Dimensionality reduction methods have the potential to play a crucial role in improving the accuracy of geological maps. Although conventional dimensionality reduction methods may struggle with nonlinear data, unsupervised deep learning models such as autoencoders have the ability to model nonlinear relationship in data. Stacked autoencoders feature multiple interconnected layers to capture hierarchical data representations that can be useful for remote sensing data. In this study, we present an unsupervised machine learning framework for processing remote sensing data by utilizing stacked autoencoders for dimensionality reduction and k-means clustering for mapping geological units. We use the Landsat-8, ASTER, and Sentinel-2 datasets of the Mutawintji region in Western New South Wales, Australia to evaluate the framework for geological mapping. We also provide a comparison of stacked autoencoders with principal component analysis and canonical autoencoders. Our results reveal that the framework produces accurate and interpretable geological maps, efficiently discriminating rock units. We find that the stacked autoencoders provide better accuracy when compared to the counterparts. We also find that the generated maps align with prior geological knowledge of the study area while providing novel insights into geological structures.

Updated: 2024-04-02 09:15:32

标题: 基于堆叠自动编码器和聚类的遥感地质制图框架

摘要: 监督学习方法用于利用遥感进行地质制图面临着由于准确标记训练数据稀缺而限制。相比之下，无监督学习方法，如降维和聚类，能够在不依赖预定义标签的情况下发现遥感数据中的模式和结构。降维方法有潜力在提高地质图准确性方面发挥关键作用。虽然传统的降维方法可能难以处理非线性数据，但无监督深度学习模型，如自动编码器，能够模拟数据中的非线性关系。堆叠自动编码器具有多个相互连接的层，可捕获有用于遥感数据的分层数据表示。在这项研究中，我们提出了一个利用堆叠自动编码器进行降维和k均值聚类进行地质单位制图的无监督机器学习框架，用于处理遥感数据。我们使用澳大利亚新南威尔士州西部Mutawintji地区的Landsat-8、ASTER和Sentinel-2数据集来评估地质制图的框架。我们还将堆叠自动编码器与主成分分析和规范自动编码器进行比较。我们的结果显示，该框架能够产生准确且可解释的地质图，有效区分岩石单位。我们发现与对照组相比，堆叠自动编码器提供更好的准确性。我们还发现生成的地图与研究区域的先前地质知识相符，并提供了对地质结构的新见解。

更新时间: 2024-04-02 09:15:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.02180v1

Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments

Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially Large Language Model based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple. It remains unclear how existing approaches perform in repairing programs from higher-level programming courses. To address these limitations, we curate a new advanced student assignment dataset named Defects4DS from a higher-level programming course. Subsequently, we identify the challenges related to fixing bugs in advanced assignments. Based on the analysis, we develop a framework called PaR that is powered by the LLM. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria. Then Multi-Source Prompt Generation adeptly combines multiple sources of information to create a comprehensive and informative prompt for the last Program Repair stage. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively

Updated: 2024-04-02 09:12:21

标题: 同伴辅助修复者：赋予大型语言模型修复高级学生作业的能力

摘要: 自动生成关于编程作业的反馈对编程教育有重要意义，尤其是在高级作业方面。自动程序修复技术，尤其是基于大型语言模型的方法，因其修复入门级作业的潜力而受到显着认可。然而，用于评估的程序相对简单。目前尚不清楚现有方法在修复高级编程课程中的程序时的表现如何。为了解决这些限制，我们从高级编程课程中整理了一个新的高级学生作业数据集，名为Defects4DS。随后，我们确定了修复高级作业中的错误所涉及的挑战。根据分析结果，我们开发了一个名为PaR的框架，该框架由LLM支持。PaR分为三个阶段：同行解决方案选择、多源提示生成和程序修复。同行解决方案选择根据词法、语义和句法标准识别与之密切相关的同行程序。然后，多源提示生成巧妙地将多个信息源结合在一起，为最后的程序修复阶段创建了全面且信息丰富的提示。在Defects4DS和另一个经过深入研究的ITSP数据集上的评估显示，与先前的LLM和基于符号的方法相比，PaR实现了新的最先进性能，修复率分别提高了19.94%和15.2%。

更新时间: 2024-04-02 09:12:21

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.01754v1

Safe Interval RRT* for Scalable Multi-Robot Path Planning in Continuous Space

In this paper, we consider the problem of Multi-Robot Path Planning (MRPP) in continuous space to find conflict-free paths. The difficulty of the problem arises from two primary factors. First, the involvement of multiple robots leads to combinatorial decision-making, which escalates the search space exponentially. Second, the continuous space presents potentially infinite states and actions. For this problem, we propose a two-level approach where the low level is a sampling-based planner Safe Interval RRT* (SI-RRT*) that finds a collision-free trajectory for individual robots. The high level can use any method that can resolve inter-robot conflicts where we employ two representative methods that are Prioritized Planning (SI-CPP) and Conflict Based Search (SI-CCBS). Experimental results show that SI-RRT* can find a high-quality solution quickly with a small number of samples. SI-CPP exhibits improved scalability while SI-CCBS produces higher-quality solutions compared to the state-of-the-art planners for continuous space. Compared to the most scalable existing algorithm, SI-CPP achieves a success rate that is up to 94% higher with 100 robots while maintaining solution quality (i.e., flowtime, the sum of travel times of all robots) without significant compromise. SI-CPP also decreases the makespan up to 45%. SI-CCBS decreases the flowtime by 9% compared to the competitor, albeit exhibiting a 14% lower success rate.

Updated: 2024-04-02 09:07:12

标题: 在连续空间中可扩展多机器人路径规划的安全间隔RRT*

摘要: 在本文中，我们考虑了在连续空间中解决多机器人路径规划（MRPP）问题以找到无冲突路径的问题。该问题的困难主要源于两个因素。首先，涉及多个机器人导致组合决策制定，使搜索空间呈指数级增长。其次，连续空间呈现出潜在无限的状态和行动。针对这个问题，我们提出了一个两级方法，其中低级是基于采样的规划器Safe Interval RRT* (SI-RRT*)，用于为单个机器人找到无碰撞的轨迹。高级可以使用任何能解决机器人间冲突的方法，我们采用了两种代表性方法，即Prioritized Planning (SI-CPP)和Conflict Based Search (SI-CCBS)。实验结果表明，SI-RRT*可以快速找到高质量的解决方案，并且只需要少量样本。SI-CPP表现出更好的可扩展性，而SI-CCBS与连续空间的最先进规划器相比产生了更高质量的解决方案。与最具可扩展性的现有算法相比，SI-CPP在100个机器人的情况下的成功率高达94%，同时保持了解决方案质量（即流程时间，所有机器人的行程时间总和）而没有明显妥协。SI-CPP还将最大完成时间减少了高达45%。SI-CCBS将流程时间与竞争对手相比减少了9%，尽管成功率较低14%。

更新时间: 2024-04-02 09:07:12

领域: cs.RO,cs.AI,cs.MA

下载: http://arxiv.org/abs/2404.01752v1

Global Mapping of Exposure and Physical Vulnerability Dynamics in Least Developed Countries using Remote Sensing and Machine Learning

As the world marked the midterm of the Sendai Framework for Disaster Risk Reduction 2015-2030, many countries are still struggling to monitor their climate and disaster risk because of the expensive large-scale survey of the distribution of exposure and physical vulnerability and, hence, are not on track in reducing risks amidst the intensifying effects of climate change. We present an ongoing effort in mapping this vital information using machine learning and time-series remote sensing from publicly available Sentinel-1 SAR GRD and Sentinel-2 Harmonized MSI. We introduce the development of "OpenSendaiBench" consisting of 47 countries wherein most are least developed (LDCs), trained ResNet-50 deep learning models, and demonstrated the region of Dhaka, Bangladesh by mapping the distribution of its informal constructions. As a pioneering effort in auditing global disaster risk over time, this paper aims to advance the area of large-scale risk quantification in informing our collective long-term efforts in reducing climate and disaster risk.

Updated: 2024-04-02 09:04:56

标题: 使用遥感和机器学习技术对最不发达国家暴露和物理脆弱性动态进行全球映射

摘要: 随着世界标志着2015-2030年《仙台减灾框架》中期，许多国家仍在努力监测其气候和灾害风险，因为昂贵的大规模调查分布暴露和物理脆弱性，因此，在气候变化加剧的影响下，风险并没有减少。我们提出了使用机器学习和时序遥感数据绘制这些重要信息的持续努力，利用公开可用的Sentinel-1 SAR GRD和Sentinel-2 Harmonized MSI。我们介绍了“OpenSendaiBench”的开发，包括47个国家，其中大部分是最不发达国家（LDCs），我们训练了ResNet-50深度学习模型，并通过绘制孟加拉国达卡地区的非正式建筑分布来展示。作为全球灾害风险审计的先导性努力，本文旨在推进大规模风险量化领域，以指导我们共同的长期努力，降低气候和灾害风险。

更新时间: 2024-04-02 09:04:56

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.01748v1

Towards Scalable & Efficient Interaction-Aware Planning in Autonomous Vehicles using Knowledge Distillation

Real-world driving involves intricate interactions among vehicles navigating through dense traffic scenarios. Recent research focuses on enhancing the interaction awareness of autonomous vehicles to leverage these interactions in decision-making. These interaction-aware planners rely on neural-network-based prediction models to capture inter-vehicle interactions, aiming to integrate these predictions with traditional control techniques such as Model Predictive Control. However, this integration of deep learning-based models with traditional control paradigms often results in computationally demanding optimization problems, relying on heuristic methods. This study introduces a principled and efficient method for combining deep learning with constrained optimization, employing knowledge distillation to train smaller and more efficient networks, thereby mitigating complexity. We demonstrate that these refined networks maintain the problem-solving efficacy of larger models while significantly accelerating optimization. Specifically, in the domain of interaction-aware trajectory planning for autonomous vehicles, we illustrate that training a smaller prediction network using knowledge distillation speeds up optimization without sacrificing accuracy.

Updated: 2024-04-02 09:04:06

标题: 朝向可扩展和高效的自主车辆交互感知规划的方向，使用知识蒸馏

摘要: 实际驾驶涉及车辆在密集交通场景中相互作用复杂的互动。最近的研究集中于增强自动驾驶车辆的交互意识，以利用这些互动来进行决策。这些交互意识规划器依赖于基于神经网络的预测模型来捕捉车辆之间的互动，旨在将这些预测与传统控制技术如模型预测控制相结合。然而，将基于深度学习的模型与传统控制范式相结合往往会导致计算需求高的优化问题，依赖于启发式方法。本研究引入了一种基于原则和高效的方法，将深度学习与受限优化相结合，利用知识蒸馏来训练更小更高效的网络，从而减轻了复杂性。我们展示了这些精炼的网络保持了较大模型的问题解决效能，同时显著加快了优化过程。具体来说，在自动驾驶车辆的交互意识轨迹规划领域，我们证明了使用知识蒸馏训练一个更小的预测网络可以加速优化过程，同时不损害准确性。

更新时间: 2024-04-02 09:04:06

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01746v1

Is Mamba Effective for Time Series Forecasting?

In the realm of time series forecasting (TSF), it is imperative for models to adeptly discern and distill dependencies embedded within historical time series data. This encompasses the extraction of temporal dependencies and inter-variate correlations (VC), thereby empowering the models to forecast future states. Transformer-based models have exhibited formidable efficacy in TSF, primarily attributed to their distinct proficiency in apprehending both TD and VC. However, due to the inefficiencies, ongoing efforts to refine the Transformer persist. Recently, state space models (SSMs), e.g. Mamba, have gained traction due to their ability to process complex dependencies in sequences, similar to the Transformer, while maintaining near-linear complexity. This has piqued our interest in exploring SSM's potential in TSF tasks. Therefore, we propose a Mamba-based model named Simple-Mamba (S-Mamba) for TSF. Specifically, we tokenize the time points of each variate autonomously via a linear layer. Subsequently, a bidirectional Mamba layer is utilized to extract VC, followed by the generation of forecast outcomes through a composite structure of a Feed-Forward Network for TD and a mapping layer. Experiments on several datasets prove that S-Mamba maintains low computational overhead and achieves leading performance. Furthermore, we conduct extensive experiments to delve deeper into the potential of Mamba compared to the Transformer in the TSF. Our code is available at https://github.com/wzhwzhwzh0921/S-D-Mamba.

Updated: 2024-04-02 09:03:37

标题: 蛇毒对时间序列预测有效吗？

摘要: 在时间序列预测（TSF）领域，模型必须灵活地识别和提取嵌入在历史时间序列数据中的依赖关系。这包括提取时间依赖关系和变量之间的相关性（VC），从而使模型能够预测未来的状态。基于Transformer的模型在TSF中表现出强大的效果，主要归功于它们在理解时间依赖关系和VC方面的独特能力。然而，由于效率低下，持续努力改进Transformer。最近，状态空间模型（SSMs），例如Mamba，由于其能够处理序列中的复杂依赖关系，类似于Transformer，同时保持接近线性复杂度，因此引起了我们对探索SSM在TSF任务中潜力的兴趣。因此，我们提出了一种基于Mamba的模型，命名为Simple-Mamba（S-Mamba）用于TSF。具体地，我们通过线性层自主对每个变量的时间点进行标记化。随后，利用双向Mamba层提取VC，通过由前馈网络用于TD和一个映射层的组合结构生成预测结果。对几个数据集进行的实验表明，S-Mamba具有低计算开销并获得领先的性能。此外，我们进行了大量实验，以深入探讨Mamba与Transformer在TSF中的潜力。我们的代码可在https://github.com/wzhwzhwzh0921/S-D-Mamba找到。

更新时间: 2024-04-02 09:03:37

领域: cs.LG

下载: http://arxiv.org/abs/2403.11144v2

Unleash the Potential of CLIP for Video Highlight Detection

Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge.

Updated: 2024-04-02 09:01:58

标题: 释放 CLIP 在视频重点检测中的潜力

摘要: 多模态和大型语言模型（LLMs）已经彻底改变了对开放世界知识的利用，释放出跨越各种任务和应用的新潜力。在这些领域中，视频领域明显受益于它们的能力。在本文中，我们提出了Highlight-CLIP（HL-CLIP），这是一种旨在通过利用多模态模型中预训练的知识在视频精彩片段检测任务中表现出色的方法。通过简单地微调多模态编码器并结合我们创新的显著性池化技术，据我们所知，我们在精彩片段检测任务中取得了最先进的性能，即QVHighlight基准测试。

更新时间: 2024-04-02 09:01:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01745v1

Intrusion Tolerance for Networked Systems through Two-Level Feedback Control

We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.

Updated: 2024-04-02 09:00:45

标题: 网络系统的入侵容忍性通过两级反馈控制

摘要: 我们将具有服务副本的系统的入侵容忍性表述为一个两级最优控制问题。在本地级别，节点控制器执行入侵恢复，而在全局级别，系统控制器管理复制因子。本地和全局控制问题可以被表述为经典的运营研究问题，即机器更换问题和库存补充问题。基于这种表述，我们设计了一种新颖的控制架构TOLERANCE，用于入侵容忍性系统。我们证明了在两个级别上的最优控制策略具有阈值结构，并设计了计算它们的高效算法。我们在一个仿真环境中实施和评估TOLERANCE，在这里我们运行了10种类型的网络入侵。结果表明，与最先进的入侵容忍系统相比，TOLERANCE可以提高服务可用性并降低运营成本。

更新时间: 2024-04-02 09:00:45

领域: cs.DC,cs.AI,cs.CR,cs.GT,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.01741v1

Weakly-supervised Audio Separation via Bi-modal Semantic Similarity

Conditional sound separation in multi-source audio mixtures without having access to single source sound data during training is a long standing challenge. Existing mix-and-separate based methods suffer from significant performance drop with multi-source training mixtures due to the lack of supervision signal for single source separation cases during training. However, in the case of language-conditional audio separation, we do have access to corresponding text descriptions for each audio mixture in our training data, which can be seen as (rough) representations of the audio samples in the language modality. To this end, in this paper, we propose a generic bi-modal separation framework which can enhance the existing unsupervised frameworks to separate single-source signals in a target modality (i.e., audio) using the easily separable corresponding signals in the conditioning modality (i.e., language), without having access to single-source samples in the target modality during training. We empirically show that this is well within reach if we have access to a pretrained joint embedding model between the two modalities (i.e., CLAP). Furthermore, we propose to incorporate our framework into two fundamental scenarios to enhance separation performance. First, we show that our proposed methodology significantly improves the performance of purely unsupervised baselines by reducing the distribution shift between training and test samples. In particular, we show that our framework can achieve 71% boost in terms of Signal-to-Distortion Ratio (SDR) over the baseline, reaching 97.5% of the supervised learning performance. Second, we show that we can further improve the performance of the supervised learning itself by 17% if we augment it by our proposed weakly-supervised framework, that enables a powerful semi-supervised framework for audio separation.

Updated: 2024-04-02 08:59:58

标题: 通过双模态语义相似性进行弱监督音频分离

摘要: 在多源音频混合物中进行有条件的声音分离，而在训练过程中无法访问单个源声音数据，是一个长期存在的挑战。现有的基于混合和分离的方法在多源训练混合物中表现出明显的性能下降，这是由于在训练过程中缺乏单个源分离情况的监督信号。然而，在语言条件音频分离的情况下，我们可以访问训练数据中每个音频混合物的相应文本描述，这可以被视为语言模态中音频样本的（粗略）表示。因此，在本文中，我们提出了一个通用的双模态分离框架，可以增强现有的无监督框架，以使用在条件模态（即语言）中容易分离的相应信号，来分离目标模态（即音频）中的单一源信号，而在训练过程中无法访问目标模态中的单源样本。我们在实证中展示，如果我们可以访问两种模态之间的预训练联合嵌入模型（即CLAP），这是完全可行的。此外，我们提议将我们的框架纳入两种基本场景中以增强分离性能。首先，我们展示了我们提出的方法显著改善了纯无监督基线的性能，通过减少训练和测试样本之间的分布偏移。特别地，我们展示了我们的框架在信噪比（SDR）方面可以实现71%的提升，达到了监督学习性能的97.5%。其次，我们展示了如果我们通过我们提出的弱监督框架增强监督学习本身，可以进一步提高性能，这使得音频分离的半监督框架更为强大。

更新时间: 2024-04-02 08:59:58

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2404.01740v1

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .

Updated: 2024-04-02 08:59:57

标题: 3D扩散政策：通过简单的3D表示学习可推广的视觉动作策略

摘要: 模仿学习为教授机器人灵巧技能提供了一种高效的方法；然而，学习复杂技能并具有鲁棒性和泛化性通常需要大量的人类示范。为了解决这一具有挑战性的问题，我们提出了3D扩散策略（DP3），这是一种新颖的视觉模仿学习方法，将3D视觉表示的能力融入到扩散策略中，这是一类有条件的动作生成模型。DP3的核心设计是利用高效的点编码器从稀疏点云中提取出紧凑的3D视觉表示。在我们进行的72个模拟任务实验中，DP3仅用10个演示就成功处理了大多数任务，并且相对改进率达到了24.2％，超过了基线。在4个真实机器人任务中，DP3仅提供每个任务40个演示就展示了精确的控制，成功率高达85％，并且在空间、视角、外观和实例等多个方面展现出出色的泛化能力。有趣的是，在真实机器人实验中，与经常违反安全要求并需要人工干预的基线方法相比，DP3很少违反安全要求。我们的广泛评估突出了3D表示在现实世界机器人学习中的关键重要性。视频、代码和数据可在https://3d-diffusion-policy.github.io 上获得。

更新时间: 2024-04-02 08:59:57

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.03954v2

Saliency strikes back: How filtering out high frequencies improves white-box explanations

Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model's decision-making process. We have identified a significant limitation in one type of attribution methods, known as "white-box" methods. Although highly efficient, these methods rely on a gradient signal that is often contaminated by high-frequency noise. To overcome this limitation, we introduce a new approach called "FORGrad". This simple method effectively filters out noise artifacts by using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of already existing white-box methods, enabling them to compete effectively with more accurate yet computationally demanding "black-box" methods. We anticipate that our research will foster broader adoption of simpler and more efficient white-box methods for explainability, offering a better balance between faithfulness and computational efficiency.

Updated: 2024-04-02 08:55:51

标题: 突显再次回归：如何滤除高频率提升白盒解释

摘要: 归因方法对应于一类可解释性方法（XAI），旨在评估个体输入如何影响模型的决策过程。我们在一类归因方法中发现了一个重要限制，即所谓的“白盒”方法。尽管效率很高，这些方法依赖于梯度信号，而这些信号通常被高频噪声污染。为了克服这一限制，我们引入了一种名为“FORGrad”的新方法。这种简单方法通过使用针对每个模型架构独特特征定制的最佳截止频率，有效地过滤掉噪声伪迹。我们的研究结果表明，FORGrad一直在提高已有白盒方法的性能，使它们能够有效地与更准确但计算要求更高的“黑盒”方法竞争。我们预计，我们的研究将促进更广泛地采用更简单和更有效的白盒方法进行可解释性研究，提供更好的忠实度和计算效率之间的平衡。

更新时间: 2024-04-02 08:55:51

领域: cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2307.09591v3

Ignore Me But Don't Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity Domain

Cybersecurity information is often technically complex and relayed through unstructured text, making automation of cyber threat intelligence highly challenging. For such text domains that involve high levels of expertise, pretraining on in-domain corpora has been a popular method for language models to obtain domain expertise. However, cybersecurity texts often contain non-linguistic elements (such as URLs and hash values) that could be unsuitable with the established pretraining methodologies. Previous work in other domains have removed or filtered such text as noise, but the effectiveness of these methods have not been investigated, especially in the cybersecurity domain. We propose different pretraining methodologies and evaluate their effectiveness through downstream tasks and probing tasks. Our proposed strategy (selective MLM and jointly training NLE token classification) outperforms the commonly taken approach of replacing non-linguistic elements (NLEs). We use our domain-customized methodology to train CyBERTuned, a cybersecurity domain language model that outperforms other cybersecurity PLMs on most tasks.

Updated: 2024-04-02 08:46:42

标题: 不要忽视我，但不要取代我：利用非语言元素进行网络安全领域的预训练

摘要: 网络安全信息通常在技术上复杂，并通过非结构化文本传递，这使得自动化网络威胁情报变得非常具有挑战性。对于涉及高级别专业知识的文本领域，预训练在领域语言模型获取专业知识方面已经成为一种流行的方法。然而，网络安全文本通常包含非语言元素（如URL和哈希值），这可能不适合已建立的预训练方法。在其他领域的先前工作中，已将此类文本视为噪音进行移除或过滤，但这些方法的有效性尚未得到调查，尤其是在网络安全领域。我们提出了不同的预训练方法，并通过下游任务和探测任务评估它们的有效性。我们提出的策略（选择性MLM和联合训练NLE标记分类）优于常用的替换非语言元素（NLEs）的方法。我们使用我们的定制领域方法来训练CyBERTuned，一个在大多数任务上优于其他网络安全PLM的网络安全领域语言模型。

更新时间: 2024-04-02 08:46:42

领域: cs.CR,cs.CL,cs.LG,I.2.7

下载: http://arxiv.org/abs/2403.10576v2

FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models

The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denoising iterations misses the opportunity to update fine details, resulting in noticeable quality degradation. In our work, we introduce an advanced acceleration technique that leverages the temporal redundancy inherent in diffusion models. Reusing feature maps with high temporal similarity opens up a new opportunity to save computation resources without compromising output quality. To realize the practical benefits of this intuition, we conduct an extensive analysis and propose a novel method, FRDiff. FRDiff is designed to harness the advantages of both reduced NFE and feature reuse, achieving a Pareto frontier that balances fidelity and latency trade-offs in various generative tasks.

Updated: 2024-04-02 08:40:54

标题: FRDiff：特征复用用于扩散模型的通用无需训练加速

摘要: 扩散模型的计算成本很大，特别是由于生成高质量图像需要重复去噪步骤，这是它们被广泛采用的主要障碍。虽然有几项研究试图通过使用先进的ODE求解器来减少评分函数评估次数（NFE）而不需要微调来解决这个问题，但降低去噪迭代次数错失了更新细节的机会，导致明显的质量下降。在我们的工作中，我们引入了一种利用扩散模型固有的时间冗余的高级加速技术。利用具有高时序相似度的特征图可以为节省计算资源提供新机会，而不影响输出质量。为了实现这种直觉的实际好处，我们进行了广泛的分析，并提出了一种新方法FRDiff。FRDiff旨在利用减少的NFE和特征重用的优势，实现在各种生成任务中平衡保真度和延迟的帕累托前沿。

更新时间: 2024-04-02 08:40:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.03517v2

Asymptotics of Language Model Alignment

Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $\phi$ that results in a higher expected reward while keeping $\phi$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which chooses a distribution $\phi_\Delta$ that maximizes $E_{\phi_{\Delta}} r(y)$ subject to a relative entropy constraint $KL(\phi_\Delta || p) \leq \Delta.$ Another simple alignment method is best-of-$N$, where $N$ samples are drawn from $p$ and one with highest reward is selected. In this paper, we offer a closed-form characterization of the optimal KL-constrained RL solution. We demonstrate that any alignment method that achieves a comparable trade-off between KL divergence and reward must approximate the optimal KL-constrained RL solution in terms of relative entropy. To further analyze the properties of alignment methods, we introduce two simplifying assumptions: we let the language model be memoryless, and the reward model be linear. Although these assumptions may not reflect complex real-world scenarios, they enable a precise characterization of the asymptotic behavior of both the best-of-$N$ alignment, and the KL-constrained RL method, in terms of information-theoretic quantities. We prove that the reward of the optimal KL-constrained RL solution satisfies a large deviation principle, and we fully characterize its rate function. We also show that the rate of growth of the scaled cumulants of the reward is characterized by a proper Renyi cross entropy. Finally, we show that best-of-$N$ is asymptotically equivalent to KL-constrained RL solution by proving that their expected rewards are asymptotically equal, and concluding that the two distributions must be close in KL divergence.

Updated: 2024-04-02 08:40:07

标题: 语言模型对准的渐近性

摘要: 让$p$表示一种生成语言模型。让$r$表示一个奖励模型，返回一个标量，捕捉从$p$中抽取的程度优先。语言模型对齐的目标是改变$p$到一个新分布$\phi$，以获得更高的期望奖励，同时保持$\phi$接近$p$。一种流行的对齐方法是KL受限强化学习（RL），选择一个分布$\phi_\Delta$，最大化$E_{\phi_{\Delta}} r(y)$，受到相对熵约束$KL(\phi_\Delta || p) \leq \Delta$。另一种简单的对齐方法是best-of-$N$，从$p$中抽取$N$个样本，选择奖励最高的一个。在本文中，我们提供了最优KL受限RL解决方案的闭合形式表征。我们证明了任何实现KL散度和奖励之间可比的权衡的对齐方法必须在相对熵方面近似最优KL受限RL解决方案。为了进一步分析对齐方法的特性，我们引入了两个简化假设：让语言模型是无记忆的，奖励模型是线性的。尽管这些假设可能不反映复杂的现实场景，但它们能够精确刻画最优best-of-$N$对齐和KL受限RL方法在信息论量方面的渐近行为。我们证明最优KL受限RL解决方案的奖励满足大偏差原理，并完全表征其速率函数。我们还展示奖励的标度累积量的增长率由适当的Renyi交叉熵表征。最后，我们通过证明它们的期望奖励渐近相等，并得出结论两个分布必须在KL散度上接近，证明best-of-$N$与KL受限RL解决方案渐近等价。

更新时间: 2024-04-02 08:40:07

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2404.01730v1

Learned Kernels for Sparse, Interpretable, and Efficient Medical Time Series Processing

Background: Rapid, reliable, and accurate interpretation of medical signals is crucial for high-stakes clinical decision-making. The advent of deep learning allowed for an explosion of new models that offered unprecedented performance in medical time series processing but at a cost: deep learning models are often compute-intensive and lack interpretability. Methods: We propose Sparse Mixture of Learned Kernels (SMoLK), an interpretable architecture for medical time series processing. The method learns a set of lightweight flexible kernels to construct a single-layer neural network, providing not only interpretability, but also efficiency and robustness. We introduce novel parameter reduction techniques to further reduce the size of our network. We demonstrate the power of our architecture on two important tasks: photoplethysmography (PPG) artifact detection and atrial fibrillation detection from single-lead electrocardiograms (ECGs). Our approach has performance similar to the state-of-the-art deep neural networks with several orders of magnitude fewer parameters, allowing for deep neural network level performance with extremely low-power wearable devices. Results: Our interpretable method achieves greater than 99% of the performance of the state-of-the-art methods on the PPG artifact detection task, and even outperforms the state-of-the-art on a challenging out-of-distribution test set, while using dramatically fewer parameters (2% of the parameters of Segade, and about half of the parameters of Tiny-PPG). On single lead atrial fibrillation detection, our method matches the performance of a 1D-residual convolutional network, at less than 1% the parameter count, while exhibiting considerably better performance in the low-data regime, even when compared to a parameter-matched control deep network.

Updated: 2024-04-02 08:31:51

标题: 学习的核心：用于稀疏、可解释和高效的医学时间序列处理

摘要: 背景：快速、可靠和准确地解释医学信号对于高风险临床决策至关重要。深度学习的出现导致了大量新模型的爆发，这些模型在医学时间序列处理中提供了前所未有的性能，但代价是：深度学习模型通常需要大量计算资源，并且缺乏可解释性。方法：我们提出了Sparse Mixture of Learned Kernels（SMoLK），这是一种用于医学时间序列处理的可解释性架构。该方法学习一组轻量级灵活的内核来构建一个单层神经网络，不仅提供可解释性，还提供效率和鲁棒性。我们引入了新颖的参数减少技术，进一步减小了网络的大小。我们在两个重要任务上展示了我们的架构的强大能力：光电容测量（PPG）伪影检测和单导联心电图（ECG）中的心房颤动检测。我们的方法在PPG伪影检测任务上实现了超过99%的性能，甚至在具有挑战性的分布外测试集上超越了最先进的方法，同时使用的参数显著较少（Segade的参数的2%，Tiny-PPG的参数的约一半）。在单导联心房颤动检测方面，我们的方法与1D残差卷积网络的性能相匹配，参数数量不到1%，而且在低数据情况下表现出更好的性能，甚至与参数匹配的控制深度网络相比也是如此。

更新时间: 2024-04-02 08:31:51

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2307.05385v3

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Recent advances in generative diffusion models have enabled the previously unfeasible capability of generating 3D assets from a single input image or a text prompt. In this work, we aim to enhance the quality and functionality of these models for the task of creating controllable, photorealistic human avatars. We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view-consistent diffusion approach. We demonstrate that accurate conditioning of a generative pipeline on the articulated 3D model enhances the baseline model performance on the task of novel view synthesis from a single image. More importantly, this integration facilitates a seamless and accurate incorporation of facial expression and body pose control into the generation process. To the best of our knowledge, our proposed framework is the first diffusion model to enable the creation of fully 3D-consistent, animatable, and photorealistic human avatars from a single image of an unseen subject; extensive quantitative and qualitative evaluations demonstrate the advantages of our approach over existing state-of-the-art avatar creation models on both novel view and novel expression synthesis tasks. The code for our project is publicly available.

Updated: 2024-04-02 08:29:09

标题: 可塑扩散：用于单图像头像创建的3D一致扩散

摘要: 最近发展的生成式扩散模型使得以前无法实现的能力，即从单个输入图像或文本提示生成3D资产。在这项工作中，我们旨在增强这些模型的质量和功能，用于创建可控、逼真的人类头像。我们通过将3D可变模型集成到最先进的多视角一致扩散方法中来实现这一目标。我们证明，在关节3D模型上准确地条件生成管道可以提高基线模型在从单个图像进行新视图合成任务上的性能。更重要的是，这种集成促进了面部表情和身体姿势控制的无缝和准确地融入到生成过程中。据我们所知，我们提出的框架是第一个能够从未见过的主体的单个图像中创建完全一致、可动画和逼真的人类头像的扩散模型；广泛的定量和定性评估展示了我们的方法在新视图和新表情合成任务上优于现有最先进的头像创建模型的优势。我们项目的代码是公开可用的。

更新时间: 2024-04-02 08:29:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.04728v2

RTPS Attack Dataset Description

This paper explains all about our RTPS datasets. We collect malicious/benign packet data by injecting attack data in an Unmanned Ground Vehicle (UGV) in the normal state. We assembled the testbed, consisting of UGV, Controller, PC, and Router. We collect this dataset in the UGV part of our testbed. We conducted two types of attack "Command Injection" and "Command Injection with ARP Spoofing" on our testbed. The data collection time is 180, 300, 600, and 1200. The scenario has 30 each on collection time, 240 total. We expect this dataset to contribute to the development of defense technologies like anomaly detection to address security threat issues in ROS2 networks and Fast-DDS implements.

Updated: 2024-04-02 08:28:31

标题: RTPS攻击数据集描述

摘要: 这篇论文详细介绍了我们的RTPS数据集。我们通过在无人地面车辆（UGV）的正常状态下注入攻击数据来收集恶意/良性数据包。我们组装了由UGV、控制器、个人电脑和路由器组成的实验平台。我们在实验平台的UGV部分收集了这些数据集。我们在实验平台上进行了两种类型的攻击，即“命令注入”和“带ARP欺骗的命令注入”。数据收集时间分别为180、300、600和1200。场景包括每个收集时间30个，总共240个。我们期望这个数据集能够为ROS2网络和Fast-DDS实现中的安全威胁问题提供异常检测等防御技术的发展做出贡献。

更新时间: 2024-04-02 08:28:31

领域: cs.CR

下载: http://arxiv.org/abs/2311.14496v4

Effective internal language model training and fusion for factorized transducer model

The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external language models. Recently, various of factorized transducer models have been proposed, which explicitly embrace a standalone internal language model for non-blank token prediction. However, even with the adoption of factorized transducer models, limited improvement has been observed compared to shallow fusion. In this paper, we propose a novel ILM training and decoding strategy for factorized transducer models, which effectively combines the blank, acoustic and ILM scores. Our experiments show a 17% relative improvement over the standard decoding method when utilizing a well-trained ILM and the proposed decoding strategy on LibriSpeech datasets. Furthermore, when compared to a strong RNN-T baseline enhanced with external LM fusion, the proposed model yields a 5.5% relative improvement on general-sets and an 8.9% WER reduction for rare words. The proposed model can achieve superior performance without relying on external language models, rendering it highly efficient for production use-cases. To further improve the performance, we propose a novel and memory-efficient ILM-fusion-aware minimum word error rate (MWER) training method which improves ILM integration significantly.

Updated: 2024-04-02 08:01:05

标题: 有效的内部语言模型训练和融合用于因子化转录模型

摘要: 神经传导器的内部语言模型（ILM）已经得到广泛研究。在大多数先前的工作中，它主要用于估计ILM分数，并在推理过程中随后被减去，以促进与外部语言模型的更好集成。最近，已经提出了各种分解的传导器模型，明确采用独立的内部语言模型进行非空白标记预测。然而，即使采用了分解传导器模型，与浅层融合相比，也仅观察到有限的改进。在本文中，我们提出了一种新颖的ILM训练和解码策略，适用于分解传导器模型，有效地结合了空白、声学和ILM分数。我们的实验表明，在LibriSpeech数据集上，当利用经过良好训练的ILM和提出的解码策略时，相对于标准解码方法，我们实现了17%的相对改进。此外，与增强外部LM融合的强RNN-T基线相比，所提出的模型在一般集合上实现了5.5%的相对改进，对于罕见单词实现了8.9%的WER减少。所提出的模型可以在不依赖外部语言模型的情况下实现卓越性能，使其在生产用例中高效使用。为了进一步提高性能，我们提出了一种新颖且内存高效的ILM融合感知最小词错误率（MWER）训练方法，显著改善了ILM集成。

更新时间: 2024-04-02 08:01:05

领域: eess.AS,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.01716v1

A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface

Performing data-intensive tasks in the von Neumann architecture is challenging to achieve both high performance and power efficiency due to the memory wall bottleneck. Computing-in-memory (CiM) is a promising mitigation approach by enabling parallel in-situ multiply-accumulate (MAC) operations within the memory with support from the peripheral interface and datapath. SRAM-based charge-domain CiM (CD-CiM) has shown its potential of enhanced power efficiency and computing accuracy. However, existing SRAM-based CD-CiM faces scaling challenges to meet the throughput requirement of high-performance multi-bit-quantization applications. This paper presents an SRAM-based high-throughput ReLU-optimized CD-CiM macro. It is capable of completing MAC and ReLU of two signed 8b vectors in one CiM cycle with only one A/D conversion. Along with non-linearity compensation for the analog computing and A/D conversion interfaces, this work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.

Updated: 2024-04-02 07:58:41

标题: 一种基于65纳米8比特激活8比特权重的基于SRAM的电荷域计算内存宏，采用全并行模拟加法器网络和单ADC接口

摘要: 在冯诺依曼架构中执行数据密集任务具有挑战性，因为由于内存墙瓶颈，很难实现高性能和功耗效率。内存中计算（CiM）是一种有希望的缓解方法，通过在内存中支持外围接口和数据通路，实现并行原位乘累加（MAC）操作。基于SRAM的电荷域CiM（CD-CiM）已经显示出其提高功耗效率和计算精度的潜力。然而，现有的基于SRAM的CD-CiM面临着扩展挑战，以满足高性能多位量化应用的吞吐量要求。本文提出了一种基于SRAM的高吞吐量ReLU优化CD-CiM宏。它能够在一个CiM周期内完成两个有符号8位向量的MAC和ReLU，仅需一次A/D转换。通过对模拟计算和A/D转换接口进行非线性补偿，这项工作实现了51.2GOPS的吞吐量和10.3TOPS/W的能效，同时在CIFAR-10数据集中展示了88.6%的准确性。

更新时间: 2024-04-02 07:58:41

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2212.04320v2

Faked Speech Detection with Zero Prior Knowledge

Audio is one of the most used ways of human communication, but at the same time it can be easily misused to trick people. With the revolution of AI, the related technologies are now accessible to almost everyone, thus making it simple for the criminals to commit crimes and forgeries. In this work, we introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word 'blindly' refers to the ability to detect mimicked audio without references or real sources. We propose a deep neural network following a sequential model that comprises three hidden layers, with alternating dense and drop out layers. The proposed model was trained on a set of 26 important features extracted from a large dataset of audios to get a classifier that was tested on the same set of features from different audios. The data was extracted from two raw datasets, especially composed for this work; an all English dataset and a mixed dataset (Arabic plus English) (The dataset can be provided, in raw form, by writing an email to the first author). For the purpose of comparison, the audios were also classified through human inspection with the subjects being the native speakers. The ensued results were interesting and exhibited formidable accuracy, as we were able to get at least 94% correct classification of the test cases, as against the 85% accuracy in the case of human observers.

Updated: 2024-04-02 07:58:15

标题: 使用零先验知识进行伪造语音检测

摘要: 音频是人类交流中最常用的方式之一，但同时也很容易被滥用来欺骗人们。随着人工智能的革命，相关技术现在几乎所有人都可以访问，这使得犯罪分子可以轻松地犯罪和伪造。在这项工作中，我们介绍了一种神经网络方法，开发了一个分类器，可以盲目将输入的音频分类为真实或模仿；这里的“盲目”指的是在没有参考或真实来源的情况下检测模仿音频的能力。我们提出了一个基于顺序模型的深度神经网络，包括三个隐藏层，交替使用密集层和丢失层。所提出的模型是在从大量音频数据集中提取的26个重要特征集上进行训练的，以获得一个分类器，该分类器在来自不同音频的同一特征集上进行了测试。数据是从两个原始数据集中提取的，专门为这项工作而准备的；一个是全英文数据集，另一个是混合数据集（阿拉伯语加英语）（数据集可以通过给第一作者写邮件以原始形式提供）。为了进行比较，音频也通过母语使用者的人类检查进行分类。结果令人感兴趣，展示了令人瞩目的准确性，我们至少能够在测试案例中获得94%的正确分类，而人类观察者的准确率为85%。

更新时间: 2024-04-02 07:58:15

领域: cs.SD,cs.AI,cs.LG,cs.MM,cs.NE,eess.AS,68T05, 68T07, 68T10,I.2; I.5; I.m

下载: http://arxiv.org/abs/2209.12573v6

Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning

Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.

Updated: 2024-04-02 07:57:17

标题: 深度学习中基于共轭梯度的自适应动量估计优化算法

摘要: 训练深度神经网络是一项具有挑战性的任务。为了加快训练速度并提高深度神经网络的性能，我们将传统的共轭梯度修正为类共轭梯度，并将其融入通用的Adam优化算法中，从而提出了一种名为CG-like-Adam的新优化算法用于深度学习。具体来说，通用Adam的一阶和二阶矩估计被共轭梯度样式所替代。收敛分析处理了一阶矩估计的指数移动平均系数是常数且一阶矩估计是无偏的情况。数值实验显示了基于CIFAR10/100数据集的所提出算法的优越性。

更新时间: 2024-04-02 07:57:17

领域: cs.LG,cs.AI,cs.CV,math.OC

下载: http://arxiv.org/abs/2404.01714v1

GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

Evaluating and enhancing the general capabilities of large language models (LLMs) has been an important research topic. Graph is a common data structure in the real world, and understanding graph data is a crucial part for advancing general intelligence. To evaluate and enhance the graph understanding abilities of LLMs, in this paper, we propose a benchmark named GraphInstruct, which comprehensively includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed reasoning steps. Based on GraphInstruct, we further construct GraphLM through efficient instruction-tuning, which shows prominent graph understanding capability. In order to enhance the LLM with graph reasoning capability as well, we propose a step mask training strategy, and construct a model named GraphLM+. As one of the pioneering efforts to enhance the graph understanding and reasoning abilities of LLMs, extensive experiments have demonstrated the superiority of GraphLM and GraphLM+ over other LLMs. We look forward to more researchers exploring the potential of LLMs in the graph data mining domain through GraphInstruct. Our code for generating GraphInstruct is released publicly at: https://github.com/CGCL-codes/GraphInstruct.

Updated: 2024-04-02 07:57:16

标题: GraphInstruct：赋予大型语言模型图理解和推理能力

摘要: 评估和增强大型语言模型（LLMs）的通用能力是一个重要的研究课题。图是现实世界中常见的数据结构，理解图数据是推动通用智能的关键部分。为了评估和增强LLMs的图理解能力，在本文中，我们提出了一个名为GraphInstruct的基准测试，全面包括21个经典的图推理任务，提供多样化的图生成管道和详细的推理步骤。基于GraphInstruct，我们进一步通过高效的指令调整构建了GraphLM，显示出显著的图理解能力。为了进一步增强LLM的图推理能力，我们提出了一种步骤掩码训练策略，并构建了一个名为GraphLM+的模型。作为增强LLMs的图理解和推理能力的先驱努力之一，广泛的实验表明GraphLM和GraphLM+在其他LLMs上的优越性。我们期待更多的研究人员通过GraphInstruct探索LLMs在图数据挖掘领域的潜力。我们公开发布了用于生成GraphInstruct的代码：https://github.com/CGCL-codes/GraphInstruct。

更新时间: 2024-04-02 07:57:16

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.04483v2

Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Over the past two decades, the Internet-of-Things (IoT) has been a transformative concept, and as we approach 2030, a new paradigm known as the Internet of Senses (IoS) is emerging. Unlike conventional Virtual Reality (VR), IoS seeks to provide multi-sensory experiences, acknowledging that in our physical reality, our perception extends far beyond just sight and sound; it encompasses a range of senses. This article explores existing technologies driving immersive multi-sensory media, delving into their capabilities and potential applications. This exploration includes a comparative analysis between conventional immersive media streaming and a proposed use case that lever- ages semantic communication empowered by generative Artificial Intelligence (AI). The focal point of this analysis is the substantial reduction in bandwidth consumption by 99.93% in the proposed scheme. Through this comparison, we aim to underscore the practical applications of generative AI for immersive media while addressing the challenges and outlining future trajectories.

Updated: 2024-04-02 07:57:05

标题: 生成式人工智能用于沉浸式通信：通过6G实现感官互联网的下一个前沿

摘要: 在过去的二十年中，物联网（IoT）已经是一个变革性的概念，当我们逼近2030年时，一个名为感知互联网（IoS）的新范式正在崛起。与传统虚拟现实（VR）不同，IoS旨在提供多感官体验，承认在我们的现实中，我们的感知远远不止视觉和听觉；它囊括了一系列感官。本文探讨了推动沉浸式多感官媒体的现有技术，深入探讨它们的能力和潜在应用。这一探索包括传统沉浸式媒体流媒体与一种借助生成式人工智能（AI）赋能的语义通信的提议用例之间的比较分析。这个分析的焦点是在提议方案中带来的带宽消耗大幅减少了99.93%。通过这种比较，我们旨在强调生成式AI在沉浸式媒体中的实际应用，同时解决挑战并概述未来的发展方向。

更新时间: 2024-04-02 07:57:05

领域: cs.CL,cs.AI,cs.HC,cs.MM,cs.NI

下载: http://arxiv.org/abs/2404.01713v1

Efficient Online Unlearning via Hessian-Free Recollection of Individual Data Statistics

Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data. Recent methods suggest that one approach of data forgetting is by precomputing and storing statistics carrying second-order information to improve computational and memory efficiency. However, they rely on restrictive assumptions and the computation/storage suffer from the curse of model parameter dimensionality, making it challenging to apply to most deep neural networks. In this work, we propose a Hessian-free online unlearning method. We propose to maintain a statistical vector for each data point, computed through affine stochastic recursion approximation of the difference between retrained and learned models. Our proposed algorithm achieves near-instantaneous online unlearning as it only requires a vector addition operation. Based on the strategy that recollecting statistics for forgetting data, the proposed method significantly reduces the unlearning runtime. Experimental studies demonstrate that the proposed scheme surpasses existing results by orders of magnitude in terms of time and memory costs, while also enhancing accuracy.

Updated: 2024-04-02 07:54:18

标题: 高效的在线反学习：通过无Hessian回忆个体数据统计量

摘要: 机器遗忘致力于维护数据所有者被遗忘的权利，通过使模型能够选择性地遗忘特定数据。最近的方法表明，一种数据遗忘的方法是通过预先计算和存储携带二阶信息的统计数据，以提高计算和内存效率。然而，它们依赖于限制性假设，并且计算/存储受到模型参数维度的诅咒影响，使得难以应用于大多数深度神经网络。在这项工作中，我们提出了一种不需要Hessian的在线遗忘方法。我们建议为每个数据点维护一个统计向量，通过重新训练和学习模型之间的差异的仿射随机递归近似计算得出。我们提出的算法实现了几乎即时的在线遗忘，因为它只需要进行向量加法操作。根据重新收集遗忘数据统计信息的策略，所提出的方法显著减少了遗忘运行时间。实验研究表明，所提出的方案在时间和内存成本方面比现有结果提高了数个数量级，同时还提高了准确性。

更新时间: 2024-04-02 07:54:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.01712v1

Upsample Guidance: Scale Up Diffusion Models without Training

Diffusion models have demonstrated superior performance across various generative tasks including images, videos, and audio. However, they encounter difficulties in directly generating high-resolution samples. Previously proposed solutions to this issue involve modifying the architecture, further training, or partitioning the sampling process into multiple stages. These methods have the limitation of not being able to directly utilize pre-trained models as-is, requiring additional work. In this paper, we introduce upsample guidance, a technique that adapts pretrained diffusion model (e.g., $512^2$) to generate higher-resolution images (e.g., $1536^2$) by adding only a single term in the sampling process. Remarkably, this technique does not necessitate any additional training or relying on external models. We demonstrate that upsample guidance can be applied to various models, such as pixel-space, latent space, and video diffusion models. We also observed that the proper selection of guidance scale can improve image quality, fidelity, and prompt alignment.

Updated: 2024-04-02 07:49:08

标题: 上采样指导：在不训练的情况下扩大扩散模型

摘要: 扩散模型已经在包括图像、视频和音频在内的各种生成任务中表现出优越性能。然而，它们在直接生成高分辨率样本时遇到困难。先前提出的解决此问题的方法包括修改架构、进一步训练，或将采样过程分为多个阶段。这些方法的局限性在于不能直接使用预训练模型，需要额外的工作。在本文中，我们介绍了上采样引导技术，这是一种通过在采样过程中添加一个单一项来调整预训练扩散模型（如$512^2$）以生成更高分辨率图像（如$1536^2$）的技术。值得注意的是，这种技术不需要任何额外的训练或依赖外部模型。我们证明了上采样引导可以应用于各种模型，如像素空间、潜在空间和视频扩散模型。我们还观察到，正确选择引导尺度可以改善图像质量、保真度和及时对齐。

更新时间: 2024-04-02 07:49:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01709v1

CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization

Generative Adversarial Networks (GANs) significantly advanced image generation but their performance heavily depends on abundant training data. In scenarios with limited data, GANs often struggle with discriminator overfitting and unstable training. Batch Normalization (BN), despite being known for enhancing generalization and training stability, has rarely been used in the discriminator of Data-Efficient GANs. Our work addresses this gap by identifying a critical flaw in BN: the tendency for gradient explosion during the centering and scaling steps. To tackle this issue, we present CHAIN (lipsCHitz continuity constrAIned Normalization), which replaces the conventional centering step with zero-mean regularization and integrates a Lipschitz continuity constraint in the scaling step. CHAIN further enhances GAN training by adaptively interpolating the normalized and unnormalized features, effectively avoiding discriminator overfitting. Our theoretical analyses firmly establishes CHAIN's effectiveness in reducing gradients in latent features and weights, improving stability and generalization in GAN training. Empirical evidence supports our theory. CHAIN achieves state-of-the-art results in data-limited scenarios on CIFAR-10/100, ImageNet, five low-shot and seven high-resolution few-shot image datasets.

Updated: 2024-04-02 07:15:34

标题: CHAIN：通过Lipschitz连续性约束归一化增强数据高效GAN的泛化

摘要: 生成对抗网络（GANs）显著推动了图像生成技术的发展，但它们的性能在很大程度上取决于丰富的训练数据。在数据有限的情况下，GANs经常面临鉴别器过拟合和训练不稳定的困难。尽管批量归一化（BN）以增强泛化和训练稳定性而闻名，但在数据高效的GANs的鉴别器中很少被使用。我们的工作弥补了这一空白，通过识别BN中的一个关键缺陷：在中心化和缩放步骤中存在梯度爆炸的倾向。为了解决这个问题，我们提出了CHAIN（lipsCHitz continuity constrAIned Normalization），它用零均值正则化替代传统的中心化步骤，并在缩放步骤中整合了Lipschitz连续性约束。CHAIN通过自适应插值规范化和非规范化特征，有效避免了鉴别器过拟合，进一步增强了GAN的训练。我们的理论分析坚定地证实了CHAIN在减少潜在特征和权重中的梯度，改善GAN训练的稳定性和泛化能力的有效性。经验证据支持了我们的理论。CHAIN在CIFAR-10/100、ImageNet、五个低样本和七个高分辨率少样本图像数据集的数据有限场景中取得了最先进的结果。

更新时间: 2024-04-02 07:15:34

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.00521v2

Few-shot Link Prediction on N-ary Facts

Hyper-relational facts, which consist of a primary triple (head entity, relation, tail entity) and auxiliary attribute-value pairs, are widely present in real-world Knowledge Graphs (KGs). Link Prediction on Hyper-relational Facts (LPHFs) is to predict a missing element in a hyper-relational fact, which helps populate and enrich KGs. However, existing LPHFs studies usually require an amount of high-quality data. They overlook few-shot relations, which have limited instances, yet are common in real-world scenarios. Thus, we introduce a new task, Few-Shot Link Prediction on Hyper-relational Facts (FSLPHFs). It aims to predict a missing entity in a hyper-relational fact with limited support instances. To tackle FSLPHFs, we propose MetaRH, a model that learns Meta Relational information in Hyper-relational facts. MetaRH comprises three modules: relation learning, support-specific adjustment, and query inference. By capturing meta relational information from limited support instances, MetaRH can accurately predict the missing entity in a query. As there is no existing dataset available for this new task, we construct three datasets to validate the effectiveness of MetaRH. Experimental results on these datasets demonstrate that MetaRH significantly outperforms existing representative models.

Updated: 2024-04-02 07:11:01

标题: 少样本链接预测在N元事实上的应用

摘要: 超关系事实由一个主要三元组（头实体、关系、尾实体）和辅助属性-值对组成，在现实世界的知识图谱（KGs）中广泛存在。在超关系事实上的链接预测（LPHFs）是为了预测一个超关系事实中缺失的元素，从而帮助填充和丰富KGs。然而，现有的LPHFs研究通常需要大量高质量数据。他们忽视了少样本关系，这些关系的实例有限，但在现实场景中很常见。因此，我们引入了一个新任务，即针对超关系事实的少样本链接预测（FSLPHFs）。它旨在预测一个在具有有限支持实例的超关系事实中缺失的实体。为了解决FSLPHFs，我们提出了MetaRH，一个学习超关系事实中元关系信息的模型。MetaRH包括三个模块：关系学习、支持特定调整和查询推理。通过从有限支持实例中捕获元关系信息，MetaRH能够准确预测查询中缺失的实体。由于目前没有可用于这一新任务的现有数据集，我们构建了三个数据集来验证MetaRH的有效性。对这些数据集的实验结果表明，MetaRH明显优于现有的代表性模型。

更新时间: 2024-04-02 07:11:01

领域: cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2305.06104v3

Distributional Drift Adaptation with Temporal Conditional Variational Autoencoder for Multivariate Time Series Forecasting

Due to the non-stationary nature, the distribution of real-world multivariate time series (MTS) changes over time, which is known as distribution drift. Most existing MTS forecasting models greatly suffer from distribution drift and degrade the forecasting performance over time. Existing methods address distribution drift via adapting to the latest arrived data or self-correcting per the meta knowledge derived from future data. Despite their great success in MTS forecasting, these methods hardly capture the intrinsic distribution changes, especially from a distributional perspective. Accordingly, we propose a novel framework temporal conditional variational autoencoder (TCVAE) to model the dynamic distributional dependencies over time between historical observations and future data in MTSs and infer the dependencies as a temporal conditional distribution to leverage latent variables. Specifically, a novel temporal Hawkes attention mechanism represents temporal factors subsequently fed into feed-forward networks to estimate the prior Gaussian distribution of latent variables. The representation of temporal factors further dynamically adjusts the structures of Transformer-based encoder and decoder to distribution changes by leveraging a gated attention mechanism. Moreover, we introduce conditional continuous normalization flow to transform the prior Gaussian to a complex and form-free distribution to facilitate flexible inference of the temporal conditional distribution. Extensive experiments conducted on six real-world MTS datasets demonstrate the TCVAE's superior robustness and effectiveness over the state-of-the-art MTS forecasting baselines. We further illustrate the TCVAE applicability through multifaceted case studies and visualization in real-world scenarios.

Updated: 2024-04-02 06:58:50

标题: 多元时间序列预测中基于时间条件变分自动编码器的分布漂移适应性

摘要: 由于实际多变量时间序列（MTS）的分布在时间上是非平稳的，会发生变化，这种现象被称为分布漂移。大多数现有的MTS预测模型在处理分布漂移时遭受重创，并随着时间推移降低了预测性能。现有方法通过适应最新到达的数据或根据未来数据提取的元知识进行自我校正来解决分布漂移。尽管这些方法在MTS预测方面取得了巨大成功，但它们很少能够从分布的角度捕捉内在的分布变化。因此，我们提出了一个新颖的框架，称为时间条件变分自动编码器（TCVAE），用于建模MTS中历史观察和未来数据之间的动态分布依赖关系，并将这些依赖关系推断为一个时间条件分布，以利用潜变量。具体来说，一种新颖的时间Hawkes注意机制表示时间因素，随后馈入前馈网络以估计潜变量的先验高斯分布。时间因素的表示进一步通过利用门控注意机制来动态调整基于Transformer的编码器和解码器的结构，以应对分布变化。此外，我们引入了条件连续归一化流，将先验高斯分布转换为一种复杂且无形式的分布，以便灵活推断时间条件分布。对六个真实世界的MTS数据集进行的大量实验表明，TCVAE在MTS预测基线方面具有卓越的鲁棒性和有效性。我们通过多方面的案例研究和在真实场景中的可视化进一步说明了TCVAE的适用性。

更新时间: 2024-04-02 06:58:50

领域: cs.LG,68Txx,I.2.6

下载: http://arxiv.org/abs/2209.00654v4

Preventing Model Collapse in Gaussian Process Latent Variable Models

Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models, commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, which leads to a type of model collapse characterized primarily by vague latent representations that do not reflect the underlying structure of the data. This paper addresses these issues by, first, theoretically examining the impact of the projection variance on model collapse through the lens of a linear GPLVM. Second, we address the problem of model collapse due to inadequate kernel flexibility by integrating the spectral mixture (SM) kernel and a differentiable random Fourier feature (RFF) kernel approximation, which ensures computational scalability and efficiency through off-the-shelf automatic differentiation tools for learning the kernel hyperparameters, projection variance, and latent representations within the variational inference framework. The proposed GPLVM, named advisedRFLVM, is evaluated across diverse datasets and consistently outperforms various salient competing models, including state-of-the-art variational autoencoders (VAEs) and GPLVM variants, in terms of informative latent representations and missing data imputation.

Updated: 2024-04-02 06:58:41

标题: 在高斯过程潜变量模型中防止模型崩溃

摘要: 高斯过程潜变量模型（GPLVMs）是一种多功能的无监督学习模型，通常用于降维。然而，在使用GPLVMs对数据建模时常见的挑战包括核灵活性不足和投影噪声选择不当，这导致一种模型崩溃，主要特征是模糊的潜在表示不能反映数据的潜在结构。本文首先通过线性GPLVM的视角从理论上研究了投影方差对模型崩溃的影响。其次，我们通过整合谱混合（SM）核和可微随机傅立叶特征（RFF）核逼近解决由于核灵活性不足导致的模型崩溃问题，这确保了通过现成的自动微分工具学习核超参数、投影方差和潜在表示在变分推断框架内的计算可伸缩性和效率。所提出的GPLVM，命名为advisedRFLVM，在多样的数据集上进行评估，并在信息丰富的潜在表示和缺失数据填补方面始终表现优于各种突出的竞争模型，包括最先进的变分自动编码器（VAEs）和GPLVM变体。

更新时间: 2024-04-02 06:58:41

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.01697v1

Selective Temporal Knowledge Graph Reasoning

Temporal Knowledge Graph (TKG), which characterizes temporally evolving facts in the form of (subject, relation, object, timestamp), has attracted much attention recently. TKG reasoning aims to predict future facts based on given historical ones. However, existing TKG reasoning models are unable to abstain from predictions they are uncertain, which will inevitably bring risks in real-world applications. Thus, in this paper, we propose an abstention mechanism for TKG reasoning, which helps the existing models make selective, instead of indiscriminate, predictions. Specifically, we develop a confidence estimator, called Confidence Estimator with History (CEHis), to enable the existing TKG reasoning models to first estimate their confidence in making predictions, and then abstain from those with low confidence. To do so, CEHis takes two kinds of information into consideration, namely, the certainty of the current prediction and the accuracy of historical predictions. Experiments with representative TKG reasoning models on two benchmark datasets demonstrate the effectiveness of the proposed CEHis.

Updated: 2024-04-02 06:56:21

标题: 选择性时间知识图推理

摘要: 时间知识图谱（TKG）以（主体，关系，客体，时间戳）的形式表征了随时间演变的事实，最近引起了广泛关注。TKG推理旨在基于给定的历史事实预测未来事实。然而，现有的TKG推理模型无法避免对其不确定性的预测，这将不可避免地在现实世界的应用中带来风险。因此，在本文中，我们提出了一种适用于TKG推理的弃权机制，可以帮助现有模型进行选择性而非无差别的预测。具体来说，我们开发了一种置信度估计器，称为带有历史信息的置信度估计器（CEHis），使现有的TKG推理模型首先能够估计其在进行预测时的置信度，然后放弃那些置信度较低的预测。为此，CEHis考虑了两种信息，即当前预测的确定性和历史预测的准确性。在两个基准数据集上对代表性TKG推理模型进行的实验证明了所提出的CEHis的有效性。

更新时间: 2024-04-02 06:56:21

领域: cs.LG

下载: http://arxiv.org/abs/2404.01695v1

HeMeNet: Heterogeneous Multichannel Equivariant Network for Protein Multitask Learning

Understanding and leveraging the 3D structures of proteins is central to a variety of biological and drug discovery tasks. While deep learning has been applied successfully for structure-based protein function prediction tasks, current methods usually employ distinct training for each task. However, each of the tasks is of small size, and such a single-task strategy hinders the models' performance and generalization ability. As some labeled 3D protein datasets are biologically related, combining multi-source datasets for larger-scale multi-task learning is one way to overcome this problem. In this paper, we propose a neural network model to address multiple tasks jointly upon the input of 3D protein structures. In particular, we first construct a standard structure-based multi-task benchmark called Protein-MT, consisting of 6 biologically relevant tasks, including affinity prediction and property prediction, integrated from 4 public datasets. Then, we develop a novel graph neural network for multi-task learning, dubbed Heterogeneous Multichannel Equivariant Network (HeMeNet), which is E(3) equivariant and able to capture heterogeneous relationships between different atoms. Besides, HeMeNet can achieve task-specific learning via the task-aware readout mechanism. Extensive evaluations on our benchmark verify the effectiveness of multi-task learning, and our model generally surpasses state-of-the-art models.

Updated: 2024-04-02 06:53:45

标题: HeMeNet：用于蛋白质多任务学习的异质多通道等变网络

摘要: 理解和利用蛋白质的三维结构对于各种生物学和药物发现任务至关重要。虽然深度学习在基于结构的蛋白质功能预测任务中已成功应用，但当前方法通常针对每个任务单独进行训练。然而，每个任务的规模都很小，这种单一任务策略限制了模型的性能和泛化能力。由于一些带标签的3D蛋白质数据集在生物学上相关，将多源数据集结合起来进行更大规模的多任务学习是克服这个问题的一种方法。在本文中，我们提出了一个神经网络模型，以3D蛋白质结构为输入共同解决多个任务。特别是，我们首先构建了一个名为Protein-MT的标准基于结构的多任务基准，包括来自4个公共数据集的6个生物相关任务，包括亲和力预测和属性预测。然后，我们开发了一种新颖的图神经网络用于多任务学习，称为异质多通道等变网络（HeMeNet），它是E（3）等变的，并能够捕捉不同原子之间的异质关系。此外，HeMeNet可以通过任务感知的读出机制实现任务特定学习。在我们的基准测试中进行了广泛评估，验证了多任务学习的有效性，我们的模型通常优于最先进的模型。

更新时间: 2024-04-02 06:53:45

领域: cs.LG

下载: http://arxiv.org/abs/2404.01693v1

Can LLMs get help from other LLMs without revealing private information?

Cascades are a common type of machine learning systems in which a large, remote model can be queried if a local model is not able to accurately label a user's data by itself. Serving stacks for large language models (LLMs) increasingly use cascades due to their ability to preserve task performance while dramatically reducing inference costs. However, applying cascade systems in situations where the local model has access to sensitive data constitutes a significant privacy risk for users since such data could be forwarded to the remote model. In this work, we show the feasibility of applying cascade systems in such setups by equipping the local model with privacy-preserving techniques that reduce the risk of leaking private information when querying the remote model. To quantify information leakage in such setups, we introduce two privacy measures. We then propose a system that leverages the recently introduced social learning paradigm in which LLMs collaboratively learn from each other by exchanging natural language. Using this paradigm, we demonstrate on several datasets that our methods minimize the privacy loss while at the same time improving task performance compared to a non-cascade baseline.

Updated: 2024-04-02 06:49:33

标题: LLM们能否在不透露私人信息的情况下得到其他LLMs的帮助？

摘要: 级联是一种常见的机器学习系统类型，其中如果本地模型无法准确标记用户数据，则可以查询一个大型远程模型。用于大型语言模型(LLMs)的Serving堆栈越来越多地使用级联，因为它们能够在显著降低推理成本的同时保持任务性能。然而，在本地模型可以访问敏感数据的情况下应用级联系统构成对用户的重大隐私风险，因为这些数据可能被转发到远程模型。在这项工作中，我们展示了通过为本地模型配备隐私保护技术来降低查询远程模型时泄露私人信息风险的可行性。为了量化这种设置中的信息泄漏，我们引入了两个隐私度量。然后，我们提出了一个系统，利用最近引入的社交学习范式，在这个范式中，LLMs通过交换自然语言协同学习。利用这种范式，我们在几个数据集上演示了我们的方法相对于非级联基线不仅最小化了隐私损失，同时提高了任务性能。

更新时间: 2024-04-02 06:49:33

领域: cs.LG,cs.AI,cs.CR,cs.MA

下载: http://arxiv.org/abs/2404.01041v2

A Lightweight Security Solution for Mitigation of Hatchetman Attack in RPL-based 6LoWPAN

In recent times, the Internet of Things (IoT) has a significant rise in industries, and we live in the era of Industry 4.0, where each device is connected to the Internet from small to big. These devices are Artificial Intelligence (AI) enabled and are capable of perspective analytics. By 2023, it's anticipated that over 14 billion smart devices will be available on the Internet. These applications operate in a wireless environment where memory, power, and other resource limitations apply to the nodes. In addition, the conventional routing method is ineffective in networks with limited resource devices, lossy links, and slow data rates. Routing Protocol for Low Power and Lossy Networks (RPL), a new routing protocol for such networks, was proposed by the IETF's ROLL group. RPL operates in two modes: Storing and Non-Storing. In Storing mode, each node have the information to reach to other node. In Non-Storing mode, the routing information lies with the root node only. The attacker may exploit the Non-Storing feature of the RPL. When the root node transmits User Datagram Protocol~(UDP) or control message packet to the child nodes, the routing information is stored in the extended header of the IPv6 packet. The attacker may modify the address from the source routing header which leads to Denial of Service (DoS) attack. This attack is RPL specific which is known as Hatchetman attack. This paper shows significant degradation in terms of network performance when an attacker exploits this feature. We also propose a lightweight mitigation of Hatchetman attack using game theoretic approach to detect the Hatchetman attack in IoT.

Updated: 2024-04-02 06:48:33

标题: 一种用于减轻RPL-based 6LoWPAN中斧头手攻击的轻量级安全解决方案

摘要: 最近，物联网（IoT）在各行各业中显著增长，我们生活在工业4.0时代，每个设备都与互联网连接，从小到大都是如此。这些设备具有人工智能（AI）功能，并能进行透视分析。预计到2023年，将有超过140亿智能设备可用于互联网。这些应用程序在无线环境中运行，其中存储器、电源和其他资源限制适用于节点。此外，传统的路由方法在资源有限、丢包严重、数据传输速率慢的网络中效果不佳。低功耗和丢包网络（LLN）的路由协议（RPL）是IETF的ROLL组提出的一种新型路由协议。RPL有两种模式：存储和非存储。在存储模式下，每个节点都有到其他节点的信息。在非存储模式下，路由信息仅存储在根节点中。攻击者可能利用RPL的非存储特性。当根节点向子节点传输用户数据报协议（UDP）或控制消息数据包时，路由信息存储在IPv6数据包的扩展头中。攻击者可能修改源路由头中的地址，导致拒绝服务（DoS）攻击。这种攻击是特定于RPL的，被称为“Hatchetman”攻击。本文显示，当攻击者利用这一特性时，网络性能会显著下降。我们还提出了一种轻量级的Hatchetman攻击缓解措施，采用博弈论方法来检测物联网中的Hatchetman攻击。

更新时间: 2024-04-02 06:48:33

领域: cs.CR

下载: http://arxiv.org/abs/2404.01689v1

A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling

Spiking Neural Networks (SNNs) can offer ultra low power/ energy consumption for machine learning-based applications due to their sparse spike-based operations. Currently, most of the SNN architectures need a significantly larger model size to achieve higher accuracy, which is not suitable for resource-constrained embedded applications. Therefore, developing SNNs that can achieve high accuracy with acceptable memory footprint is highly needed. Toward this, we propose a novel methodology that improves the accuracy of SNNs through kernel size scaling. Its key steps include investigating the impact of different kernel sizes on the accuracy, devising new sets of kernel sizes, generating SNN architectures based on the selected kernel sizes, and analyzing the accuracy-memory trade-offs for SNN model selection. The experimental results show that our methodology achieves higher accuracy than state-of-the-art (93.24% accuracy for CIFAR10 and 70.84% accuracy for CIFAR100) with less than 10M parameters and up to 3.45x speed-up of searching time, thereby making it suitable for embedded applications.

Updated: 2024-04-02 06:42:14

标题: 一种通过核大小缩放改善嵌入式脉冲神经网络准确性的方法论

摘要: 脉冲神经网络（SNNs）可以通过其稀疏的基于脉冲的操作，在机器学习应用中提供超低功耗/能耗。目前，大多数SNN架构需要更大的模型尺寸才能实现更高的准确性，这不适用于资源受限的嵌入式应用。因此，迫切需要开发能够在可接受的内存占用下实现高准确性的SNN。为此，我们提出了一种通过核大小缩放改善SNN准确性的新方法。其关键步骤包括研究不同核大小对准确性的影响，设计新的核大小集，基于选定的核大小生成SNN架构，并分析准确性-内存折衷在SNN模型选择中的作用。实验结果显示，我们的方法比最先进的方法实现了更高的准确性（CIFAR10为93.24％准确性，CIFAR100为70.84％准确性），参数少于10M，并且搜索时间高达3.45倍的加速，从而适用于嵌入式应用。

更新时间: 2024-04-02 06:42:14

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01685v1

SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces

Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their memory consumption, which increases quadratically with the length of the sequence. This limitation presents significant challenges when attempting to generate longer video sequences using diffusion models. To overcome this challenge, we propose leveraging state-space models (SSMs). SSMs have recently gained attention as viable alternatives due to their linear memory consumption relative to sequence length. In the experiments, we first evaluate our SSM-based model with UCF101, a standard benchmark of video generation. In addition, to investigate the potential of SSMs for longer video generation, we perform an experiment using the MineRL Navigate dataset, varying the number of frames to 64, 200, and 400. In these settings, our SSM-based model can considerably save memory consumption for longer sequences, while maintaining competitive FVD scores to the attention-based models. Our codes are available at https://github.com/shim0114/SSM-Meets-Video-Diffusion-Models.

Updated: 2024-04-02 06:38:18

标题: SSM遇见视频扩散模型：利用结构化状态空间高效生成视频

摘要: 鉴于扩散模型在图像生成方面取得的显著成就，研究界对将这些模型扩展到视频生成表现出越来越浓厚的兴趣。最近用于视频生成的扩散模型主要利用注意力层来提取时间特征。然而，注意力层受到其内存消耗的限制，随着序列长度的增加呈二次增长。在尝试使用扩散模型生成更长的视频序列时，这种限制带来了重大挑战。为了克服这一挑战，我们提出利用状态空间模型（SSMs）。由于相对于序列长度，SSMs具有线性内存消耗，因此最近引起了人们的关注作为可行的替代方案。在实验中，我们首先使用UCF101对基于SSM的模型进行评估，这是视频生成的标准基准。此外，为了探究SSMs在更长视频生成中的潜力，我们使用了MineRL Navigate数据集进行了一个实验，将帧数变化为64、200和400。在这些设置中，我们基于SSM的模型可以显著节省更长序列的内存消耗，同时保持竞争力的FVD分数与基于注意力的模型。我们的代码可在https://github.com/shim0114/SSM-Meets-Video-Diffusion-Models找到。

更新时间: 2024-04-02 06:38:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.07711v3

Learning The Likelihood Test With One-Class Classifiers

Given an observation randomly generated from two alternative probability density functions (pdfs) P0 and P1, we consider the problem of deciding which pdf generated the observation. To design the decision technique we assume that we either know P0 or have a set of samples generated from it; the P1 pdf is instead completely unknown. Such a scenario arises, for example, in security contexts, where the attacker's behavior is completely unknown to the legitimate users. When the P0 pdf is known, we resort to the likelihood test (LT), while when a set of samples with its distribution is available, we resort to one-class classification (OCC). We focus on the problem of learning OCC models that operate as the LT. We show this occurs for the multilayer perceptron neural network (NN) and the one-class least-squares support vector machine (OCLSSVM) models properly trained as two-class classifiers using an artificial dataset for the negative class, obtained by generating samples uniformly distributed over the domain of the positive class dataset. The artificial dataset is used only for training, while the OCC is used on negative-class samples generated from a different pdf. We also derive a modified stochastic gradient descent (SGD) algorithm that provides OCC operating as LT without the need for the artificial dataset. Furthermore, we show that the OCLSSVM with suitable kernels operates as the LT at convergence. Lastly, we prove that the widely used autoencoder (AE) classifier generally does not provide the LT.

Updated: 2024-04-02 06:37:59

标题: 学习似然比检验与单类分类器

摘要: 考虑到从两种不同的概率密度函数（pdfs）P0和P1中随机生成的观测结果，我们考虑决定哪个pdf生成了观测结果的问题。为了设计决策技术，我们假设我们要么知道P0，要么有一组从中生成的样本；而P1的pdf则完全未知。例如，在安全环境中，攻击者的行为对合法用户完全未知时会出现这种情况。当P0的pdf已知时，我们使用似然度检验（LT），而当有一组样本及其分布可用时，我们使用一类分类（OCC）。我们专注于学习OCC模型的问题，使其像LT一样运作。我们展示了多层感知器神经网络（NN）和一类最小二乘支持向量机（OCLSSVM）模型，经过适当训练后可以作为两类分类器，使用人工数据集作为负类获得样本，这些样本均匀分布在正类数据集的域上。人工数据集仅用于训练，而OCC用于从不同pdf生成的负类样本。我们还推导了一个修改后的随机梯度下降（SGD）算法，提供了无需人工数据集即可运行作为LT的OCC。此外，我们展示了适用于核函数的OCLSSVM在收敛时可以作为LT运行。最后，我们证明了广泛使用的自编码器（AE）分类器通常不提供LT。

更新时间: 2024-04-02 06:37:59

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2210.12494v3

Evolution and Efficiency in Neural Architecture Search: Bridging the Gap Between Expert Design and Automated Optimization

The paper provides a comprehensive overview of Neural Architecture Search (NAS), emphasizing its evolution from manual design to automated, computationally-driven approaches. It covers the inception and growth of NAS, highlighting its application across various domains, including medical imaging and natural language processing. The document details the shift from expert-driven design to algorithm-driven processes, exploring initial methodologies like reinforcement learning and evolutionary algorithms. It also discusses the challenges of computational demands and the emergence of efficient NAS methodologies, such as Differentiable Architecture Search and hardware-aware NAS. The paper further elaborates on NAS's application in computer vision, NLP, and beyond, demonstrating its versatility and potential for optimizing neural network architectures across different tasks. Future directions and challenges, including computational efficiency and the integration with emerging AI domains, are addressed, showcasing NAS's dynamic nature and its continued evolution towards more sophisticated and efficient architecture search methods.

Updated: 2024-04-02 06:35:04

标题: 神经架构搜索中的演化与效率：弥合专家设计与自动优化之间的差距

摘要: 本文提供了神经结构搜索（NAS）的全面概述，强调了其从手动设计到自动化、计算驱动方法的演变。它涵盖了NAS的起源和发展，强调了它在各个领域的应用，包括医学成像和自然语言处理。文档详细介绍了从专家驱动设计到算法驱动过程的转变，探讨了最初的方法论，如强化学习和进化算法。它还讨论了计算需求的挑战以及高效NAS方法的出现，例如可微架构搜索和硬件感知NAS。本文进一步阐述了NAS在计算机视觉、自然语言处理等领域的应用，展示了其在优化神经网络架构跨不同任务上的多功能性和潜力。未来的方向和挑战，包括计算效率和与新兴人工智能领域的整合，都得到了解决，展示了NAS的动态性以及其持续朝着更复杂、更高效的架构搜索方法的发展。

更新时间: 2024-04-02 06:35:04

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2403.17012v2

Separating and Learning Latent Confounders to Enhancing User Preferences Modeling

Recommender models aim to capture user preferences from historical feedback and then predict user-specific feedback on candidate items. However, the presence of various unmeasured confounders causes deviations between the user preferences in the historical feedback and the true preferences, resulting in models not meeting their expected performance. Existing debias models either (1) specific to solving one particular bias or (2) directly obtain auxiliary information from user historical feedback, which cannot identify whether the learned preferences are true user preferences or mixed with unmeasured confounders. Moreover, we find that the former recommender system is not only a successor to unmeasured confounders but also acts as an unmeasured confounder affecting user preference modeling, which has always been neglected in previous studies. To this end, we incorporate the effect of the former recommender system and treat it as a proxy for all unmeasured confounders. We propose a novel framework, Separating and Learning Latent Confounders For Recommendation (SLFR), which obtains the representation of unmeasured confounders to identify the counterfactual feedback by disentangling user preferences and unmeasured confounders, then guides the target model to capture the true preferences of users. Extensive experiments in five real-world datasets validate the advantages of our method.

Updated: 2024-04-02 06:31:59

标题: 将文献标题翻译为：分离和学习潜在混杂因素以增强用户偏好建模

摘要: 推荐模型旨在从历史反馈中捕获用户偏好，然后预测用户对候选项目的特定反馈。然而，各种未测量的混淆因素存在会导致历史反馈中的用户偏好与真实偏好之间存在偏差，导致模型未达到预期的性能。现有的去偏置模型要么特定于解决特定偏见，要么直接从用户历史反馈中获取辅助信息，这不能确定学习到的偏好是真实用户偏好还是混入了未测量的混淆因素。此外，我们发现前一个推荐系统不仅是未测量的混淆因素的继任者，还会作为未测量的混淆因素影响用户偏好建模，这在先前的研究中一直被忽视。因此，我们将前一个推荐系统的影响纳入考虑，并将其视为所有未测量混淆因素的代理。我们提出了一个新颖的框架，用于推荐的分离和学习潜在混淆因素（SLFR），该框架通过分离用户偏好和未测量混淆因素来获取未测量混淆因素的表示，然后引导目标模型捕获用户的真实偏好。在五个真实世界数据集上进行的广泛实验验证了我们方法的优势。

更新时间: 2024-04-02 06:31:59

领域: cs.IR,cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2311.03381v2

Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation

Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simple reasoning problems, which significantly decreases their generalization ability. To address this issue, we propose a novel framework, named Generalizable and Faithful Reasoner (GFaiR), which introduces the paradigm of resolution refutation. Resolution refutation has the capability to solve all first-order logic reasoning problems by extending reasoning rules and employing the principle of proof by contradiction, so our system's completeness can be improved by introducing resolution refutation. Experimental results demonstrate that our system outperforms previous works by achieving state-of-the-art performances in complex scenarios while maintaining performances in simple scenarios. Besides, we observe that GFaiR is faithful to its reasoning process.

Updated: 2024-04-02 06:28:44

标题: 朝向通过归结推理实现自然语言的通用和可靠逻辑推理

摘要: 大型语言模型（LLMs）在各种自然语言推理任务中取得了显著的性能。然而，它们仍然在执行表达为自然语言的形式逻辑理论的一阶逻辑推理方面遇到困难。这是因为先前基于LLMs的推理系统存在理论上的不完整性问题。因此，它只能解决一组有限的简单推理问题，这显著降低了它们的泛化能力。为了解决这个问题，我们提出了一个新颖的框架，名为Generalizable and Faithful Reasoner（GFaiR），引入了解析推翻范式。解析推翻具有通过扩展推理规则和应用矛盾证明原则来解决所有一阶逻辑推理问题的能力，因此我们的系统的完整性可以通过引入解析推翻得到改进。实验结果表明，我们的系统在复杂场景中表现出色，同时在简单场景中保持了性能。此外，我们发现GFaiR对其推理过程是忠实的。

更新时间: 2024-04-02 06:28:44

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.01677v1

Incentives in Private Collaborative Machine Learning

Collaborative machine learning involves training models on data from multiple parties but must incentivize their participation. Existing data valuation methods fairly value and reward each party based on shared data or model parameters but neglect the privacy risks involved. To address this, we introduce differential privacy (DP) as an incentive. Each party can select its required DP guarantee and perturb its sufficient statistic (SS) accordingly. The mediator values the perturbed SS by the Bayesian surprise it elicits about the model parameters. As our valuation function enforces a privacy-valuation trade-off, parties are deterred from selecting excessive DP guarantees that reduce the utility of the grand coalition's model. Finally, the mediator rewards each party with different posterior samples of the model parameters. Such rewards still satisfy existing incentives like fairness but additionally preserve DP and a high similarity to the grand coalition's posterior. We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.

Updated: 2024-04-02 06:28:22

标题: 私人合作机器学习中的激励措施

摘要: 协作式机器学习涉及在来自多个方当的数据上训练模型，但必须激励他们的参与。现有的数据估值方法公平地根据共享数据或模型参数对每个方当进行估值和奖励，但忽视了涉及的隐私风险。为了解决这个问题，我们引入差分隐私（DP）作为一种激励手段。每个方当可以选择所需的DP保证，并相应地扰动其充分统计量（SS）。中介通过模型参数引发的贝叶斯惊喜来估值扰动的SS。由于我们的估值函数强制执行隐私估值权衡，各方当被阻止选择过高的DP保证，以减少大联盟模型的效用。最后，中介用模型参数的不同后验样本奖励每个方当。这种奖励仍然满足现有的激励措施，如公平性，但同时保护DP和与大联盟后验的高相似性。我们在合成和真实世界数据集上经验地展示了我们方法的有效性和实用性。

更新时间: 2024-04-02 06:28:22

领域: cs.LG

下载: http://arxiv.org/abs/2404.01676v1

How COVID-19 has Impacted the Anti-Vaccine Discourse: A Large-Scale Twitter Study Spanning Pre-COVID and Post-COVID Era

The debate around vaccines has been going on for decades, but the COVID-19 pandemic showed how crucial it is to understand and mitigate anti-vaccine sentiments. While the pandemic may be over, it is still important to understand how the pandemic affected the anti-vaccine discourse, and whether the arguments against non-COVID vaccines (e.g., Flu, MMR, IPV, HPV vaccines) have also changed due to the pandemic. This study attempts to answer these questions through a large-scale study of anti-vaccine posts on Twitter. Almost all prior works that utilized social media to understand anti-vaccine opinions considered only the three broad stances of Anti-Vax, Pro-Vax, and Neutral. There has not been any effort to identify the specific reasons/concerns behind the anti-vax sentiments (e.g., side-effects, conspiracy theories, political reasons) on social media at scale. In this work, we propose two novel methods for classifying tweets into 11 different anti-vax concerns -- a discriminative approach (entailment-based) and a generative approach (based on instruction tuning of LLMs) -- which outperform several strong baselines. We then apply this classifier on anti-vaccine tweets posted over a 5-year period (Jan 2018 - Jan 2023) to understand how the COVID-19 pandemic has impacted the anti-vaccine concerns among the masses. We find that the pandemic has made the anti-vaccine discourse far more complex than in the pre-COVID times, and increased the variety of concerns being voiced. Alarmingly, we find that concerns about COVID vaccines are now being projected onto the non-COVID vaccines, thus making more people hesitant in taking vaccines in the post-COVID era.

Updated: 2024-04-02 06:18:41

标题: COVID-19如何影响了反疫苗话语：一项跨越COVID前后时代的大规模Twitter研究

摘要: 围绕疫苗的辩论已经持续了数十年，但COVID-19大流行显示了理解和缓解反疫苗情绪的重要性。尽管大流行可能已经结束，但仍然重要的是了解大流行如何影响了反疫苗话语，并且是否也由于大流行改变了针对非COVID疫苗（例如流感、麻疹、脊髓灰质炎、HPV疫苗）的论点。本研究试图通过对Twitter上反疫苗帖子的大规模研究来回答这些问题。几乎所有之前利用社交媒体了解反疫苗观点的作品仅考虑了反疫苗、支持疫苗和中立三种广泛立场。迄今为止还没有任何努力在社交媒体上大规模识别反疫苗情绪背后的具体原因/担忧（例如副作用、阴谋论、政治原因）。在这项工作中，我们提出了两种新颖的方法，将推文分类为11种不同的反疫苗担忧 -- 一种是基于蕴涵的歧视性方法，另一种是基于LLMs指导调整的生成性方法 -- 这些方法胜过了几种强大的基准线。然后，我们将这个分类器应用于在5年时间内发布的反疫苗推文（2018年1月 - 2023年1月），以了解COVID-19大流行如何影响了大众对反疫苗担忧。我们发现，大流行使反疫苗话语比COVID之前更加复杂，增加了被表达的担忧种类。令人担忧的是，我们发现对COVID疫苗的担忧现在被投射到非COVID疫苗上，因此在COVID后时代更多的人对接种疫苗持怀疑态度。

更新时间: 2024-04-02 06:18:41

领域: cs.SI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.01669v1

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Despite the many use cases for large language models (LLMs) in creating personalized chatbots, there has been limited research on evaluating the extent to which the behaviors of personalized LLMs accurately and consistently reflect specific personality traits. We consider studying the behavior of LLM-based agents which we refer to as LLM personas and present a case study with GPT-3.5 and GPT-4 to investigate whether LLMs can generate content that aligns with their assigned personality profiles. To this end, we simulate distinct LLM personas based on the Big Five personality model, have them complete the 44-item Big Five Inventory (BFI) personality test and a story writing task, and then assess their essays with automatic and human evaluations. Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types, with large effect sizes observed across five traits. Additionally, LLM personas' writings have emerging representative linguistic patterns for personality traits when compared with a human writing corpus. Furthermore, human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%. Interestingly, the accuracy drops significantly when the annotators were informed of AI authorship.

Updated: 2024-04-02 06:06:53

标题: PersonaLLM: 探究大型语言模型表达个性特质的能力

摘要: 尽管大型语言模型（LLMs）在创建个性化聊天机器人方面有许多用例，但对于评估个性化LLMs的行为在多大程度上准确和一致地反映特定人格特质的研究有限。我们考虑研究基于LLM的代理人的行为，我们将其称为LLM人格，并通过与GPT-3.5和GPT-4的案例研究，调查LLMs是否能生成与其分配的人格档案一致的内容。为此，我们根据五大人格模型模拟不同的LLM人格，让它们完成44项五大人格问卷（BFI）人格测试和一个故事写作任务，然后通过自动和人工评估评估他们的作文。结果显示，LLM人格的自我报告BFI分数与其指定的人格类型一致，五个特质上观察到了较大的效应大小。此外，与人类写作语料库相比，LLM人格的写作具有新兴的人格特质代表性语言模式。此外，人类评估显示，人类可以以高达80%的准确率感知一些人格特质。有趣的是，当评注者被告知是AI作者时，准确率显著下降。

更新时间: 2024-04-02 06:06:53

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2305.02547v5

AI Act and Large Language Models (LLMs): When critical issues and privacy impact require human and ethical oversight

The imposing evolution of artificial intelligence systems and, specifically, of Large Language Models (LLM) makes it necessary to carry out assessments of their level of risk and the impact they may have in the area of privacy, personal data protection and at an ethical level, especially on the weakest and most vulnerable. This contribution addresses human oversight, ethical oversight, and privacy impact assessment.

Updated: 2024-04-02 06:05:29

标题: AI法案和大型语言模型（LLM）：当关键问题和隐私影响需要人类和伦理监督时

摘要: 人工智能系统的迅速发展，特别是大型语言模型（LLM）的发展，使得有必要对它们的风险水平和可能对隐私、个人数据保护以及伦理层面产生的影响进行评估，尤其是对最弱势和最脆弱的人群。本文讨论了人类监督、伦理监督和隐私影响评估。

更新时间: 2024-04-02 06:05:29

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.00600v2

Configuration Validation with Large Language Models

Misconfigurations are major causes of software failures. Existing practices rely on developer-written rules or test cases to validate configurations, which are expensive. Machine learning (ML) for configuration validation is considered a promising direction, but has been facing challenges such as the need of large-scale field data and system-specific models. Recent advances in Large Language Models (LLMs) show promise in addressing some of the long-lasting limitations of ML-based configuration validation. We present a first analysis on the feasibility and effectiveness of using LLMs for configuration validation. We empirically evaluate LLMs as configuration validators by developing a generic LLM-based configuration validation framework, named Ciri. Ciri employs effective prompt engineering with few-shot learning based on both valid configuration and misconfiguration data. Ciri checks outputs from LLMs when producing results, addressing hallucination and nondeterminism of LLMs. We evaluate Ciri's validation effectiveness on eight popular LLMs using configuration data of ten widely deployed open-source systems. Our analysis (1) confirms the potential of using LLMs for configuration validation, (2) explores design space of LLMbased validators like Ciri, and (3) reveals open challenges such as ineffectiveness in detecting certain types of misconfigurations and biases towards popular configuration parameters.

Updated: 2024-04-02 06:02:07

标题: 大型语言模型的配置验证

摘要: Misconfigurations are a major cause of software failures. Current practices rely on developer-written rules or test cases to validate configurations, which can be costly. Machine learning (ML) for configuration validation is seen as a promising approach, but faces challenges such as the requirement for large-scale field data and system-specific models. Recent advancements in Large Language Models (LLMs) show promise in overcoming some of these long-standing limitations of ML-based configuration validation. This study presents an initial analysis of the feasibility and effectiveness of using LLMs for configuration validation. The study empirically evaluates LLMs as configuration validators by creating a generic LLM-based configuration validation framework called Ciri. Ciri utilizes effective prompt engineering and few-shot learning based on both valid and misconfigured data. Ciri verifies the outputs of LLMs during result generation to address issues like hallucination and non-determinism. The study assesses Ciri's validation effectiveness on eight popular LLMs using configuration data from ten widely used open-source systems. The analysis confirms the potential of using LLMs for configuration validation, explores the design space of LLM-based validators like Ciri, and uncovers challenges such as the inability to detect certain types of misconfigurations and biases towards popular configuration parameters.

更新时间: 2024-04-02 06:02:07

领域: cs.SE,cs.AI,cs.OS

下载: http://arxiv.org/abs/2310.09690v2

Release of Pre-Trained Models for the Japanese Language

AI democratization aims to create a world in which the average person can utilize AI techniques. To achieve this goal, numerous research institutes have attempted to make their results accessible to the public. In particular, large pre-trained models trained on large-scale data have shown unprecedented potential, and their release has had a significant impact. However, most of the released models specialize in the English language, and thus, AI democratization in non-English-speaking communities is lagging significantly. To reduce this gap in AI access, we released Generative Pre-trained Transformer (GPT), Contrastive Language and Image Pre-training (CLIP), Stable Diffusion, and Hidden-unit Bidirectional Encoder Representations from Transformers (HuBERT) pre-trained in Japanese. By providing these models, users can freely interface with AI that aligns with Japanese cultural values and ensures the identity of Japanese culture, thus enhancing the democratization of AI. Additionally, experiments showed that pre-trained models specialized for Japanese can efficiently achieve high performance in Japanese tasks.

Updated: 2024-04-02 05:59:43

标题: 发布预训练的日语语言模型

摘要: AI民主化旨在创造一个普通人可以利用人工智能技术的世界。为了实现这一目标，许多研究机构已经尝试使他们的研究结果对公众开放。特别是，基于大规模数据训练的大型预训练模型展现了前所未有的潜力，它们的发布产生了重大影响。然而，大多数发布的模型专门针对英语，因此，非英语社区的AI民主化进展明显滞后。为了缩小这种AI获取的差距，我们发布了在日语中预训练的生成式预训练变换器（GPT）、对比语言和图像预训练（CLIP）、稳定扩散和隐藏单元双向编码器表示变换（HuBERT）。通过提供这些模型，用户可以自由地与符合日本文化价值观的人工智能接口，并确保日本文化的身份，从而增强AI的民主化。此外，实验证明，专门针对日本的预训练模型可以有效地在日本任务中取得高性能。

更新时间: 2024-04-02 05:59:43

领域: cs.CL,cs.AI,cs.CV,cs.LG,eess.AS

下载: http://arxiv.org/abs/2404.01657v1

Empowering Credit Scoring Systems with Quantum-Enhanced Machine Learning

Quantum Kernels are projected to provide early-stage usefulness for quantum machine learning. However, highly sophisticated classical models are hard to surpass without losing interpretability, particularly when vast datasets can be exploited. Nonetheless, classical models struggle once data is scarce and skewed. Quantum feature spaces are projected to find better links between data features and the target class to be predicted even in such challenging scenarios and most importantly, enhanced generalization capabilities. In this work, we propose a novel approach called Systemic Quantum Score (SQS) and provide preliminary results indicating potential advantage over purely classical models in a production grade use case for the Finance sector. SQS shows in our specific study an increased capacity to extract patterns out of fewer data points as well as improved performance over data-hungry algorithms such as XGBoost, providing advantage in a competitive market as it is the FinTech and Neobank regime.

Updated: 2024-04-02 05:54:55

标题: 利用量子增强机器学习技术增强信用评分系统

摘要: 量子核心被认为为量子机器学习提供了早期的实用性。然而，在利用大量数据集时，高度复杂的经典模型很难超越，而又不失解释性。然而，一旦数据稀缺且倾斜，经典模型就会遇到困难。预计量子特征空间能够在这种具有挑战性的情况下找到数据特征与目标类别之间更好的联系，尤其重要的是，增强了泛化能力。在这项工作中，我们提出了一种名为“系统量子分数（SQS）”的新方法，并提供初步结果，表明在金融领域的生产级用例中，相比纯粹的经典模型具有潜在优势。在我们的具体研究中，SQS表现出更强的能力从较少的数据点中提取模式，以及比如XGBoost等需大量数据的算法表现更好，为竞争激烈的市场提供优势，如金融科技和新银行体制。

更新时间: 2024-04-02 05:54:55

领域: q-fin.RM,cs.LG,q-fin.ST,quant-ph,stat.ML

下载: http://arxiv.org/abs/2404.00015v2

AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease

Parkinson's Disease (PD) is the second most common neurodegenerative disorder. The existing assessment method for PD is usually the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of various types of motor symptoms and disease progression. However, manual assessment suffers from high subjectivity, lack of consistency, and high cost and low efficiency of manual communication. We want to use a computer vision based solution to capture human pose images based on a camera, reconstruct and perform motion analysis using algorithms, and extract the features of the amount of motion through feature engineering. The proposed approach can be deployed on different smartphones, and the video recording and artificial intelligence analysis can be done quickly and easily through our APP.

Updated: 2024-04-02 05:53:34

标题: AI WALKUP：一种计算机视觉方法用于量化帕金森病中的MDS-UPDRS

摘要: 帕金森病（PD）是第二常见的神经退行性疾病。目前用于PD评估的方法通常是运动障碍学会 - 统一帕金森病评分量表（MDS-UPDRS），用于评估各种类型的运动症状严重程度和疾病进展情况。然而，手动评估存在较高的主观性、缺乏一致性、手动沟通成本高且效率低的问题。我们希望利用基于计算机视觉的解决方案，通过摄像头捕获人体姿势图像，使用算法重建和进行运动分析，并通过特征工程提取运动量特征。所提出的方法可以部署在不同的智能手机上，通过我们的APP可以快速轻松地进行视频录制和人工智能分析。

更新时间: 2024-04-02 05:53:34

领域: cs.CV,cs.AI,eess.IV,eess.SP

下载: http://arxiv.org/abs/2404.01654v1

Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki-bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools. Our code will be made publicly available.

Updated: 2024-04-02 05:50:21

标题: Kiki还是Bouba？视觉与语言模型中的声音象征主义

摘要: 尽管人类语言中声音和含义之间的映射被认为在很大程度上是任意的，但认知科学的研究表明，在不同语言和人口群体之间存在特定声音和含义之间的非平凡相关性，这一现象被称为声音象征主义。在许多含义维度中，声音象征主义特别突出，并在语言和视觉领域之间的跨模态关联方面得到了充分证明。在这项工作中，我们探讨了声音象征主义是否反映在 CLIP 和 Stable Diffusion 等视觉和语言模型中。通过使用零-shot 知识探测来调查这些模型的固有知识，我们发现强有力的证据表明它们确实显示出这种模式，类似于心理语言学中著名的基基-布巴效应。我们的工作提供了一种利用计算工具展示声音象征主义并理解其本质的新方法。我们的代码将公开发布。

更新时间: 2024-04-02 05:50:21

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.16781v3

i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance

Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the recommendation or not on their own. We propose i-Rebalance, a personalized vehicle reposition technique with deep reinforcement learning (DRL). i-Rebalance estimates drivers' decisions on accepting reposition recommendations through an on-field user study involving 99 real drivers. To optimize supply-demand balance and enhance preference satisfaction simultaneously, i-Rebalance has a sequential reposition strategy with dual DRL agents: Grid Agent to determine the reposition order of idle vehicles, and Vehicle Agent to provide personalized recommendations to each vehicle in the pre-defined order. This sequential learning strategy facilitates more effective policy training within a smaller action space compared to traditional joint-action methods. Evaluation of real-world trajectory data shows that i-Rebalance improves driver acceptance rate by 38.07% and total driver income by 9.97%.

Updated: 2024-04-02 05:50:00

标题: i-Rebalance：个性化车辆重新定位以实现供需平衡

摘要: 乘车平台一直面临着平衡需求和供应的挑战。现有的车辆重新定位技术通常将驾驶员视为同质代理，并且确定性地重新定位它们，假定他们遵守重新定位。在本文中，我们考虑了一个更现实和以驾驶员为中心的情景，其中驾驶员具有独特的巡航偏好，并且可以自行决定是否接受建议。我们提出了一种个性化车辆重新定位技术i-Rebalance，采用深度强化学习（DRL）。i-Rebalance通过涉及99名真实驾驶员的现场用户研究来估计驾驶员接受重新定位建议的决策。为了同时优化供需平衡和增强偏好满意度，i-Rebalance具有一个具有双DRL代理的顺序重新定位策略：Grid Agent确定空闲车辆的重新定位顺序，Vehicle Agent为预定义顺序中的每辆车提供个性化建议。这种顺序学习策略相较于传统的联合行动方法，在更小的行动空间内促进更有效的政策训练。对真实世界轨迹数据的评估显示，i-Rebalance提高了驾驶员接受率38.07％，总驾驶员收入提高了9.97％。

更新时间: 2024-04-02 05:50:00

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2401.04429v2

Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization

Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus. However, real-world knowledge is not static; it updates and evolves continually. Such a dynamic characteristic of knowledge poses a vital challenge for these models, as the trained models need to constantly adapt to the latest information to make sure that the answers remain accurate. In addition, it is still unclear how well an OpenQA model can transfer to completely new knowledge domains. In this paper, we investigate the generalization performance of a retrieval-augmented QA model in two specific scenarios: 1) adapting to updated versions of the same knowledge corpus; 2) switching to completely different knowledge domains. We observe that the generalization challenges of OpenQA models stem from the reader's over-reliance on memorizing the knowledge from the external corpus, which hinders the model from generalizing to a new knowledge corpus. We introduce Corpus-Invariant Tuning (CIT), a simple but effective training strategy, to mitigate the knowledge over-memorization by controlling the likelihood of retrieved contexts during training. Extensive experimental results on multiple OpenQA benchmarks show that CIT achieves significantly better generalizability without compromising the model's performance in its original corpus and domain.

Updated: 2024-04-02 05:44:50

标题: 通过减轻上下文记忆来实现开放域问答中更好的泛化

摘要: 开放领域问答（OpenQA）旨在利用外部大规模知识语料库回答事实性问题。然而，现实世界的知识并非静态；它不断更新和演变。这种知识的动态特性为这些模型带来了重要挑战，因为训练好的模型需要不断适应最新信息，以确保答案保持准确。此外，目前仍不清楚开放领域问答模型在完全新的知识领域中能够有多好的迁移能力。本文研究了检索增强的问答模型在两种特定情景下的泛化性能：1）适应相同知识语料库的更新版本；2）切换到完全不同的知识领域。我们观察到开放领域问答模型的泛化挑战源于阅读器过度依赖外部语料库知识的记忆，这阻碍了模型对新知识语料库的泛化。我们引入了一种简单但有效的训练策略，即语料库不变调整（CIT），通过在训练过程中控制检索到的上下文的可能性来减轻知识过度记忆。在多个开放领域问答基准测试上的广泛实验结果显示，CIT在不损害模型在原始语料库和领域中的性能的情况下显著提高了泛化能力。

更新时间: 2024-04-02 05:44:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01652v1

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

Recent work by Power et al. (2022) highlighted a surprising "grokking" phenomenon in learning arithmetic tasks: a neural net first "memorizes" the training set, resulting in perfect training accuracy but near-random test accuracy, and after training for sufficiently longer, it suddenly transitions to perfect test accuracy. This paper studies the grokking phenomenon in theoretical setups and shows that it can be induced by a dichotomy of early and late phase implicit biases. Specifically, when training homogeneous neural nets with large initialization and small weight decay on both classification and regression tasks, we prove that the training process gets trapped at a solution corresponding to a kernel predictor for a long time, and then a very sharp transition to min-norm/max-margin predictors occurs, leading to a dramatic change in test accuracy.

Updated: 2024-04-02 05:43:18

标题: 早期和晚期内隐偏见的二分法可以明显诱发洞察力

摘要: 最近Power等人（2022年）的研究突出显示了学习算术任务中令人惊讶的“理解”现象：神经网络首先“记住”训练集，导致训练准确性完美但测试准确性接近随机，而在训练足够长时间后，突然转变为完美的测试准确性。本文研究了理解现象在理论设置中的表现，并显示它可以通过早期和晚期隐含偏见的二分法引发。具体来说，当训练具有大初始化和小权重衰减的同质神经网络进行分类和回归任务时，我们证明训练过程在很长时间内被困在与核预测器相对应的解中，然后会发生非常尖锐的转变，过渡到最小范数/最大间隔预测器，导致测试准确性发生显著变化。

更新时间: 2024-04-02 05:43:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.18817v2

Test-Time Model Adaptation with Only Forward Passes

Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Only Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C. The source code will be released.

Updated: 2024-04-02 05:34:33

标题: 仅通过前向传递进行测试时间模型自适应

摘要: 试验时间适应已被证明可以有效地将已训练好的模型适应到具有潜在分布偏移的未见测试样本上。然而，在现实世界的场景中，模型通常部署在资源有限的设备上，例如FPGAs，并且通常是量化和硬编码的，具有不可修改的参数以加速运行。考虑到这一点，现有的方法通常是不可行的，因为它们严重依赖于计算密集型的反向传播来进行模型更新，而这种方法可能不被支持。为了解决这个问题，我们提出了一种测试时间的仅前向适应（FOA）方法。在FOA中，我们试图仅通过无导数的协方差矩阵适应进化策略来学习一个新添加的提示（作为模型的输入）。为了使这种策略在我们的在线无监督设置下稳定运行，我们设计了一个新颖的适应函数，通过测量测试-训练统计差异和模型预测熵来衡量。此外，我们设计了一个激活位移方案，直接调整模型激活以适应移位的测试样本，使它们与源训练领域保持一致，从而进一步增强适应性能。在不使用任何反向传播和修改模型权重的情况下，FOA在量化的8位ViT上胜过基于梯度的32位ViT上的TENT，同时在ImageNet-C上实现高达24倍的内存减少。源代码将发布。

更新时间: 2024-04-02 05:34:33

领域: cs.LG

下载: http://arxiv.org/abs/2404.01650v1

Transformer meets wcDTW to improve real-time battery bids: A new approach to scenario selection

Stochastic battery bidding in real-time energy markets is a nuanced process, with its efficacy depending on the accuracy of forecasts and the representative scenarios chosen for optimization. In this paper, we introduce a pioneering methodology that amalgamates Transformer-based forecasting with weighted constrained Dynamic Time Warping (wcDTW) to refine scenario selection. Our approach harnesses the predictive capabilities of Transformers to foresee Energy prices, while wcDTW ensures the selection of pertinent historical scenarios by maintaining the coherence between multiple uncertain products. Through extensive simulations in the PJM market for July 2023, our method exhibited a 10% increase in revenue compared to the conventional method, highlighting its potential to revolutionize battery bidding strategies in real-time markets.

Updated: 2024-04-02 05:30:54

标题: 变压器遇见wcDTW以提高实时电池投标：一种新的场景选择方法

摘要: 在实时能源市场中，随机电池竞标是一个微妙的过程，其有效性取决于预测的准确性和选择优化的代表性场景。在本文中，我们介绍了一种将基于Transformer的预测与加权约束的动态时间扭曲（wcDTW）相结合的开创性方法，以改进场景选择。我们的方法利用Transformers的预测能力来预测能源价格，而wcDTW通过维持多个不确定产品之间的一致性，确保选择相关历史场景。通过在2023年7月的PJM市场进行广泛模拟，我们的方法与传统方法相比，收入增加了10％，突显了其在实时市场中革新电池竞标策略的潜力。

更新时间: 2024-04-02 05:30:54

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2404.01646v1

ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models

The success of Transformer-based models has encouraged many researchers to learn CAD models using sequence-based approaches. However, learning CAD models is still a challenge, because they can be represented as complex shapes with long construction sequences. Furthermore, the same CAD model can be expressed using different CAD construction sequences. We propose a novel contrastive learning-based approach, named ContrastCAD, that effectively captures semantic information within the construction sequences of the CAD model. ContrastCAD generates augmented views using dropout techniques without altering the shape of the CAD model. We also propose a new CAD data augmentation method, called a Random Replace and Extrude (RRE) method, to enhance the learning performance of the model when training an imbalanced training CAD dataset. Experimental results show that the proposed RRE augmentation method significantly enhances the learning performance of Transformer-based autoencoders, even for complex CAD models having very long construction sequences. The proposed ContrastCAD model is shown to be robust to permutation changes of construction sequences and performs better representation learning by generating representation spaces where similar CAD models are more closely clustered. Our codes are available at https://github.com/cm8908/ContrastCAD.

Updated: 2024-04-02 05:30:39

标题: ContrastCAD：基于对比学习的计算机辅助设计模型表示学习

摘要: Transformer-based模型的成功鼓舞了许多研究人员采用基于序列的方法学习CAD模型。然而，学习CAD模型仍然是一个挑战，因为它们可以被表示为具有长建模序列的复杂形状。此外，同一CAD模型可以用不同的CAD建模序列表示。我们提出了一种新颖的对比学习方法，命名为ContrastCAD，它有效地捕捉了CAD模型建模序列中的语义信息。ContrastCAD使用丢弃技术生成增强视图，而不改变CAD模型的形状。我们还提出了一种新的CAD数据增强方法，称为Random Replace and Extrude (RRE)方法，用于增强模型在训练不平衡的CAD训练数据集时的学习性能。实验结果表明，提出的RRE增强方法显着提高了基于Transformer的自编码器的学习性能，即使对于具有非常长建模序列的复杂CAD模型。所提出的ContrastCAD模型显示出对建模序列的排列变化具有鲁棒性，并通过生成表示空间来更紧密地聚类类似的CAD模型来实现更好的表示学习。我们的代码可以在https://github.com/cm8908/ContrastCAD找到。

更新时间: 2024-04-02 05:30:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01645v1

Bridging the Projection Gap: Overcoming Projection Bias Through Parameterized Distance Learning

Generalized zero-shot learning (GZSL) aims to recognize samples from both seen and unseen classes using only seen class samples for training. However, GZSL methods are prone to bias towards seen classes during inference due to the projection function being learned from seen classes. Most methods focus on learning an accurate projection, but bias in the projection is inevitable. We address this projection bias by proposing to learn a parameterized Mahalanobis distance metric for robust inference. Our key insight is that the distance computation during inference is critical, even with a biased projection. We make two main contributions - (1) We extend the VAEGAN (Variational Autoencoder \& Generative Adversarial Networks) architecture with two branches to separately output the projection of samples from seen and unseen classes, enabling more robust distance learning. (2) We introduce a novel loss function to optimize the Mahalanobis distance representation and reduce projection bias. Extensive experiments on four datasets show that our approach outperforms state-of-the-art GZSL techniques with improvements of up to 3.5 \% on the harmonic mean metric.

Updated: 2024-04-02 05:20:01

标题: 填补预测差距：通过参数化距离学习克服预测偏见

摘要: 广义零样本学习（GZSL）旨在仅使用已知类别样本进行训练来识别来自已知和未知类别的样本。然而，在推断过程中，由于投影函数是从已知类别中学习的，GZSL方法很容易对已知类别产生偏见。大多数方法专注于学习准确的投影，但投影中的偏见是不可避免的。我们通过提出学习参数化马氏距离度量来解决这种投影偏见，以实现鲁棒的推断。我们的关键见解是，在推断过程中，距离计算是至关重要的，即使存在偏见的投影也是如此。我们的两个主要贡献是 - （1）我们通过在VAEGAN（变分自动编码器和生成对抗网络）架构中添加两个分支来分别输出已知和未知类别样本的投影，从而实现更稳健的距离学习。（2）我们引入一种新的损失函数来优化马氏距离表示，并减少投影偏见。在四个数据集上进行的大量实验表明，我们的方法在和谐平均度量上优于最先进的GZSL技术，性能提高高达3.5％。

更新时间: 2024-04-02 05:20:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.01390v2

A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection

Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions and slices of the entire CT scan. How can we effectively figure out where these are located? To deal with this, we introduce an enhanced Spatial-Slice Feature Learning (SSFL++) framework specifically designed for CT scan. It aim to filter out a OOD data within whole CT scan, enabling our to select crucial spatial-slice for analysis by reducing 70% redundancy totally. Meanwhile, we proposed Kernel-Density-based slice Sampling (KDS) method to improve the stability when training and inference stage, therefore speeding up the rate of convergence and boosting performance. As a result, the experiments demonstrate the promising performance of our model using a simple EfficientNet-2D (E2D) model, even with only 1% of the training data. The efficacy of our approach has been validated on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop, in conjunction with CVPR 2024. Our source code will be made available.

Updated: 2024-04-02 05:19:27

标题: 对COVID-19检测的空间切片特征学习进行深入研究

摘要: 常规计算机断层扫描（CT）成像识别面临两个重要挑战：（1）每个CT扫描的分辨率和大小通常存在相当大的变异性，需要对模型的输入大小和适应性有严格要求。（2）CT扫描包含大量的超出分布范围（OOD）切片。关键特征可能只存在于整个CT扫描的特定空间区域和切片中。我们如何有效地找出这些位置？为了解决这个问题，我们引入了一种专门为CT扫描设计的增强型空间切片特征学习（SSFL++）框架。它旨在过滤整个CT扫描中的OOD数据，使我们能够通过完全减少70％冗余来选择关键空间切片进行分析。同时，我们提出了基于核密度的切片采样（KDS）方法，以提高培训和推理阶段的稳定性，从而加快收敛速度并提升性能。结果，实验证明了我们的模型在使用简单的EfficientNet-2D（E2D）模型时表现出有希望的性能，甚至只使用1％的训练数据。我们的方法的有效性已在DEF-AI-MIA研讨会提供的COVID-19-CT-DB数据集上得到验证，与CVPR 2024结合。我们的源代码将会提供。

更新时间: 2024-04-02 05:19:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01643v1

ADVREPAIR:Provable Repair of Adversarial Attack

Deep neural networks (DNNs) are increasingly deployed in safety-critical domains, but their vulnerability to adversarial attacks poses serious safety risks. Existing neuron-level methods using limited data lack efficacy in fixing adversaries due to the inherent complexity of adversarial attack mechanisms, while adversarial training, leveraging a large number of adversarial samples to enhance robustness, lacks provability. In this paper, we propose ADVREPAIR, a novel approach for provable repair of adversarial attacks using limited data. By utilizing formal verification, ADVREPAIR constructs patch modules that, when integrated with the original network, deliver provable and specialized repairs within the robustness neighborhood. Additionally, our approach incorporates a heuristic mechanism for assigning patch modules, allowing this defense against adversarial attacks to generalize to other inputs. ADVREPAIR demonstrates superior efficiency, scalability and repair success rate. Different from existing DNN repair methods, our repair can generalize to general inputs, thereby improving the robustness of the neural network globally, which indicates a significant breakthrough in the generalization capability of ADVREPAIR.

Updated: 2024-04-02 05:16:59

标题: ADVREPAIR: Adversarial Attack的可证明修复

摘要: 深度神经网络（DNNs）越来越多地应用于安全关键领域，但它们对对抗攻击的脆弱性带来了严重的安全风险。现有的使用有限数据的神经元级方法在修复对手方面缺乏效力，因为对手攻击机制的固有复杂性，而对抗训练则利用大量的对抗样本来增强鲁棒性，但缺乏可证明性。在本文中，我们提出了ADVREPAIR，一种使用有限数据进行可证明修复对抗攻击的新方法。通过利用形式验证，ADVREPAIR构建补丁模块，当与原始网络集成时，可在鲁棒性邻域内提供可证明且专业化的修复。此外，我们的方法还包括一种启发式机制用于分配补丁模块，使这种防御对对抗攻击具有泛化到其他输入的能力。ADVREPAIR展示了卓越的效率、可扩展性和修复成功率。与现有的DNN修复方法不同，我们的修复可以泛化到一般输入，从而全面提高神经网络的鲁棒性，这表明ADVREPAIR在泛化能力方面取得了重大突破。

更新时间: 2024-04-02 05:16:59

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2404.01642v1

Flames: Benchmarking Value Alignment of LLMs in Chinese

The widespread adoption of large language models (LLMs) across various regions underscores the urgent need to evaluate their alignment with human values. Current benchmarks, however, fall short of effectively uncovering safety vulnerabilities in LLMs. Despite numerous models achieving high scores and 'topping the chart' in these evaluations, there is still a significant gap in LLMs' deeper alignment with human values and achieving genuine harmlessness. To this end, this paper proposes a value alignment benchmark named Flames, which encompasses both common harmlessness principles and a unique morality dimension that integrates specific Chinese values such as harmony. Accordingly, we carefully design adversarial prompts that incorporate complex scenarios and jailbreaking methods, mostly with implicit malice. By prompting 17 mainstream LLMs, we obtain model responses and rigorously annotate them for detailed evaluation. Our findings indicate that all the evaluated LLMs demonstrate relatively poor performance on Flames, particularly in the safety and fairness dimensions. We also develop a lightweight specified scorer capable of scoring LLMs across multiple dimensions to efficiently evaluate new models on the benchmark. The complexity of Flames has far exceeded existing benchmarks, setting a new challenge for contemporary LLMs and highlighting the need for further alignment of LLMs. Our benchmark is publicly available at https://github.com/AIFlames/Flames.

Updated: 2024-04-02 05:15:19

标题: "火焰：对中文LLMs价值对齐的基准测试"

摘要: 广泛采用大型语言模型(LLMs)跨越各个领域，强调了评估它们与人类价值观一致性的紧迫性。然而，目前的基准测试未能有效揭示LLMs中的安全漏洞。尽管许多模型在这些评估中获得高分并“名列前茅”，但LLMs在更深层次上与人类价值观的一致性和实现真正无害性仍存在显著差距。因此，本文提出了一个名为Flames的价值观一致性基准测试，该测试包括常见的无害性原则和一个集成了特定中国价值观如和谐的独特道德维度。因此，我们精心设计了包含复杂情境和越狱方法的对抗提示，大多具有隐含的恶意。通过对17个主流LLMs进行提示，我们获得了模型响应并对其进行详细评估。我们的研究结果表明，所有评估的LLMs在Flames上表现出相对较差的性能，特别是在安全性和公平性方面。我们还开发了一个轻量级的指定评分器，能够跨多个维度评分LLMs，以便高效地评估基准测试上的新模型。Flames的复杂性已远超过现有的基准测试，为当代LLMs设定了新的挑战，并凸显了进一步调整LLMs的必要性。我们的基准测试可在https://github.com/AIFlames/Flames上公开获取。

更新时间: 2024-04-02 05:15:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.06899v3

Classification for everyone : Building geography agnostic models for fairer recognition

In this paper, we analyze different methods to mitigate inherent geographical biases present in state of the art image classification models. We first quantitatively present this bias in two datasets - The Dollar Street Dataset and ImageNet, using images with location information. We then present different methods which can be employed to reduce this bias. Finally, we analyze the effectiveness of the different techniques on making these models more robust to geographical locations of the images.

Updated: 2024-04-02 05:12:10

标题: 适用于所有人的分类：构建地理无关模型以实现更公平的识别

摘要: 在这篇论文中，我们分析了不同的方法来减轻现有图像分类模型中存在的地理偏差。我们首先定量地展示了两个数据集中的这种偏差 - 《美元街道数据集》和 ImageNet，使用具有位置信息的图像。然后我们提出了可以用来减少这种偏差的不同方法。最后，我们分析了不同技术在使这些模型对图像的地理位置更具鲁棒性方面的有效性。

更新时间: 2024-04-02 05:12:10

领域: cs.CV,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2312.02957v3

Learning to Control Camera Exposure via Reinforcement Learning

Adjusting camera exposure in arbitrary lighting conditions is the first step to ensure the functionality of computer vision applications. Poorly adjusted camera exposure often leads to critical failure and performance degradation. Traditional camera exposure control methods require multiple convergence steps and time-consuming processes, making them unsuitable for dynamic lighting conditions. In this paper, we propose a new camera exposure control framework that rapidly controls camera exposure while performing real-time processing by exploiting deep reinforcement learning. The proposed framework consists of four contributions: 1) a simplified training ground to simulate real-world's diverse and dynamic lighting changes, 2) flickering and image attribute-aware reward design, along with lightweight state design for real-time processing, 3) a static-to-dynamic lighting curriculum to gradually improve the agent's exposure-adjusting capability, and 4) domain randomization techniques to alleviate the limitation of the training ground and achieve seamless generalization in the wild.As a result, our proposed method rapidly reaches a desired exposure level within five steps with real-time processing (1 ms). Also, the acquired images are well-exposed and show superiority in various computer vision tasks, such as feature extraction and object detection.

Updated: 2024-04-02 04:53:39

标题: 学习通过强化学习控制相机曝光

摘要: 在任意光照条件下调整相机曝光是确保计算机视觉应用功能的第一步。曝光不当通常会导致严重故障和性能下降。传统的相机曝光控制方法需要多次收敛步骤和耗时的过程，因此在动态光照条件下不适用。本文提出了一个新的相机曝光控制框架，通过利用深度强化学习在实时处理的同时快速控制相机曝光。提出的框架包括四个贡献：1）简化的训练场景以模拟现实世界的多样化和动态的光照变化，2）闪烁和图像属性感知奖励设计，以及轻量级状态设计用于实时处理，3）从静态到动态的光照课程，逐步提高代理的曝光调整能力，4）域随机化技术以减轻训练场景的限制并在野外实现无缝泛化。因此，我们提出的方法在实时处理中在五个步骤内迅速达到了所需的曝光水平（1毫秒）。此外，获取的图像曝光良好，并在特征提取和目标检测等各种计算机视觉任务中表现优越。

更新时间: 2024-04-02 04:53:39

领域: cs.CV,cs.AI,cs.LG,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.01636v1

TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various textual datasets. It is intriguing to explore whether GPT-type architectures can be effective for time series, capturing the intrinsic dynamic attributes and leading to significant accuracy improvements. In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. We focus on utilizing two essential inductive biases of the time series task for pre-trained models: (i) decomposition of the complex interaction between trend, seasonal and residual components; and (ii) introducing the design of prompts to facilitate distribution adaptation in different types of time series. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains. Our experiments demonstrate the superior performance of TEMPO over state-of-the-art methods on zero shot setting for a number of time series benchmark datasets. This performance gain is observed not only in scenarios involving previously unseen datasets but also in scenarios with multi-modal inputs. This compelling finding highlights TEMPO's potential to constitute a foundational model-building framework.

Updated: 2024-04-02 04:39:08

标题: TEMPO：基于提示的生成式预训练变压器用于时间序列预测

摘要: 在过去的十年中，深度学习在时间序列建模方面取得了显著进展。虽然取得了最先进的结果，但在不同应用和领域中，表现最好的架构差异很大。与此同时，在自然语言处理领域，生成式预训练转换器（GPT）通过训练一个通用模型跨多种文本数据集展现出了令人印象深刻的性能。有趣的是探索GPT类型的架构是否对时间序列有效，能够捕捉固有的动态属性并导致显著的准确性改进。本文提出了一个新颖的框架TEMPO，可以有效地学习时间序列表示。我们专注于利用时间序列任务的两个基本归纳偏差用于预训练模型：（i）分解趋势、季节和残差组件之间的复杂交互；（ii）引入提示设计以促进不同类型时间序列的分布适应。TEMPO扩展了从不同领域数据中动态建模现实世界时间现象的能力。我们的实验证明了TEMPO在多个时间序列基准数据集的零样本设置中优于最先进方法的出色性能。这种性能增益不仅在涉及先前未见数据集的场景中观察到，而且在多模态输入的情况下也能看到。这一引人注目的发现突显了TEMPO成为一个基础模型构建框架的潜力。

更新时间: 2024-04-02 04:39:08

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2310.04948v3

Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning

Given the widespread use of safety-critical applications in the automotive field, it is crucial to ensure the Functional Safety (FuSa) of circuits and components within automotive systems. The Analog and Mixed-Signal (AMS) circuits prevalent in these systems are more vulnerable to faults induced by parametric perturbations, noise, environmental stress, and other factors, in comparison to their digital counterparts. However, their continuous signal characteristics present an opportunity for early anomaly detection, enabling the implementation of safety mechanisms to prevent system failure. To address this need, we propose a novel framework based on unsupervised machine learning for early anomaly detection in AMS circuits. The proposed approach involves injecting anomalies at various circuit locations and individual components to create a diverse and comprehensive anomaly dataset, followed by the extraction of features from the observed circuit signals. Subsequently, we employ clustering algorithms to facilitate anomaly detection. Finally, we propose a time series framework to enhance and expedite anomaly detection performance. Our approach encompasses a systematic analysis of anomaly abstraction at multiple levels pertaining to the automotive domain, from hardware- to block-level, where anomalies are injected to create diverse fault scenarios. By monitoring the system behavior under these anomalous conditions, we capture the propagation of anomalies and their effects at different abstraction levels, thereby potentially paving the way for the implementation of reliable safety mechanisms to ensure the FuSa of automotive SoCs. Our experimental findings indicate that our approach achieves 100% anomaly detection accuracy and significantly optimizes the associated latency by 5X, underscoring the effectiveness of our devised solution.

Updated: 2024-04-02 04:33:03

标题: 通过无监督机器学习提高汽车AMS电路的功能安全性

摘要: 鉴于汽车领域中安全关键应用的广泛使用，确保汽车系统中电路和组件的功能安全（FuSa）至关重要。这些系统中广泛存在的模拟和混合信号（AMS）电路比它们的数字对应物更容易受到参数扰动、噪声、环境压力和其他因素引起的故障影响。然而，它们的连续信号特性为早期异常检测提供了机会，从而实施安全机制以防止系统故障。为了解决这一需求，我们提出了一种基于无监督机器学习的新颖框架，用于AMS电路的早期异常检测。所提出的方法涉及在各种电路位置和个别组件注入异常，以创建多样化和全面的异常数据集，然后从观察到的电路信号中提取特征。随后，我们采用聚类算法促进异常检测。最后，我们提出了一个时间序列框架来增强和加速异常检测性能。我们的方法涵盖了对汽车领域中多个层次的异常抽象的系统分析，从硬件到块级，其中注入异常以创建多样化的故障场景。通过监测这些异常条件下的系统行为，我们捕捉异常的传播及其在不同抽象级别的影响，从而有可能为实施可靠的安全机制以确保汽车SoC的FuSa铺平道路。我们的实验结果表明，我们的方法实现了100%的异常检测准确率，并将相关延迟优化了5倍，突显了我们设计的解决方案的有效性。

更新时间: 2024-04-02 04:33:03

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.01632v1

Learning Equi-angular Representations for Online Continual Learning

Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups.

Updated: 2024-04-02 04:29:01

标题: 学习等角表示在在线连续学习中的应用

摘要: 在线持续学习存在欠拟合解决方案的问题，这是由于对即时模型更新的训练不足（例如，单次训练）。为了解决这一挑战，我们提出了一种利用神经坍塌现象的高效在线持续学习方法。具体来说，我们诱导神经坍塌形成一个简单等角紧框架（ETF）结构在表示空间中，使得连续学习的模型在单个时期可以更好地适应流数据，提出了表示空间中的预备数据训练和残差矫正。通过使用CIFAR-10/100、TinyImageNet、ImageNet-200和ImageNet-1K等广泛的实证验证集，我们展示了我们提出的方法在各种在线持续学习场景（如不相交和高斯调度连续（即，无边界）数据设置）中明显优于最先进的方法。

更新时间: 2024-04-02 04:29:01

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01628v1

Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis

We present a novel approach to automatically synthesize "wayfinding instructions" for an embodied robot agent. In contrast to prior approaches that are heavily reliant on human-annotated datasets designed exclusively for specific simulation platforms, our algorithm uses in-context learning to condition an LLM to generate instructions using just a few references. Using an LLM-based Visual Question Answering strategy, we gather detailed information about the environment which is used by the LLM for instruction synthesis. We implement our approach on multiple simulation platforms including Matterport3D, AI Habitat and ThreeDWorld, thereby demonstrating its platform-agnostic nature. We subjectively evaluate our approach via a user study and observe that 83.3% of users find the synthesized instructions accurately capture the details of the environment and show characteristics similar to those of human-generated instructions. Further, we conduct zero-shot navigation with multiple approaches on the REVERIE dataset using the generated instructions, and observe very close correlation with the baseline on standard success metrics (< 1% change in SR), quantifying the viability of generated instructions in replacing human-annotated data. We finally discuss the applicability of our approach in enabling a generalizable evaluation of embodied navigation policies. To the best of our knowledge, ours is the first LLM-driven approach capable of generating "human-like" instructions in a platform-agnostic manner, without training.

Updated: 2024-04-02 04:27:55

标题: LLM能生成类似人类的路线指示吗？走向平台无关的体验指引综合

摘要: 我们提出了一种新颖的方法，用于自动合成“路标指引”以供具有实体机器人代理的使用。与以往主要依赖于专为特定模拟平台设计的人工标注数据集的方法不同，我们的算法利用上下文学习来调节LLM，仅使用少量参考即可生成指引。通过基于LLM的视觉问答策略，我们收集有关环境的详细信息，LLM用于指引合成。我们在多个模拟平台上实施我们的方法，包括Matterport3D，AI Habitat和ThreeDWorld，从而展示了其与平台无关的特性。我们通过用户研究进行主观评估，观察到83.3％的用户发现合成的指引准确捕捉了环境的细节，并显示出与人类生成的指引类似的特征。此外，我们在REVERIE数据集上使用生成的指引进行零样本导航，并观察到与标准成功指标基线非常接近（SR变化小于1％），量化生成指引在替换人工标注数据方面的可行性。最后，我们讨论了我们的方法在促进具有普遍评估的实体导航策略方面的适用性。据我们所知，我们的方法是第一个能够以与平台无关的方式生成“人类化”指引的LLM驱动方法，无需训练。

更新时间: 2024-04-02 04:27:55

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.11487v3

Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering

Recent progress with LLM-based agents has shown promising results across various tasks. However, their use in answering questions from knowledge bases remains largely unexplored. Implementing a KBQA system using traditional methods is challenging due to the shortage of task-specific training data and the complexity of creating task-focused model structures. In this paper, we present Triad, a unified framework that utilizes an LLM-based agent with three roles for KBQA tasks. The agent is assigned three roles to tackle different KBQA subtasks: agent as a generalist for mastering various subtasks, as a decision maker for the selection of candidates, and as an advisor for answering questions with knowledge. Our KBQA framework is executed in four phases, involving the collaboration of the agent's multiple roles. We evaluated the performance of our framework using three benchmark datasets, and the results show that our framework outperforms state-of-the-art systems on the LC-QuAD and YAGO-QA benchmarks, yielding F1 scores of 11.8% and 20.7%, respectively.

Updated: 2024-04-02 04:23:44

标题: 三重性：利用基于多角色LLM的代理框架解决知识库问答

摘要: 最近基于LLM的代理在各种任务中取得了令人鼓舞的进展。然而，在知识库中回答问题方面的应用仍然大多未被探索。使用传统方法实现KBQA系统具有挑战性，因为缺乏特定任务的训练数据，并且创建面向任务的模型结构的复杂性。在本文中，我们提出了Triad，一个统一的框架，利用了一个具有三个角色的基于LLM的代理用于KBQA任务。代理被分配了三个角色来解决不同的KBQA子任务：作为掌握各种子任务的通才，作为选择候选人的决策者，以及作为回答带有知识的问题的顾问。我们的KBQA框架在四个阶段中执行，涉及代理的多个角色的协作。我们使用三个基准数据集评估了我们框架的性能，结果表明我们的框架在LC-QuAD和YAGO-QA基准测试中胜过最先进的系统，分别获得了11.8%和20.7%的F1分数。

更新时间: 2024-04-02 04:23:44

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2402.14320v3

AAA: an Adaptive Mechanism for Locally Differential Private Mean Estimation

Local differential privacy (LDP) is a strong privacy standard that has been adopted by popular software systems. The main idea is that each individual perturbs their own data locally, and only submits the resulting noisy version to a data aggregator. Although much effort has been devoted to computing various types of aggregates and building machine learning applications under LDP, research on fundamental perturbation mechanisms has not achieved significant improvement in recent years. Towards a more refined result utility, existing works mainly focus on improving the worst-case guarantee. However, this approach does not necessarily promise a better average performance given the fact that the data in practice obey a certain distribution, which is not known beforehand. In this paper, we propose the advanced adaptive additive (AAA) mechanism, which is a distribution-aware approach that addresses the average utility and tackles the classical mean estimation problem. AAA is carried out in a two-step approach: first, as the global data distribution is not available beforehand, the data aggregator selects a random subset of individuals to compute a (noisy) quantized data descriptor; then, the data aggregator collects data from the remaining individuals, which are perturbed in a distribution-aware fashion. The perturbation involved in the latter step is obtained by solving an optimization problem, which is formulated with the data descriptor obtained in the former step and the desired properties of task-determined utilities. We provide rigorous privacy proofs, utility analyses, and extensive experiments comparing AAA with state-of-the-art mechanisms. The evaluation results demonstrate that the AAA mechanism consistently outperforms existing solutions with a clear margin in terms of result utility, on a wide range of privacy constraints and real-world and synthetic datasets.

Updated: 2024-04-02 04:22:07

标题: AAA：一种用于局部差分私密均值估计的自适应机制

摘要: 局部差分隐私（LDP）是一种强隐私标准，已被流行的软件系统采用。其主要思想是每个个体在本地扰动自己的数据，并仅将结果带有噪声的版本提交给数据聚合器。尽管在LDP下计算各种类型的聚合和构建机器学习应用程序上投入了大量工作，但近年来在基本扰动机制方面的研究并未取得显著进展。为了更精细的结果效用，现有工作主要集中在改善最坏情况保证。然而，考虑到实践中的数据服从某种分布，这种方法并不一定能保证更好的平均性能，因为这种分布事先并不知道。在本文中，我们提出了高级自适应加法（AAA）机制，这是一种考虑分布的方法，旨在解决平均效用并解决经典的均值估计问题。AAA采用两步方法进行：首先，由于全局数据分布事先不可用，数据聚合器选择一个随机子集的个体来计算（带噪声的）量化数据描述符；然后，数据聚合器收集其余个体的数据，这些数据以一种考虑分布的方式进行扰动。后一步中涉及的扰动是通过解决一个优化问题得到的，该问题由前一步获得的数据描述符和任务确定的效用属性构成。我们提供了严格的隐私证明，效用分析以及与最先进机制的广泛实验比较。评估结果表明，AAA机制在各种隐私约束和现实世界及合成数据集上始终优于现有解决方案，具有明显的结果效用优势。

更新时间: 2024-04-02 04:22:07

领域: cs.CR

下载: http://arxiv.org/abs/2404.01625v1

Autonomous Data Selection with Language Models for Mathematical Texts

To improve language models' proficiency in mathematical reasoning via continual pretraining, we introduce a novel strategy that leverages base language models for autonomous data selection. Departing from conventional supervised fine-tuning or trained classifiers with human-annotated data, our approach Autonomous Data Selection (AutoDS) utilizes meta-prompted language models as zero-shot verifiers to evaluate and select high-quality mathematical content autonomously. To demonstrate the efficacy of our method, we continuously pretrained a 7B-parameter language model on our curated dataset, achieving substantial improvements in downstream performance on the MATH, GSM8K, and BIG-Bench Hard (BBH) tasks with a token amount reduced by orders of magnitude compared to previous continual pretraining works. Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities. The AutoMathText dataset is available at https://huggingface.co/datasets/math-ai/AutoMathText. The code is available at https://github.com/yifanzhang-pro/AutoMathText.

Updated: 2024-04-02 04:17:30

标题: 自然语言模型在数学文本中的自主数据选择

摘要: 为了通过持续预训练提高语言模型在数学推理方面的熟练度，我们引入了一种新颖的策略，利用基础语言模型进行自主数据选择。与传统的监督微调或使用人工标注数据训练的分类器不同，我们的AutoDS方法利用元提示语言模型作为零短验证器，自主评估和选择高质量的数学内容。为了证明我们方法的有效性，我们持续在我们精选的数据集上预训练了一个7B参数的语言模型，在MATH、GSM8K和BIG-Bench Hard (BBH)任务中，下游性能显著提升，与以前的持续预训练工作相比，令牌数量减少了几个数量级。我们的方法展示了与最先进基线相比，预训练令牌效率提高了2倍，强调了我们方法在增强模型数学推理能力方面的潜力。AutoMathText数据集可在https://huggingface.co/datasets/math-ai/AutoMathText获取。代码可在https://github.com/yifanzhang-pro/AutoMathText获取。

更新时间: 2024-04-02 04:17:30

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.07625v2

Transcribing Bengali Text with Regional Dialects to IPA using District Guided Tokens

Accurate transcription of Bengali text to the International Phonetic Alphabet (IPA) is a challenging task due to the complex phonology of the language and context-dependent sound changes. This challenge is even more for regional Bengali dialects due to unavailability of standardized spelling conventions for these dialects, presence of local and foreign words popular in those regions and phonological diversity across different regions. This paper presents an approach to this sequence-to-sequence problem by introducing the District Guided Tokens (DGT) technique on a new dataset spanning six districts of Bangladesh. The key idea is to provide the model with explicit information about the regional dialect or "district" of the input text before generating the IPA transcription. This is achieved by prepending a district token to the input sequence, effectively guiding the model to understand the unique phonetic patterns associated with each district. The DGT technique is applied to fine-tune several transformer-based models, on this new dataset. Experimental results demonstrate the effectiveness of DGT, with the ByT5 model achieving superior performance over word-based models like mT5, BanglaT5, and umT5. This is attributed to ByT5's ability to handle a high percentage of out-of-vocabulary words in the test set. The proposed approach highlights the importance of incorporating regional dialect information into ubiquitous natural language processing systems for languages with diverse phonological variations. The following work was a result of the "Bhashamul" challenge, which is dedicated to solving the problem of Bengali text with regional dialects to IPA transcription https://www.kaggle.com/competitions/regipa/. The training and inference notebooks are available through the competition link.

Updated: 2024-04-02 04:15:36

标题: 使用地区引导令牌将孟加拉文本转录为国际音标IPA

摘要: 将孟加拉文本准确转录为国际音标表（IPA）是一项具有挑战性的任务，这是由于语言的复杂音韵学和依赖上下文的音变。对于地区孟加拉方言来说，这一挑战更大，因为这些方言没有标准化的拼写规范，当地和外来词在这些地区流行，并且不同地区之间存在语音上的多样性。本文提出了一种方法来解决这个序列到序列问题，通过在涵盖孟加拉国六个地区的新数据集上引入“区域指导标记”（DGT）技术。其关键思想是在生成IPA转录之前，向模型提供关于输入文本的区域方言或“地区”的明确信息。这通过在输入序列前添加一个地区标记来实现，有效地引导模型理解与每个地区相关的独特语音模式。DGT技术被应用于在这个新数据集上微调多个基于转换器的模型。实验结果表明了DGT的有效性，ByT5模型在测试集中实现了优越的性能，超过了基于单词的模型如mT5、BanglaT5和umT5。这归因于ByT5处理测试集中高比例的词汇外的能力。所提出的方法突显了将地区方言信息纳入通用自然语言处理系统对于具有多样化音韵变体的语言的重要性。这项工作是“Bhashamul”挑战的结果，旨在解决孟加拉文本中带有地区方言的IPA转录问题。训练和推理笔记本可以通过竞赛链接获得。

更新时间: 2024-04-02 04:15:36

领域: cs.CL,cs.AI,cs.LG,F.2.2; I.2.7

下载: http://arxiv.org/abs/2403.17407v3

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Recent advancements in Large Language Models (LLMs) showcase advanced reasoning, yet NLP evaluations often depend on static benchmarks. Evaluating this necessitates environments that test strategic reasoning in dynamic, competitive scenarios requiring long-term planning. We introduce AucArena, a novel evaluation suite that simulates auctions, a setting chosen for being highly unpredictable and involving many skills related to resource and risk management, while also being easy to evaluate. We conduct controlled experiments using state-of-the-art LLMs to power bidding agents to benchmark their planning and execution skills. Our research demonstrates that LLMs, such as GPT-4, possess key skills for auction participation, such as budget management and goal adherence, which improve with adaptive strategies. This highlights LLMs' potential in modeling complex social interactions in competitive contexts. However, variability in LLM performance and occasional outperformance by simpler methods indicate opportunities for further advancements in LLM design and the value of our simulation environment for ongoing testing and refinement.

Updated: 2024-04-02 04:12:53

标题: 把钱放在你的嘴巴上：评估LLM代理在拍卖竞技场中的战略规划和执行

摘要: 最近对大型语言模型（LLMs）的进展展示出先进的推理能力，然而自然语言处理评估往往依赖于静态基准。评估这一点需要测试在动态、竞争性场景中进行战略推理的环境，这需要长期规划。我们引入了AucArena，这是一个新颖的评估套件，模拟了拍卖场景，选择这个场景是因为它高度不可预测，并涉及许多与资源和风险管理相关的技能，同时也易于评估。我们进行了受控实验，使用最先进的LLMs来支持出价代理，以评估他们的规划和执行技能。我们的研究表明，诸如GPT-4之类的LLMs具有参与拍卖的关键技能，如预算管理和目标遵从，这些技能随着自适应策略的改进而提高。这突显了LLMs在建模竞争环境中的复杂社交互动方面的潜力。然而，LLM性能的变化以及有时简单方法的表现优异表明LLM设计仍有进一步发展的机会，同时也突显了我们的模拟环境在持续测试和改进中的价值。

更新时间: 2024-04-02 04:12:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.05746v2

Gen4DS: Workshop on Data Storytelling in an Era of Generative AI

Storytelling is an ancient and precious human ability that has been rejuvenated in the digital age. Over the last decade, there has been a notable surge in the recognition and application of data storytelling, both in academia and industry. Recently, the rapid development of generative AI has brought new opportunities and challenges to this field, sparking numerous new questions. These questions may not necessarily be quickly transformed into papers, but we believe it is necessary to promptly discuss them to help the community better clarify important issues and research agendas for the future. We thus invite you to join our workshop (Gen4DS) to discuss questions such as: How can generative AI facilitate the creation of data stories? How might generative AI alter the workflow of data storytellers? What are the pitfalls and risks of incorporating AI in storytelling? We have designed both paper presentations and interactive activities (including hands-on creation, group discussion pods, and debates on controversial issues) for the workshop. We hope that participants will learn about the latest advances and pioneering work in data storytelling, engage in critical conversations with each other, and have an enjoyable, unforgettable, and meaningful experience at the event.

Updated: 2024-04-02 04:11:37

标题: Gen4DS：生成式人工智能时代数据叙事研讨会

摘要: 故事讲述是一种古老而宝贵的人类能力，在数字时代得到了复兴。在过去的十年中，学术界和工业界对数据讲述的认可和应用显著增加。最近，生成式人工智能的快速发展为这一领域带来了新的机遇和挑战，引发了许多新问题。这些问题可能并不会很快转化为论文，但我们认为及时讨论它们是有必要的，以帮助社区更好地澄清未来重要问题和研究议程。因此，我们邀请您参加我们的研讨会（Gen4DS），讨论以下问题：生成式人工智能如何促进数据故事的创作？生成式人工智能可能如何改变数据故事讲述者的工作流程？在故事讲述中引入人工智能的陷阱和风险是什么？我们为研讨会设计了论文报告和互动活动（包括实践创作、小组讨论和有争议问题的辩论）。我们希望参与者能了解数据讲述领域的最新进展和开创性工作，与他人进行批判性对话，并在活动中获得愉快、难忘和有意义的体验。

更新时间: 2024-04-02 04:11:37

领域: cs.HC,cs.AI,cs.GR

下载: http://arxiv.org/abs/2404.01622v1

Voice EHR: Introducing Multimodal Audio Data for Health

Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning - compensating for the typical limitations of unimodal clinical datasets. This report introduces a consortium of partners for global work, presents the application used for data collection, and showcases the potential of informative voice EHR to advance the scalability and diversity of audio AI.

Updated: 2024-04-02 04:07:22

标题: Voice EHR：引入多模式音频数据用于健康

摘要: 基于音频数据训练的大型人工智能模型可能具有快速分类患者的潜力，提升医疗决策水平，并通过早期发现潜在地改善结果。现有技术依赖于有限的数据集，使用昂贵的录音设备在高收入、英语国家。这给资源有限、高容量设置中的部署带来挑战，音频数据可能产生深远影响。本报告介绍了一种新型数据类型和相应的收集系统，通过引导问题仅使用移动/网络应用程序捕获健康数据。该应用程序最终产生音频电子健康记录（声音EHR），其中可能包含来自传统声音/呼吸特征、语音模式和具有语义意义的语言的复杂生物标志物 - 弥补了单模临床数据集的典型限制。本报告介绍了一个全球合作伙伴联盟，展示了用于数据收集的应用程序，并展示了信息丰富的声音EHR推动音频人工智能的扩展性和多样性的潜力。

更新时间: 2024-04-02 04:07:22

领域: cs.SD,cs.AI,cs.CY,eess.AS

下载: http://arxiv.org/abs/2404.01620v1

Making Privacy-preserving Federated Graph Analytics with Strong Guarantees Practical (for Certain Queries)

Privacy-preserving federated graph analytics is an emerging area of research. The goal is to run graph analytics queries over a set of devices that are organized as a graph while keeping the raw data on the devices rather than centralizing it. Further, no entity may learn any new information except for the final query result. For instance, a device may not learn a neighbor's data. The state-of-the-art prior work for this problem provides privacy guarantees for a broad set of queries in a strong threat model where the devices can be malicious. However, it imposes an impractical overhead: each device locally requires over 8.79 hours of cpu time and 5.73 GiBs of network transfers per query. This paper presents Colo, a new, low-cost system for privacy-preserving federated graph analytics that requires minutes of cpu time and a few MiBs in network transfers, for a particular subset of queries. At the heart of Colo is a new secure computation protocol that enables a device to securely and efficiently evaluate a graph query in its local neighborhood while hiding device data, edge data, and topology data. An implementation and evaluation of Colo shows that for running a variety of COVID-19 queries over a population of 1M devices, it requires less than 8.4 minutes of a device's CPU time and 4.93 MiBs in network transfers - improvements of up to three orders of magnitude.

Updated: 2024-04-02 04:01:31

标题: 使具有强保障的隐私保护联邦图分析实践化（对于特定查询）

摘要: 隐私保护的联邦图分析是一个新兴的研究领域。其目标是在一个以图形式组织的设备集合上运行图形分析查询，同时保留原始数据在设备上而不是集中存储。此外，除了最终查询结果外，没有实体可以学习到任何新信息。例如，一个设备可能不会学习到邻居的数据。针对这一问题的最新先前工作在一个强威胁模型中为广泛的查询提供了隐私保证，其中设备可能是恶意的。然而，它施加了一个不切实际的开销：每个设备在本地执行一个查询需要超过8.79小时的CPU时间和5.73 GiBs的网络传输。本文介绍了Colo，一个新的低成本系统，用于隐私保护的联邦图分析，需要几分钟的CPU时间和少量的网络传输，用于特定的查询子集。Colo的核心是一种新的安全计算协议，使设备能够在本地邻域安全有效地评估图形查询，同时隐藏设备数据、边缘数据和拓扑数据。Colo的实施和评估表明，在对100万台设备的人口运行各种COVID-19查询时，它仅需要不到8.4分钟的设备CPU时间和4.93 MiBs的网络传输 - 改进了多达三个数量级。

更新时间: 2024-04-02 04:01:31

领域: cs.CR

下载: http://arxiv.org/abs/2404.01619v1

LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models

We present LLM-ABR, the first system that utilizes the generative capabilities of large language models (LLMs) to autonomously design adaptive bitrate (ABR) algorithms tailored for diverse network characteristics. Operating within a reinforcement learning framework, LLM-ABR empowers LLMs to design key components such as states and neural network architectures. We evaluate LLM-ABR across diverse network settings, including broadband, satellite, 4G, and 5G. LLM-ABR consistently outperforms default ABR algorithms.

Updated: 2024-04-02 03:43:55

标题: LLM-ABR：通过大型语言模型设计自适应比特率算法

摘要: 我们提出了LLM-ABR，这是第一个利用大型语言模型（LLMs）的生成能力来自主设计适应不同网络特征的自适应比特率（ABR）算法的系统。在强化学习框架内运行，LLM-ABR使LLMs能够设计关键组件，如状态和神经网络架构。我们评估了LLM-ABR在各种网络设置下的性能，包括宽带、卫星、4G和5G。LLM-ABR始终优于默认的ABR算法。

更新时间: 2024-04-02 03:43:55

领域: cs.NI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2404.01617v1

Cumulative Reasoning with Large Language Models

Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. Additionally, CR sets new state-of-the-art on the MATH dataset, achieving a 4.2% increase from previous methods and a 43% relative improvement in the most challenging problems. By extending CR to incorporate a code environment without external aids like retrieval or web browsing, we further harness the computational and logical reasoning capabilities of LMs, achieving a remarkable 72.2% accuracy on the MATH dataset and outperforming the PAL/PoT method by 38.8%. Our work not only sets new state-of-the-art but also paves the way toward more sophisticated AI reasoning methods. The code is available at https://github.com/iiis-ai/cumulative-reasoning.

Updated: 2024-04-02 03:37:39

标题: 大型语言模型的累积推理

摘要: 尽管语言模型（LMs）最近取得了进展，但它们解决复杂问题的能力仍然有限。本文介绍了累积推理（CR），这是一种利用LMs累积和迭代的新方法，模拟人类思维过程以解决问题。CR将任务分解为较小、可管理的组件，并利用先前的命题进行有效组合，显著增强了问题解决能力。我们通过几个复杂的推理任务展示了CR的优越性：在逻辑推理任务中，它的表现优于现有方法，最高提高了9.3%，在精心策划的FOLIO维基数据集上达到了98.04%的准确率。在24点游戏中，它达到了98%的准确率，比先前的最新技术提高了24%。此外，CR在MATH数据集上创造了新的最新技术，与先前方法相比提高了4.2%，在最具挑战性的问题上相对提高了43%。通过扩展CR以在没有外部辅助的情况下包含代码环境，如检索或网页浏览，我们进一步利用了LMs的计算和逻辑推理能力，在MATH数据集上实现了惊人的72.2%的准确率，并比PAL/PoT方法高出了38.8%。我们的工作不仅创造了新的最新技术，还为更复杂的人工智能推理方法铺平了道路。代码可在https://github.com/iiis-ai/cumulative-reasoning 上找到。

更新时间: 2024-04-02 03:37:39

领域: cs.AI

下载: http://arxiv.org/abs/2308.04371v6

Meta Prompting for AI Systems

In this work, we present a comprehensive study of Meta Prompting (MP), an innovative technique reshaping the utilization of language models (LMs) and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of Meta Prompting, sets it apart from few-shot prompting, and underlines its effectiveness in various AI applications. A key focus is applying Meta Prompting for complex reasoning tasks, showing how it effectively deconstructs intricate problems into simpler sub-problems, enhancing token efficiency, and enabling more equitable problem-solving comparisons, especially against few-shot prompting methods. Additionally, the paper introduces Meta Prompting for prompting tasks, allowing LLMs to self-generate new prompts in a recursive, metaprogramming-like manner. Empirical experiments, including using a Qwen-72B base language model equipped with meta prompt without instruction-tuning to solve MATH problems with accuracy at 46.3%, which surpass the supervised fine-tuned counterpart trained with extensive mathematical QA instruction pairs and even the initial version of GPT-4, solving GSM8K problems with 83.5% accuracy with zero-shot meta-prompted Qwen-72B base language model, and solving the Game of 24 tasks with a 100% success rate using GPT-4, demonstrate the meta prompting's efficacy in achieving high accuracy and efficiency, showcasing Meta Prompting's transformative impact on AI problem-solving. The code is available at https://github.com/meta-prompting/meta-prompting.

Updated: 2024-04-02 03:36:57

标题: AI系统的元提示

摘要: 在这项工作中，我们提出了对元提示（MP）的全面研究，这是一种创新技术，正在重新塑造语言模型（LMs）和人工智能系统在问题解决和数据交互中的利用。基于类型理论和范畴理论，元提示强调信息的结构和语法，而不是传统的以内容为中心的方法。本文探讨了元提示的形式定义，将其与少样本提示方法区分开，并强调了其在各种人工智能应用中的有效性。一个关键焦点是将元提示应用于复杂的推理任务，展示它如何有效地将复杂问题分解为更简单的子问题，提高令牌效率，并实现更公平的问题解决比较，特别是与少样本提示方法对比。此外，本文介绍了元提示用于提示任务，允许LLMs以递归的元编程方式自动生成新提示。实证实验包括使用配备元提示但未经指导调整的Qwen-72B基础语言模型解决数学问题，准确率达到46.3％，超过了使用大量数学问答指导对进行了监督微调的对照模型，甚至超过了GPT-4的初始版本，以零样本元提示的Qwen-72B基础语言模型解决GSM8K问题准确率为83.5％，以及使用GPT-4解决24点游戏任务的成功率达到100％，展示了元提示在实现高准确性和效率方面的有效性，展示了元提示对AI问题解决的变革性影响。代码可在https://github.com/meta-prompting/meta-prompting上找到。

更新时间: 2024-04-02 03:36:57

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2311.11482v5

A new lightweight additive homomorphic encryption algorithm

This article describes a lightweight additive homomorphic algorithm with the same encryption and decryption keys. Compared to standard additive homomorphic algorithms like Paillier, this algorithm reduces the computational cost of encryption and decryption from modular exponentiation to modular multiplication, and reduces the computational cost of ciphertext addition from modular multiplication to modular addition. This algorithm is based on a new mathematical problem: in two division operations, whether it is possible to infer the remainder or divisor based on the dividend when two remainders are related. Currently, it is not obvious how to break this problem, but further exploration is needed to determine if it is sufficiently difficult. In addition to this mathematical problem, we have also designed two interesting mathematical structures for decryption, which are used in the two algorithms mentioned in the main text. It is possible that the decryption structure of Algorithm 2 introduces new security vulnerabilities, but we have not investigated this issue thoroughly.

Updated: 2024-04-02 03:36:37

标题: 一个新的轻量级附加同态加密算法

摘要: 本文描述了一种轻量级的附加同态算法，具有相同的加密和解密密钥。与像Paillier这样的标准附加同态算法相比，该算法通过将加密和解密的计算成本从模幂运算降低到模乘法，并将密文加法的计算成本从模乘法降低到模加法。该算法基于一个新的数学问题：在两次除法运算中，当两个余数相关时，是否可以根据被除数推断出余数或除数。目前，如何破解这个问题还不明显，但需要进一步探索以确定它是否足够困难。除了这个数学问题，我们还设计了两个有趣的解密数学结构，它们被用于主文中提到的两种算法中。算法2的解密结构可能引入新的安全漏洞，但我们尚未彻底调查这个问题。

更新时间: 2024-04-02 03:36:37

领域: cs.CR

下载: http://arxiv.org/abs/2312.06987v3

Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation

The recent advancement of large and powerful models with Text-to-Image (T2I) generation abilities -- such as OpenAI's DALLE-3 and Google's Gemini -- enables users to generate high-quality images from textual prompts. However, it has become increasingly evident that even simple prompts could cause T2I models to exhibit conspicuous social bias in generated images. Such bias might lead to both allocational and representational harms in society, further marginalizing minority groups. Noting this problem, a large body of recent works has been dedicated to investigating different dimensions of bias in T2I systems. However, an extensive review of these studies is lacking, hindering a systematic understanding of current progress and research gaps. We present the first extensive survey on bias in T2I generative models. In this survey, we review prior studies on dimensions of bias: Gender, Skintone, and Geo-Culture. Specifically, we discuss how these works define, evaluate, and mitigate different aspects of bias. We found that: (1) while gender and skintone biases are widely studied, geo-cultural bias remains under-explored; (2) most works on gender and skintone bias investigated occupational association, while other aspects are less frequently studied; (3) almost all gender bias works overlook non-binary identities in their studies; (4) evaluation datasets and metrics are scattered, with no unified framework for measuring biases; and (5) current mitigation methods fail to resolve biases comprehensively. Based on current limitations, we point out future research directions that contribute to human-centric definitions, evaluations, and mitigation of biases. We hope to highlight the importance of studying biases in T2I systems, as well as encourage future efforts to holistically understand and tackle biases, building fair and trustworthy T2I technologies for everyone.

Updated: 2024-04-02 03:36:28

标题: 文本到图像生成中的偏见调查：定义、评估和缓解

摘要: 最近，大型且功能强大的文本到图像（T2I）生成模型的进步，如OpenAI的DALLE-3和谷歌的Gemini，使用户能够从文本提示生成高质量的图像。然而，越来越明显的是，即使是简单的提示也可能导致T2I模型在生成的图像中显示明显的社会偏见。这种偏见可能导致社会上的分配和表征性伤害，进一步边缘化少数群体。鉴于这一问题，最近大量的研究致力于调查T2I系统中不同维度的偏见。然而，对这些研究的广泛审查尚不足，阻碍了对当前进展和研究空白的系统性理解。我们提出了第一个关于T2I生成模型中偏见的广泛调查。在这项调查中，我们回顾了先前关于偏见维度的研究：性别、肤色和地域文化。具体来说，我们讨论了这些研究如何定义、评估和缓解偏见的不同方面。我们发现：（1）虽然性别和肤色偏见被广泛研究，地域文化偏见仍未被充分探讨；（2）大多数关于性别和肤色偏见的作品调查了职业关联，而其他方面研究较少；（3）几乎所有性别偏见作品在研究中忽视了非二元身份；（4）评估数据集和指标分散，没有统一的框架来衡量偏见；（5）目前的缓解方法未能全面解决偏见。基于当前的限制，我们指出了未来的研究方向，这些方向有助于人本主义定义、评估和缓解偏见。我们希望强调研究T2I系统中的偏见的重要性，同时鼓励未来的努力，全面理解和解决偏见问题，为所有人构建公平和可信赖的T2I技术。

更新时间: 2024-04-02 03:36:28

领域: cs.CV,cs.AI,cs.CY

下载: http://arxiv.org/abs/2404.01030v2

Discovering Effective Policies for Land-Use Planning with Neuroevolution

How areas of land are allocated for different uses, such as forests, urban areas, and agriculture, has a large effect on the terrestrial carbon balance, and therefore climate change. Based on available historical data on land-use changes and a simulation of the associated carbon emissions and removals, a surrogate model can be learned that makes it possible to evaluate the different options available to decision-makers efficiently. An evolutionary search process can then be used to discover effective land-use policies for specific locations. Such a system was built on the Project Resilience platform and evaluated with the Land-Use Harmonization dataset LUH2 and the bookkeeping model BLUE. It generates Pareto fronts that trade off carbon impact and amount of land-use change customized to different locations, thus providing a potentially useful tool for land-use planning.

Updated: 2024-04-02 03:35:02

标题: 使用神经进化发现有效的土地利用规划政策

摘要: 土地如何分配给不同的用途，如森林、城市地区和农业，对陆地碳平衡和气候变化有很大影响。基于可用的土地利用变化的历史数据以及相关碳排放和去除的模拟，可以学习一个替代模型，使决策者能够有效评估不同的选项。然后可以使用演化搜索过程来发现特定位置的有效土地利用政策。这样一个系统建立在项目弹性平台上，并使用土地利用协调数据集LUH2和簿记模型BLUE进行评估。它生成权衡碳影响和土地利用变化量的帕累托前沿，定制给不同地点，因此为土地利用规划提供了一个潜在有用的工具。

更新时间: 2024-04-02 03:35:02

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.12304v5

Distributed and Rate-Adaptive Feature Compression

We study the problem of distributed and rate-adaptive feature compression for linear regression. A set of distributed sensors collect disjoint features of regressor data. A fusion center is assumed to contain a pretrained linear regression model, trained on a dataset of the entire uncompressed data. At inference time, the sensors compress their observations and send them to the fusion center through communication-constrained channels, whose rates can change with time. Our goal is to design a feature compression {scheme} that can adapt to the varying communication constraints, while maximizing the inference performance at the fusion center. We first obtain the form of optimal quantizers assuming knowledge of underlying regressor data distribution. Under a practically reasonable approximation, we then propose a distributed compression scheme which works by quantizing a one-dimensional projection of the sensor data. We also propose a simple adaptive scheme for handling changes in communication constraints. We demonstrate the effectiveness of the distributed adaptive compression scheme through simulated experiments.

Updated: 2024-04-02 03:21:06

标题: 分布式和速率自适应特征压缩

摘要: 我们研究了分布式和自适应速率的特征压缩问题，用于线性回归。一组分布式传感器收集回归器数据的不相交特征。假设一个融合中心包含一个在整个未压缩数据集上训练的预先训练的线性回归模型。在推理时间，传感器压缩他们的观察结果并通过通信受限的通道将它们发送到融合中心，这些通道的速率可以随时间改变。我们的目标是设计一个特征压缩方案，可以适应不断变化的通信约束，同时最大化融合中心的推理性能。我们首先获得了最优量化器的形式，假设了对潜在回归器数据分布的了解。在一个实际合理的近似下，我们提出了一个分布式压缩方案，通过对传感器数据的一维投影进行量化来实现。我们还提出了一个简单的自适应方案，用于处理通信约束的变化。通过模拟实验展示了分布式自适应压缩方案的有效性。

更新时间: 2024-04-02 03:21:06

领域: cs.IT,cs.AI,math.IT,stat.ML

下载: http://arxiv.org/abs/2404.02179v1

Audio Simulation for Sound Source Localization in Virtual Evironment

Non-line-of-sight localization in signal-deprived environments is a challenging yet pertinent problem. Acoustic methods in such predominantly indoor scenarios encounter difficulty due to the reverberant nature. In this study, we aim to locate sound sources to specific locations within a virtual environment by leveraging physically grounded sound propagation simulations and machine learning methods. This process attempts to overcome the issue of data insufficiency to localize sound sources to their location of occurrence especially in post-event localization. We achieve 0.786+/- 0.0136 F1-score using an audio transformer spectrogram approach.

Updated: 2024-04-02 03:18:28

标题: 在虚拟环境中进行声源定位的音频模拟

摘要: 在信号匮乏的环境中进行非直射定位是一个具有挑战性但又相关性强的问题。在这种主要为室内场景的声学方法中，由于混响的特性，遇到了困难。在这项研究中，我们旨在通过利用基于物理的声音传播模拟和机器学习方法，在虚拟环境中将声源定位到特定位置。这个过程试图克服数据不足的问题，尤其是在后续事件定位中将声源定位到其发生位置。我们使用音频变换器频谱图方法实现了0.786+/- 0.0136的F1分数。

更新时间: 2024-04-02 03:18:28

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2404.01611v1

Online Uniform Risk Times Sampling: First Approximation Algorithms, Learning Augmentation with Full Confidence Interval Integration

In digital health, the strategy of allocating a limited treatment budget across risk times is crucial to reduce user fatigue. This strategy, however, encounters a significant obstacle due to the unknown actual number of risk times, a factor not adequately addressed by existing methods lacking theoretical guarantees. This paper introduces, for the first time, the online uniform risk times sampling problem within the approximation algorithm framework. We propose two online approximation algorithms for this problem, one with and one without learning augmentation, and provide rigorous theoretical performance guarantees for them using competitive ratio analysis. We assess the performance of our algorithms using both synthetic experiments and a real-world case study on HeartSteps mobile applications.

Updated: 2024-04-02 03:10:33

标题: 在线统一风险时间采样：首次近似算法，学习增强与完整置信区间集成

摘要: 在数字健康领域，将有限的治疗预算分配到风险时间上的策略对减少用户疲劳至关重要。然而，这种策略由于风险时间的实际数量未知而遇到重大障碍，这是现有缺乏理论保证的方法未能充分解决的因素。本文首次介绍了在线均匀风险时间抽样问题在近似算法框架内。我们提出了两种在线近似算法来解决这个问题，一种带有学习增强，一种没有，并使用竞争比分析为它们提供严格的理论性能保证。我们使用合成实验和HeartSteps移动应用的真实案例研究评估了我们算法的性能。

更新时间: 2024-04-02 03:10:33

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.01995v3

FAIRM: Learning invariant representations for algorithmic fairness and domain generalization with minimax optimality

Machine learning methods often assume that the test data have the same distribution as the training data. However, this assumption may not hold due to multiple levels of heterogeneity in applications, raising issues in algorithmic fairness and domain generalization. In this work, we address the problem of fair and generalizable machine learning by invariant principles. We propose a training environment-based oracle, FAIRM, which has desirable fairness and domain generalization properties under a diversity-type condition. We then provide an empirical FAIRM with finite-sample theoretical guarantees under weak distributional assumptions. We then develop efficient algorithms to realize FAIRM in linear models and demonstrate the nonasymptotic performance with minimax optimality. We evaluate our method in numerical experiments with synthetic data and MNIST data and show that it outperforms its counterparts.

Updated: 2024-04-02 03:06:25

标题: FAIRM：学习不变表示以实现算法公平性和域泛化的极小-极大优化

摘要: 机器学习方法通常假设测试数据与训练数据具有相同的分布。然而，在应用中存在多层次的异质性，这种假设可能不成立，从而引发算法公平性和领域泛化方面的问题。在这项工作中，我们通过不变原则解决了公平和可泛化的机器学习问题。我们提出了一个基于训练环境的oracle，FAIRM，其在多样性条件下具有理想的公平性和领域泛化性质。然后，我们在弱分布假设下提供了一个具有有限样本理论保证的经验FAIRM。接着，我们开发了高效的算法来实现在线性模型中的FAIRM，并展示了最小最优性的非渐近性能。我们通过合成数据和MNIST数据的数值实验评估了我们的方法，并表明其优于同类方法。

更新时间: 2024-04-02 03:06:25

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2404.01608v1

Learning Memory Kernels in Generalized Langevin Equations

We introduce a novel approach for learning memory kernels in Generalized Langevin Equations. This approach initially utilizes a regularized Prony method to estimate correlation functions from trajectory data, followed by regression over a Sobolev norm-based loss function with RKHS regularization. Our method guarantees improved performance within an exponentially weighted L^2 space, with the kernel estimation error controlled by the error in estimated correlation functions. We demonstrate the superiority of our estimator compared to other regression estimators that rely on L^2 loss functions and also an estimator derived from the inverse Laplace transform, using numerical examples that highlight its consistent advantage across various weight parameter selections. Additionally, we provide examples that include the application of force and drift terms in the equation.

Updated: 2024-04-02 03:04:09

标题: 学习在广义朗之万方程中的记忆核

摘要: 我们介绍了一种新颖的方法，用于学习广义朗之万方程中的记忆核。该方法最初利用正则Prony方法从轨迹数据中估计相关函数，然后通过基于Sobolev范数的损失函数进行回归，并进行RKHS正则化。我们的方法保证在指数加权的L^2空间内具有改进的性能，核估计误差受到估计相关函数误差的控制。我们通过数值示例展示了我们的估计器相对于依赖于L^2损失函数和从逆拉普拉斯变换导出的估计器的优越性，这些示例突显了在各种权重参数选择中其一致优势。此外，我们提供了包括在方程中应用力和漂移项的示例。

更新时间: 2024-04-02 03:04:09

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.11705v2

Haina Storage: A Decentralized Secure Storage Framework Based on Improved Blockchain Structure

Although the decentralized storage technology based on the blockchain can effectively realize secure data storage on cloud services. However, there are still some problems in the existing schemes, such as low storage capacity and low efficiency. To address related issues, we propose a novel decentralized storage framework, which mainly includes four aspects: (1) we proposed a Bi-direction Circular Linked Chain Structure (BCLCS), which improves data's storage capacity and applicability in decentralized storage. (2) A Proof of Resources (PoR) decision model is proposed. By introducing the network environment as an essential evaluation parameter of storage right decision, the energy and time consumption of decision-making are reduced, and the fairness of decision-making is improved. (3) A chain structure dynamic locking mechanism (CSDLM) is designed to realize anti-traverse and access control. (4) A Bi-directional data Access Mechanism (BDAM) is proposed, which improves the efficiency of data access and acquisition in decentralized storage mode. The experimental results show that the framework has significantly improved the shortcomings of the current decentralized storage.

Updated: 2024-04-02 02:56:27

标题: 海娜存储：基于改进的区块链结构的分散安全存储框架

摘要: 尽管基于区块链的去中心化存储技术可以有效实现云服务上的安全数据存储，但现有方案仍存在一些问题，如存储容量低和效率低。为解决相关问题，我们提出了一种新颖的去中心化存储框架，主要包括四个方面：(1) 我们提出了一种双向循环链式结构（BCLCS），改善了数据在去中心化存储中的存储容量和适用性。(2) 提出了一种资源证明（PoR）决策模型。通过将网络环境引入存储权决策的重要评估参数，减少了决策的能量和时间消耗，并提高了决策的公平性。(3) 设计了一种链式结构动态锁定机制（CSDLM），实现了反遍历和访问控制。(4) 提出了一种双向数据访问机制（BDAM），改善了去中心化存储模式中的数据访问和获取效率。实验结果表明，这一框架显著改善了当前去中心化存储的缺点。

更新时间: 2024-04-02 02:56:27

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2404.01606v1

Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown

Differential privacy (DP) is increasingly used to protect the release of hierarchical, tabular population data, such as census data. A common approach for implementing DP in this setting is to release noisy responses to a predefined set of queries. For example, this is the approach of the TopDown algorithm used by the US Census Bureau. Such methods have an important shortcoming: they cannot answer queries for which they were not optimized. An appealing alternative is to generate DP synthetic data, which is drawn from some generating distribution. Like the TopDown method, synthetic data can also be optimized to answer specific queries, while also allowing the data user to later submit arbitrary queries over the synthetic population data. To our knowledge, there has not been a head-to-head empirical comparison of these approaches. This study conducts such a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity, in-distribution vs. out-of-distribution queries, and privacy guarantees. Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated; for instance, in our experiments, TopDown achieved at least $20\times$ lower error on counting queries than the leading synthetic data method at the same privacy budget. Our findings suggest guidelines for practitioners and the synthetic data research community.

Updated: 2024-04-02 02:54:24

标题: 基准测试私人人口数据发布机制：合成数据 vs. 自上而下

摘要: 差分隐私（DP）越来越被用来保护层次化、表格化的人口数据，例如普查数据的发布。在这种情况下，实现DP的常见方法是对预定义的一组查询发布带有噪音的响应。例如，这是美国人口普查局使用的TopDown算法的方法。这种方法有一个重要的缺点：它们无法回答它们未经优化的查询。一个吸引人的替代方法是生成DP合成数据，它是从某种生成分布中抽取的。像TopDown方法一样，合成数据也可以被优化以回答特定查询，同时还允许数据用户稍后对合成人口数据提交任意查询。据我们所知，这些方法之间还没有进行过正面对比的实证比较。本研究进行了TopDown算法和私密合成数据生成之间的这种比较，以确定查询复杂性、分布内查询与分布外查询以及隐私保证如何影响准确性。我们的结果表明，在分布内查询方面，TopDown算法在隐私-保真度权衡方面比我们评估的任何合成数据方法都要好得多；例如，在我们的实验中，TopDown在相同的隐私预算下，对计数查询的错误至少比领先的合成数据方法低20倍。我们的发现为从业者和合成数据研究社区提供了指导。

更新时间: 2024-04-02 02:54:24

领域: cs.CR

下载: http://arxiv.org/abs/2401.18024v2

Generative AI in the Wild: Prospects, Challenges, and Strategies

Propelled by their remarkable capabilities to generate novel and engaging content, Generative Artificial Intelligence (GenAI) technologies are disrupting traditional workflows in many industries. While prior research has examined GenAI from a techno-centric perspective, there is still a lack of understanding about how users perceive and utilize GenAI in real-world scenarios. To bridge this gap, we conducted semi-structured interviews with (N=18) GenAI users increative industries, investigating the human-GenAI co-creation process within a holistic LUA (Learning, Using and Assessing)framework. Our study uncovered an intriguingly complex landscape: Prospects-GenAI greatly fosters the co-creation between human expertise and GenAI capabilities, profoundly transforming creative workflows; Challenges-Meanwhile, users face substantial uncertainties and complexities arising from resource availability, tool usability, and regulatory compliance; Strategies-In response, users actively devise various strategies to overcome many of such challenges. Our study reveals key implications for the design of future GenAI tools.

Updated: 2024-04-02 02:54:04

标题: 野外环境中的生成式人工智能：前景、挑战和策略

摘要: 由于其出色的能力生成新颖且引人入胜的内容，生成式人工智能（GenAI）技术正在许多行业中颠覆传统工作流程。尽管先前的研究已经从技术中心的角度研究了GenAI，但对于用户如何在现实场景中感知和利用GenAI仍存在一定的理解不足。为了弥补这一差距，我们对（N=18）创意产业中的GenAI用户进行了半结构化访谈，研究了人类-GenAI共同创作过程在全面的LUA（学习、使用和评估）框架内。我们的研究揭示了一个复杂而有趣的景观：前景-GenAI极大地促进了人类专业知识与GenAI能力之间的共同创作，深刻改变了创意工作流程；挑战-同时，用户面临着由资源可用性、工具可用性和合规性所引起的重大不确定性和复杂性；策略-作为回应，用户积极设计各种策略来克服许多这类挑战。我们的研究揭示了未来GenAI工具设计的关键启示。

更新时间: 2024-04-02 02:54:04

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2302.10827v2

Quantifying Self-diagnostic Atomic Knowledge in Chinese Medical Foundation Model: A Computational Analysis

Foundation Models (FMs) have the potential to revolutionize the way users self-diagnose through search engines by offering direct and efficient suggestions. Recent studies primarily focused on the quality of FMs evaluated by GPT-4 or their ability to pass medical exams, no studies have quantified the extent of self-diagnostic atomic knowledge stored in FMs' memory, which is the basis of foundation models to provide factual and reliable suggestions. In this paper, we first constructed a benchmark of Self-diagnostic Atomic Knowledge (SdAK), including the most common types of atomic knowledge involved in self-diagnostic queries, with 17 atomic types and a total of 14, 048 pieces of atomic knowledge. Then, we evaluated both generic and open-source Chinese medical FMs on the benchmark. The experimental results showcase that generic FMs perform better than medical FMs in terms of self-diagnostic atomic knowledge. Error analysis revealed that both generic and medical FMs are sycophantic, e.g., always catering to users' claims when it comes to unknown knowledge. We further explored different types of data commonly adopted for fine-tuning medical FMs, i.e., real-world, semi-distilled, and distilled data, and found that distilled data can benefit FMs most. The code and data are available at https://github.com/FreedomIntelligence/SDAK.

Updated: 2024-04-02 02:48:22

标题: 量化中医基础模型中的自诊断原子知识：一项计算分析

摘要: 基础模型（FMs）有潜力通过搜索引擎革新用户自我诊断的方式，提供直接高效的建议。最近的研究主要集中在通过GPT-4评估FMs的质量或它们通过医学考试的能力，但没有研究量化存储在FMs记忆中的自诊断原子知识的程度，这是基础模型提供事实和可靠建议的基础。在本文中，我们首先构建了一个自诊断原子知识（SdAK）的基准，包括自诊断查询中涉及的最常见类型的原子知识，共17种原子类型和14,048个原子知识。然后，我们在该基准上评估了通用和开源的中文医学FMs。实验结果显示，通用FMs在自诊断原子知识方面表现优于医学FMs。错误分析显示，通用和医学FMs都是阿谀奉承的，例如，在涉及未知知识时总是迎合用户的要求。我们进一步探讨了用于微调医学FMs的不同类型数据，即真实世界、半蒸馏和蒸馏数据，并发现蒸馏数据对FMs最有益。代码和数据可在https://github.com/FreedomIntelligence/SDAK找到。

更新时间: 2024-04-02 02:48:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.11722v3

Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game

Large language models (LLMs) have exhibited memorable strategic behaviors in social deductive games. However, the significance of opinion leadership exhibited by LLM-based agents has been overlooked, which is crucial for practical applications in multi-agent and human-AI interaction settings. Opinion leaders are individuals who have a noticeable impact on the beliefs and behaviors of others within a social group. In this work, we employ the Werewolf game as a simulation platform to assess the opinion leadership of LLMs. The game features the role of the Sheriff, tasked with summarizing arguments and recommending decision options, and therefore serves as a credible proxy for an opinion leader. We develop a framework integrating the Sheriff role and devise two novel metrics for evaluation based on the critical characteristics of opinion leaders. The first metric measures the reliability of the opinion leader, and the second assesses the influence of the opinion leader on other players' decisions. We conduct extensive experiments to evaluate LLMs of different scales. In addition, we collect a Werewolf question-answering dataset (WWQA) to assess and enhance LLM's grasp of the game rules, and we also incorporate human participants for further analysis. The results suggest that the Werewolf game is a suitable test bed to evaluate the opinion leadership of LLMs and few LLMs possess the capacity for opinion leadership.

Updated: 2024-04-02 02:46:18

标题: 大众的舵手？评估大型语言模型在狼人游戏中的舆论领袖地位

摘要: 大型语言模型（LLMs）在社会推理游戏中表现出令人难忘的战略行为。然而，基于LLM的代理表现出的意见领袖性质被忽视了，这对于多代理和人工智能交互环境中的实际应用至关重要。意见领袖是在社会群体中对他人的信念和行为产生显著影响的个体。在这项工作中，我们利用狼人游戏作为模拟平台来评估LLMs的意见领袖性质。该游戏设有警长角色，负责总结论点并推荐决策选项，因此可作为意见领袖的可靠代理。我们开发了一个整合了警长角色的框架，并设计了两个基于意见领袖关键特征的新指标进行评估。第一个指标衡量意见领袖的可靠性，第二个评估意见领袖对其他玩家决策的影响。我们进行了大量实验评估不同规模的LLMs。此外，我们收集了一个狼人问答数据集（WWQA）来评估和增强LLM对游戏规则的掌握，并且还结合了人类参与者进行进一步分析。结果表明，狼人游戏是一个适合评估LLMs意见领袖性质的测试平台，很少有LLMs具有意见领袖的能力。

更新时间: 2024-04-02 02:46:18

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.01602v1

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to perform memorization, reasoning, generalization, and contextual generalization. We show a transformer with only one attention layer can excel in memorization but falls short in other tasks. Then, we show that exhibiting reasoning and generalization ability requires the transformer to have at least two attention layers, while context generalization ability may necessitate three attention layers. Additionally, we identify a class of simple operations that a single attention layer can execute, and show that the complex tasks can be approached as the combinations of these simple operations and thus can be resolved by stacking multiple attention layers. This sheds light on studying more practical and complex tasks beyond our design. Numerical experiments corroborate our theoretical findings.

Updated: 2024-04-02 02:45:12

标题: 变幻深度下Transformer可以学到什么？序列学习任务的案例研究

摘要: 我们研究了变化深度的transformer架构的能力。具体地，我们设计了一组新颖的序列学习任务，系统地评估和理解transformer的深度如何影响其进行记忆、推理、泛化和上下文泛化的能力。我们展示了一个只有一个注意力层的transformer在记忆方面表现优异，但在其他任务中表现不佳。然后，我们展示了表现出推理和泛化能力需要transformer至少具有两个注意力层，而上下文泛化能力可能需要三个注意力层。此外，我们确定了一个单一注意力层可以执行的简单操作类别，并展示了复杂任务可以被看作是这些简单操作的组合，因此可以通过堆叠多个注意力层来解决。这为研究超出我们设计的更实际和复杂的任务提供了启示。数值实验证实了我们的理论发现。

更新时间: 2024-04-02 02:45:12

领域: cs.LG

下载: http://arxiv.org/abs/2404.01601v1

Extremum-Seeking Action Selection for Accelerating Policy Optimization

Reinforcement learning for control over continuous spaces typically uses high-entropy stochastic policies, such as Gaussian distributions, for local exploration and estimating policy gradient to optimize performance. Many robotic control problems deal with complex unstable dynamics, where applying actions that are off the feasible control manifolds can quickly lead to undesirable divergence. In such cases, most samples taken from the ambient action space generate low-value trajectories that hardly contribute to policy improvement, resulting in slow or failed learning. We propose to improve action selection in this model-free RL setting by introducing additional adaptive control steps based on Extremum-Seeking Control (ESC). On each action sampled from stochastic policies, we apply sinusoidal perturbations and query for estimated Q-values as the response signal. Based on ESC, we then dynamically improve the sampled actions to be closer to nearby optima before applying them to the environment. Our methods can be easily added in standard policy optimization to improve learning efficiency, which we demonstrate in various control learning environments.

Updated: 2024-04-02 02:39:17

标题: 寻找极值的行动选择以加速策略优化

摘要: 强化学习用于控制连续空间通常使用高熵随机策略，如高斯分布，用于局部探索和估计策略梯度以优化性能。许多机器人控制问题涉及复杂不稳定动态，其中应用远离可行控制流形的动作可能会迅速导致不良发散。在这种情况下，大多数从环境动作空间中抽取的样本生成低价值轨迹，几乎不 contribue到策略改进，导致学习缓慢或失败。我们提出通过引入基于极值寻找控制（ESC）的额外自适应控制步骤来改进这种无模型RL设置中的动作选择。对于从随机策略中抽样的每个动作，我们应用正弦扰动并查询估计的Q值作为响应信号。基于ESC，我们动态改进抽样的动作以更接近附近的最优值，然后再将其应用于环境中。我们的方法可以轻松添加到标准策略优化中以提高学习效率，在各种控制学习环境中进行了演示。

更新时间: 2024-04-02 02:39:17

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.01598v1

Learning to Compress Prompt in Natural Language Formats

Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on compressing long prompt contexts into soft prompts. However, soft prompt compression encounters limitations in transferability across different LLMs, especially API-based LLMs. To this end, this work aims to compress lengthy prompts in the form of natural language with LLM transferability. This poses two challenges: (i) Natural Language (NL) prompts are incompatible with back-propagation, and (ii) NL prompts lack flexibility in imposing length constraints. In this work, we propose a Natural Language Prompt Encapsulation (Nano-Capsulator) framework compressing original prompts into NL formatted Capsule Prompt while maintaining the prompt utility and transferability. Specifically, to tackle the first challenge, the Nano-Capsulator is optimized by a reward function that interacts with the proposed semantics preserving loss. To address the second question, the Nano-Capsulator is optimized by a reward function featuring length constraints. Experimental results demonstrate that the Capsule Prompt can reduce 81.4% of the original length, decrease inference latency up to 4.5x, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets.

Updated: 2024-04-02 02:38:31

标题: 学习在自然语言格式中压缩提示

摘要: 大型语言模型（LLMs）在处理多个自然语言处理任务方面表现出色，但它们的能力受到长上下文、推理速度慢和计算结果成本高的限制。部署具有精确和信息丰富上下文的LLMs有助于用户更有效地和节省成本地处理大规模数据集。现有工作依赖于将长提示上下文压缩为软提示。然而，软提示压缩在不同LLMs之间的可迁移性上存在限制，特别是基于API的LLMs。因此，本研究旨在以自然语言形式压缩冗长提示，以提高LLM的可迁移性。这带来了两个挑战：（i）自然语言（NL）提示与反向传播不兼容，（ii）NL提示在施加长度约束方面缺乏灵活性。在本研究中，我们提出了一种自然语言提示封装（Nano-Capsulator）框架，将原始提示压缩为NL格式的Capsule Prompt，同时保持提示的实用性和可迁移性。具体来说，为了解决第一个挑战，Nano-Capsulator通过与提出的保留语义损失互动的奖励函数进行优化。为了解决第二个问题，Nano-Capsulator通过具有长度约束的奖励函数进行优化。实验结果表明，Capsule Prompt可以减少原始长度的81.4％，将推理延迟降低最多4.5倍，并节省80.1％的预算开销，同时提供跨不同LLMs和不同数据集的可迁移性。

更新时间: 2024-04-02 02:38:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.18700v2

Online Local False Discovery Rate Control: A Resource Allocation Approach

We consider the problem of sequentially conducting multiple experiments where each experiment corresponds to a hypothesis testing task. At each time point, the experimenter must make an irrevocable decision of whether to reject the null hypothesis (or equivalently claim a discovery) before the next experimental result arrives. The goal is to maximize the number of discoveries while maintaining a low error rate at all time points measured by local False Discovery Rate (FDR). We formulate the problem as an online knapsack problem with exogenous random budget replenishment. We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such regret rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $\Omega(\sqrt{T})$ or even a $\Omega(T)$ regret. With the observation that canonical policies tend to be too optimistic and over claim discoveries, we propose a novel policy that incorporates budget safety buffers. It turns out that a little more safety can greatly enhance efficiency -- small additional logarithmic buffers suffice to reduce the regret from $\Omega(\sqrt{T})$ or even $\Omega(T)$ to $O(\ln^2 T)$. From a practical perspective, we extend the policy to the scenario with continuous arrival distributions as well as time-dependent information structures. We conduct both synthetic experiments and empirical applications on a time series data from New York City taxi passengers to validate the performance of our proposed policies. Our results emphasize how effective policies should be designed in online resource allocation problems with exogenous budget replenishment.

Updated: 2024-04-02 02:36:56

标题: 在线本地虚警率控制：一种资源分配方法

摘要: 我们考虑顺序进行多个实验的问题，其中每个实验对应于一个假设检验任务。在每个时间点，实验者必须在下一个实验结果到达之前做出一个不可撤销的决定，即是否拒绝零假设（或等效地宣称发现）。目标是在所有时间点保持低错误率（由局部虚假发现率（FDR）测量），同时最大化发现的数量。我们将问题制定为具有外生随机预算补充的在线背包问题。我们从一般到达分布开始，并展示了一个简单的策略可以实现$O(\sqrt{T})$的后悔。我们通过展示这种后悔率在一般情况下是无法改进的来补充结果。然后，我们将焦点转移到离散到达分布。我们发现在线资源分配文献中许多现有的重新解决启发式方法，尽管在经典设置中实现了有界损失，但可能会导致$\Omega(\sqrt{T})$甚至$\Omega(T)$的后悔。通过观察到经典策略往往过于乐观和过度声称发现，我们提出了一种结合预算安全缓冲区的新策略。事实证明，稍微更安全可以极大地提高效率 - 小额的额外对数缓冲区就足以将后悔从$\Omega(\sqrt{T})$甚至$\Omega(T)$降低到$O(\ln^2 T)$。从实际角度看，我们将该策略扩展到具有连续到达分布以及时变信息结构的情景。我们在纽约市出租车乘客的时间序列数据上进行了合成实验和实证应用，以验证我们提出的策略的性能。我们的结果强调了如何在具有外生预算补充的在线资源分配问题中设计有效的策略。

更新时间: 2024-04-02 02:36:56

领域: stat.ME,cs.LG,math.OC,math.PR

下载: http://arxiv.org/abs/2402.11425v3

PhysORD: A Neuro-Symbolic Approach for Physics-infused Motion Prediction in Off-road Driving

Motion prediction is critical for autonomous off-road driving, however, it presents significantly more challenges than on-road driving because of the complex interaction between the vehicle and the terrain. Traditional physics-based approaches encounter difficulties in accurately modeling dynamic systems and external disturbance. In contrast, data-driven neural networks require extensive datasets and struggle with explicitly capturing the fundamental physical laws, which can easily lead to poor generalization. By merging the advantages of both methods, neuro-symbolic approaches present a promising direction. These methods embed physical laws into neural models, potentially significantly improving generalization capabilities. However, no prior works were evaluated in real-world settings for off-road driving. To bridge this gap, we present PhysORD, a neural-symbolic approach integrating the conservation law, i.e., the Euler-Lagrange equation, into data-driven neural models for motion prediction in off-road driving. Our experiments showed that PhysORD can accurately predict vehicle motion and tolerate external disturbance by modeling uncertainties. It outperforms existing methods both in accuracy and efficiency and demonstrates data-efficient learning and generalization ability in long-term prediction.

Updated: 2024-04-02 02:36:31

标题: PhysORD：一种神经符号方法，用于越野驾驶中注入物理的运动预测

摘要: 运动预测对于自主越野驾驶至关重要，然而，由于车辆与地形之间的复杂交互作用，它比在路上驾驶要面临更多挑战。传统基于物理的方法在准确建模动态系统和外部干扰方面遇到困难。相反，基于数据驱动的神经网络需要大量数据集，并且在明确捕捉基本物理定律方面存在困难，这很容易导致泛化能力差。通过融合这两种方法的优势，神经符号化方法呈现出一种有前途的方向。这些方法将物理定律嵌入神经模型中，可能显著提高泛化能力。然而，以往的研究并未在实际的越野驾驶环境中进行评估。为了弥补这一差距，我们提出了PhysORD，这是一种神经符号化方法，将守恒定律，即欧拉-拉格朗日方程，整合到基于数据驱动的神经模型中，用于越野驾驶中的运动预测。我们的实验表明，PhysORD能够准确预测车辆运动并容忍外部干扰，通过建模不确定性。它在准确性和效率方面均优于现有方法，并展示了长期预测中的数据效率学习和泛化能力。

更新时间: 2024-04-02 02:36:31

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.01596v1

Propensity Score Alignment of Unpaired Multimodal Data

Multimodal representation learning techniques typically rely on paired samples to learn common representations, but paired samples are challenging to collect in fields such as biology where measurement devices often destroy the samples. This paper presents an approach to address the challenge of aligning unpaired samples across disparate modalities in multimodal representation learning. We draw an analogy between potential outcomes in causal inference and potential views in multimodal observations, which allows us to use Rubin's framework to estimate a common space in which to match samples. Our approach assumes we collect samples that are experimentally perturbed by treatments, and uses this to estimate a propensity score from each modality, which encapsulates all shared information between a latent state and treatment and can be used to define a distance between samples. We experiment with two alignment techniques that leverage this distance -- shared nearest neighbours (SNN) and optimal transport (OT) matching -- and find that OT matching results in significant improvements over state-of-the-art alignment approaches in both a synthetic multi-modal setting and in real-world data from NeurIPS Multimodal Single-Cell Integration Challenge.

Updated: 2024-04-02 02:36:21

标题: 未配对多模态数据的倾向评分对齐

摘要: 多模态表示学习技术通常依赖于配对样本来学习共同表示，但在生物学等领域，由于测量设备经常会破坏样本，配对样本的收集具有挑战性。本文提出了一种方法来解决多模态表示学习中不同模态之间配对样本的对齐挑战。我们在因果推断中的潜在结果和多模态观察中的潜在视图之间建立了类比，从而可以利用Rubin的框架来估计一个匹配样本的共同空间。我们的方法假设我们收集的样本是经过处理实验干扰的，并利用此来估计每种模态的倾向得分，这个倾向得分包含了潜在状态与处理之间的所有共享信息，并可用于定义样本之间的距离。我们尝试了两种利用这种距离的对齐技术--共享最近邻（SNN）和最优输运（OT）匹配--发现OT匹配在合成多模态设置和NeurIPS多模态单细胞整合挑战的真实数据中均显著优于最先进的对齐方法。

更新时间: 2024-04-02 02:36:21

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2404.01595v1

FairRAG: Fair Human Generation via Fair Retrieval Augmentation

Existing text-to-image generative models reflect or even amplify societal biases ingrained in their training data. This is especially concerning for human image generation where models are biased against certain demographic groups. Existing attempts to rectify this issue are hindered by the inherent limitations of the pre-trained models and fail to substantially improve demographic diversity. In this work, we introduce Fair Retrieval Augmented Generation (FairRAG), a novel framework that conditions pre-trained generative models on reference images retrieved from an external image database to improve fairness in human generation. FairRAG enables conditioning through a lightweight linear module that projects reference images into the textual space. To enhance fairness, FairRAG applies simple-yet-effective debiasing strategies, providing images from diverse demographic groups during the generative process. Extensive experiments demonstrate that FairRAG outperforms existing methods in terms of demographic diversity, image-text alignment, and image fidelity while incurring minimal computational overhead during inference.

Updated: 2024-04-02 02:34:22

标题: 公平的人类生成技术：通过公平的检索增强FairRAG

摘要: 现有的文本到图像生成模型反映甚至放大了根植于其训练数据中的社会偏见。这对人类图像生成特别令人担忧，因为模型对某些人口群体存在偏见。现有尝试纠正这一问题的努力受到预训练模型固有限制的阻碍，并未能在很大程度上改善人口多样性。在这项工作中，我们介绍了一种新颖的框架Fair Retrieval Augmented Generation（FairRAG），它将预训练生成模型条件化为从外部图像数据库检索的参考图像，以提高人类生成的公平性。FairRAG通过一个轻量级线性模块将参考图像投影到文本空间，以实现条件化。为了增强公平性，FairRAG应用简单但有效的去偏倚策略，在生成过程中提供来自不同人口群体的图像。大量实验表明，FairRAG在人口多样性、图像-文本对齐和图像保真度方面优于现有方法，在推断过程中带来的计算开销最小。

更新时间: 2024-04-02 02:34:22

领域: cs.CV,cs.CY,cs.LG

下载: http://arxiv.org/abs/2403.19964v2

Classifying Cancer Stage with Open-Source Clinical Large Language Models

Cancer stage classification is important for making treatment and care management plans for oncology patients. Information on staging is often included in unstructured form in clinical, pathology, radiology and other free-text reports in the electronic health record system, requiring extensive work to parse and obtain. To facilitate the extraction of this information, previous NLP approaches rely on labeled training datasets, which are labor-intensive to prepare. In this study, we demonstrate that without any labeled training data, open-source clinical large language models (LLMs) can extract pathologic tumor-node-metastasis (pTNM) staging information from real-world pathology reports. Our experiments compare LLMs and a BERT-based model fine-tuned using the labeled data. Our findings suggest that while LLMs still exhibit subpar performance in Tumor (T) classification, with the appropriate adoption of prompting strategies, they can achieve comparable performance on Metastasis (M) classification and improved performance on Node (N) classification.

Updated: 2024-04-02 02:30:47

标题: 使用开源临床大型语言模型对癌症分期进行分类

摘要: 癌症分期分类对于制定肿瘤患者的治疗和护理管理计划至关重要。分期信息通常以非结构化形式包含在临床、病理、放射学和其他自由文本报告中，存储在电子健康记录系统中，需要大量工作来解析和获取。为了促进该信息的提取，先前的自然语言处理方法依赖于标记的训练数据集，这些数据集准备工作繁重。在本研究中，我们证明，在没有任何标记的训练数据的情况下，开源的临床大型语言模型（LLM）可以从现实世界的病理报告中提取病理肿瘤-淋巴结-转移（pTNM）分期信息。我们的实验比较了LLM和使用标记数据微调的基于BERT的模型。我们的研究结果表明，虽然LLM在肿瘤（T）分类方面仍表现出较差的性能，但通过适当采用提示策略，它们可以实现与转移（M）分类相当的性能，并在淋巴结（N）分类上表现更好。

更新时间: 2024-04-02 02:30:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01589v1

Hallucination Diversity-Aware Active Learning for Text Summarization

Large Language Models (LLMs) have shown propensity to generate hallucinated outputs, i.e., texts that are factually incorrect or unsupported. Existing methods for alleviating hallucinations typically require costly human annotations to identify and correct hallucinations in LLM outputs. Moreover, most of these methods focus on a specific type of hallucination, e.g., entity or token errors, which limits their effectiveness in addressing various types of hallucinations exhibited in LLM outputs. To our best knowledge, in this paper we propose the first active learning framework to alleviate LLM hallucinations, reducing costly human annotations of hallucination needed. By measuring fine-grained hallucinations from errors in semantic frame, discourse and content verifiability in text summarization, we propose HAllucination Diversity-Aware Sampling (HADAS) to select diverse hallucinations for annotations in active learning for LLM finetuning. Extensive experiments on three datasets and different backbone models demonstrate advantages of our method in effectively and efficiently mitigating LLM hallucinations.

Updated: 2024-04-02 02:30:27

标题: 幻觉多样性感知的文本摘要主动学习

摘要: 大型语言模型（LLMs）已显示出生成虚构输出的倾向，即在事实上不正确或不支持的文本。现有的缓解幻觉的方法通常需要昂贵的人工标注来识别和纠正LLM输出中的幻觉。此外，大多数这些方法侧重于特定类型的幻觉，例如实体或标记错误，这限制了它们在解决LLM输出中展示的各种类型幻觉的有效性。据我们所知，在本文中，我们提出了第一个主动学习框架来缓解LLM幻觉，减少了对幻觉所需的昂贵人工标注。通过在文本摘要中的语义框架、话语和内容可验证性错误中测量细粒度幻觉，我们提出了HAllucination Divers-Aware Sampling（HADAS）来选择多样的幻觉进行主动学习，以便进行LLM微调的标注。对三个数据集和不同的骨干模型进行的广泛实验表明，我们的方法在有效和高效地减轻LLM幻觉方面具有优势。

更新时间: 2024-04-02 02:30:27

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01588v1

GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection

The choice of a graph learning (GL) model (i.e., a GL algorithm and its hyperparameter settings) has a significant impact on the performance of downstream tasks. However, selecting the right GL model becomes increasingly difficult and time consuming as more and more GL models are developed. Accordingly, it is of great significance and practical value to equip users of GL with the ability to perform a near-instantaneous selection of an effective GL model without manual intervention. Despite the recent attempts to tackle this important problem, there has been no comprehensive benchmark environment to evaluate the performance of GL model selection methods. To bridge this gap, we present GLEMOS in this work, a comprehensive benchmark for instantaneous GL model selection that makes the following contributions. (i) GLEMOS provides extensive benchmark data for fundamental GL tasks, i.e., link prediction and node classification, including the performances of 366 models on 457 graphs on these tasks. (ii) GLEMOS designs multiple evaluation settings, and assesses how effectively representative model selection techniques perform in these different settings. (iii) GLEMOS is designed to be easily extended with new models, new graphs, and new performance records. (iv) Based on the experimental results, we discuss the limitations of existing approaches and highlight future research directions. To promote research on this significant problem, we make the benchmark data and code publicly available at https://github.com/facebookresearch/glemos.

Updated: 2024-04-02 02:13:00

标题: GLEMOS：瞬时图学习模型选择基准

摘要: 图学习（GL）模型的选择（即GL算法及其超参数设置）对下游任务的性能有重大影响。然而，随着越来越多的GL模型被开发出来，选择合适的GL模型变得越来越困难和耗时。因此，赋予GL用户在无需手动干预的情况下能够快速选择有效GL模型的能力具有重要意义和实际价值。尽管最近有人尝试解决这一重要问题，但尚无全面的基准环境来评估GL模型选择方法的性能。为了填补这一空白，我们在这项工作中提出了GLEMOS，一个用于即时GL模型选择的全面基准，具有以下贡献：（i）GLEMOS为基本的GL任务（例如链接预测和节点分类）提供了广泛的基准数据，包括366个模型在457个图上的表现。（ii）GLEMOS设计了多种评估设置，并评估代表性模型选择技术在这些不同设置中的表现。（iii）GLEMOS设计得易于扩展新模型、新图形和新性能记录。（iv）基于实验结果，我们讨论现有方法的局限性并强调未来的研究方向。为促进对这一重要问题的研究，我们将基准数据和代码公开在https://github.com/facebookresearch/glemos。

更新时间: 2024-04-02 02:13:00

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2404.01578v1

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded. Significant advances based on intrinsic motivation show promising results in simple environments but often get stuck in environments with multimodal and stochastic dynamics. In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity. We consider the environmental state-action transition as a conditional generative process by generating the next-state prediction under the condition of the current state, action, and latent variable, which provides a better understanding of the dynamics and leads a better performance in exploration. We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration, which allows the agent to learn skills by self-supervised exploration without observing extrinsic rewards. We evaluate the proposed method on several image-based simulation tasks and a real robotic manipulating task. Our method outperforms several state-of-the-art environment model-based exploration approaches.

Updated: 2024-04-02 02:09:15

标题: 深度强化学习中自监督探索的变分动力学

摘要: 有效的探索在强化学习中仍然是一个具有挑战性的问题，特别是对于那些外部环境奖励稀缺甚至完全被忽视的任务。基于内在动机的重要进展在简单环境中表现出有希望的结果，但往往在具有多模态和随机动态的环境中陷入困境。在这项工作中，我们提出了一个基于条件变分推断的变分动态模型，用于建模多模态性和随机性。我们将环境状态-动作转换视为一种条件生成过程，通过在当前状态、动作和潜变量的条件下生成下一个状态预测，从而更好地理解动态并在探索中取得更好的性能。我们推导出环境转换的负对数似然的上界，并将该上界作为内在奖励用于探索，使代理能够通过自我监督的探索学习技能而无需观察外部奖励。我们在几个基于图像的模拟任务和一个真实的机器人操作任务上评估了所提出的方法。我们的方法优于几种最先进的基于环境模型的探索方法。

更新时间: 2024-04-02 02:09:15

领域: cs.LG,cs.CV,cs.RO

下载: http://arxiv.org/abs/2010.08755v3

Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

Adversarial ranking attacks have gained increasing attention due to their success in probing vulnerabilities, and, hence, enhancing the robustness, of neural ranking models. Conventional attack methods employ perturbations at a single granularity, e.g., word-level or sentence-level, to a target document. However, limiting perturbations to a single level of granularity may reduce the flexibility of creating adversarial examples, thereby diminishing the potential threat of the attack. Therefore, we focus on generating high-quality adversarial examples by incorporating multi-granular perturbations. Achieving this objective involves tackling a combinatorial explosion problem, which requires identifying an optimal combination of perturbations across all possible levels of granularity, positions, and textual pieces. To address this challenge, we transform the multi-granular adversarial attack into a sequential decision-making process, where perturbations in the next attack step are influenced by the perturbed document in the current attack step. Since the attack process can only access the final state without direct intermediate signals, we use reinforcement learning to perform multi-granular attacks. During the reinforcement learning process, two agents work cooperatively to identify multi-granular vulnerabilities as attack targets and organize perturbation candidates into a final perturbation sequence. Experimental results show that our attack method surpasses prevailing baselines in both attack effectiveness and imperceptibility.

Updated: 2024-04-02 02:08:29

标题: 多粒度对抗攻击黑盒神经排序模型

摘要: 对抗性排名攻击因其在探测漏洞并增强神经排名模型的健壮性方面取得的成功而受到越来越多的关注。传统的攻击方法在目标文档中应用单一粒度的扰动，例如单词级或句子级。然而，将扰动限制在单一粒度可能会降低创建对抗性示例的灵活性，从而降低攻击的潜在威胁。因此，我们专注于通过整合多粒度扰动来生成高质量的对抗性示例。实现这一目标涉及解决组合爆炸问题，需要确定最佳组合的扰动，跨越所有可能的粒度、位置和文本片段。为了解决这一挑战，我们将多粒度对抗攻击转变为一个序贯决策过程，其中下一个攻击步骤中的扰动受当前攻击步骤中扰动文档的影响。由于攻击过程只能访问最终状态而无法获得直接的中间信号，我们使用强化学习来执行多粒度攻击。在强化学习过程中，两个代理合作工作，以确定多粒度漏洞作为攻击目标，并将扰动候选者组织成最终扰动序列。实验结果表明，我们的攻击方法在攻击效果和难以察觉性方面超过了现有基准线。

更新时间: 2024-04-02 02:08:29

领域: cs.IR,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.01574v1

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

Different from a unimodal model whose input is from a single modality, the input (called multi-modal input) of a multi-modal model is from multiple modalities such as image, 3D points, audio, text, etc. Similar to unimodal models, many existing studies show that a multi-modal model is also vulnerable to adversarial perturbation, where an attacker could add small perturbation to all modalities of a multi-modal input such that the multi-modal model makes incorrect predictions for it. Existing certified defenses are mostly designed for unimodal models, which achieve sub-optimal certified robustness guarantees when extended to multi-modal models as shown in our experimental results. In our work, we propose MMCert, the first certified defense against adversarial attacks to a multi-modal model. We derive a lower bound on the performance of our MMCert under arbitrary adversarial attacks with bounded perturbations to both modalities (e.g., in the context of auto-driving, we bound the number of changed pixels in both RGB image and depth image). We evaluate our MMCert using two benchmark datasets: one for the multi-modal road segmentation task and the other for the multi-modal emotion recognition task. Moreover, we compare our MMCert with a state-of-the-art certified defense extended from unimodal models. Our experimental results show that our MMCert outperforms the baseline.

Updated: 2024-04-02 02:05:46

标题: MMCert：多模型模型对抗攻击的可证明防御

摘要: 与单模态模型不同，多模态模型的输入（称为多模态输入）来自多个模态，如图像、3D 点、音频、文本等。与单模态模型类似，许多现有研究表明，多模态模型也容易受到对抗性扰动的影响，攻击者可以向多模态输入的所有模态添加小扰动，使多模态模型对其进行错误预测。现有的认证防御主要针对单模态模型设计，当扩展到多模态模型时，这些防御措施的认证鲁棒性保证并不理想，正如我们的实验结果所示。在我们的工作中，我们提出了MMCert，这是针对多模态模型的首个认证防御措施。我们推导了我们的MMCert 在受到有界扰动的任意对抗性攻击下的性能下限（例如，在自动驾驶的情境中，我们限制了 RGB 图像和深度图像中像素变化的数量）。我们使用两个基准数据集评估我们的MMCert：一个用于多模态道路分割任务，另一个用于多模态情绪识别任务。此外，我们将我们的MMCert 与从单模态模型扩展而来的最先进的认证防御进行比较。我们的实验结果表明，我们的MMCert 胜过基线。

更新时间: 2024-04-02 02:05:46

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2403.19080v3

Evaluating Large Language Models Using Contrast Sets: An Experimental Approach

In the domain of Natural Language Inference (NLI), especially in tasks involving the classification of multiple input texts, the Cross-Entropy Loss metric is widely employed as a standard for error measurement. However, this metric falls short in effectively evaluating a model's capacity to understand language entailments. In this study, we introduce an innovative technique for generating a contrast set for the Stanford Natural Language Inference (SNLI) dataset. Our strategy involves the automated substitution of verbs, adverbs, and adjectives with their synonyms to preserve the original meaning of sentences. This method aims to assess whether a model's performance is based on genuine language comprehension or simply on pattern recognition. We conducted our analysis using the ELECTRA-small model. The model achieved an accuracy of 89.9% on the conventional SNLI dataset but showed a reduced accuracy of 72.5% on our contrast set, indicating a substantial 17% decline. This outcome led us to conduct a detailed examination of the model's learning behaviors. Following this, we improved the model's resilience by fine-tuning it with a contrast-enhanced training dataset specifically designed for SNLI, which increased its accuracy to 85.5% on the contrast sets. Our findings highlight the importance of incorporating diverse linguistic expressions into datasets for NLI tasks. We hope that our research will encourage the creation of more inclusive datasets, thereby contributing to the development of NLI models that are both more sophisticated and effective.

Updated: 2024-04-02 02:03:28

标题: 使用对比集评估大型语言模型：一种实验方法

摘要: 在自然语言推理（NLI）领域，特别是涉及多个输入文本分类的任务中，交叉熵损失度量被广泛应用作为错误测量的标准。然而，这种度量在有效评估模型理解语言推论能力方面存在不足。本研究引入了一种创新技术，用于为斯坦福自然语言推理（SNLI）数据集生成对比集。我们的策略包括自动用动词、副词和形容词的同义词替换，以保留句子的原始含义。这种方法旨在评估模型的性能是基于真正的语言理解还是简单的模式识别。我们使用ELECTRA-small模型进行了分析。该模型在传统的SNLI数据集上实现了89.9%的准确率，但在我们的对比集上显示出降低的准确率，为72.5%，表明有显著的17%下降。这一结果促使我们对模型的学习行为进行了详细的研究。随后，我们通过使用专门设计用于SNLI的对比增强训练数据集对模型进行微调，从而将其准确率提高至85.5%。我们的研究结果强调了将不同语言表达形式纳入NLI任务数据集的重要性。我们希望我们的研究能促进更具包容性数据集的创建，从而为开发更复杂和有效的NLI模型作出贡献。

更新时间: 2024-04-02 02:03:28

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01569v1

Image Captioning in news report scenario

Image captioning strives to generate pertinent captions for specified images, situating itself at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This endeavor is of paramount importance with far-reaching applications in recommendation systems, news outlets, social media, and beyond. Particularly within the realm of news reporting, captions are expected to encompass detailed information, such as the identities of celebrities captured in the images. However, much of the existing body of work primarily centers around understanding scenes and actions. In this paper, we explore the realm of image captioning specifically tailored for celebrity photographs, illustrating its broad potential for enhancing news industry practices. This exploration aims to augment automated news content generation, thereby facilitating a more nuanced dissemination of information. Our endeavor shows a broader horizon, enriching the narrative in news reporting through a more intuitive image captioning framework.

Updated: 2024-04-02 01:57:00

标题: 新闻报道情境下的图像字幕生成

摘要: 图像字幕旨在为指定的图像生成相关的字幕，位于计算机视觉（CV）和自然语言处理（NLP）的交叉点。这一努力具有重要意义，在推荐系统、新闻媒体、社交媒体等领域具有广泛的应用。特别是在新闻报道领域，预期字幕能够包含详细信息，如图像中捕捉到的名人身份。然而，现有大部分研究主要集中在理解场景和动作方面。本文探讨了专门针对名人照片定制的图像字幕领域，展示了其在增强新闻行业实践方面的广泛潜力。这一探索旨在增强自动化新闻内容生成，从而促进更细致的信息传播。我们的努力展示了更广阔的视野，通过更直观的图像字幕框架丰富了新闻报道的叙述。

更新时间: 2024-04-02 01:57:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.16209v3

Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability

What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps toward answering this question. We evaluate thirteen model architectures capable of causal language modeling across a suite of synthetic in-context learning tasks. These selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, state space model inspired, and other emerging attention alternatives. We discover that all the considered architectures can perform in-context learning under a wider range of conditions than previously documented. Additionally, we observe stark differences in statistical efficiency and consistency by varying the number of in-context examples and task difficulty. We also measure each architecture's predisposition towards in-context learning when presented with the option to memorize rather than leverage in-context examples. Finally, and somewhat surprisingly, we find that several attention alternatives are sometimes competitive with or better in-context learners than transformers. However, no single architecture demonstrates consistency across all tasks, with performance either plateauing or declining when confronted with a significantly larger number of in-context examples than those encountered during gradient-based training.

Updated: 2024-04-02 01:54:53

标题: 注意力在ICL中是否是必需的？探讨模型架构与上下文学习能力之间的关系

摘要: 这项实证研究探讨了模型架构与在上下文学习能力之间的关系。我们评估了十三种能够进行因果语言建模的模型架构在一系列合成上下文学习任务中的表现。这些选择的架构代表了一系列范式，包括循环和基于卷积的神经网络、变压器、受状态空间模型启发以及其他新兴的注意力替代方案。我们发现所有考虑的架构在更广泛的条件下均能进行上下文学习，这一点比以往记录的情况更为广泛。此外，我们观察到通过改变上下文示例的数量和任务难度，统计效率和一致性存在显著差异。当给予架构记忆而不是利用上下文示例的选择时，我们还衡量了每种架构对于上下文学习的倾向。最后，令人惊讶的是，我们发现一些注意力替代方案有时与变压器相竞争或更好地适应上下文学习。然而，没有单一架构在所有任务中表现一致，当面临比梯度训练期间遇到的上下文示例数量显著更多时，性能要么趋于稳定，要么下降。

更新时间: 2024-04-02 01:54:53

领域: cs.LG

下载: http://arxiv.org/abs/2310.08049v3

Rumor Detection with a novel graph neural network approach

The wide spread of rumors on social media has caused a negative impact on people's daily life, leading to potential panic, fear, and mental health problems for the public. How to debunk rumors as early as possible remains a challenging problem. Existing studies mainly leverage information propagation structure to detect rumors, while very few works focus on correlation among users that they may coordinate to spread rumors in order to gain large popularity. In this paper, we propose a new detection model, that jointly learns both the representations of user correlation and information propagation to detect rumors on social media. Specifically, we leverage graph neural networks to learn the representations of user correlation from a bipartite graph that describes the correlations between users and source tweets, and the representations of information propagation with a tree structure. Then we combine the learned representations from these two modules to classify the rumors. Since malicious users intend to subvert our model after deployment, we further develop a greedy attack scheme to analyze the cost of three adversarial attacks: graph attack, comment attack, and joint attack. Evaluation results on two public datasets illustrate that the proposed MODEL outperforms the state-of-the-art rumor detection models. We also demonstrate our method performs well for early rumor detection. Moreover, the proposed detection method is more robust to adversarial attacks compared to the best existing method. Importantly, we show that it requires a high cost for attackers to subvert user correlation pattern, demonstrating the importance of considering user correlation for rumor detection.

Updated: 2024-04-02 01:52:13

标题: 用一种新颖的图神经网络方法进行谣言检测

摘要: 社交媒体上谣言的广泛传播对人们的日常生活造成了负面影响，导致公众可能产生恐慌、恐惧和心理健康问题。如何尽早揭穿谣言仍然是一个具有挑战性的问题。现有研究主要利用信息传播结构来检测谣言，而很少有研究关注用户之间的相关性，他们可能协调传播谣言以获得大量的关注。在本文中，我们提出了一个新的检测模型，它联合学习用户相关性和信息传播的表示，以便在社交媒体上检测谣言。具体来说，我们利用图神经网络从描述用户和源推文之间相关性的二部图中学习用户相关性的表示，以及利用树结构学习信息传播的表示。然后，我们结合这两个模块学到的表示来对谣言进行分类。由于恶意用户意图在部署后破坏我们的模型，我们进一步开发了一种贪婪攻击方案，分析了三种敌对攻击的成本：图攻击、评论攻击和联合攻击。在两个公共数据集上的评估结果表明，所提出的模型优于最先进的谣言检测模型。我们还展示了我们的方法在早期谣言检测方面表现良好。此外，与最好的现有方法相比，所提出的检测方法对敌对攻击更加稳健。重要的是，我们表明攻击者要破坏用户相关性模式需要付出高昂的代价，这表明考虑用户相关性对于谣言检测的重要性。

更新时间: 2024-04-02 01:52:13

领域: cs.AI

下载: http://arxiv.org/abs/2403.16206v3

Automated User Story Generation with Test Case Specification Using Large Language Model

Modern Software Engineering era is moving fast with the assistance of artificial intelligence (AI), especially Large Language Models (LLM). Researchers have already started automating many parts of the software development workflow. Requirements Engineering (RE) is a crucial phase that begins the software development cycle through multiple discussions on a proposed scope of work documented in different forms. RE phase ends with a list of user-stories for each unit task identified through discussions and usually these are created and tracked on a project management tool such as Jira, AzurDev etc. In this research we developed a tool "GeneUS" using GPT-4.0 to automatically create user stories from requirements document which is the outcome of the RE phase. The output is provided in JSON format leaving the possibilities open for downstream integration to the popular project management tools. Analyzing requirements documents takes significant effort and multiple meetings with stakeholders. We believe, automating this process will certainly reduce additional load off the software engineers, and increase the productivity since they will be able to utilize their time on other prioritized tasks.

Updated: 2024-04-02 01:45:57

标题: 使用大型语言模型进行自动化用户故事生成与测试用例规范化

摘要: 现代软件工程时代在人工智能（AI）的帮助下发展迅速，尤其是大型语言模型（LLM）。研究人员已经开始自动化软件开发工作流的许多部分。需求工程（RE）是一个至关重要的阶段，通过对不同形式的文档中提出的工作范围进行多次讨论，开始软件开发周期。需求工程阶段以为每个单元任务确定的用户故事清单结束，通常这些用户故事是在项目管理工具（如Jira、AzurDev等）上创建和跟踪的。在这项研究中，我们使用GPT-4.0开发了一个名为"GeneUS"的工具，可以从需求文档自动生成用户故事，这是需求工程阶段的结果。输出以JSON格式提供，为将来与流行的项目管理工具集成留下了可能性。分析需求文档需要大量的工作量和与利益相关者的多次会议。我们相信，自动化这个过程肯定会减轻软件工程师的额外负担，并提高生产效率，因为他们将能够利用时间处理其他优先任务。

更新时间: 2024-04-02 01:45:57

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.01558v1

TeleChat Technical Report

In this technical report, we present TeleChat, a collection of large language models (LLMs) with parameters of 3 billion, 7 billion and 12 billion. It includes pretrained language models as well as fine-tuned chat models that is aligned with human preferences. TeleChat is initially pretrained on an extensive corpus containing a diverse collection of texts from both English and Chinese languages, including trillions of tokens. Subsequently, the model undergoes fine-tuning to align with human preferences, following a detailed methodology that we describe. We evaluate the performance of TeleChat on various tasks, including language understanding, mathematics, reasoning, code generation, and knowledge-based question answering. Our findings indicate that TeleChat achieves comparable performance to other open-source models of similar size across a wide range of public benchmarks. To support future research and applications utilizing LLMs, we release the fine-tuned model checkpoints of TeleChat's 7B and 12B variant, along with code and a portion of our pretraining data, to the public community.

Updated: 2024-04-02 01:45:11

标题: TeleChat技术报告

摘要: 在这份技术报告中，我们介绍了TeleChat，这是一个拥有30亿、70亿和120亿参数的大型语言模型（LLMs）集合。它包括预训练语言模型以及与人类偏好相符的微调聊天模型。TeleChat最初在包含来自英语和中文语言的各种文本的广泛语料库上进行预训练，包括数万亿个标记。随后，该模型经过微调以与人类偏好保持一致，遵循我们描述的详细方法论。我们评估了TeleChat在各种任务上的表现，包括语言理解、数学、推理、代码生成和基于知识的问题回答。我们的研究结果表明，TeleChat在各种公共基准测试中取得了与其他开源模型相似规模的可比性能。为了支持未来利用LLMs进行研究和应用，我们向公众社区发布了TeleChat的70亿和120亿变体的微调模型检查点，以及代码和部分预训练数据。

更新时间: 2024-04-02 01:45:11

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2401.03804v2

Distributed Autonomous Swarm Formation for Dynamic Network Bridging

Effective operation and seamless cooperation of robotic systems are a fundamental component of next-generation technologies and applications. In contexts such as disaster response, swarm operations require coordinated behavior and mobility control to be handled in a distributed manner, with the quality of the agents' actions heavily relying on the communication between them and the underlying network. In this paper, we formulate the problem of dynamic network bridging in a novel Decentralized Partially Observable Markov Decision Process (Dec-POMDP), where a swarm of agents cooperates to form a link between two distant moving targets. Furthermore, we propose a Multi-Agent Reinforcement Learning (MARL) approach for the problem based on Graph Convolutional Reinforcement Learning (DGN) which naturally applies to the networked, distributed nature of the task. The proposed method is evaluated in a simulated environment and compared to a centralized heuristic baseline showing promising results. Moreover, a further step in the direction of sim-to-real transfer is presented, by additionally evaluating the proposed approach in a near Live Virtual Constructive (LVC) UAV framework.

Updated: 2024-04-02 01:45:03

标题: 分布式自主群体形成用于动态网络桥接

摘要: 机器人系统的有效运作和无缝合作是下一代技术和应用的基本组成部分。在灾难响应等情境中，群体操作需要协调的行为和移动控制以分布式方式处理，代理的行动质量在很大程度上取决于它们之间和基础网络之间的通信。本文中，我们在一个新颖的分布式部分可观察马尔可夫决策过程（Dec-POMDP）中制定了动态网络桥接的问题，其中一群代理协作形成两个远距离移动目标之间的连接。此外，我们提出了一种基于图卷积强化学习（DGN）的多代理强化学习（MARL）方法，该方法自然适用于任务的网络化、分布式性质。提出的方法在模拟环境中进行了评估，并与集中式启发式基线进行了比较，展示了有希望的结果。此外，还提出了朝着模拟到现实转移的更进一步的步骤，通过在接近实时虚拟构建（LVC）无人机框架中额外评估所提出的方法。

更新时间: 2024-04-02 01:45:03

领域: cs.MA,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.01557v1

"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To bridge this gap, we analyzed sensitive disclosures in real-world ChatGPT conversations and conducted semi-structured interviews with 19 LLM-based CA users. We found that users are constantly faced with trade-offs between privacy, utility, and convenience when using LLM-based CAs. However, users' erroneous mental models and the dark patterns in system design limited their awareness and comprehension of the privacy risks. Additionally, the human-like interactions encouraged more sensitive disclosures, which complicated users' ability to navigate the trade-offs. We discuss practical design guidelines and the needs for paradigm shifts to protect the privacy of LLM-based CA users.

Updated: 2024-04-02 01:32:06

标题: "这是一个公平的游戏，还是？研究用户在使用基于LLM的对话代理时如何平衡风险和利益"

摘要: 大规模语言模型（LLM）为基础的对话代理（CAs）的广泛使用，特别是在高风险领域，引发了许多隐私问题。构建尊重用户隐私的道德LLM基础的CAs需要深入了解最关注用户的隐私风险。然而，现有研究，主要是以模型为中心，未能提供用户的观点。为了弥合这一差距，我们分析了现实世界ChatGPT对话中的敏感披露，并进行了与19名LLM基础CA用户的半结构化访谈。我们发现，用户在使用LLM基础的CA时不断面临隐私、实用性和便利性之间的权衡。然而，用户错误的心智模型和系统设计中的黑暗模式限制了他们对隐私风险的意识和理解。此外，类似人类的互动促使了更多敏感披露，这使得用户难以权衡。我们讨论了实用设计指南和保护LLM基础CA用户隐私的范式转变的需求。

更新时间: 2024-04-02 01:32:06

领域: cs.HC,cs.AI,cs.CR

下载: http://arxiv.org/abs/2309.11653v2

Vulnerabilities of Foundation Model Integrated Federated Learning Under Adversarial Threats

Federated Learning (FL) addresses critical issues in machine learning related to data privacy and security, yet suffering from data insufficiency and imbalance under certain circumstances. The emergence of foundation models (FMs) offers potential solutions to the limitations of existing FL frameworks, e.g., by generating synthetic data for model initialization. However, due to the inherent safety concerns of FMs, integrating FMs into FL could introduce new risks, which remains largely unexplored. To address this gap, we conduct the first investigation on the vulnerability of FM integrated FL (FM-FL) under adversarial threats. Based on a unified framework of FM-FL, we introduce a novel attack strategy that exploits safety issues of FM to compromise FL client models. Through extensive experiments with well-known models and benchmark datasets in both image and text domains, we reveal the high susceptibility of the FM-FL to this new threat under various FL configurations. Furthermore, we find that existing FL defense strategies offer limited protection against this novel attack approach. This research highlights the critical need for enhanced security measures in FL in the era of FMs.

Updated: 2024-04-02 01:31:24

标题: 基于对抗性威胁的基于集成联邦学习的基础模型的脆弱性

摘要: 联邦学习（FL）解决了与数据隐私和安全相关的机器学习中的关键问题，但在某些情况下受到数据不足和不平衡的困扰。基础模型（FMs）的出现为现有FL框架的限制提供了潜在解决方案，例如通过生成合成数据进行模型初始化。然而，由于FMs固有的安全问题，将FMs集成到FL中可能会引入新的风险，这方面尚未得到充分探讨。为了填补这一空白，我们对FM集成FL（FM-FL）在对抗威胁下的脆弱性进行了首次调查。基于FM-FL的统一框架，我们引入了一种利用FM安全问题来破坏FL客户端模型的新型攻击策略。通过对图像和文本领域中知名模型和基准数据集进行广泛实验，我们揭示了FM-FL在各种FL配置下对这一新威胁的高度敏感性。此外，我们发现现有的FL防御策略对这种新型攻击方法提供的保护有限。这项研究强调了在基础模型时代加强FL安全措施的迫切需要。

更新时间: 2024-04-02 01:31:24

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2401.10375v2

Multi-Agent Reinforcement Learning with Control-Theoretic Safety Guarantees for Dynamic Network Bridging

Addressing complex cooperative tasks in safety-critical environments poses significant challenges for Multi-Agent Systems, especially under conditions of partial observability. This work introduces a hybrid approach that integrates Multi-Agent Reinforcement Learning with control-theoretic methods to ensure safe and efficient distributed strategies. Our contributions include a novel setpoint update algorithm that dynamically adjusts agents' positions to preserve safety conditions without compromising the mission's objectives. Through experimental validation, we demonstrate significant advantages over conventional MARL strategies, achieving comparable task performance with zero safety violations. Our findings indicate that integrating safe control with learning approaches not only enhances safety compliance but also achieves good performance in mission objectives.

Updated: 2024-04-02 01:30:41

标题: 动态网络桥接的具有控制理论安全保证的多智能体强化学习

摘要: 在安全关键环境中解决复杂的合作任务对于多智能体系统提出了重大挑战，尤其是在部分可观察性条件下。本研究引入了一种混合方法，将多智能体强化学习与控制理论方法相结合，以确保安全和高效的分布式策略。我们的贡献包括一种新颖的设定点更新算法，动态调整智能体的位置以保持安全条件而不损害任务目标。通过实验验证，我们展示了与传统的多智能体强化学习策略相比的显著优势，实现了与零安全违规相媲美的任务性能。我们的研究结果表明，将安全控制与学习方法相结合不仅增强了安全合规性，还实现了良好的任务目标性能。

更新时间: 2024-04-02 01:30:41

领域: cs.MA,cs.AI,cs.LG,cs.NI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.01551v1

mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning

In the fields of computer vision and natural language processing, multimodal chart question-answering, especially involving color, structure, and textless charts, poses significant challenges. Traditional methods, which typically involve either direct multimodal processing or a table-to-text conversion followed by language model analysis, have limitations in effectively handling these complex scenarios. This paper introduces a novel multimodal chart question-answering model, specifically designed to address these intricate tasks. Our model integrates visual and linguistic processing, overcoming the constraints of existing methods. We adopt a dual-phase training approach: the initial phase focuses on aligning image and text representations, while the subsequent phase concentrates on optimizing the model's interpretative and analytical abilities in chart-related queries. This approach has demonstrated superior performance on multiple public datasets, particularly in handling color, structure, and textless chart questions, indicating its effectiveness in complex multimodal tasks.

Updated: 2024-04-02 01:28:44

标题: mChartQA：基于视觉-语言对齐和推理的多模态图表问题回答的通用基准

摘要: 在计算机视觉和自然语言处理领域，多模态图表问答，特别是涉及颜色、结构和无文本图表的问答，面临着重大挑战。传统方法通常涉及直接的多模态处理或表格到文本转换，然后进行语言模型分析，对于有效处理这些复杂场景存在局限性。本文介绍了一种新颖的多模态图表问答模型，专门设计用于解决这些复杂任务。我们的模型整合了视觉和语言处理，克服了现有方法的限制。我们采用双阶段训练方法：初始阶段专注于对齐图像和文本表示，而随后的阶段集中于优化模型在与图表相关查询中的解释和分析能力。这种方法在多个公开数据集上表现出优越性能，特别是在处理颜色、结构和无文本图表问题方面，表明其在复杂多模态任务中的有效性。

更新时间: 2024-04-02 01:28:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01548v1

InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning

Continual learning requires the model to learn multiple tasks sequentially. In continual learning, the model should possess the ability to maintain its performance on old tasks (stability) and the ability to adapt to new tasks continuously (plasticity). Recently, parameter-efficient fine-tuning (PEFT), which involves freezing a pre-trained model and injecting a small number of learnable parameters to adapt to downstream tasks, has gained increasing popularity in continual learning. Although existing continual learning methods based on PEFT have demonstrated superior performance compared to those not based on PEFT, most of them do not consider how to eliminate the interference of the new task on the old tasks, which inhibits the model from making a good trade-off between stability and plasticity. In this work, we propose a new PEFT method, called interference-free low-rank adaptation (InfLoRA), for continual learning. InfLoRA injects a small number of parameters to reparameterize the pre-trained weights and shows that fine-tuning these injected parameters is equivalent to fine-tuning the pre-trained weights within a subspace. Furthermore, InfLoRA designs this subspace to eliminate the interference of the new task on the old tasks, making a good trade-off between stability and plasticity. Experimental results show that InfLoRA outperforms existing state-of-the-art continual learning methods on multiple datasets.

Updated: 2024-04-02 01:16:20

标题: InfLoRA：无干扰低秩适应连续学习

摘要: 持续学习要求模型按顺序学习多个任务。在持续学习中，模型应具备在旧任务上保持性能（稳定性）和不断适应新任务（可塑性）的能力。最近，基于参数高效微调（PEFT）的方法越来越受欢迎，在这种方法中，冻结预训练模型并注入少量可学习参数以适应下游任务。尽管基于PEFT的现有持续学习方法表现优于非PEFT的方法，但大多数方法并未考虑如何消除新任务对旧任务的干扰，这会阻碍模型在稳定性和可塑性之间取得良好的平衡。在这项工作中，我们提出了一种新的PEFT方法，称为无干扰低秩适应（InfLoRA），用于持续学习。InfLoRA注入少量参数重新参数化预训练权重，并表明微调这些注入的参数等同于在一个子空间内微调预训练权重。此外，InfLoRA设计了这个子空间以消除新任务对旧任务的干扰，实现了在稳定性和可塑性之间的良好平衡。实验结果表明，InfLoRA在多个数据集上优于现有的最先进持续学习方法。

更新时间: 2024-04-02 01:16:20

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.00228v2

ExpertQA: Expert-Curated Questions and Attributed Answers

As language models are adopted by a more sophisticated and diverse set of users, the importance of guaranteeing that they provide factually correct information supported by verifiable sources is critical across fields of study. This is especially the case for high-stakes fields, such as medicine and law, where the risk of propagating false information is high and can lead to undesirable societal consequences. Previous work studying attribution and factuality has not focused on analyzing these characteristics of language model outputs in domain-specific scenarios. In this work, we conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality, by bringing domain experts in the loop. Specifically, we collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions. In addition, we ask experts to improve upon responses from language models. The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.

Updated: 2024-04-02 01:07:05

标题: 专家问答：专家筛选问题和对应答案

摘要: 随着语言模型被更复杂和多样化的用户采用，确保它们提供由可验证来源支持的事实正确信息的重要性在各个研究领域中至关重要。这在高风险领域，如医学和法律中尤为重要，因为传播虚假信息的风险很高，可能导致不良社会后果。先前研究关于归因和事实性并未集中分析语言模型在特定领域情景下的这些特征。在这项工作中，我们通过将领域专家纳入评估，对几个代表性系统的响应在归因和事实性各个方面进行人类评估。具体而言，我们从32个研究领域的484名参与者中收集专家策划的问题，然后请同一专家评估对其问题的生成回答。此外，我们要求专家改进语言模型的回答。我们分析的结果是ExpertQA，这是一个高质量的长格式问答数据集，涵盖32个领域的2177个问题，以及回答中主张的验证答案和归因。

更新时间: 2024-04-02 01:07:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.07852v2

Predicting the Performance of Foundation Models via Agreement-on-the-Line

Estimating the out-of-distribution performance in regimes where labels are scarce is critical to safely deploy foundation models. Recently, it was shown that ensembles of neural networks observe the phenomena ``agreement-on-the-line'', which can be leveraged to reliably predict OOD performance without labels. However, in contrast to classical neural networks that are trained on in-distribution data from scratch for numerous epochs, foundation models undergo minimal finetuning from heavily pretrained weights, which may reduce the ensemble diversity needed to observe agreement-on-the-line. In our work, we demonstrate that when lightly finetuning multiple runs from a $\textit{single}$ foundation model, the choice of randomness during training (linear head initialization, data ordering, and data subsetting) can lead to drastically different levels of agreement-on-the-line in the resulting ensemble. Surprisingly, only random head initialization is able to reliably induce agreement-on-the-line in finetuned foundation models across vision and language benchmarks. Second, we demonstrate that ensembles of $\textit{multiple}$ foundation models pretrained on different datasets but finetuned on the same task can also show agreement-on-the-line. In total, by careful construction of a diverse ensemble, we can utilize agreement-on-the-line-based methods to predict the OOD performance of foundation models with high precision.

Updated: 2024-04-02 00:54:38

标题: 通过“在线协议”预测基础模型的性能

摘要: 在标签稀缺的情况下，估计基础模型在分布之外的性能是安全部署基础模型的关键。最近，研究表明神经网络集成观察到“在线一致性”现象，可以利用这一现象可靠地预测无标签的OOD性能。然而，与经过多次训练的经典神经网络不同，基础模型经历了从预训练权重到微调的最小程度的调优，这可能降低了观察到“在线一致性”所需的集成多样性。在我们的工作中，我们证明当从单个基础模型进行轻微微调多次运行时，训练过程中的随机性选择（线性头初始化、数据排序和数据子集）可以导致得到的集成中的“在线一致性”水平截然不同。令人惊讶的是，只有随机头初始化能够可靠地在视觉和语言基准上引发微调基础模型中的“在线一致性”。其次，我们证明预训练于不同数据集但在相同任务上微调的多个基础模型集成也可以展现“在线一致性”。总的来说，通过精心构建多样化的集成，我们可以利用基于“在线一致性”的方法高精度预测基础模型的OOD性能。

更新时间: 2024-04-02 00:54:38

领域: cs.LG

下载: http://arxiv.org/abs/2404.01542v1

How Effective Are Neural Networks for Fixing Security Vulnerabilities

Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise: (1) large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and (2) automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs. This paper is the first to study and compare Java vulnerability repair capabilities of LLMs and DL-based APR models. The contributions include that we (1) apply and evaluate five LLMs (Codex, CodeGen, CodeT5, PLBART and InCoder), four fine-tuned LLMs, and four DL-based APR techniques on two real-world Java vulnerability benchmarks (Vul4J and VJBench), (2) design code transformations to address the training and test data overlapping threat to Codex, (3) create a new Java vulnerability repair benchmark VJBench, and its transformed version VJBench-trans and (4) evaluate LLMs and APR techniques on the transformed vulnerabilities in VJBench-trans. Our findings include that (1) existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities. (2) Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities. (3) Our new VJBench reveals that LLMs and APR models fail to fix many Common Weakness Enumeration (CWE) types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling. (4) Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities. The results call for innovations to enhance automated Java vulnerability repair such as creating larger vulnerability repair training data, tuning LLMs with such data, and applying code simplification transformation to facilitate vulnerability repair.

Updated: 2024-04-02 00:48:11

标题: 神经网络在修复安全漏洞方面有多有效？

摘要: 安全漏洞修复是一项艰巨的任务，迫切需要自动化。两组技术显示出了希望：（1）已经在源代码上进行了预训练以用于代码完成等任务的大型代码语言模型（LLMs），以及（2）使用深度学习（DL）模型自动修复软件缺陷的自动化程序修复（APR）技术。本文是第一篇研究和比较LLMs和基于DL的APR模型的Java漏洞修复能力的论文。贡献包括：我们（1）在两个真实世界的Java漏洞基准（Vul4J和VJBench）上应用和评估五个LLMs（Codex、CodeGen、CodeT5、PLBART和InCoder）、四个经过微调的LLMs和四个基于DL的APR技术，（2）设计代码转换来解决对Codex的训练和测试数据重叠的威胁，（3）创建一个新的Java漏洞修复基准VJBench及其转换版本VJBench-trans，以及（4）评估LLMs和APR技术对VJBench-trans中转换的漏洞的修复情况。我们的研究结果包括：（1）现有的LLMs和APR模型修复的Java漏洞数量很少。Codex修复了10.2个（20.4%）漏洞，数量最多。（2）使用通用APR数据进行微调可以提高LLMs的漏洞修复能力。（3）我们的新VJBench显示LLMs和APR模型未能修复许多常见弱点枚举（CWE）类型，例如CWE-325缺少加密步骤和CWE-444 HTTP请求欺骗。（4）Codex仍然修复了8.3个转换后的漏洞，在转换后的漏洞上表现优于所有其他LLMs和APR模型。结果呼吁创新，以增强自动化Java漏洞修复，例如创建更大的漏洞修复训练数据，使用这些数据调整LLMs，并应用代码简化转换以促进漏洞修复。

更新时间: 2024-04-02 00:48:11

领域: cs.SE,cs.AI,cs.CR

下载: http://arxiv.org/abs/2305.18607v2

Contrastive Credibility Propagation for Reliable Semi-Supervised Learning

Producing labels for unlabeled data is error-prone, making semi-supervised learning (SSL) troublesome. Often, little is known about when and why an algorithm fails to outperform a supervised baseline. Using benchmark datasets, we craft five common real-world SSL data scenarios: few-label, open-set, noisy-label, and class distribution imbalance/misalignment in the labeled and unlabeled sets. We propose a novel algorithm called Contrastive Credibility Propagation (CCP) for deep SSL via iterative transductive pseudo-label refinement. CCP unifies semi-supervised learning and noisy label learning for the goal of reliably outperforming a supervised baseline in any data scenario. Compared to prior methods which focus on a subset of scenarios, CCP uniquely outperforms the supervised baseline in all scenarios, supporting practitioners when the qualities of labeled or unlabeled data are unknown.

Updated: 2024-04-02 00:44:45

标题: 对比可信传播用于可靠的半监督学习

摘要: 生产无标签数据的标签是容易出错的，这使得半监督学习（SSL）变得麻烦。通常，我们很少知道算法何时以及为什么无法超越监督基线。利用基准数据集，我们构建了五种常见的真实世界半监督学习数据情形：少量标签、开放集、嘈杂标签以及标记和未标记集中的类分布不平衡/不匹配。我们提出了一种名为对比可信传播（CCP）的新算法，通过迭代的传导伪标签细化来进行深度SSL。CCP将半监督学习和嘈杂标签学习统一起来，目的是在任何数据情形中可靠地超越监督基线。与以前专注于某些情形的方法相比，CCP在所有情形中独特地超越监督基线，支持从业者在标记或未标记数据的质量未知时的决策。

更新时间: 2024-04-02 00:44:45

领域: cs.LG

下载: http://arxiv.org/abs/2211.09929v4

Interpretable Dimensionality Reduction by Feature Preserving Manifold Approximation and Projection

Nonlinear dimensionality reduction lacks interpretability due to the absence of source features in low-dimensional embedding space. We propose an interpretable method featMAP to preserve source features by tangent space embedding. The core of our proposal is to utilize local singular value decomposition (SVD) to approximate the tangent space which is embedded to low-dimensional space by maintaining the alignment. Based on the embedding tangent space, featMAP enables the interpretability by locally demonstrating the source features and feature importance. Furthermore, featMAP embeds the data points by anisotropic projection to preserve the local similarity and original density. We apply featMAP to interpreting digit classification, object detection and MNIST adversarial examples. FeatMAP uses source features to explicitly distinguish the digits and objects and to explain the misclassification of adversarial examples. We also compare featMAP with other state-of-the-art methods on local and global metrics.

Updated: 2024-04-02 00:33:42

标题: 可解释的降维方法：通过特征保持流形近似和投影达到

摘要: 非线性降维由于在低维嵌入空间中缺乏源特征而缺乏可解释性。我们提出了一种名为featMAP的可解释方法，通过切线空间嵌入来保留源特征。我们提议的核心是利用局部奇异值分解（SVD）来近似切线空间，通过保持对齐将其嵌入到低维空间中。基于嵌入的切线空间，featMAP能够通过局部展示源特征和特征重要性实现可解释性。此外，featMAP通过各向异性投影嵌入数据点，以保留局部相似性和原始密度。我们将featMAP应用于解释数字分类、物体检测和MNIST对抗示例。FeatMAP使用源特征明确区分数字和物体，解释对抗示例的误分类。我们还将featMAP与其他最先进的方法在局部和全局指标上进行了比较。

更新时间: 2024-04-02 00:33:42

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2211.09321v2

MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval

Due to the success of large-scale visual-language pretraining (VLP) models and the widespread use of image-text retrieval in industry areas, it is now critically necessary to reduce the model size and streamline their mobile-device deployment. Single- and dual-stream model structures are commonly used in image-text retrieval with the goal of closing the semantic gap between textual and visual modalities. While single-stream models use deep feature fusion to achieve more accurate cross-model alignment, dual-stream models are better at offline indexing and fast inference.We propose a Multi-teacher Cross-modality Alignment Distillation (MCAD) technique to integrate the advantages of single- and dual-stream models. By incorporating the fused single-stream features into the image and text features of the dual-stream model, we formulate new modified teacher similarity distributions and features. Then, we conduct both distribution and feature distillation to boost the capability of the student dual-stream model, achieving high retrieval performance without increasing inference complexity.Extensive experiments demonstrate the remarkable performance and high efficiency of MCAD on image-text retrieval tasks. Furthermore, we implement a lightweight CLIP model on Snapdragon/Dimensity chips with only $\sim$100M running memory and $\sim$8.0ms search latency, achieving the mobile-device application of VLP models.

Updated: 2024-04-02 00:12:21

标题: MCAD：多教师跨模态对齐蒸馏用于高效图像-文本检索

摘要: 由于大规模视觉-语言预训练（VLP）模型的成功和图像-文本检索在行业领域的广泛应用，现在迫切需要减小模型尺寸并简化它们在移动设备上的部署。单流和双流模型结构通常用于图像-文本检索，旨在缩小文本和视觉模态之间的语义差距。单流模型使用深度特征融合来实现更准确的跨模型对齐，而双流模型则更适用于离线索引和快速推断。我们提出了一种多教师跨模态对齐蒸馏（MCAD）技术，以整合单流和双流模型的优势。通过将融合的单流特征整合到双流模型的图像和文本特征中，我们形成新的修改后的教师相似度分布和特征。然后，我们进行分布和特征蒸馏，以提升学生双流模型的能力，实现高效的检索性能而不增加推断复杂性。大量实验展示了MCAD在图像-文本检索任务中的出色性能和高效性。此外，我们在Snapdragon/Dimensity芯片上实现了一个轻量级的CLIP模型，仅需约100M的运行内存和约8.0ms的搜索延迟，实现了VLP模型在移动设备上的应用。

更新时间: 2024-04-02 00:12:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2310.19654v3

Laying Anchors: Semantically Priming Numerals in Language Modeling

Off-the-shelf pre-trained language models have become the de facto standard in NLP pipelines for a multitude of downstream tasks. However, the inability of these models to properly encode numerals limits their performance on tasks requiring numeric comprehension. We introduce strategies to semantically prime numerals in any corpus by generating anchors governed by the distribution of numerals in said corpus, thereby enabling mathematically grounded representations of these numeral tokens. We establish the superiority of our proposed techniques through evaluation on a range of numeracy tasks for both in-domain (seen) and out-domain (unseen) numerals. Further, we expand our empirical evaluations to numerals ranging from 1 to 10 billion, a significantly broader range compared to previous studies of the same nature, and we demonstrate significant improvements in the mathematical grounding of our learned embeddings.

Updated: 2024-04-02 00:02:00

标题: 《打下锚点：语义启动语言建模中的数字》

摘要: 现成的预训练语言模型已成为自然语言处理管道中多种下游任务的事实标准。然而，这些模型无法正确编码数字，限制了它们在需要数值理解的任务上的性能。我们引入了一种策略，通过生成由语料库中数字分布控制的锚点来在任何语料库中语义上启动数字，从而使这些数字令牌的表示具有数学基础。我们通过对一系列数字任务进行评估，证明了我们提出的技术的优越性，包括领域内（已知）和领域外（未知）数字。此外，我们将经验评估扩展到从1到100亿的数字范围，与之前同类研究相比，这是一个显著更广泛的范围，我们展示了我们学习到的嵌入在数学基础上的显著改进。

更新时间: 2024-04-02 00:02:00

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01536v1