Arxiv Day: Article

ML KPI Prediction in 5G and B5G Networks

Network operators are facing new challenges when meeting the needs of their customers. The challenges arise due to the rise of new services, such as HD video streaming, IoT, autonomous driving, etc., and the exponential growth of network traffic. In this context, 5G and B5G networks have been evolving to accommodate a wide range of applications and use cases. Additionally, this evolution brings new features, like the ability to create multiple end-to-end isolated virtual networks using network slicing. Nevertheless, to ensure the quality of service, operators must maintain and optimize their networks in accordance with the key performance indicators (KPIs) and the slice service-level agreements (SLAs). In this paper, we introduce a machine learning (ML) model used to estimate throughput in 5G and B5G networks with end-to-end (E2E) network slices. Then, we combine the predicted throughput with the current network state to derive an estimate of other network KPIs, which can be used to further improve service assurance. To assess the efficiency of our solution, a performance metric was proposed. Numerical evaluations demonstrate that our KPI prediction model outperforms those derived from other methods with the same or nearly the same computational time.

Updated: 2024-04-01 23:34:28

标题: 5G和B5G网络中的ML关键绩效指标预测

摘要: 网络运营商在满足客户需求时面临着新的挑战。这些挑战是由新服务的崛起所引起的，如高清视频流媒体、物联网、自动驾驶等，以及网络流量的指数级增长。在这种背景下，5G和B5G网络已经在不断发展，以适应各种应用程序和用例。此外，这种演进还带来了新的功能，如利用网络切片创建多个端到端隔离的虚拟网络的能力。然而，为了确保服务质量，运营商必须根据关键绩效指标（KPI）和切片服务级别协议（SLAs）来维护和优化他们的网络。在本文中，我们介绍了一个用于估算5G和B5G网络中端到端（E2E）网络切片吞吐量的机器学习（ML）模型。然后，我们将预测的吞吐量与当前网络状态相结合，以推导出其他网络KPI的估算值，这可以用于进一步提高服务保证。为了评估我们解决方案的效率，提出了一个性能指标。数值评估表明，我们的KPI预测模型在相同或几乎相同的计算时间内优于其他方法推导的模型。

更新时间: 2024-04-01 23:34:28

领域: cs.NI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.01530v1

Categorical semiotics: Foundations for Knowledge Integration

The integration of knowledge extracted from diverse models, whether described by domain experts or generated by machine learning algorithms, has historically been challenged by the absence of a suitable framework for specifying and integrating structures, learning processes, data transformations, and data models or rules. In this work, we extend algebraic specification methods to address these challenges within such a framework. In our work, we tackle the challenging task of developing a comprehensive framework for defining and analyzing deep learning architectures. We believe that previous efforts have fallen short by failing to establish a clear connection between the constraints a model must adhere to and its actual implementation. Our methodology employs graphical structures that resemble Ehresmann's sketches, interpreted within a universe of fuzzy sets. This approach offers a unified theory that elegantly encompasses both deterministic and non-deterministic neural network designs. Furthermore, we highlight how this theory naturally incorporates fundamental concepts from computer science and automata theory. Our extended algebraic specification framework, grounded in graphical structures akin to Ehresmann's sketches, offers a promising solution for integrating knowledge across disparate models and domains. By bridging the gap between domain-specific expertise and machine-generated insights, we pave the way for more comprehensive, collaborative, and effective approaches to knowledge integration and modeling.

Updated: 2024-04-01 23:19:01

标题: 范畴符号学：知识整合的基础

摘要: 摘要：从不同模型中提取的知识集成，无论是由领域专家描述还是由机器学习算法生成，历来受到缺乏适当框架来指定和集成结构、学习过程、数据转换和数据模型或规则的挑战。在本研究中，我们将代数规范方法扩展到这样一个框架中，以解决这些挑战。在我们的工作中，我们致力于开发一个全面的框架，用于定义和分析深度学习架构。我们认为先前的努力之所以未能成功，是因为未能建立模型必须遵循的约束与其实际实现之间的明确联系。我们的方法论采用类似于埃尔斯曼素描的图形结构，解释为模糊集的一个宇宙中。这种方法提供了一个统一的理论，优雅地涵盖了确定性和非确定性神经网络设计。此外，我们强调了这一理论如何自然地结合了计算机科学和自动机理论中的基本概念。我们基于类似于埃尔斯曼素描的图形结构的扩展代数规范框架，为跨不同模型和领域的知识集成提供了一个有前途的解决方案。通过弥合领域专业知识和机器生成的见解之间的鸿沟，我们为更全面、协作和有效的知识集成和建模方法铺平了道路。

更新时间: 2024-04-01 23:19:01

领域: cs.AI

下载: http://arxiv.org/abs/2404.01526v1

Estimating truncation effects of quantum bosonic systems using sampling algorithms

To simulate bosons on a qubit- or qudit-based quantum computer, one has to regularize the theory by truncating infinite-dimensional local Hilbert spaces to finite dimensions. In the search for practical quantum applications, it is important to know how big the truncation errors can be. In general, it is not easy to estimate errors unless we have a good quantum computer. In this paper, we show that traditional sampling methods on classical devices, specifically Markov Chain Monte Carlo, can address this issue for a rather generic class of bosonic systems with a reasonable amount of computational resources available today. As a demonstration, we apply this idea to the scalar field theory on a two-dimensional lattice, with a size that goes beyond what is achievable using exact diagonalization methods. This method can be used to estimate the resources needed for realistic quantum simulations of bosonic theories, and also, to check the validity of the results of the corresponding quantum simulations.

Updated: 2024-04-01 23:15:00

标题: 使用抽样算法估计量子玻色系统的截断效应

摘要: 为了在基于量子比特或量子数位的量子计算机上模拟玻色子，必须通过将无限维局部希尔伯特空间截断为有限维度来规范理论。在寻找实际的量子应用时，了解截断误差可以有多大是很重要的。一般来说，除非我们拥有一个好的量子计算机，否则很难估计误差。在本文中，我们展示了在经典设备上的传统抽样方法，具体来说是马尔科夫链蒙特卡罗，可以用今天可用的合理计算资源来解决这个问题，针对一类相当通用的玻色子系统。作为演示，我们将这个想法应用到了二维格点上的标量场理论中，其规模超出了使用精确对角化方法可达到的范围。这种方法可以用来估计实际量子模拟玻色子理论所需的资源，并且用来检验相应量子模拟结果的有效性。

更新时间: 2024-04-01 23:15:00

领域: quant-ph,cs.AI,cs.LG,hep-lat,hep-th

下载: http://arxiv.org/abs/2212.08546v3

On Train-Test Class Overlap and Detection for Image Retrieval

How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings are striking. Not only is there a dramatic drop in performance, but it is inconsistent across methods, changing the ranking.What does it take to focus on objects or interest and ignore background clutter when indexing? Do we need to train an object detector and the representation separately? Do we need location supervision? We introduce Single-stage Detect-to-Retrieve (CiDeR), an end-to-end, single-stage pipeline to detect objects of interest and extract a global image representation. We outperform previous state-of-the-art on both existing training sets and the new RGLDv2-clean. Our dataset is available at https://github.com/dealicious-inc/RGLDv2-clean.

Updated: 2024-04-01 23:11:15

标题: 关于图像检索中训练-测试类别重叠和检测的研究

摘要: 在图像检索中，训练集和评估集之间不重叠的重要性有多大？我们重新审视了谷歌地标v2干净数据集，这是最流行的训练集，通过识别并移除与重新审视的牛津和巴黎数据集之间的类别重叠[34]。通过比较原始和新的RGLDv2-clean在再现最先进方法的基准上，我们的发现令人震惊。不仅性能急剧下降，而且在方法之间不一致，改变了排名。当索引时如何专注于对象或兴趣，并忽略背景杂乱？我们需要单独训练对象检测器和表示吗？我们需要位置监督吗？我们介绍了Single-stage Detect-to-Retrieve (CiDeR)，一个端到端的、单阶段的管道，用于检测感兴趣的对象并提取全局图像表示。我们在现有训练集和新的RGLDv2-clean上表现优异。我们的数据集可在https://github.com/dealicious-inc/RGLDv2-clean找到。

更新时间: 2024-04-01 23:11:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01524v1

Fair MP-BOOST: Fair and Interpretable Minipatch Boosting

Ensemble methods, particularly boosting, have established themselves as highly effective and widely embraced machine learning techniques for tabular data. In this paper, we aim to leverage the robust predictive power of traditional boosting methods while enhancing fairness and interpretability. To achieve this, we develop Fair MP-Boost, a stochastic boosting scheme that balances fairness and accuracy by adaptively learning features and observations during training. Specifically, Fair MP-Boost sequentially samples small subsets of observations and features, termed minipatches (MP), according to adaptively learned feature and observation sampling probabilities. We devise these probabilities by combining loss functions, or by combining feature importance scores to address accuracy and fairness simultaneously. Hence, Fair MP-Boost prioritizes important and fair features along with challenging instances, to select the most relevant minipatches for learning. The learned probability distributions also yield intrinsic interpretations of feature importance and important observations in Fair MP-Boost. Through empirical evaluation of simulated and benchmark datasets, we showcase the interpretability, accuracy, and fairness of Fair MP-Boost.

Updated: 2024-04-01 23:01:07

标题: 公平MP-BOOST：公平且可解释的小批量Boosting

摘要: 集成方法，特别是boosting，已经成为高度有效和广泛接受的用于表格数据的机器学习技术。在本文中，我们旨在利用传统boosting方法的强大预测能力，同时提高公平性和可解释性。为了实现这一目标，我们开发了Fair MP-Boost，这是一种平衡公平性和准确性的随机boosting方案，通过在训练过程中自适应学习特征和观察值来实现。具体来说，Fair MP-Boost根据自适应学习的特征和观察值抽样概率，依次抽样小的观察值和特征子集，称为minipatches（MP）。我们通过结合损失函数或特征重要性分数来设计这些概率，以同时解决准确性和公平性问题。因此，Fair MP-Boost优先选择重要和公平的特征以及具有挑战性的实例，以选择最相关的minipatches进行学习。学习到的概率分布还可以为Fair MP-Boost中的特征重要性和重要观察值提供内在解释。通过对模拟和基准数据集的实证评估，我们展示了Fair MP-Boost的可解释性、准确性和公平性。

更新时间: 2024-04-01 23:01:07

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.01521v1

The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance

Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available at https://github.com/jdonnelly36/Rashomon_Importance_Distribution.

Updated: 2024-04-01 22:59:31

标题: “拉细摩重要性分布：摆脱不稳定的、基于单一模型的变量重要性”

摘要: 量化变量重要性对于回答遗传学、公共政策和医学等领域的高风险问题至关重要。目前的方法通常计算在给定数据集上训练的给定模型的变量重要性。然而，对于给定的数据集，可能有许多模型可以同样好地解释目标结果；如果不考虑所有可能的解释，不同的研究人员可能会在相同的数据下得出许多相互矛盾但同样有效的结论。此外，即使考虑了给定数据集的所有可能解释，这些见解可能也不具有泛化性，因为并非所有好的解释都能稳定地跨越合理的数据扰动。我们提出了一个新的变量重要性框架，可以量化在所有好的模型集合中的变量重要性，并且在数据分布上是稳定的。我们的框架非常灵活，可以与大多数现有的模型类和全局变量重要性度量集成。我们通过实验表明，我们的框架可以恢复复杂模拟设置中的变量重要性排序，而其他方法则失败。此外，我们展示了我们的框架可以准确估计变量在基础数据分布中的真实重要性。我们为我们的估计器的一致性和有限样本误差率提供了理论保证。最后，我们通过一个探索哪些基因在预测HIV感染者的HIV载量中重要的实际案例研究，突出了一个以前未被研究与HIV相关的重要基因。代码可在https://github.com/jdonnelly36/Rashomon_Importance_Distribution 上找到。

更新时间: 2024-04-01 22:59:31

领域: cs.LG,q-bio.GN,stat.ML

下载: http://arxiv.org/abs/2309.13775v4

Deep transfer learning for intrusion detection in industrial control networks: A comprehensive review

Globally, the external internet is increasingly being connected to industrial control systems. As a result, there is an immediate need to protect these networks from a variety of threats. The key infrastructure of industrial activity can be protected from harm using an intrusion detection system (IDS), a preventive mechanism that seeks to recognize new kinds of dangerous threats and hostile activities. This review examines the most recent artificial-intelligence techniques that are used to create IDSs in many kinds of industrial control networks, with a particular emphasis on IDS-based deep transfer learning (DTL). DTL can be seen as a type of information-fusion approach that merges and/or adapts knowledge from multiple domains to enhance the performance of a target task, particularly when labeled data in the target domain is scarce. Publications issued after 2015 were considered. These selected publications were divided into three categories: DTL-only and IDS-only works are examined in the introduction and background section, and DTL-based IDS papers are considered in the core section of this review. By reading this review paper, researchers will be able to gain a better grasp of the current state of DTL approaches used in IDSs in many different types of network. Other useful information, such as the datasets used, the type of DTL employed, the pre-trained network, IDS techniques, the evaluation metrics including accuracy/F-score and false-alarm rate, and the improvements gained, are also covered. The algorithms and methods used in several studies are presented, and the principles of DTL-based IDS subcategories are presented to the reader and illustrated deeply and clearly

Updated: 2024-04-01 22:57:46

标题: 深度迁移学习在工业控制网络入侵检测中的应用：综合评述

摘要: 在全球范围内，外部互联网越来越多地与工业控制系统连接在一起。因此，迫切需要保护这些网络免受各种威胁。工业活动的关键基础设施可以通过入侵检测系统（IDS）来保护免受伤害，这是一种预防机制，旨在识别新的危险威胁和敌对活动。本综述考察了最新的人工智能技术，这些技术用于在许多种工业控制网络中创建IDS，特别强调基于IDS的深度转移学习（DTL）。DTL可以被视为一种信息融合方法，它合并和/或调整来自多个领域的知识，以增强目标任务的性能，特别是当目标领域中的标记数据稀缺时。考虑了2015年后发表的出版物。这些选定的出版物被分为三类：仅DTL和仅IDS作品在介绍和背景部分进行考察，基于DTL的IDS论文在本综述的核心部分进行考虑。通过阅读本综述文章，研究人员将能够更好地了解目前在许多不同类型的网络中使用的IDS中的DTL方法的现状。还涵盖了其他有用的信息，例如使用的数据集、所采用的DTL类型、预训练网络、IDS技术、包括准确率/F分数和虚警率在内的评估指标，以及所获得的改进。介绍了几项研究中使用的算法和方法，并向读者展示了基于DTL的IDS子类别的原则，并进行了深入而清晰的说明。

更新时间: 2024-04-01 22:57:46

领域: cs.CR,cs.AI,cs.LG,cs.NI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2304.10550v2

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an optimal transport problem. By encoding a temporal consistency prior into a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov-Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsupervised learning setting, where our method is used to generate pseudo-labels for self-training. We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Updated: 2024-04-01 22:53:47

标题: 时间上一致的非平衡最优输运在无监督动作分割中的应用

摘要: 我们提出了一种新颖的方法来解决长时间未修剪视频的动作分割任务，基于解决最优输运问题。通过将时间一致性先验编码到Gromov-Wasserstein问题中，我们能够从视频帧和动作类别之间的噪声亲和/匹配成本矩阵中解码出一个时间一致的分割。与先前的方法不同，我们的方法不需要知道视频的动作顺序以实现时间一致性。此外，我们的结果（融合）Gromov-Wasserstein问题可以通过在GPU上使用几次投影镜面下降来有效解决。我们在一个无监督学习设置中展示了我们方法的有效性，其中我们的方法用于为自我训练生成伪标签。我们在Breakfast、50-Salads、YouTube说明和Desktop Assembly数据集上评估了我们的分割方法和无监督学习流程，为无监督视频动作分割任务提供了最先进的结果。

更新时间: 2024-04-01 22:53:47

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2404.01518v1

Addressing Heterogeneity in Federated Load Forecasting with Personalization Layers

The advent of smart meters has enabled pervasive collection of energy consumption data for training short-term load forecasting models. In response to privacy concerns, federated learning (FL) has been proposed as a privacy-preserving approach for training, but the quality of trained models degrades as client data becomes heterogeneous. In this paper we propose the use of personalization layers for load forecasting in a general framework called PL-FL. We show that PL-FL outperforms FL and purely local training, while requiring lower communication bandwidth than FL. This is done through extensive simulations on three different datasets from the NREL ComStock repository.

Updated: 2024-04-01 22:53:09

标题: 使用个性化层解决联邦负荷预测中的异质性

摘要: 智能电表的出现使得能源消耗数据的普遍收集成为可能，用于训练短期负荷预测模型。为了应对隐私问题，提出了联邦学习（FL）作为一种保护隐私的训练方法，但是随着客户数据变得异质化，训练模型的质量会降低。本文提出了在一个名为PL-FL的通用框架中使用个性化层进行负荷预测。我们展示了PL-FL优于FL和纯粹本地训练，同时比FL需要更低的通信带宽。通过对NREL ComStock存储库中三个不同数据集的广泛模拟实验来实现。

更新时间: 2024-04-01 22:53:09

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2404.01517v1

Hierarchical Generative Adversarial Imitation Learning with Mid-level Input Generation for Autonomous Driving on Urban Environments

Deriving robust control policies for realistic urban navigation scenarios is not a trivial task. In an end-to-end approach, these policies must map high-dimensional images from the vehicle's cameras to low-level actions such as steering and throttle. While pure Reinforcement Learning (RL) approaches are based exclusively on engineered rewards, Generative Adversarial Imitation Learning (GAIL) agents learn from expert demonstrations while interacting with the environment, which favors GAIL on tasks for which a reward signal is difficult to derive, such as autonomous driving. However, training deep networks directly from raw images on RL tasks is known to be unstable and troublesome. To deal with that, this work proposes a hierarchical GAIL-based architecture (hGAIL) which decouples representation learning from the driving task to solve the autonomous navigation of a vehicle. The proposed architecture consists of two modules: a GAN (Generative Adversarial Net) which generates an abstract mid-level input representation, which is the Bird's-Eye View (BEV) from the surroundings of the vehicle; and the GAIL which learns to control the vehicle based on the BEV predictions from the GAN as input. hGAIL is able to learn both the policy and the mid-level representation simultaneously as the agent interacts with the environment. Our experiments made in the CARLA simulation environment have shown that GAIL exclusively from cameras without BEV) fails to even learn the task, while hGAIL, after training exclusively on one city, was able to autonomously navigate successfully in 98% of the intersections of a new city not used in training phase.

Updated: 2024-04-01 22:51:21

标题: Hierarchical Generative Adversarial Imitation Learning with Mid-level Input Generation for Autonomous Driving on Urban Environments Hierarchical生成对抗性模仿学习与中层输入生成用于城市环境下的自动驾驶

摘要: 在现实城市导航场景中推导出稳健的控制策略并非易事。在端到端方法中，这些策略必须将车辆摄像头的高维图像映射到低级动作，如转向和油门。虽然纯强化学习（RL）方法仅基于设计好的奖励，但生成对抗模仿学习（GAIL）代理在与环境互动时从专家演示中学习，这对于难以推导奖励信号的任务（如自动驾驶）有利于GAIL。然而，直接从原始图像训练深度网络在RL任务上已知是不稳定和麻烦的。为了解决这个问题，本文提出了一种基于分层GAIL的架构（hGAIL），它将表示学习与驾驶任务解耦以解决车辆的自主导航问题。所提出的架构由两个模块组成：一个生成对抗网络（GAN），它生成一个抽象的中级输入表示，即车辆周围的俯视图（BEV）；以及GAIL，它根据GAN生成的BEV预测来控制车辆。hGAIL能够同时学习策略和中级表示，因为代理与环境互动。我们在CARLA模拟环境中进行的实验表明，仅从摄像头（没有BEV）学习的GAIL甚至无法学习任务，而经过仅在一个城市进行训练后，hGAIL能够在一个新的未被训练的城市中成功地自主导航98%的路口。

更新时间: 2024-04-01 22:51:21

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2302.04823v4

A Continued Pretrained LLM Approach for Automatic Medical Note Generation

LLMs are revolutionizing NLP tasks. However, the use of the most advanced LLMs, such as GPT-4, is often prohibitively expensive for most specialized fields. We introduce HEAL, the first continuously trained 13B LLaMA2-based LLM that is purpose-built for medical conversations and measured on automated scribing. Our results demonstrate that HEAL outperforms GPT-4 and PMC-LLaMA in PubMedQA, with an accuracy of 78.4\%. It also achieves parity with GPT-4 in generating medical notes. Remarkably, HEAL surpasses GPT-4 and Med-PaLM 2 in identifying more correct medical concepts and exceeds the performance of human scribes and other comparable models in correctness and completeness.

Updated: 2024-04-01 22:48:56

标题: 一种持续预训练的LLM方法用于自动医疗记录生成

摘要: LLMs正在革新自然语言处理任务。然而，对于大多数专业领域来说，使用最先进的LLMs，如GPT-4，往往价格昂贵。我们介绍了HEAL，这是第一个基于13B LLaMA2持续训练的LLM，专门用于医学对话，并在自动记录方面进行了评估。我们的结果表明，HEAL在PubMedQA中的准确率为78.4\%，优于GPT-4和PMC-LLaMA。它还在生成医疗笔记方面与GPT-4达到了同等水平。值得注意的是，HEAL在识别更多正确的医学概念方面超过了GPT-4和Med-PaLM 2，并在正确性和完整性方面超过了人类记录员和其他可比较的模型的性能。

更新时间: 2024-04-01 22:48:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.09057v2

Functional Encryption in the Bounded Storage Models

Functional encryption is a powerful paradigm for public-key encryption that allows for controlled access to encrypted data. Achieving the ideal simulation based security for this primitive is generally impossible in the plain model, so we investigate possibilities in the bounded quantum storage model (BQSM) and the bounded classical storage model (BCSM), where adversaries are limited with respect to their quantum and classical memories, respectively. The impossibility results on functional encryption do not apply to these settings which allows us to obtain positive outcomes. Firstly, in the BQSM, we construct non-interactive functional encryption satisfying information-theoretic simulation based security with ${q}=O(\sqrt{{s}/{r}})$. Here ${r}$ denotes the number of times that an adversary is restricted to ${s}$--qubits of quantum memory in the protocol and ${q}$ denotes the required quantum memory to run the protocol honestly. We then show that our scheme is optimal by proving that it is impossible to attain information-theoretically security with ${q} < \sqrt{{s}/{r}}$. However, by assuming the existence of one-way functions, we achieve (interactive) functional encryption with ${q}=0$ and ${r}=1$. Secondly, in the BCSM, we construct non-interactive functional encryption satisfying information-theoretic subexponential simulation based security assuming the existence of subexponential grey-box obfuscation. We then demonstrate that this assumption is minimal by constructing subexponential grey-box obfuscation from non-interactive functional encryption. We also consider the computational setting, obtaining (interactive) functional encryption satisfying simulation based security assuming grey-box obfuscation and one-way functions.

Updated: 2024-04-01 22:38:16

标题: 有界存储模型中的功能加密

摘要: 功能加密是一种强大的公钥加密范式，允许对加密数据进行受控访问。在普通模型中通常无法实现这种原语的理想模拟安全性，因此我们研究了有限量子存储模型（BQSM）和有限经典存储模型（BCSM）中的可能性，其中对手在量子和经典记忆方面受到限制。对功能加密的不可能性结果不适用于这些设置，这使我们能够获得积极的结果。首先，在BQSM中，我们构建了满足信息论模拟安全性的非交互式功能加密，其中${q}=O(\sqrt{{s}/{r}})$。这里，${r}$表示在协议中对手被限制使用${s}$比特的量子内存的次数，${q}$表示运行协议所需的量子内存。然后，我们通过证明不可能以${q} < \sqrt{{s}/{r}}$获得信息理论安全性来展示我们的方案是最佳的。然而，通过假设单向函数的存在，我们实现了（交互式）功能加密，其中${q}=0$且${r}=1$。其次，在BCSM中，我们构建了满足信息论亚指数模拟安全性的非交互式功能加密，假设亚指数灰盒混淆存在。然后，通过从非交互式功能加密构建亚指数灰盒混淆，我们证明了这一假设是最小的。我们还考虑了计算设置，获得了（交互式）功能加密，假设灰盒混淆和单向函数实现模拟安全性。

更新时间: 2024-04-01 22:38:16

领域: cs.CR

下载: http://arxiv.org/abs/2309.06702v2

Concept-based Analysis of Neural Networks via Vision-Language Models

The analysis of vision-based deep neural networks (DNNs) is highly desirable but it is very challenging due to the difficulty of expressing formal specifications for vision tasks and the lack of efficient verification procedures. In this paper, we propose to leverage emerging multimodal, vision-language, foundation models (VLMs) as a lens through which we can reason about vision models. VLMs have been trained on a large body of images accompanied by their textual description, and are thus implicitly aware of high-level, human-understandable concepts describing the images. We describe a logical specification language $\texttt{Con}_{\texttt{spec}}$ designed to facilitate writing specifications in terms of these concepts. To define and formally check $\texttt{Con}_{\texttt{spec}}$ specifications, we build a map between the internal representations of a given vision model and a VLM, leading to an efficient verification procedure of natural-language properties for vision models. We demonstrate our techniques on a ResNet-based classifier trained on the RIVAL-10 dataset using CLIP as the multimodal model.

Updated: 2024-04-01 22:34:37

标题: 基于视觉-语言模型的神经网络概念分析

摘要: 对基于视觉的深度神经网络（DNNs）进行分析是非常可取的，但由于难以表达视觉任务的正式规范以及缺乏高效的验证程序，这是非常具有挑战性的。在本文中，我们提出利用新兴的多模态、视觉-语言基础模型（VLMs）作为一种透视视觉模型的方式。VLMs已经在伴随其文本描述的大量图像上进行了训练，因此隐含地了解描述这些图像的高层次、人类可理解的概念。我们描述了一种逻辑规范语言Con_spec，旨在便于用这些概念来撰写规范。为了定义和正式检查Con_spec规范，我们建立了给定视觉模型和VLM之间的内部表示之间的映射，从而实现了对视觉模型的自然语言属性进行高效验证的过程。我们在一个基于ResNet的分类器上展示了我们的技术，该分类器使用CLIP作为多模态模型，并在RIVAL-10数据集上进行了训练。

更新时间: 2024-04-01 22:34:37

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.LO

下载: http://arxiv.org/abs/2403.19837v2

Solving Attention Kernel Regression Problem via Pre-conditioner

The attention mechanism is the key to large language models, and the attention matrix serves as an algorithmic and computational bottleneck for such a scheme. In this paper, we define two problems, motivated by designing fast algorithms for proxy of attention matrix and solving regressions against them. Given an input matrix $A\in \mathbb{R}^{n\times d}$ with $n\gg d$ and a response vector $b$, we first consider the matrix exponential of the matrix $A^\top A$ as a proxy, and we in turn design algorithms for two types of regression problems: $\min_{x\in \mathbb{R}^d}\|(A^\top A)^jx-b\|_2$ and $\min_{x\in \mathbb{R}^d}\|A(A^\top A)^jx-b\|_2$ for any positive integer $j$. Studying algorithms for these regressions is essential, as matrix exponential can be approximated term-by-term via these smaller problems. The second proxy is applying exponential entrywise to the Gram matrix, denoted by $\exp(AA^\top)$ and solving the regression $\min_{x\in \mathbb{R}^n}\|\exp(AA^\top)x-b \|_2$. We call this problem the attention kernel regression problem, as the matrix $\exp(AA^\top)$ could be viewed as a kernel function with respect to $A$. We design fast algorithms for these regression problems, based on sketching and preconditioning. We hope these efforts will provide an alternative perspective of studying efficient approximation of attention matrices.

Updated: 2024-04-01 22:30:22

标题: 通过预条件器解决注意力核回归问题

摘要: 关注机制是大型语言模型的关键，而关注矩阵则作为这种方案的算法和计算瓶颈。在本文中，我们定义了两个问题，旨在设计关注矩阵的快速算法和解决回归问题。给定一个输入矩阵$A\in \mathbb{R}^{n\times d}$，其中$n\gg d$，以及一个响应向量$b$，我们首先将矩阵$A^\top A$的矩阵指数作为代理，然后设计两种类型的回归问题的算法：$\min_{x\in \mathbb{R}^d}\|(A^\top A)^jx-b\|_2$和$\min_{x\in \mathbb{R}^d}\|A(A^\top A)^jx-b\|_2$，其中$j$为任意正整数。研究这些回归的算法是必不可少的，因为矩阵指数可以通过这些较小的问题逐项逼近。第二个代理是将指数逐个元素地应用于Gram矩阵，记为$\exp(AA^\top)$，并解决回归问题$\min_{x\in \mathbb{R}^n}\|\exp(AA^\top)x-b\|_2$。我们将这个问题称为关注核回归问题，因为矩阵$\exp(AA^\top)$可以看作是关于$A$的核函数。基于草图和预处理，我们为这些回归问题设计了快速算法。我们希望这些努力将为研究关注矩阵的有效逼近提供另一种视角。

更新时间: 2024-04-01 22:30:22

领域: cs.LG

下载: http://arxiv.org/abs/2308.14304v2

Can Biases in ImageNet Models Explain Generalization?

The robust generalization of models to rare, in-distribution (ID) samples drawn from the long tail of the training distribution and to out-of-training-distribution (OOD) samples is one of the major challenges of current deep learning methods. For image classification, this manifests in the existence of adversarial attacks, the performance drops on distorted images, and a lack of generalization to concepts such as sketches. The current understanding of generalization in neural networks is very limited, but some biases that differentiate models from human vision have been identified and might be causing these limitations. Consequently, several attempts with varying success have been made to reduce these biases during training to improve generalization. We take a step back and sanity-check these attempts. Fixing the architecture to the well-established ResNet-50, we perform a large-scale study on 48 ImageNet models obtained via different training methods to understand how and if these biases - including shape bias, spectral biases, and critical bands - interact with generalization. Our extensive study results reveal that contrary to previous findings, these biases are insufficient to accurately predict the generalization of a model holistically. We provide access to all checkpoints and evaluation code at https://github.com/paulgavrikov/biases_vs_generalization

Updated: 2024-04-01 22:25:48

标题: ImageNet模型中的偏见是否能解释泛化能力？

摘要: 模型对稀有、在训练分布的长尾中抽取的样本以及训练之外的样本的健壮泛化是当前深度学习方法的主要挑战之一。对于图像分类而言，这体现在对扭曲图像的性能下降、对概念（如素描）的泛化不足等方面。目前对神经网络泛化的理解非常有限，但已经确认了一些将模型与人类视觉区分开的偏见，这些偏见可能导致这些限制。因此，已经尝试了各种成功程度不同的方法来减少这些偏见，以改善泛化。我们退一步，对这些尝试进行理性检查。通过将架构固定为成熟的ResNet-50，我们对通过不同训练方法获得的48个ImageNet模型进行了大规模研究，以了解这些偏见（包括形状偏见、光谱偏见和临界频带）如何以及是否与泛化相互作用。我们广泛的研究结果显示，与先前的发现相反，这些偏见不足以准确预测模型的整体泛化。我们在https://github.com/paulgavrikov/biases_vs_generalization上提供所有检查点和评估代码的访问。

更新时间: 2024-04-01 22:25:48

领域: cs.CV,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.01509v1

Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time

Given a matrix $M\in \mathbb{R}^{m\times n}$, the low rank matrix completion problem asks us to find a rank-$k$ approximation of $M$ as $UV^\top$ for $U\in \mathbb{R}^{m\times k}$ and $V\in \mathbb{R}^{n\times k}$ by only observing a few entries specified by a set of entries $\Omega\subseteq [m]\times [n]$. In particular, we examine an approach that is widely used in practice -- the alternating minimization framework. Jain, Netrapalli, and Sanghavi [JNS13] showed that if $M$ has incoherent rows and columns, then alternating minimization provably recovers the matrix $M$ by observing a nearly linear in $n$ number of entries. While the sample complexity has been subsequently improved [GLZ17], alternating minimization steps are required to be computed exactly. This hinders the development of more efficient algorithms and fails to depict the practical implementation of alternating minimization, where the updates are usually performed approximately in favor of efficiency. In this paper, we take a major step towards a more efficient and error-robust alternating minimization framework. To this end, we develop an analytical framework for alternating minimization that can tolerate a moderate amount of errors caused by approximate updates. Moreover, our algorithm runs in time $\widetilde O(|\Omega| k)$, which is nearly linear in the time to verify the solution while preserving the sample complexity. This improves upon all prior known alternating minimization approaches which require $\widetilde O(|\Omega| k^2)$ time.

Updated: 2024-04-01 22:17:22

标题: Nearly Linear Time下的稳健交替极小化实现低秩矩阵完成

摘要: 给定一个矩阵$M\in \mathbb{R}^{m\times n}$，低秩矩阵完成问题要求我们找到一个秩为$k$的$M$的近似$UV^\top$，其中$U\in \mathbb{R}^{m\times k}$，$V\in \mathbb{R}^{n\times k}$，只观察由一组条目$\Omega\subseteq [m]\times [n]$指定的少量条目。具体而言，我们研究了一种在实践中广泛使用的方法--交替最小化框架。Jain, Netrapalli和Sanghavi [JNS13]表明，如果$M$的行和列是不相关的，那么通过观察接近线性$n$个条目，交替最小化可以可靠地恢复矩阵$M$。虽然后来的样本复杂度得到了改进[GLZ17]，但需要精确计算交替最小化步骤。这阻碍了更有效算法的发展，并未展示交替最小化的实际实现，其中更新通常是近似执行以提高效率。在本文中，我们迈出了向更高效和错误鲁棒的交替最小化框架迈出了重要一步。为此，我们开发了一个可以容忍由近似更新引起的适度错误的交替最小化的分析框架。此外，我们的算法运行时间为$\widetilde O(|\Omega| k)$，几乎与验证解决方案的时间成线性关系，同时保持样本复杂度。这改进了所有先前已知的需要$\widetilde O(|\Omega| k^2)$时间的交替最小化方法。

更新时间: 2024-04-01 22:17:22

领域: cs.LG,cs.DS,math.OC,stat.ML

下载: http://arxiv.org/abs/2302.11068v3

Some Orders Are Important: Partially Preserving Orders in Top-Quality Planning

The ability to generate multiple plans is central to using planning in real-life applications. Top-quality planners generate sets of such top-cost plans, allowing flexibility in determining equivalent ones. In terms of the order between actions in a plan, the literature only considers two extremes -- either all orders are important, making each plan unique, or all orders are unimportant, treating two plans differing only in the order of actions as equivalent. To allow flexibility in selecting important orders, we propose specifying a subset of actions the orders between which are important, interpolating between the top-quality and unordered top-quality planning problems. We explore the ways of adapting partial order reduction search pruning techniques to address this new computational problem and present experimental evaluations demonstrating the benefits of exploiting such techniques in this setting.

Updated: 2024-04-01 22:10:12

标题: 一些顺序是重要的：在高质量规划中部分保留顺序

摘要: 生成多个计划的能力对于在现实生活中应用规划至关重要。高质量的规划器生成一组最优成本的计划，允许灵活地确定等效计划。在计划中动作的顺序方面，文献只考虑两个极端情况--要么所有顺序都重要，使得每个计划都是独一无二的，要么所有顺序都不重要，将动作顺序不同的两个计划视为等效。为了允许在选择重要顺序时灵活性，我们提议指定一组动作，这些动作之间的顺序是重要的，从而在高质量和无序高质量规划问题之间进行插值。我们探讨了如何调整部分顺序减少搜索修剪技术来解决这个新的计算问题，并提出实验评估，展示了在这种情境中利用这些技术的好处。

更新时间: 2024-04-01 22:10:12

领域: cs.AI

下载: http://arxiv.org/abs/2404.01503v1

Pathspace Kalman Filters with Dynamic Process Uncertainty for Analyzing Time-course Data

Kalman Filter (KF) is an optimal linear state prediction algorithm, with applications in fields as diverse as engineering, economics, robotics, and space exploration. Here, we develop an extension of the KF, called a Pathspace Kalman Filter (PKF) which allows us to a) dynamically track the uncertainties associated with the underlying data and prior knowledge, and b) take as input an entire trajectory and an underlying mechanistic model, and using a Bayesian methodology quantify the different sources of uncertainty. An application of this algorithm is to automatically detect temporal windows where the internal mechanistic model deviates from the data in a time-dependent manner. First, we provide theorems characterizing the convergence of the PKF algorithm. Then, we numerically demonstrate that the PKF outperforms conventional KF methods on a synthetic dataset lowering the mean-squared-error by several orders of magnitude. Finally, we apply this method to biological time-course dataset involving over 1.8 million gene expression measurements.

Updated: 2024-04-01 21:58:10

标题: 带动态过程不确定性的路径空间卡尔曼滤波器用于分析时间序列数据

摘要: 卡尔曼滤波器（KF）是一种最优的线性状态预测算法，在工程、经济学、机器人学和太空探索等领域有应用。在这里，我们开发了KF的扩展，称为路径空间卡尔曼滤波器（PKF），它允许我们动态跟踪与基础数据和先验知识相关的不确定性，并且接受整个轨迹和基础机械模型作为输入，并使用贝叶斯方法量化不同的不确定性源。该算法的一个应用是自动检测内部机械模型与数据在时间依赖方式上偏离的时间窗口。首先，我们提供刻画PKF算法收敛性的定理。然后，我们通过数值方法证明，在合成数据集上，PKF优于传统KF方法，将均方误差降低了几个数量级。最后，我们将这种方法应用于涉及超过180万基因表达测量的生物时间序列数据集。

更新时间: 2024-04-01 21:58:10

领域: stat.ML,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2402.04498v2

MosquitoFusion: A Multiclass Dataset for Real-Time Detection of Mosquitoes, Swarms, and Breeding Sites Using Deep Learning

In this paper, we present an integrated approach to real-time mosquito detection using our multiclass dataset (MosquitoFusion) containing 1204 diverse images and leverage cutting-edge technologies, specifically computer vision, to automate the identification of Mosquitoes, Swarms, and Breeding Sites. The pre-trained YOLOv8 model, trained on this dataset, achieved a mean Average Precision (mAP@50) of 57.1%, with precision at 73.4% and recall at 50.5%. The integration of Geographic Information Systems (GIS) further enriches the depth of our analysis, providing valuable insights into spatial patterns. The dataset and code are available at https://github.com/faiyazabdullah/MosquitoFusion.

Updated: 2024-04-01 21:49:05

标题: 蚊子融合：用深度学习实时检测蚊子、蚊群和繁殖地的多类数据集

摘要: 在这篇论文中，我们提出了一种综合方法，利用我们的多类数据集（MosquitoFusion）进行实时蚊子检测，数据集包含1204张不同的图像，并利用尖端技术，特别是计算机视觉，自动识别蚊子、蚊群和繁殖地点。在这个数据集上训练的预训练YOLOv8模型实现了一个平均精度（mAP@50）为57.1%，精度为73.4%，召回率为50.5%。地理信息系统（GIS）的整合进一步丰富了我们分析的深度，提供了有关空间模式的宝贵见解。数据集和代码可在https://github.com/faiyazabdullah/MosquitoFusion 获取。

更新时间: 2024-04-01 21:49:05

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01501v1

Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models

Distributed maximization of a submodular function in the MapReduce (MR) model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property - which had previously only been known to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive (highly parallelizable) algorithms satisfy the consistency property required to work in the MR setting, which yields practical, parallelizable and distributed algorithms. Separately, we develop the first distributed algorithm with linear query complexity for this problem. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds.

Updated: 2024-04-01 21:47:26

标题: 可扩展的分布式算法用于在MapReduce和自适应复杂性模型中进行受大小限制的子模量最大化

摘要: 在MapReduce（MR）模型中分布式最大化次模函数引起了广泛关注，最终形成了两种框架，允许集中式算法在MR环境中运行而不损失近似度，只要集中式算法满足特定的一致性属性 - 这些属性先前仅被已知标准贪婪和连续贪婪算法满足。另一方面，一系列研究已经在适应性复杂性模型中研究了次模最大化的并行性，其中每个线程可以访问整个基础集。对于单调且次模函数的大小受限最大化，我们展示了几种次线性适应（高度可并行化）算法满足在MR环境中工作所需的一致性属性，从而提供实用的、可并行化的和分布式算法。另外，我们开发了这一问题的首个具有线性查询复杂度的分布式算法。最后，我们提供了一种方法，通过额外的MR轮次增加MR算法的最大基数约束。

更新时间: 2024-04-01 21:47:26

领域: cs.DS,cs.DC,cs.LG

下载: http://arxiv.org/abs/2206.09563v5

CP-PINNs: Data-Driven Changepoints Detection in PDEs Using Online Optimized Physics-Informed Neural Networks

We investigate the inverse problem for Partial Differential Equations (PDEs) in scenarios where the parameters of the given PDE dynamics may exhibit changepoints at random time. We employ Physics-Informed Neural Networks (PINNs) - universal approximators capable of estimating the solution of any physical law described by a system of PDEs, which serves as a regularization during neural network training, restricting the space of admissible solutions and enhancing function approximation accuracy. We demonstrate that when the system exhibits sudden changes in the PDE dynamics, this regularization is either insufficient to accurately estimate the true dynamics, or it may result in model miscalibration and failure. Consequently, we propose a PINNs extension using a Total-Variation penalty, which allows to accommodate multiple changepoints in the PDE dynamics and significantly improves function approximation. These changepoints can occur at random locations over time and are estimated concurrently with the solutions. Additionally, we introduce an online learning method for re-weighting loss function terms dynamically. Through empirical analysis using examples of various equations with parameter changes, we showcase the advantages of our proposed model. In the absence of changepoints, the model reverts to the original PINNs model. However, when changepoints are present, our approach yields superior parameter estimation, improved model fitting, and reduced training error compared to the original PINNs model.

Updated: 2024-04-01 21:36:20

标题: CP-PINNs：使用在线优化的物理信息神经网络在PDE中进行数据驱动的变点检测

摘要: 我们研究了在给定的偏微分方程（PDEs）动力学参数可能在随机时间出现变化点的情况下的反问题。我们采用了物理信息神经网络（PINNs）-一种能够估计由一组PDEs描述的任何物理定律解的通用逼近器，它在神经网络训练期间作为正则化，限制了可接受解的空间并增强了函数逼近的准确性。我们证明，当系统在PDE动态中出现突然变化时，这种正则化要么无法准确估计真实动态，要么可能导致模型误校准和失败。因此，我们提出了使用总变差惩罚的PINNs扩展，该方法可以容纳PDE动态中的多个变化点，并显著改善函数逼近。这些变化点可能随时间在随机位置发生，并与解同时估计。此外，我们引入了一种在线学习方法，用于动态重新加权损失函数项。通过使用具有参数变化的各种方程的示例进行经验分析，我们展示了我们提出的模型的优势。在没有变化点的情况下，该模型将恢复到原始的PINNs模型。然而，当存在变化点时，我们的方法相较于原始的PINNs模型，可以产生更优的参数估计、改善的模型拟合和减少的训练误差。

更新时间: 2024-04-01 21:36:20

领域: stat.ML,cs.AI,cs.LG,cs.NA,math.DS,math.NA

下载: http://arxiv.org/abs/2208.08626v3

Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Despite extensive diagnostics and debugging by developers, AI systems sometimes exhibit harmful unintended behaviors. Finding and fixing these is challenging because the attack surface is so large -- it is not tractable to exhaustively search for inputs that may elicit harmful behaviors. Red-teaming and adversarial training (AT) are commonly used to improve robustness, however, they empirically struggle to fix failure modes that differ from the attacks used during training. In this work, we utilize latent adversarial training (LAT) to defend against vulnerabilities without generating inputs that elicit them. LAT leverages the compressed, abstract, and structured latent representations of concepts that the network actually uses for prediction. We use it to remove trojans and defend against held-out classes of adversarial attacks. We show in image classification, text classification, and text generation tasks that LAT usually improves both robustness to novel attacks and performance on clean data relative to AT. This suggests that LAT can be a promising tool for defending against failure modes that are not explicitly identified by developers.

Updated: 2024-04-01 21:32:18

标题: 使用潜在对抗训练应对意外故障模式

摘要: 尽管开发人员进行了广泛的诊断和调试，人工智能系统有时会表现出有害的意外行为。发现并修复这些问题具有挑战性，因为攻击面非常广泛——无法穷尽地搜索可能引发有害行为的输入。红队测试和对抗训练（AT）通常用于提高鲁棒性，然而，它们在实际中往往难以修复与训练过程中使用的攻击不同的故障模式。在这项工作中，我们利用潜在对抗训练（LAT）来防御漏洞，而不生成可能引发这些漏洞的输入。LAT利用网络实际用于预测的压缩、抽象和结构化潜在表示来进行防御。我们使用它来清除特洛伊木马，并防御针对未知类别的对抗攻击。我们在图像分类、文本分类和文本生成任务中显示，LAT通常相对于AT在新型攻击的鲁棒性和对干净数据的性能上都有所提升。这表明LAT可以成为一种有前途的工具，用于防御开发人员未明确识别的故障模式。

更新时间: 2024-04-01 21:32:18

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.05030v3

Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge

A common practice in deep learning consists of training large neural networks on massive datasets to perform accurately for different domains and tasks. While this methodology may work well in numerous application areas, it only applies across modalities due to a larger distribution shift in data captured using different sensors. This paper focuses on the problem of adapting a large object detection model to one or multiple modalities while being efficient. To do so, we propose ModTr as an alternative to the common approach of fine-tuning large models. ModTr consists of adapting the input with a small transformation network trained to minimize the detection loss directly. The original model can therefore work on the translated inputs without any further change or fine-tuning to its parameters. Experimental results on translating from IR to RGB images on two well-known datasets show that this simple ModTr approach provides detectors that can perform comparably or better than the standard fine-tuning without forgetting the original knowledge. This opens the doors to a more flexible and efficient service-based detection pipeline in which, instead of using a different detector for each modality, a unique and unaltered server is constantly running, where multiple modalities with the corresponding translations can query it. Code: https://github.com/heitorrapela/ModTr.

Updated: 2024-04-01 21:28:50

标题: 目标检测适应中的模态转换，无需忘记先前知识

摘要: 深度学习中的一个常见做法是在大规模数据集上训练大型神经网络，以在不同的领域和任务中进行准确的表现。虽然这种方法在许多应用领域可能表现良好，但由于使用不同传感器捕获的数据存在更大的分布偏移，因此仅适用于跨模态。本文关注的问题是将大型目标检测模型调整到一个或多个模态而保持高效。为此，我们提出了ModTr作为对调整大型模型的常见方法——微调的替代方法。ModTr包括使用一个小型变换网络对输入进行调整，该网络经过训练以直接最小化检测损失。因此，原始模型可以在经过转换的输入上运行，而无需进一步更改或对其参数进行微调。在两个知名数据集上从红外到RGB图像的转换实验结果表明，这种简单的ModTr方法提供的检测器可以表现出与标准微调相当或更好的性能，同时不会忘记原始知识。这为一种更灵活和高效的基于服务的检测流水线打开了大门，其中不再需要为每种模态使用不同的检测器，而是一个唯一且未改动的服务器不断运行，多种模态及其相应的转换可以查询它。代码：https://github.com/heitorrapela/ModTr。

更新时间: 2024-04-01 21:28:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01492v1

Explainable AI Integrated Feature Engineering for Wildfire Prediction

Wildfires present intricate challenges for prediction, necessitating the use of sophisticated machine learning techniques for effective modeling\cite{jain2020review}. In our research, we conducted a thorough assessment of various machine learning algorithms for both classification and regression tasks relevant to predicting wildfires. We found that for classifying different types or stages of wildfires, the XGBoost model outperformed others in terms of accuracy and robustness. Meanwhile, the Random Forest regression model showed superior results in predicting the extent of wildfire-affected areas, excelling in both prediction error and explained variance. Additionally, we developed a hybrid neural network model that integrates numerical data and image information for simultaneous classification and regression. To gain deeper insights into the decision-making processes of these models and identify key contributing features, we utilized eXplainable Artificial Intelligence (XAI) techniques, including TreeSHAP, LIME, Partial Dependence Plots (PDP), and Gradient-weighted Class Activation Mapping (Grad-CAM). These interpretability tools shed light on the significance and interplay of various features, highlighting the complex factors influencing wildfire predictions. Our study not only demonstrates the effectiveness of specific machine learning models in wildfire-related tasks but also underscores the critical role of model transparency and interpretability in environmental science applications.

Updated: 2024-04-01 21:12:44

标题: 可解释人工智能集成特征工程用于森林火灾预测

摘要: 野火对于预测提出了复杂的挑战，需要使用先进的机器学习技术进行有效建模。在我们的研究中，我们对各种机器学习算法进行了全面评估，用于预测野火相关的分类和回归任务。我们发现，对于分类不同类型或阶段的野火，XGBoost模型在准确性和稳健性方面表现优于其他模型。与此同时，随机森林回归模型在预测受野火影响区域范围方面表现出色，在预测误差和解释方差方面都表现优异。此外，我们开发了一种混合神经网络模型，将数值数据和图像信息整合在一起，用于同时进行分类和回归。为了更深入地了解这些模型的决策过程并确定关键贡献特征，我们利用了可解释人工智能（XAI）技术，包括TreeSHAP、LIME、部分依赖图（PDP）和梯度加权类激活映射（Grad-CAM）。这些解释性工具揭示了各种特征的重要性和相互作用，突显了影响野火预测的复杂因素。我们的研究不仅展示了特定机器学习模型在与野火相关任务中的有效性，还强调了模型透明度和可解释性在环境科学应用中的关键作用。

更新时间: 2024-04-01 21:12:44

领域: cs.LG

下载: http://arxiv.org/abs/2404.01487v1

QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving

A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps have been utilized to understand free-space. However, predicting a grid for the entire scene is wasteful since only certain spatio-temporal regions are reachable and relevant to the self-driving vehicle. We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive, then predict, and finally plan. Instead, we shift the paradigm to have the planner query occupancy at relevant spatio-temporal points, restricting the computation to those regions of interest. Exploiting this representation, we evaluate candidate trajectories around key factors such as collision avoidance, comfort, and progress for safety and interpretability. Our approach achieves better highway driving quality than the state-of-the-art in high-fidelity closed-loop simulations.

Updated: 2024-04-01 21:11:43

标题: QuAD：自主驾驶的基于查询的可解释神经运动规划

摘要: 自动驾驶车辆必须了解其环境以确定适当的行动。传统的自主系统依赖于物体检测来找到场景中的代理人。然而，物体检测假设一组离散的物体，并丢失有关不确定性的信息，因此在预测这些代理人的未来行为时，任何错误都会累积。相反，密集的占用网格地图已被用于理解自由空间。然而，为整个场景预测网格是浪费的，因为只有某些时空区域是可达的并且与自动驾驶车辆相关。我们提出了一个统一的、可解释的、高效的自主框架，摆脱了首先感知、然后预测、最后规划的级联模块。相反，我们改变了范式，让规划者在相关的时空点查询占用情况，将计算限制在感兴趣的区域。利用这种表示，我们评估候选轨迹围绕关键因素如避碰、舒适和进展的安全性和可解释性。我们的方法在高保真度闭环模拟中实现了比最先进技术更好的高速公路驾驶质量。

更新时间: 2024-04-01 21:11:43

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01486v1

Long-form factuality in large language models

Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality.

Updated: 2024-04-01 21:02:37

标题: 大型语言模型中的长篇事实性

摘要: 大型语言模型（LLMs）在回应开放性话题上的事实搜索提示时，往往会生成包含事实错误的内容。为了在开放领域评估模型的长篇事实性，我们首先使用GPT-4生成LongFact，这是一个包含数千个涵盖38个主题的问题的提示集。然后，我们提出LLM代理可以通过一种我们称之为搜索增强事实性评估器（SAFE）的方法用作长篇事实性的自动评估工具。SAFE利用LLM将长篇回应分解为一组单独的事实，并通过一个多步推理过程评估每个事实的准确性，包括向谷歌搜索发送搜索查询并确定一个事实是否得到搜索结果的支持。此外，我们还提出将F1分数扩展为长篇事实性的综合度量标准。为此，我们平衡了回应中支持的事实的百分比（精确度）和相对于代表用户首选回应长度的超参数提供的事实的百分比（召回率）。在经验上，我们证明LLM代理可以胜过众包的人类标注者-在约16k个独立事实集上，SAFE与众包的人类标注者达成一致的概率为72％，在100个不一致案例的随机子集上，SAFE的胜率为76％。与此同时，SAFE比人类标注者便宜20倍以上。我们还在LongFact上对十三种语言模型进行了基准测试，包括四个模型系列（Gemini，GPT，Claude和PaLM-2），发现较大的语言模型通常会实现更好的长篇事实性。LongFact，SAFE以及所有实验代码均可在https://github.com/google-deepmind/long-form-factuality找到。

更新时间: 2024-04-01 21:02:37

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.18802v2

TraveLER: A Multi-LMM Agent Framework for Video Question-Answering

Recently, Large Multimodal Models (LMMs) have made significant progress in video question-answering using a frame-wise approach by leveraging large-scale, image-based pretraining in a zero-shot manner. While image-based methods for videos have shown impressive performance, a current limitation is that they often overlook how key timestamps are selected and cannot adjust when incorrect timestamps are identified. Moreover, they are unable to extract details relevant to the question, instead providing general descriptions of the frame. To overcome this, we design a multi-LMM agent framework that travels along the video, iteratively collecting relevant information from keyframes through interactive question-asking until there is sufficient information to answer the question. Specifically, we propose TraveLER, a model that can create a plan to "Traverse" through the video, ask questions about individual frames to "Locate" and store key information, and then "Evaluate" if there is enough information to answer the question. Finally, if there is not enough information, our method is able to "Replan" based on its collected knowledge. Through extensive experiments, we find that the proposed TraveLER approach improves performance on several video question-answering benchmarks, such as NExT-QA, STAR, and Perception Test, without the need to fine-tune on specific datasets.

Updated: 2024-04-01 20:58:24

标题: TraveLER：用于视频问答的多层次多智能体框架

摘要: 最近，大型多模态模型（LMMs）在视频问答方面取得了显著进展，通过利用大规模基于图像的预训练以零射击方式进行逐帧处理。虽然基于图像的视频方法表现出令人印象深刻的性能，但目前的一个限制是它们通常忽视关键时间戳的选择方式，并且在识别错误时间戳时无法进行调整。此外，它们无法提取与问题相关的细节，而是提供框架的一般描述。为了克服这一问题，我们设计了一个多LMM代理框架，沿着视频旅行，通过互动式提问逐步收集关键帧的相关信息，直到有足够的信息来回答问题。具体而言，我们提出了TraveLER，这是一个可以创建通过视频的“遍历”计划，对个别帧提出问题以“定位”和存储关键信息，然后“评估”是否有足够的信息来回答问题的模型。最后，如果没有足够的信息，我们的方法能够基于其收集的知识进行“重新规划”。通过大量实验，我们发现所提出的TraveLER方法在多个视频问答基准测试中提高了性能，如NExT-QA、STAR和Perception Test，而无需在特定数据集上进行微调。

更新时间: 2024-04-01 20:58:24

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.01476v1

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features often lack the spatial resolution to directly perform dense prediction tasks like segmentation and depth prediction because models aggressively pool information over large areas. In this work, we introduce FeatUp, a task- and model-agnostic framework to restore lost spatial information in deep features. We introduce two variants of FeatUp: one that guides features with high-resolution signal in a single forward pass, and one that fits an implicit model to a single image to reconstruct features at any resolution. Both approaches use a multi-view consistency loss with deep analogies to NeRFs. Our features retain their original semantics and can be swapped into existing applications to yield resolution and performance gains even without re-training. We show that FeatUp significantly outperforms other feature upsampling and image super-resolution approaches in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation.

Updated: 2024-04-01 20:57:45

标题: FeatUp：一个适用于任何分辨率特征的模型无关框架

摘要: 深度特征是计算机视觉研究的基石，捕捉图像语义并使社区能够在零或少样本情况下解决下游任务。然而，这些特征通常缺乏空间分辨率，无法直接执行像分割和深度预测这样的密集预测任务，因为模型会过于聚合大范围的信息。在这项工作中，我们介绍了一种名为FeatUp的任务和模型不可知框架，用于恢复深度特征中丢失的空间信息。我们介绍了FeatUp的两个变体：一个在单次前向传递中引导具有高分辨率信号的特征，另一个适合单个图像的隐式模型以重建任何分辨率的特征。这两种方法都使用了与NeRFs具有深刻类比的多视图一致性损失。我们的特征保留其原始语义，并可以替换现有应用程序，即使无需重新训练也可以获得分辨率和性能的增益。我们展示了FeatUp在类激活图生成、分割和深度预测的迁移学习，以及用于语义分割的端到端训练中显著优于其他特征上采样和图像超分辨率方法。

更新时间: 2024-04-01 20:57:45

领域: cs.CV,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2403.10516v2

Are large language models superhuman chemists?

Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained. This is relevant for the chemical sciences, which face the problem of small and diverse datasets that are frequently in the form of text. LLMs have shown promise in addressing these issues and are increasingly being harnessed to predict chemical properties, optimize reactions, and even design and conduct experiments autonomously. However, we still have only a very limited systematic understanding of the chemical reasoning capabilities of LLMs, which would be required to improve models and mitigate potential harms. Here, we introduce "ChemBench," an automated framework designed to rigorously evaluate the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of human chemists. We curated more than 7,000 question-answer pairs for a wide array of subfields of the chemical sciences, evaluated leading open and closed-source LLMs, and found that the best models outperformed the best human chemists in our study on average. The models, however, struggle with some chemical reasoning tasks that are easy for human experts and provide overconfident, misleading predictions, such as about chemicals' safety profiles. These findings underscore the dual reality that, although LLMs demonstrate remarkable proficiency in chemical tasks, further research is critical to enhancing their safety and utility in chemical sciences. Our findings also indicate a need for adaptations to chemistry curricula and highlight the importance of continuing to develop evaluation frameworks to improve safe and useful LLMs.

Updated: 2024-04-01 20:56:25

标题: 大型语言模型是超级化学家吗？

摘要: 大型语言模型（LLMs）因其处理人类语言并执行未经明确训练的任务的能力而引起广泛关注。这对化学科学是相关的，因为化学面临的问题是数据集小而多样，通常以文本形式存在。LLMs已经显示出在解决这些问题方面具有潜力，并越来越多地被利用来预测化学性质，优化反应，甚至自主设计和开展实验。然而，我们对LLMs的化学推理能力仍然只有非常有限的系统化理解，这将需要改进模型并减轻潜在危害。在这里，我们介绍了“ChemBench”，这是一个自动化框架，旨在严格评估最先进的LLMs的化学知识和推理能力，以与人类化学家的专业知识进行比较。我们为化学科学各个子领域精心策划了超过7,000个问题-答案对，评估了领先的开源和闭源LLMs，并发现在我们的研究中，最好的模型平均表现优于最优秀的人类化学家。然而，这些模型在一些对人类专家来说很容易的化学推理任务上表现出困难，并提供过于自信、误导性的预测，比如关于化学品的安全概况。这些发现强调了这样一个双重事实，即尽管LLMs在化学任务中表现出出色的能力，但进一步的研究对于增强它们在化学科学中的安全性和实用性至关重要。我们的研究结果还表明有必要对化学课程进行调整，并强调继续发展评估框架以改进安全和有用的LLMs的重要性。

更新时间: 2024-04-01 20:56:25

领域: cs.LG,cond-mat.mtrl-sci,cs.AI,physics.chem-ph

下载: http://arxiv.org/abs/2404.01475v1

The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance

Large Language Models (LLMs) are regularly being used to label data across many domains and for myriad tasks. By simply asking the LLM for an answer, or ``prompting,'' practitioners are able to use LLMs to quickly get a response for an arbitrary task. This prompting is done through a series of decisions by the practitioner, from simple wording of the prompt, to requesting the output in a certain data format, to jailbreaking in the case of prompts that address more sensitive topics. In this work, we ask: do variations in the way a prompt is constructed change the ultimate decision of the LLM? We answer this using a series of prompt variations across a variety of text classification tasks. We find that even the smallest of perturbations, such as adding a space at the end of a prompt, can cause the LLM to change its answer. Further, we find that requesting responses in XML and commonly used jailbreaks can have cataclysmic effects on the data labeled by LLMs.

Updated: 2024-04-01 20:56:11

标题: 改变提示的蝴蝶效应：小变化和越狱如何影响大型语言模型的性能

摘要: 大型语言模型(LLMs)经常被用来在许多领域和各种任务中标记数据。通过简单地向LLM询问答案，或者“提示”，从业者能够快速地获得LLM对于任意任务的响应。这种提示是通过从简单的提示措辞，到请求以特定数据格式输出，再到在涉及更敏感话题的提示中越狱等一系列决策来完成的。在这项工作中，我们问：构建提示的方式的变化是否会改变LLM的最终决定？我们通过在各种文本分类任务中使用一系列提示变体来回答这个问题。我们发现，即使是最小的扰动，比如在提示的末尾加一个空格，也会导致LLM改变答案。此外，我们发现请求以XML格式和常用的越狱方式作出的响应可能对LLMs标记的数据产生灾难性影响。

更新时间: 2024-04-01 20:56:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.03729v3

Exploring Quantum-Enhanced Machine Learning for Computer Vision: Applications and Insights on Noisy Intermediate-Scale Quantum Devices

As medium-scale quantum computers progress, the application of quantum algorithms across diverse fields like simulating physical systems, chemistry, optimization, and cryptography becomes more prevalent. However, these quantum computers, known as Noisy Intermediate Scale Quantum (NISQ), are susceptible to noise, prompting the search for applications that can capitalize on quantum advantage without extensive error correction procedures. Since, Machine Learning (ML), particularly Deep Learning (DL), faces challenges due to resource-intensive training and algorithmic opacity. Therefore, this study explores the intersection of quantum computing and ML, focusing on computer vision tasks. Specifically, it evaluates the effectiveness of hybrid quantum-classical algorithms, such as the data re-uploading scheme and the patch Generative Adversarial Networks (GAN) model, on small-scale quantum devices. Through practical implementation and testing, the study reveals comparable or superior performance of these algorithms compared to classical counterparts, highlighting the potential of leveraging quantum algorithms in ML tasks.

Updated: 2024-04-01 20:55:03

标题: 探索量子增强机器学习在计算机视觉中的应用和见解：嘈杂中等规模量子设备上的应用和洞察

摘要: 随着中等规模的量子计算机的进展，量子算法在模拟物理系统、化学、优化和密码学等各个领域的应用变得更加普遍。然而，这些被称为嘈杂中间规模量子（NISQ）的量子计算机容易受到噪音的影响，促使人们寻求能够利用量子优势而无需进行繁琐的错误校正程序的应用。由于机器学习（ML），尤其是深度学习（DL），面临着资源密集型训练和算法不透明性的挑战。因此，本研究探讨了量子计算与机器学习的交叉点，重点关注计算机视觉任务。具体而言，它评估了混合量子-经典算法（如数据重新上传方案和贴片生成对抗网络（GAN）模型）在小规模量子设备上的有效性。通过实际实施和测试，研究表明这些算法相比于经典对应物具有可比或更优越的性能，凸显了在机器学习任务中利用量子算法的潜力。

更新时间: 2024-04-01 20:55:03

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2404.02177v1

You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

The versatility of Large Language Models (LLMs) on natural language understanding tasks has made them popular for research in social sciences. To properly understand the properties and innate personas of LLMs, researchers have performed studies that involve using prompts in the form of questions that ask LLMs about particular opinions. In this study, we take a cautionary step back and examine whether the current format of prompting LLMs elicits responses in a consistent and robust manner. We first construct a dataset that contains 693 questions encompassing 39 different instruments of persona measurement on 115 persona axes. Additionally, we design a set of prompts containing minor variations and examine LLMs' capabilities to generate answers, as well as prompt variations to examine their consistency with respect to content-level variations such as switching the order of response options or negating the statement. Our experiments on 17 different LLMs reveal that even simple perturbations significantly downgrade a model's question-answering ability, and that most LLMs have low negation consistency. Our results suggest that the currently widespread practice of prompting is insufficient to accurately and reliably capture model perceptions, and we therefore discuss potential alternatives to improve these issues.

Updated: 2024-04-01 20:51:03

标题: 您不需要进行人格测试来知道这些模型是不可靠的：评估大型语言模型在心理测量工具上的可靠性

摘要: 大型语言模型（LLMs）在自然语言理解任务上的多功能性使它们在社会科学研究中备受青睐。为了正确理解LLMs的属性和固有人格，研究人员进行了涉及使用问题形式的提示的研究，询问LLMs关于特定观点的问题。在这项研究中，我们采取了谨慎的步骤，检查当前提示LLMs的格式是否以一致和稳健的方式引起回应。我们首先构建了一个数据集，包含693个问题，涵盖了115个人格轴上39种不同的人格测量工具。此外，我们设计了一组包含微小变化的提示，并检查LLMs生成答案的能力，以及提示变化以检查它们与内容级别变化的一致性，例如交换响应选项的顺序或否定陈述。我们对17种不同LLMs的实验显示，即使简单的扰动也会显著降低模型的问答能力，并且大多数LLMs的否定一致性较低。我们的结果表明，目前广泛实践的提示方法不足以准确和可靠地捕捉模型的感知，因此我们讨论了改进这些问题的潜在替代方案。

更新时间: 2024-04-01 20:51:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.09718v2

TS-CausalNN: Learning Temporal Causal Relations from Non-linear Non-stationary Time Series Data

The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods assume stationarity and linear relations in data, making them infeasible for the task. Further, the recent deep learning-based methods rely on the traditional causal structure learning approaches making them computationally expensive. In this paper, we propose a Time-Series Causal Neural Network (TS-CausalNN) - a deep learning technique to discover contemporaneous and lagged causal relations simultaneously. Our proposed architecture comprises (i) convolutional blocks comprising parallel custom causal layers, (ii) acyclicity constraint, and (iii) optimization techniques using the augmented Lagrangian approach. In addition to the simple parallel design, an advantage of the proposed model is that it naturally handles the non-stationarity and non-linearity of the data. Through experiments on multiple synthetic and real world datasets, we demonstrate the empirical proficiency of our proposed approach as compared to several state-of-the-art methods. The inferred graphs for the real world dataset are in good agreement with the domain understanding.

Updated: 2024-04-01 20:33:29

标题: TS-CausalNN：从非线性非平稳时间序列数据中学习时间因果关系

摘要: 随着时间序列数据在各个领域（包括环境科学、流行病学和经济学）中的日益可用和重要性增加，对于能够识别非平稳、非线性和常常嘈杂的真实世界数据中错综复杂关系的时间序列因果发现方法的需求不断增加。然而，目前大多数时间序列因果发现方法假设数据具有平稳性和线性关系，使得它们无法胜任此任务。此外，最近基于深度学习的方法依赖于传统因果结构学习方法，使其计算成本高昂。在本文中，我们提出了一个时间序列因果神经网络（TS-CausalNN）- 一种深度学习技术，可同时发现同时发生和滞后因果关系。我们提出的架构包括（i）卷积块，其中包括并行的定制因果层，（ii）无环约束，和（iii）使用增广拉格朗日方法的优化技术。除了简单的并行设计外，所提出的模型的一个优点是它自然地处理数据的非平稳性和非线性性。通过对多个合成和真实世界数据集的实验，我们展示了我们提出的方法与几种最先进方法相比的实证能力。对于真实世界数据集的推断图与领域理解高度一致。

更新时间: 2024-04-01 20:33:29

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2404.01466v1

Temporal Cross-Attention for Dynamic Embedding and Tokenization of Multimodal Electronic Health Records

The breadth, scale, and temporal granularity of modern electronic health records (EHR) systems offers great potential for estimating personalized and contextual patient health trajectories using sequential deep learning. However, learning useful representations of EHR data is challenging due to its high dimensionality, sparsity, multimodality, irregular and variable-specific recording frequency, and timestamp duplication when multiple measurements are recorded simultaneously. Although recent efforts to fuse structured EHR and unstructured clinical notes suggest the potential for more accurate prediction of clinical outcomes, less focus has been placed on EHR embedding approaches that directly address temporal EHR challenges by learning time-aware representations from multimodal patient time series. In this paper, we introduce a dynamic embedding and tokenization framework for precise representation of multimodal clinical time series that combines novel methods for encoding time and sequential position with temporal cross-attention. Our embedding and tokenization framework, when integrated into a multitask transformer classifier with sliding window attention, outperformed baseline approaches on the exemplar task of predicting the occurrence of nine postoperative complications of more than 120,000 major inpatient surgeries using multimodal data from three hospitals and two academic health centers in the United States.

Updated: 2024-04-01 20:26:01

标题: 时间交叉注意力用于多模态电子健康记录的动态嵌入和标记化

摘要: 现代电子健康记录（EHR）系统的广度、规模和时间粒度为利用顺序深度学习估计个性化和背景化患者健康轨迹提供了巨大潜力。然而，由于EHR数据具有高维度、稀疏性、多模态性、不规律和变量特定的记录频率以及在同时记录多个测量时的时间戳重复，学习有用的EHR数据表示是具有挑战性的。尽管最近的努力融合结构化EHR和非结构化临床笔记表明更准确地预测临床结果的潜力，但对直接解决时间EHR挑战的EHR嵌入方法的关注较少，这些方法通过从多模态病人时间序列中学习时间感知表示来实现。在本文中，我们介绍了一个动态嵌入和标记化框架，用于精确表示多模态临床时间序列，该框架结合了编码时间和顺序位置的新方法以及时间交叉关注。我们的嵌入和标记化框架，当集成到一个带有滑动窗口关注的多任务变压器分类器中时，在使用来自美国三家医院和两家学术医疗中心的多模态数据预测超过120,000例重症住院手术的九种术后并发症的示例任务时，优于基线方法。

更新时间: 2024-04-01 20:26:01

领域: cs.LG

下载: http://arxiv.org/abs/2403.04012v2

Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images

4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Given these circumstances, not only is data acquisition challenging, but increasing the frame rate for each dataset also proves difficult. To address this challenge, this paper proposes a simple yet effective Unsupervised Volumetric Interpolation framework, UVI-Net. This framework facilitates temporal interpolation without the need for any intermediate frames, distinguishing it from the majority of other existing unsupervised methods. Experiments on benchmark datasets demonstrate significant improvements across diverse evaluation metrics compared to unsupervised and supervised baselines. Remarkably, our approach achieves this superior performance even when trained with a dataset as small as one, highlighting its exceptional robustness and efficiency in scenarios with sparse supervision. This positions UVI-Net as a compelling alternative for 4D medical imaging, particularly in settings where data availability is limited. The source code is available at https://github.com/jungeun122333/UVI-Net.

Updated: 2024-04-01 20:25:04

标题: 无需任何中间帧的数据高效无监督插值方法用于4D医学图像

摘要: 4D医学图像代表具有时间信息的3D图像，在临床实践中对捕捉动态变化和监测长期疾病进展至关重要。然而，获取4D医学图像面临挑战，如辐射暴露和成像持续时间等因素，需要在实现高时间分辨率和最小化不良影响之间取得平衡。在这种情况下，不仅数据获取具有挑战性，而且为每个数据集增加帧速率也很困难。为了解决这一挑战，本文提出了一种简单而有效的无监督体积插值框架UVI-Net。该框架促进了时间插值，无需任何中间帧，与大多数其他现有的无监督方法有所不同。对基准数据集的实验证明，与无监督和监督基线相比，UVI-Net在各种评估指标上都有显著改善。值得注意的是，即使用一个小数据集进行训练，我们的方法也能实现卓越性能，突出显示了它在稀疏监督场景中的异常鲁棒性和效率。这使UVI-Net成为4D医学成像的一个引人注目的替代方案，特别是在数据可用性有限的情况下。源代码可在https://github.com/jungeun122333/UVI-Net 上找到。

更新时间: 2024-04-01 20:25:04

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01464v1

Local and global topological complexity measures OF ReLU neural network functions

We apply a generalized piecewise-linear (PL) version of Morse theory due to Grunert-Kuhnel-Rote to define and study new local and global notions of topological complexity for fully-connected feedforward ReLU neural network functions, F: R^n -> R. Along the way, we show how to construct, for each such F, a canonical polytopal complex K(F) and a deformation retract of the domain onto K(F), yielding a convenient compact model for performing calculations. We also give a construction showing that local complexity can be arbitrarily high.

Updated: 2024-04-01 20:19:49

标题: 基于ReLU神经网络函数的局部和全局拓扑复杂度测量

摘要: 我们应用Grunert-Kuhnel-Rote提出的广义分段线性（PL）Morse理论，来定义和研究全连接前馈ReLU神经网络函数F: R^n -> R 的新的局部和全局拓扑复杂性概念。在此过程中，我们展示了如何为每个这样的F构造一个规范的多面体复合体K(F)，并且将定义域收缩变形到K(F)，从而得到一个便于进行计算的紧致模型。我们还给出一个构造，表明局部复杂性可以任意高。

更新时间: 2024-04-01 20:19:49

领域: math.AT,cs.CG,cs.LG,math.GT,57R70, 57Q99, 52B70, 52C35

下载: http://arxiv.org/abs/2204.06062v2

Benchmarking Model Predictive Control Algorithms in Building Optimization Testing Framework (BOPTEST)

We present a data-driven modeling and control framework for physics-based building emulators. Our approach consists of: (a) Offline training of differentiable surrogate models that accelerate model evaluations, provide cost-effective gradients, and maintain good predictive accuracy for the receding horizon in Model Predictive Control (MPC), and (b) Formulating and solving nonlinear building HVAC MPC problems. We extensively evaluate the modeling and control performance using multiple surrogate models and optimization frameworks across various test cases available in the Building Optimization Testing Framework (BOPTEST). Our framework is compatible with other modeling techniques and can be customized with different control formulations, making it adaptable and future-proof for test cases currently under development for BOPTEST. This modularity provides a path towards prototyping predictive controllers in large buildings, ensuring scalability and robustness in real-world applications.

Updated: 2024-04-01 20:18:04

标题: 在建筑优化测试框架（BOPTEST）中对模型预测控制算法进行基准测试

摘要: 我们提出了一个基于数据驱动的建筑仿真模型的建模和控制框架。我们的方法包括：(a) 离线训练可微分替代模型，加速模型评估，提供高效的梯度，并在模型预测控制(MPC)中保持良好的预测准确性，以及 (b) 制定和解决非线性建筑暖通空调(MPC)问题。我们通过在建筑优化测试框架(BOPTEST)中的各种测试案例上使用多个替代模型和优化框架，广泛评估了建模和控制性能。我们的框架与其他建模技术兼容，并可以根据不同的控制形式定制，使其适应正在为BOPTEST开发的测试案例，并具有未来可扩展性。这种模块化为在大型建筑中原型化预测控制器提供了路径，确保在实际应用中的可扩展性和稳健性。

更新时间: 2024-04-01 20:18:04

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2301.13447v2

OpenChemIE: An Information Extraction Toolkit For Chemistry Literature

Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level. OpenChemIE approaches the problem in two steps: extracting relevant information from individual modalities and then integrating the results to obtain a final list of reactions. For the first step, we employ specialized neural models that each address a specific task for chemistry information extraction, such as parsing molecules or reactions from text or figures. We then integrate the information from these modules using chemistry-informed algorithms, allowing for the extraction of fine-grained reaction data from reaction condition and substrate scope investigations. Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69.5%. Additionally, the reaction extraction results of \ours attain an accuracy score of 64.3% when directly compared against the Reaxys chemical database. We provide OpenChemIE freely to the public as an open-source package, as well as through a web interface.

Updated: 2024-04-01 20:16:21

标题: OpenChemIE：化学文献信息提取工具包

摘要: 化学文献中的信息提取对于构建基于数据驱动化学的最新反应数据库至关重要。完整的提取需要结合文本、表格和图表中的信息，而以往的工作主要探讨了从单一模态中提取反应的方法。在本文中，我们提出了OpenChemIE来解决这一复杂挑战，实现对文档级别的反应数据提取。OpenChemIE分为两个步骤：从各个模态中提取相关信息，然后整合结果以获取最终的反应列表。在第一步中，我们使用专门的神经模型，每个模型都处理化学信息提取的特定任务，如从文本或图表中解析分子或反应。然后，我们使用化学信息算法集成这些模块的信息，从而实现从反应条件和底物范围研究中提取细粒度反应数据。我们的机器学习模型在单独评估时达到了最先进的性能，并且我们精心注释了一个具有R基团的反应方案的挑战性数据集，以评估我们的整个流程，获得了69.5%的F1分数。此外，与Reaxys化学数据库直接比较，我们的反应提取结果在准确度得分上达到了64.3%。我们免费向公众提供OpenChemIE作为一个开源软件包，并通过网络界面提供。

更新时间: 2024-04-01 20:16:21

领域: cs.LG,cs.CL,cs.IR

下载: http://arxiv.org/abs/2404.01462v1

Game-Theoretic Deep Reinforcement Learning to Minimize Carbon Emissions and Energy Costs for AI Inference Workloads in Geo-Distributed Data Centers

Data centers are increasingly using more energy due to the rise in Artificial Intelligence (AI) workloads, which negatively impacts the environment and raises operational costs. Reducing operating expenses and carbon emissions while maintaining performance in data centers is a challenging problem. This work introduces a unique approach combining Game Theory (GT) and Deep Reinforcement Learning (DRL) for optimizing the distribution of AI inference workloads in geo-distributed data centers to reduce carbon emissions and cloud operating (energy + data transfer) costs. The proposed technique integrates the principles of non-cooperative Game Theory into a DRL framework, enabling data centers to make intelligent decisions regarding workload allocation while considering the heterogeneity of hardware resources, the dynamic nature of electricity prices, inter-data center data transfer costs, and carbon footprints. We conducted extensive experiments comparing our game-theoretic DRL (GT-DRL) approach with current DRL-based and other optimization techniques. The results demonstrate that our strategy outperforms the state-of-the-art in reducing carbon emissions and minimizing cloud operating costs without compromising computational performance. This work has significant implications for achieving sustainability and cost-efficiency in data centers handling AI inference workloads across diverse geographic locations.

Updated: 2024-04-01 20:13:28

标题: 博弈论深度强化学习在地理分布数据中心中最小化碳排放和能源成本的应用于人工智能推理工作负载

摘要: 数据中心由于人工智能（AI）工作负载的增加而越来越多地使用更多能源，这对环境产生了负面影响，并提高了运营成本。在数据中心中减少运营费用和碳排放，同时保持性能是一个具有挑战性的问题。这项工作引入了一种独特的方法，将博弈论（GT）和深度强化学习（DRL）结合起来，以优化分布在地理分布的数据中心中的AI推理工作负载，从而减少碳排放和云运营（能源+数据传输）成本。所提出的技术将非合作博弈论的原则整合到一个DRL框架中，使数据中心能够在考虑硬件资源的异质性、电价的动态性、数据中心之间的数据传输成本和碳足迹的情况下做出智能决策。我们进行了广泛的实验，比较了我们的博弈论DRL（GT-DRL）方法与当前基于DRL的和其他优化技术的结果。结果表明，我们的策略在减少碳排放和最小化云运营成本方面优于最先进技术，而不影响计算性能。这项工作对于在处理跨不同地理位置的AI推理工作负载的数据中心实现可持续性和成本效益具有重要意义。

更新时间: 2024-04-01 20:13:28

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01459v1

Learning Collective Variables with Synthetic Data Augmentation through Physics-inspired Geodesic Interpolation

In molecular dynamics simulations, rare events, such as protein folding, are typically studied using enhanced sampling techniques, most of which are based on the definition of a collective variable (CV) along which acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. Leveraging interpolation progress parameters, we introduce a regression-based learning scheme for CV models, which outperforms classifier-based methods when transition state data are limited and noisy.

Updated: 2024-04-01 20:11:07

标题: 使用物理启发的测地插值通过合成数据增强学习集合变量

摘要: 在分子动力学模拟中，罕见事件，如蛋白质折叠，通常使用增强采样技术进行研究，其中大多数技术都基于沿着加速发生的集体变量（CV）的定义。获得一个表达能力强的CV是至关重要的，但通常受到关于特定事件的信息不足的阻碍，例如，从展开到折叠构象的转变。我们提出了一种模拟无关的数据增强策略，利用受物理启发的度量来生成类似蛋白质折叠转变的测地插值，从而提高采样效率，而无需真正的过渡态样本。利用插值进度参数，我们引入了一种基于回归的学习方案用于CV模型，当过渡态数据有限且嘈杂时，该方案优于基于分类器的方法。

更新时间: 2024-04-01 20:11:07

领域: physics.chem-ph,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2402.01542v2

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Understanding social interactions involving both verbal and non-verbal cues is essential for effectively interpreting social situations. However, most prior works on multimodal social cues focus predominantly on single-person behaviors or rely on holistic visual representations that are not aligned to utterances in multi-party environments. Consequently, they are limited in modeling the intricate dynamics of multi-party interactions. In this paper, we introduce three new challenging tasks to model the fine-grained dynamics between multiple people: speaking target identification, pronoun coreference resolution, and mentioned player prediction. We contribute extensive data annotations to curate these new challenges in social deduction game settings. Furthermore, we propose a novel multimodal baseline that leverages densely aligned language-visual representations by synchronizing visual features with their corresponding utterances. This facilitates concurrently capturing verbal and non-verbal cues pertinent to social reasoning. Experiments demonstrate the effectiveness of the proposed approach with densely aligned multimodal representations in modeling fine-grained social interactions. Project website: https://sangmin-git.github.io/projects/MMSI.

Updated: 2024-04-01 20:03:38

标题: 模拟多模态社交互动：具有密集对齐表示的新挑战和基线

摘要: 理解涉及言语和非言语线索的社交互动对于有效解释社交情境至关重要。然而，大多数关于多模态社交线索的先前研究主要关注单人行为，或依赖于不与多方环境中的话语对齐的整体视觉表征。因此，它们在建模多方互动的复杂动态方面存在局限。在本文中，我们引入了三个新的具有挑战性的任务，以模拟多人之间的细粒度动态：讲话目标识别、代词共指解析和提及玩家预测。我们提供了大量数据注释来策划这些新的社交推理游戏设置中的挑战。此外，我们提出了一种新颖的多模态基线方法，通过将视觉特征与其相应的话语同步，利用密集对齐的语言-视觉表征。这有助于同时捕捉与社交推理相关的言语和非言语线索。实验证明了所提出的方法在建模细粒度社交互动中具有密集对齐的多模态表征的有效性。项目网站：https://sangmin-git.github.io/projects/MMSI.

更新时间: 2024-04-01 20:03:38

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.02090v2

Robust One-Class Classification with Signed Distance Function using 1-Lipschitz Neural Networks

We propose a new method, dubbed One Class Signed Distance Function (OCSDF), to perform One Class Classification (OCC) by provably learning the Signed Distance Function (SDF) to the boundary of the support of any distribution. The distance to the support can be interpreted as a normality score, and its approximation using 1-Lipschitz neural networks provides robustness bounds against $l2$ adversarial attacks, an under-explored weakness of deep learning-based OCC algorithms. As a result, OCSDF comes with a new metric, certified AUROC, that can be computed at the same cost as any classical AUROC. We show that OCSDF is competitive against concurrent methods on tabular and image data while being way more robust to adversarial attacks, illustrating its theoretical properties. Finally, as exploratory research perspectives, we theoretically and empirically show how OCSDF connects OCC with image generation and implicit neural surface parametrization. Our code is available at https://github.com/Algue-Rythme/OneClassMetricLearning

Updated: 2024-04-01 20:01:10

标题: 使用1-Lipschitz神经网络的带符号距离函数的鲁棒一类分类

摘要: 我们提出了一种新方法，称为一类有符号距离函数（OCSDF），通过可证明地学习有符号距离函数（SDF）到任何分布支持边界来执行一类分类（OCC）。与支持的距离可以被解释为正常分数，并且使用1-Lipschitz神经网络对其进行近似可以提供对$l2$对抗攻击的鲁棒性边界，这是深度学习基础的OCC算法中尚未被充分探索的弱点。因此，OCSDF提供了一个新的度量标准，认证AUROC，可以以与任何经典AUROC相同的成本计算。我们展示了OCSDF在表格和图像数据上与同时存在的方法竞争，并且对对抗攻击更加鲁棒，展示了其理论性质。最后，作为探索性研究视角，我们在理论上和实证上展示了OCSDF如何将OCC与图像生成和隐式神经表面参数化相连接。我们的代码可在https://github.com/Algue-Rythme/OneClassMetricLearning获取。

更新时间: 2024-04-01 20:01:10

领域: cs.LG

下载: http://arxiv.org/abs/2303.01978v2

Understanding the Dataset Practitioners Behind Large Language Model Development

As large language models (LLMs) become more advanced and impactful, it is increasingly important to scrutinize the data that they rely upon and produce. What is it to be a dataset practitioner doing this work? We approach this in two parts: first, we define the role of "dataset practitioners" by performing a retrospective analysis on the responsibilities of teams contributing to LLM development at a technology company, Google. Then, we conduct semi-structured interviews with a cross-section of these practitioners (N=10). We find that although data quality is a top priority, there is little consensus around what data quality is and how to evaluate it. Consequently, practitioners either rely on their own intuition or write custom code to evaluate their data. We discuss potential reasons for this phenomenon and opportunities for alignment.

Updated: 2024-04-01 19:58:43

标题: 理解大型语言模型开发背后的数据集从业者

摘要: 随着大型语言模型（LLMs）变得越来越先进和有影响力，审查它们所依赖和产生的数据变得越来越重要。作为一名数据集从业者，这项工作是什么？我们将其分为两部分：首先，通过对在技术公司Google为LLM开发作出贡献的团队的责任进行回顾性分析，来定义“数据集从业者”的角色。然后，我们对这些从业者进行半结构化访谈（N=10）。我们发现，尽管数据质量是首要任务，但在什么是数据质量以及如何评估它方面缺乏共识。因此，从业者要么依靠自己的直觉，要么编写自定义代码来评估他们的数据。我们讨论了这种现象的潜在原因以及对齐的机会。

更新时间: 2024-04-01 19:58:43

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2402.16611v2

Investigating the Ability of PINNs To Solve Burgers' PDE Near Finite-Time BlowUp

Physics Informed Neural Networks (PINNs) have been achieving ever newer feats of solving complicated PDEs numerically while offering an attractive trade-off between accuracy and speed of inference. A particularly challenging aspect of PDEs is that there exist simple PDEs which can evolve into singular solutions in finite time starting from smooth initial conditions. In recent times some striking experiments have suggested that PINNs might be good at even detecting such finite-time blow-ups. In this work, we embark on a program to investigate this stability of PINNs from a rigorous theoretical viewpoint. Firstly, we derive generalization bounds for PINNs for Burgers' PDE, in arbitrary dimensions, under conditions that allow for a finite-time blow-up. Then we demonstrate via experiments that our bounds are significantly correlated to the $\ell_2$-distance of the neurally found surrogate from the true blow-up solution, when computed on sequences of PDEs that are getting increasingly close to a blow-up.

Updated: 2024-04-01 19:57:59

标题: 研究PINNs解决Burgers' PDE在有限时间炸裂附近的能力

摘要: 物理信息神经网络（PINNs）在解决复杂PDE数值问题方面取得了越来越多的成就，同时在准确性和推理速度之间提供了一种有吸引力的折衷。PDE的一个特别具有挑战性的方面是，存在一些简单的PDE，可以从光滑的初始条件开始，在有限时间内演化为奇异解。最近一些引人注目的实验表明，PINNs甚至可能擅长检测这种有限时间的爆炸。在这项工作中，我们从严格的理论视角出发，展开了一项调查PINNs稳定性的计划。首先，我们推导出了对Burgers' PDE的PINNs的概括边界，在任意维度下，在允许有限时间爆炸的条件下。然后，通过实验表明，我们的边界与通过越来越接近爆炸的PDE序列上计算的神经代理与真实爆炸解之间的$\ell_2$距离显著相关。

更新时间: 2024-04-01 19:57:59

领域: cs.LG,cs.NA,math.AP,math.NA

下载: http://arxiv.org/abs/2310.05169v2

Unveiling Divergent Inductive Biases of LLMs on Temporal Data

Unraveling the intricate details of events in natural language necessitates a subtle understanding of temporal dynamics. Despite the adeptness of Large Language Models (LLMs) in discerning patterns and relationships from data, their inherent comprehension of temporal dynamics remains a formidable challenge. This research meticulously explores these intrinsic challenges within LLMs, with a specific emphasis on evaluating the performance of GPT-3.5 and GPT-4 models in the analysis of temporal data. Employing two distinct prompt types, namely Question Answering (QA) format and Textual Entailment (TE) format, our analysis probes into both implicit and explicit events. The findings underscore noteworthy trends, revealing disparities in the performance of GPT-3.5 and GPT-4. Notably, biases toward specific temporal relationships come to light, with GPT-3.5 demonstrating a preference for "AFTER'' in the QA format for both implicit and explicit events, while GPT-4 leans towards "BEFORE''. Furthermore, a consistent pattern surfaces wherein GPT-3.5 tends towards "TRUE'', and GPT-4 exhibits a preference for "FALSE'' in the TE format for both implicit and explicit events. This persistent discrepancy between GPT-3.5 and GPT-4 in handling temporal data highlights the intricate nature of inductive bias in LLMs, suggesting that the evolution of these models may not merely mitigate bias but may introduce new layers of complexity.

Updated: 2024-04-01 19:56:41

标题: 揭示LLMs在时间数据上的不同归纳偏好

摘要: 解开自然语言事件中复杂细节的谜团需要对时间动态有微妙的理解。尽管大型语言模型(LLMs)在从数据中辨别模式和关系方面表现出伶俐，但它们对时间动态的固有理解仍然是一个严峻的挑战。本研究精心探索了LLMs内在挑战，特别强调了评估GPT-3.5和GPT-4模型在分析时间数据中的表现。通过采用两种不同的提示类型，即问答(QA)格式和文本蕴涵(TE)格式，我们的分析深入探讨了隐含和显式事件。研究结果突显了值得关注的趋势，揭示了GPT-3.5和GPT-4在表现上的差异。值得注意的是，在QA格式中，GPT-3.5在隐含和显式事件中均倾向于“之后”，而GPT-4则更偏向于“之前”。此外，一个一致的模式浮出水面，即在TE格式中，GPT-3.5倾向于“正确”，而GPT-4更偏向于“错误”，对于隐含和显式事件均如此。GPT-3.5和GPT-4在处理时间数据时持续存在的差异凸显了LLMs中感知偏见的复杂性，暗示着这些模型的演变可能不仅仅是减少偏见，还可能引入新的复杂性层面。

更新时间: 2024-04-01 19:56:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01453v1

Versatile Navigation under Partial Observability via Value-guided Diffusion Policy

Route planning for navigation under partial observability plays a crucial role in modern robotics and autonomous driving. Existing route planning approaches can be categorized into two main classes: traditional autoregressive and diffusion-based methods. The former often fails due to its myopic nature, while the latter either assumes full observability or struggles to adapt to unfamiliar scenarios, due to strong couplings with behavior cloning from experts. To address these deficiencies, we propose a versatile diffusion-based approach for both 2D and 3D route planning under partial observability. Specifically, our value-guided diffusion policy first generates plans to predict actions across various timesteps, providing ample foresight to the planning. It then employs a differentiable planner with state estimations to derive a value function, directing the agent's exploration and goal-seeking behaviors without seeking experts while explicitly addressing partial observability. During inference, our policy is further enhanced by a best-plan-selection strategy, substantially boosting the planning success rate. Moreover, we propose projecting point clouds, derived from RGB-D inputs, onto 2D grid-based bird-eye-view maps via semantic segmentation, generalizing to 3D environments. This simple yet effective adaption enables zero-shot transfer from 2D-trained policy to 3D, cutting across the laborious training for 3D policy, and thus certifying our versatility. Experimental results demonstrate our superior performance, particularly in navigating situations beyond expert demonstrations, surpassing state-of-the-art autoregressive and diffusion-based baselines for both 2D and 3D scenarios.

Updated: 2024-04-01 19:52:08

标题: 通过值导向扩散策略实现部分可观察性下的多功能导航

摘要: 路线规划在部分可观测性下的导航中起着至关重要的作用，对于现代机器人技术和自动驾驶技术至关重要。现有的路线规划方法可以分为两类：传统的自回归方法和基于扩散的方法。前者由于其短视的特性通常会失败，而后者要么假定完全可观测性，要么在适应不熟悉情况时很难，因为它与专家的行为克隆有很强的耦合。为了解决这些不足，我们提出了一种多功能的基于扩散的方法，用于部分可观测性下的2D和3D路线规划。具体来说，我们的价值引导扩散策略首先生成计划，以预测各个时间步的行动，为规划提供充分的前瞻性。然后，它利用可微分规划器进行状态估计，推导出一个价值函数，引导代理的探索和寻找目标的行为，而无需寻求专家的帮助，同时明确解决部分可观测性的问题。在推理过程中，我们的策略通过最佳计划选择策略进一步增强，大大提高了规划成功率。此外，我们提出通过语义分割将从RGB-D输入得出的点云投影到2D基于网格的鸟瞰图地图上，泛化到3D环境。这种简单而有效的适应性使得从2D训练的策略向3D的零次迁移成为可能，避免了为3D策略进行繁重的训练，从而证明了我们的多功能性。实验结果表明了我们卓越的性能，特别是在超越专家演示的导航情况下，超过了2D和3D场景的现有自回归和扩散基线。

更新时间: 2024-04-01 19:52:08

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.02176v1

Prior Frequency Guided Diffusion Model for Limited Angle (LA)-CBCT Reconstruction

Cone-beam computed tomography (CBCT) is widely used in image-guided radiotherapy. Reconstructing CBCTs from limited-angle acquisitions (LA-CBCT) is highly desired for improved imaging efficiency, dose reduction, and better mechanical clearance. LA-CBCT reconstruction, however, suffers from severe under-sampling artifacts, making it a highly ill-posed inverse problem. Diffusion models can generate data/images by reversing a data-noising process through learned data distributions; and can be incorporated as a denoiser/regularizer in LA-CBCT reconstruction. In this study, we developed a diffusion model-based framework, prior frequency-guided diffusion model (PFGDM), for robust and structure-preserving LA-CBCT reconstruction. PFGDM uses a conditioned diffusion model as a regularizer for LA-CBCT reconstruction, and the condition is based on high-frequency information extracted from patient-specific prior CT scans which provides a strong anatomical prior for LA-CBCT reconstruction. Specifically, we developed two variants of PFGDM (PFGDM-A and PFGDM-B) with different conditioning schemes. PFGDM-A applies the high-frequency CT information condition until a pre-optimized iteration step, and drops it afterwards to enable both similar and differing CT/CBCT anatomies to be reconstructed. PFGDM-B, on the other hand, continuously applies the prior CT information condition in every reconstruction step, while with a decaying mechanism, to gradually phase out the reconstruction guidance from the prior CT scans. The two variants of PFGDM were tested and compared with current available LA-CBCT reconstruction solutions, via metrics including PSNR and SSIM. PFGDM outperformed all traditional and diffusion model-based methods. PFGDM reconstructs high-quality LA-CBCTs under very-limited gantry angles, allowing faster and more flexible CBCT scans with dose reductions.

Updated: 2024-04-01 19:41:33

标题: 先验频率引导扩散模型用于有限角度（LA）-CBCT重建

摘要: 锥束计算机断层扫描（CBCT）在图像引导放射治疗中被广泛使用。从有限角度采集（LA-CBCT）重建CBCT对于提高成像效率、减少剂量和更好的机械清除非常有必要。然而，LA-CBCT重建受到严重的欠采样伪影的影响，使其成为一个高度不适定的逆问题。扩散模型可以通过反转数据噪声过程生成数据/图像，通过学习的数据分布;并且可以作为LA-CBCT重建中的去噪器/正则化器。在这项研究中，我们开发了一个基于扩散模型的框架，即先验频率引导的扩散模型（PFGDM），用于稳健和保留结构的LA-CBCT重建。PFGDM使用条件扩散模型作为LA-CBCT重建的正则化器，条件是基于从患者特定先前CT扫描中提取的高频信息，为LA-CBCT重建提供了强大的解剖学先验。具体而言，我们开发了两种PFGDM的变体（PFGDM-A和PFGDM-B）具有不同的条件方案。PFGDM-A在经过预优化的迭代步骤之前应用高频CT信息条件，并在此后放弃它，以使得可以重建相似和不同的CT/CBCT解剖。另一方面，PFGDM-B在每个重建步骤中连续应用先前CT信息条件，同时采用衰减机制，逐渐淘汰来自先前CT扫描的重建指导。通过包括PSNR和SSIM在内的度量标准，对PFGDM进行了测试并与当前可用的LA-CBCT重建解决方案进行了比较。PFGDM优于所有传统和基于扩散模型的方法。PFGDM在非常有限的门架角度下重建高质量的LA-CBCT，允许更快速和更灵活的CBCT扫描，减少剂量。

更新时间: 2024-04-01 19:41:33

领域: physics.med-ph,cs.LG

下载: http://arxiv.org/abs/2404.01448v1

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.

Updated: 2024-04-01 19:33:41

标题: 使用多实例学习在整张幻灯片图像中找到感兴趣区域

摘要: 全切片图像（WSI）是现代数字病理学的基石，通过高分辨率数字扫描显微镜玻片在多个尺度上获得。然而，它们对基于AI的/AI中介分析提出了特殊挑战，因为病理学标记通常是在幻灯片级别而不是瓷砖级别进行的。不仅医学诊断记录在样本级别，而且肿瘤基因突变的检测也是通过像癌症基因组图谱（TCGA）这样的倡议实验获取，并在幻灯片级别记录。这构成了一个双重挑战：a）准确预测整体癌症表型和b）找出与之相关的细胞形态学特征在瓷砖级别。为了解决这些挑战，探索了一种弱监督的多实例学习（MIL）方法用于两种普遍的癌症类型，浸润性乳腺癌（TCGA-BRCA）和肺鳞状细胞癌（TCGA-LUSC）。该方法用于低放大水平的肿瘤检测和不同水平的TP53突变。我们的结果显示，MIL的新型附加实现与参考实现（AUC 0.96）的性能相匹配，并且仅被注意力MIL（AUC 0.97）稍微超越。更有趣的是，从分子病理学家的角度来看，这些不同的AI架构在不同放大水平上通过识别不同的形态特征（通过兴趣区域的检测）展现出不同的灵敏度。值得一提的是，TP53突变对在分辨细胞形态的更高应用水平上的特征最为敏感。

更新时间: 2024-04-01 19:33:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01446v1

From algorithms to action: improving patient care requires causality

In cancer research there is much interest in building and validating outcome predicting outcomes to support treatment decisions. However, because most outcome prediction models are developed and validated without regard to the causal aspects of treatment decision making, many published outcome prediction models may cause harm when used for decision making, despite being found accurate in validation studies. Guidelines on prediction model validation and the checklist for risk model endorsement by the American Joint Committee on Cancer do not protect against prediction models that are accurate during development and validation but harmful when used for decision making. We explain why this is the case and how to build and validate models that are useful for decision making.

Updated: 2024-04-01 19:28:12

标题: 从算法到行动：改善患者护理需要因果关系

摘要: 在癌症研究中，建立和验证预测治疗结果以支持治疗决策的兴趣很大。然而，由于大多数预测模型是在不考虑治疗决策因果关系的情况下开发和验证的，因此许多已发表的预测模型在决策时可能会造成伤害，尽管在验证研究中被证明准确。美国癌症联合委员会对预测模型验证和风险模型认可的检查表没有保护措施，不能防止在开发和验证阶段准确但在决策时有害的预测模型。我们解释了这种情况的原因以及如何建立和验证对决策有用的模型。

更新时间: 2024-04-01 19:28:12

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2209.07397v2

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associate the two states. By explicitly modeling point-level correspondences and exploiting cues from images, 3D reconstructions, and kinematics, our method yields more accurate and stable results compared to prior work. It also handles more than one movable part and does not rely on any object shape or structure priors. Project page: https://github.com/NVlabs/DigitalTwinArt

Updated: 2024-04-01 19:23:00

标题: 未知关节对象的数字孪生建模中的神经隐式表示

摘要: 我们解决了从两个不同关节状态的物体的RGBD扫描中构建未知关节对象的数字孪生的问题。我们将问题分解为两个阶段，每个阶段都涉及不同的方面。我们的方法首先在每个状态重建物体级形状，然后恢复包括部分分割和关节关节在内的底层关节模型，这些关节将两个状态联系起来。通过明确建模点级对应关系并利用来自图像、3D重建和运动学的线索，与以前的工作相比，我们的方法产生了更准确和稳定的结果。它还处理了多个可移动部件，并且不依赖于任何物体形状或结构先验。项目页面：https://github.com/NVlabs/DigitalTwinArt

更新时间: 2024-04-01 19:23:00

领域: cs.CV,cs.AI,cs.GR,cs.RO

下载: http://arxiv.org/abs/2404.01440v1

Creating emoji lexica from unsupervised sentiment analysis of their descriptions

Online media, such as blogs and social networking sites, generate massive volumes of unstructured data of great interest to analyze the opinions and sentiments of individuals and organizations. Novel approaches beyond Natural Language Processing are necessary to quantify these opinions with polarity metrics. So far, the sentiment expressed by emojis has received little attention. The use of symbols, however, has boomed in the past four years. About twenty billion are typed in Twitter nowadays, and new emojis keep appearing in each new Unicode version, making them increasingly relevant to sentiment analysis tasks. This has motivated us to propose a novel approach to predict the sentiments expressed by emojis in online textual messages, such as tweets, that does not require human effort to manually annotate data and saves valuable time for other analysis tasks. For this purpose, we automatically constructed a novel emoji sentiment lexicon using an unsupervised sentiment analysis system based on the definitions given by emoji creators in Emojipedia. Additionally, we automatically created lexicon variants by also considering the sentiment distribution of the informal texts accompanying emojis. All these lexica are evaluated and compared regarding the improvement obtained by including them in sentiment analysis of the annotated datasets provided by Kralj Novak et al. (2015). The results confirm the competitiveness of our approach.

Updated: 2024-04-01 19:22:58

标题: 从表情符号描述中进行无监督情感分析，创建表情符号词库

摘要: 在线媒体，如博客和社交网络网站，生成了大量的非结构化数据，对分析个人和组织的观点和情绪非常感兴趣。需要新颖的方法超越自然语言处理来用极性度量来量化这些观点。到目前为止，表情符号表达的情绪受到了很少关注。然而，在过去的四年中，符号的使用量激增。现在在Twitter上每天大约有200亿个表情符号被输入，而且每个新的Unicode版本都会有新的表情符号出现，使得它们对情感分析任务变得越来越重要。这激励我们提出了一种新颖的方法，用于预测在线文本消息（如推文）中表达的表情符号的情绪，这不需要人工努力手动注释数据，并为其他分析任务节省宝贵的时间。为此，我们使用基于Emojipedia中表情符号创建者给出的定义的无监督情感分析系统自动构建了一个新颖的表情符号情感词典。此外，我们还通过考虑伴随表情符号的非正式文本的情感分布，自动创建了词典变体。所有这些词典都根据包括它们在内的情感分析在Kralj Novak等人（2015）提供的已注释数据集中的改进进行评估和比较。结果证实了我们方法的竞争力。

更新时间: 2024-04-01 19:22:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01439v1

Generation and Detection of Sign Language Deepfakes - A Linguistic and Visual Analysis

A question in the realm of deepfakes is slowly emerging pertaining to whether we can go beyond facial deepfakes and whether it would be beneficial to society. Therefore, this research presents a positive application of deepfake technology in upper body generation, while performing sign-language for the Deaf and Hard of Hearing (DHoH) community. The resulting videos are later vetted with a sign language expert. This is particularly helpful, given the intricate nature of sign language, a scarcity of sign language experts, and potential benefits for health and education. The objectives of this work encompass constructing a reliable deepfake dataset, evaluating its technical and visual credibility through computer vision and natural language processing models, and assessing the plausibility of the generated content. With over 1200 videos, featuring both previously seen and unseen individuals for the generation model, using the help of a sign language expert, we establish a deepfake dataset in sign language that can further be utilized to detect fake videos that may target certain people of determination.

Updated: 2024-04-01 19:22:43

标题: 手语深度伪造的生成和检测 - 语言和视觉分析

摘要: 在deepfakes领域中，一个正在逐渐出现的问题是我们是否能超越面部deepfakes，并且这对社会是否有益。因此，这项研究提出了deepfake技术在上半身生成方面的积极应用，同时为聋哑人群体提供手语表演。生成的视频随后经过手语专家的审核。考虑到手语的复杂性、手语专家的稀缺性以及对健康和教育的潜在益处，这一点尤其有帮助。这项研究的目标包括构建一个可靠的deepfake数据集，通过计算机视觉和自然语言处理模型评估其技术和视觉可信度，并评估生成内容的可信度。通过超过1200个视频，包括之前看过和未见过的个体，利用手语专家的帮助，我们建立了一个手语深度伪造数据集，可以进一步用于检测可能针对某些特定人群的假视频。

更新时间: 2024-04-01 19:22:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01438v1

Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance

This paper provides the first tight convergence analyses for RMSProp and Adam in non-convex optimization under the most relaxed assumptions of coordinate-wise generalized smoothness and affine noise variance. We first analyze RMSProp, which is a special case of Adam with adaptive learning rates but without first-order momentum. Specifically, to solve the challenges due to dependence among adaptive update, unbounded gradient estimate and Lipschitz constant, we demonstrate that the first-order term in the descent lemma converges and its denominator is upper bounded by a function of gradient norm. Based on this result, we show that RMSProp with proper hyperparameters converges to an $\epsilon$-stationary point with an iteration complexity of $\mathcal O(\epsilon^{-4})$. We then generalize our analysis to Adam, where the additional challenge is due to a mismatch between the gradient and first-order momentum. We develop a new upper bound on the first-order term in the descent lemma, which is also a function of the gradient norm. We show that Adam with proper hyperparameters converges to an $\epsilon$-stationary point with an iteration complexity of $\mathcal O(\epsilon^{-4})$. Our complexity results for both RMSProp and Adam match with the complexity lower bound established in \cite{arjevani2023lower}.

Updated: 2024-04-01 19:17:45

标题: RMSProp和Adam在具有仿射噪声方差的广义光滑非凸优化中的收敛保证

摘要: 本文针对非凸优化中RMSProp和Adam的收敛性进行了首次严格分析，在最宽松的坐标通用平滑性和仿射噪声方差的假设下。我们首先分析了RMSProp，它是Adam的一个特例，具有自适应学习率，但没有一阶动量。具体来说，为了解决自适应更新、无界梯度估计和Lipschitz常数之间的依赖性带来的挑战，我们证明了下降引理中的一阶项收敛，其分母由梯度范数的一个函数上界。基于这一结果，我们展示了具有适当超参数的RMSProp收敛到一个$\epsilon$-稳定点，迭代复杂度为$\mathcal O(\epsilon^{-4})$。然后，我们将分析推广到Adam，额外的挑战是梯度和一阶动量之间的不匹配。我们开发了下降引理中一阶项的新上界，也是梯度范数的函数。我们展示了具有适当超参数的Adam收敛到一个$\epsilon$-稳定点，迭代复杂度为$\mathcal O(\epsilon^{-4})$。我们对RMSProp和Adam的复杂度结果与\cite{arjevani2023lower}中建立的复杂度下界相匹配。

更新时间: 2024-04-01 19:17:45

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2404.01436v1

FasterViT: Fast Vision Transformers with Hierarchical Attention

We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-level attention with reduced computational costs. We benefit from efficient window-based self-attention. Each window has access to dedicated carrier tokens that participate in local and global representation learning. At a high level, global self-attentions enable the efficient cross-window communication at lower costs. FasterViT achieves a SOTA Pareto-front in terms of accuracy and image throughput. We have extensively validated its effectiveness on various CV tasks including classification, object detection and segmentation. We also show that HAT can be used as a plug-and-play module for existing networks and enhance them. We further demonstrate significantly faster and more accurate performance than competitive counterparts for images with high resolution. Code is available at https://github.com/NVlabs/FasterViT.

Updated: 2024-04-01 19:14:25

标题: FasterViT: 具有分层注意力的快速视觉Transformer

摘要: 我们设计了一种新的混合CNN-ViT神经网络家族，命名为FasterViT，专注于计算机视觉（CV）应用中的高图像吞吐量。FasterViT结合了CNN中快速局部表示学习和ViT中全局建模特性的优势。我们引入了一种新的分层注意（HAT）方法，将具有二次复杂度的全局自注意力分解为具有降低计算成本的多级注意力。我们受益于高效的基于窗口的自注意力。每个窗口都可以访问专用的载体令牌，参与局部和全局表示学习。在高层次上，全局自注意力能够在更低的成本下实现有效的跨窗口通信。FasterViT在准确性和图像吞吐量方面实现了SOTA Pareto前沿。我们已经广泛验证了它在各种CV任务上的有效性，包括分类、目标检测和分割。我们还表明HAT可以作为现有网络的即插即用模块并增强它们。我们进一步展示了对于分辨率高的图像，比竞争对手表现出明显更快和更准确的性能。代码可在https://github.com/NVlabs/FasterViT上找到。

更新时间: 2024-04-01 19:14:25

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.06189v2

Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs

Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts. This development is particularly crucial for tasks that involve retrieving knowledge from an external datastore, which can result in long inputs. However, recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information within the input sequence. In this study, we conduct extensive experiments to investigate the root causes of positional bias. Our findings indicate that the primary contributor to LLM positional bias stems from the inherent positional preferences of different models. We demonstrate that merely employing prompt-based solutions is inadequate for overcoming the positional preferences. To address this positional bias issue of a pre-trained LLM, we developed a Position-Aware Parameter Efficient Fine-Tuning (PAPEFT) approach which is composed of a data augmentation technique and a parameter efficient adapter, enhancing a uniform attention distribution across the input context. Our experiments demonstrate that the proposed approach effectively reduces positional bias, improving LLMs' effectiveness in handling long context sequences for various tasks that require externally retrieved knowledge.

Updated: 2024-04-01 19:04:17

标题: 具有位置感知参数高效微调方法，用于减少LLMs中的位置偏差

摘要: 最近大型语言模型(LLMs)的进展增强了它们处理长输入上下文的能力。这种发展对涉及从外部数据存储检索知识的任务尤为关键，这可能导致输入很长。然而，最近的研究显示LLMs存在位置偏见，表现出不同的性能取决于输入序列中有用信息的位置。在本研究中，我们进行了大量实验，以研究位置偏见的根本原因。我们的发现表明，LLM位置偏见的主要贡献者源于不同模型的固有位置偏好。我们证明仅仅使用基于提示的解决方案是无法克服位置偏好的。为了解决预训练LLM的位置偏见问题，我们开发了一种位置感知参数高效微调(PAPEFT)方法，该方法由一种数据增强技术和一个参数高效的适配器组成，增强了输入上下文中的均匀关注分布。我们的实验表明，提出的方法有效减少了位置偏见，提高了LLMs在处理需要外部检索知识的各种任务中长上下文序列的有效性。

更新时间: 2024-04-01 19:04:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01430v1

DiffiT: Diffusion Vision Transformers for Image Generation

Diffusion models with their powerful expressivity and high sample quality have achieved State-Of-The-Art (SOTA) performance in the generative domain. The pioneering Vision Transformer (ViT) has also demonstrated strong modeling capabilities and scalability, especially for recognition tasks. In this paper, we study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT). Specifically, we propose a methodology for finegrained control of the denoising process and introduce the Time-dependant Multihead Self Attention (TMSA) mechanism. DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency. We also propose latent and image space DiffiT models and show SOTA performance on a variety of class-conditional and unconditional synthesis tasks at different resolutions. The Latent DiffiT model achieves a new SOTA FID score of 1.73 on ImageNet-256 dataset while having 19.85%, 16.88% less parameters than other Transformer-based diffusion models such as MDT and DiT, respectively. Code: https://github.com/NVlabs/DiffiT

Updated: 2024-04-01 18:55:16

标题: DiffiT：用于图像生成的扩散视觉Transformer

摘要: 扩散模型以其强大的表达能力和高样本质量在生成领域取得了最先进的性能。开创性的Vision Transformer (ViT)也展示了强大的建模能力和可扩展性，特别适用于识别任务。在本文中，我们研究了ViT在基于扩散的生成学习中的有效性，并提出了一种新模型，命名为Diffusion Vision Transformers (DiffiT)。具体来说，我们提出了一种精细控制去噪过程的方法，并引入了时间相关的多头自注意力机制（TMSA）。DiffiT在生成高保真度图像方面非常有效，参数效率明显更好。我们还提出了潜在空间和图像空间的DiffiT模型，并展示了在不同分辨率的各种条件和无条件合成任务上的最先进性能。潜在DiffiT模型在ImageNet-256数据集上实现了新的最先进FID得分1.73，同时比其他基于Transformer的扩散模型（如MDT和DiT）分别减少了19.85%和16.88%的参数。源代码：https://github.com/NVlabs/DiffiT

更新时间: 2024-04-01 18:55:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.02139v2

Large Language Models for Education: A Survey and Outlook

The advent of Large Language Models (LLMs) has brought in a new era of possibilities in the realm of education. This survey paper summarizes the various technologies of LLMs in educational settings from multifaceted perspectives, encompassing student and teacher assistance, adaptive learning, and commercial tools. We systematically review the technological advancements in each perspective, organize related datasets and benchmarks, and identify the risks and challenges associated with deploying LLMs in education. Furthermore, we outline future research opportunities, highlighting the potential promising directions. Our survey aims to provide a comprehensive technological picture for educators, researchers, and policymakers to harness the power of LLMs to revolutionize educational practices and foster a more effective personalized learning environment.

Updated: 2024-04-01 18:47:45

标题: 教育领域的大型语言模型：调查与展望

摘要: 大型语言模型（LLM）的出现为教育领域带来了新的可能性。本调查论文总结了在教育环境中使用LLM的各种技术，涵盖了学生和教师辅助、自适应学习和商业工具等多个方面。我们系统地回顾了每个角度的技术进展，整理了相关数据集和基准，并指出了在教育中部署LLM所面临的风险和挑战。此外，我们概述了未来的研究机会，突出了潜在的有前途的方向。我们的调查旨在为教育工作者、研究人员和决策者提供一个全面的技术图景，以利用LLM的力量革新教育实践，并促进更有效的个性化学习环境。

更新时间: 2024-04-01 18:47:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.18105v2

Twin Transformer using Gated Dynamic Learnable Attention mechanism for Fault Detection and Diagnosis in the Tennessee Eastman Process

Fault detection and diagnosis (FDD) is a crucial task for ensuring the safety and efficiency of industrial processes. We propose a novel FDD methodology for the Tennessee Eastman Process (TEP), a widely used benchmark for chemical process control. The model employs two separate Transformer branches, enabling independent processing of input data and potential extraction of diverse information. A novel attention mechanism, Gated Dynamic Learnable Attention (GDLAttention), is introduced which integrates a gating mechanism and dynamic learning capabilities. The gating mechanism modulates the attention weights, allowing the model to focus on the most relevant parts of the input. The dynamic learning approach adapts the attention strategy during training, potentially leading to improved performance. The attention mechanism uses a bilinear similarity function, providing greater flexibility in capturing complex relationships between query and key vectors. In order to assess the effectiveness of our approach, we tested it against 21 and 18 distinct fault scenarios in TEP, and compared its performance with several established FDD techniques. The outcomes indicate that the method outperforms others in terms of accuracy, false alarm rate, and misclassification rate. This underscores the robustness and efficacy of the approach for FDD in intricate industrial processes.

Updated: 2024-04-01 18:37:10

标题: 《双变压器使用门控动态可学习注意机制在田纳西伊士曼过程中的故障检测和诊断》

摘要: 故障检测和诊断（FDD）是确保工业过程安全和效率的关键任务。我们提出了一种新颖的FDD方法，适用于田纳西伊斯曼工艺（TEP），这是化工过程控制的广泛使用基准。该模型采用两个独立的Transformer分支，实现输入数据的独立处理和多样信息的提取。引入了一种新颖的注意机制，名为门控动态可学习注意力（GDLAttention），该机制整合了门控机制和动态学习能力。门控机制调节注意力权重，使模型能够专注于输入的最相关部分。动态学习方法在训练过程中调整注意力策略，可能导致性能提升。注意机制使用双线性相似性函数，提供更大的灵活性，以捕捉查询和关键向量之间复杂关系。为了评估我们方法的有效性，我们在TEP中对21个和18个不同的故障场景进行了测试，并将其性能与几种已建立的FDD技术进行了比较。结果表明，该方法在准确性、误报率和误分类率方面优于其他方法。这突显了该方法在复杂工业过程中的FDD中的鲁棒性和功效。

更新时间: 2024-04-01 18:37:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.10842v2

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops discovered that such loops can lead to model collapse, a phenomenon where performance progressively degrades with each model-fitting iteration until the latest model becomes useless. However, several recent papers studying model collapse assumed that new data replace old data over time rather than assuming data accumulate over time. In this paper, we compare these two settings and show that accumulating data prevents model collapse. We begin by studying an analytically tractable setup in which a sequence of linear models are fit to the previous models' predictions. Previous work showed if data are replaced, the test error increases linearly with the number of model-fitting iterations; we extend this result by proving that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations. We next empirically test whether accumulating data similarly prevents model collapse by pretraining sequences of language models on text corpora. We confirm that replacing data does indeed cause model collapse, then demonstrate that accumulating data prevents model collapse; these results hold across a range of model sizes, architectures and hyperparameters. We further show that similar results hold for other deep generative models on real data: diffusion models for molecule generation and variational autoencoders for image generation. Our work provides consistent theoretical and empirical evidence that data accumulation mitigates model collapse.

Updated: 2024-04-01 18:31:24

标题: 模型崩溃是否不可避免？通过积累真实和合成数据打破递归的诅咒

摘要: 生成模型的激增，结合在网络规模数据上的预训练，引发了一个及时的问题：当这些模型被训练在它们自己生成的输出上时会发生什么？最近对模型-数据反馈循环的调查发现，这种循环可能导致模型崩溃，即性能随着每次模型拟合迭代逐渐下降，直到最新的模型变得无用。然而，一些最近的研究模型崩溃的论文假设随着时间推移，新数据取代旧数据，而不是假设数据随时间累积。在本文中，我们比较这两种设置，并展示积累数据可以防止模型崩溃。我们首先研究一个解析可追踪的设置，其中一系列线性模型适用于先前模型的预测。先前的工作表明，如果数据被替换，测试错误会随着模型拟合迭代次数线性增加；我们通过证明，如果数据累积，测试错误具有一个有限的上限，独立于迭代次数，从而扩展了这一结果。接下来，我们通过在文本语料库上预训练语言模型序列来实证测试是否累积数据同样可以防止模型崩溃。我们确认替换数据确实会导致模型崩溃，然后证明积累数据可以防止模型崩溃；这些结果适用于一系列模型大小、架构和超参数。我们进一步展示相似的结果也适用于其他深度生成模型在真实数据上：分子生成的扩散模型和图像生成的变分自编码器。我们的工作提供了一致的理论和实证证据，证明数据积累可以缓解模型崩溃。

更新时间: 2024-04-01 18:31:24

领域: cs.LG,cs.AI,cs.CL,cs.ET,stat.ML

下载: http://arxiv.org/abs/2404.01413v1

DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace

Software development in the aerospace domain requires adhering to strict, high-quality standards. While there exist regulatory guidelines for commercial software in this domain (e.g., ARP-4754 and DO-178), these do not apply to software with deep neural network (DNN) components. Consequently, it is unclear how to allow aerospace systems to benefit from the deep learning revolution. Our work here seeks to address this challenge with a novel, output-centric approach for DNN certification. Our method employs statistical verification techniques, and has the key advantage of being able to flag specific inputs for which the DNN's output may be unreliable - so that they may be later inspected by a human expert. To achieve this, our method conducts a statistical analysis of the DNN's predictions for other, nearby inputs, in order to detect inconsistencies. This is in contrast to existing techniques, which typically attempt to certify the entire DNN, as opposed to individual outputs. Our method uses the DNN as a black-box, and makes no assumptions about its topology. We hope that this work constitutes another step towards integrating DNNs in safety-critical applications - especially in the aerospace domain, where high standards of quality and reliability are crucial.

Updated: 2024-04-01 18:27:34

标题: DEM: 一种用于认证航空航天领域深度神经网络分类器输出的方法

摘要: 航空航天领域的软件开发需要遵守严格的高质量标准。虽然在这个领域存在针对商业软件的监管指南（例如ARP-4754和DO-178），但这些并不适用于具有深度神经网络（DNN）组件的软件。因此，如何让航空系统从深度学习革命中受益尚不清楚。我们的工作旨在通过一种新颖的、以输出为中心的方法来解决这一挑战，用于DNN认证。我们的方法采用统计验证技术，具有能够标记特定输入的关键优势，以便稍后由人类专家检查其DNN的输出可能不可靠。为了实现这一点，我们的方法通过统计分析DNN对其他附近输入的预测，以检测不一致性。这与现有技术相反，后者通常试图对整个DNN进行认证，而不是对个别输出进行认证。我们的方法将DNN作为黑匣子使用，并不对其拓扑结构做任何假设。我们希望这项工作是将DNN整合到安全关键应用中的又一步 - 尤其是在航空航天领域，那里高质量和可靠性标准至关重要。

更新时间: 2024-04-01 18:27:34

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2401.02283v2

Low-Rank MDPs with Continuous Action Spaces

Low-Rank Markov Decision Processes (MDPs) have recently emerged as a promising framework within the domain of reinforcement learning (RL), as they allow for provably approximately correct (PAC) learning guarantees while also incorporating ML algorithms for representation learning. However, current methods for low-rank MDPs are limited in that they only consider finite action spaces, and give vacuous bounds as $|\mathcal{A}| \to \infty$, which greatly limits their applicability. In this work, we study the problem of extending such methods to settings with continuous actions, and explore multiple concrete approaches for performing this extension. As a case study, we consider the seminal FLAMBE algorithm (Agarwal et al., 2020), which is a reward-agnostic method for PAC RL with low-rank MDPs. We show that, without any modifications to the algorithm, we obtain a similar PAC bound when actions are allowed to be continuous. Specifically, when the model for transition functions satisfies a H\"older smoothness condition w.r.t. actions, and either the policy class has a uniformly bounded minimum density or the reward function is also H\"older smooth, we obtain a polynomial PAC bound that depends on the order of smoothness.

Updated: 2024-04-01 18:26:36

标题: 具有连续动作空间的低秩MDPs

摘要: 低秩马尔可夫决策过程（MDPs）最近在强化学习（RL）领域中被提出作为一个有前途的框架，因为它们在保证近似正确性（PAC）学习的同时还融合了表示学习的机器学习算法。然而，目前针对低秩MDPs的方法存在局限性，因为它们只考虑有限的动作空间，并且在动作空间无限增大时给出了虚拟边界，这极大地限制了它们的适用性。本文研究了将这些方法扩展到连续动作设置的问题，并探讨了多种具体方法来进行这种扩展。作为一个案例研究，我们考虑了FLAMBE算法（Agarwal等人，2020），这是一种适用于低秩MDPs的奖励不可知方法的PAC RL。我们表明，在不对算法进行任何修改的情况下，当动作允许连续时，我们获得了类似的PAC边界。具体来说，当转移函数模型满足关于动作的H\"older平滑条件，并且策略类具有均匀有界的最小密度，或者奖励函数也是H\"older平滑时，我们获得了一个多项式的PAC边界，该边界取决于平滑度的顺序。

更新时间: 2024-04-01 18:26:36

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2311.03564v2

OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation

In the realm of food computing, segmenting ingredients from images poses substantial challenges due to the large intra-class variance among the same ingredients, the emergence of new ingredients, and the high annotation costs associated with large food segmentation datasets. Existing approaches primarily utilize a closed-vocabulary and static text embeddings setting. These methods often fall short in effectively handling the ingredients, particularly new and diverse ones. In response to these limitations, we introduce OVFoodSeg, a framework that adopts an open-vocabulary setting and enhances text embeddings with visual context. By integrating vision-language models (VLMs), our approach enriches text embedding with image-specific information through two innovative modules, eg, an image-to-text learner FoodLearner and an Image-Informed Text Encoder. The training process of OVFoodSeg is divided into two stages: the pre-training of FoodLearner and the subsequent learning phase for segmentation. The pre-training phase equips FoodLearner with the capability to align visual information with corresponding textual representations that are specifically related to food, while the second phase adapts both the FoodLearner and the Image-Informed Text Encoder for the segmentation task. By addressing the deficiencies of previous models, OVFoodSeg demonstrates a significant improvement, achieving an 4.9\% increase in mean Intersection over Union (mIoU) on the FoodSeg103 dataset, setting a new milestone for food image segmentation.

Updated: 2024-04-01 18:26:29

标题: OVFoodSeg：通过图像信息的文本表示提升开放式词汇食品图像分割

摘要: 在食品计算领域，从图像中分割食材面临着重大挑战，原因是同一食材之间存在很大的类内差异，新食材的出现以及与大型食品分割数据集相关的高成本标注。现有方法主要利用封闭词汇和静态文本嵌入设置。这些方法通常在有效处理食材方面表现不佳，特别是对于新的和多样化的食材。为了应对这些局限性，我们引入了OVFoodSeg，这是一个采用开放词汇设置并通过视觉上下文增强文本嵌入的框架。通过整合视觉-语言模型（VLMs），我们的方法通过两个创新模块（例如，图像到文本学习器FoodLearner和基于图像信息的文本编码器）将文本嵌入与图像特定信息相结合。OVFoodSeg的训练过程分为两个阶段：FoodLearner的预训练和用于分割的后续学习阶段。预训练阶段使FoodLearner具备将视觉信息与与食品特别相关的文本表示相匹配的能力，而第二阶段则调整FoodLearner和基于图像信息的文本编码器以适应分割任务。通过解决以前模型的不足，OVFoodSeg显示出显著的改进，在FoodSeg103数据集上实现了4.9\%的交并比（mIoU）平均增加，为食品图像分割设立了新的里程碑。

更新时间: 2024-04-01 18:26:29

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2404.01409v1

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a diffusion backbone which is conditioned on ViT embeddings. Our proposed design establishes a new state-of-the-art (SOTA) for SIDE on NYUv2 dataset, achieving Abs Rel error of 0.059(14% improvement) compared to 0.069 by the current SOTA (VPD). And on KITTI dataset, achieving Sq Rel error of 0.139 (2% improvement) compared to 0.142 by the current SOTA (GEDepth). For zero-shot transfer with a model trained on NYUv2, we report mean relative improvement of (20%, 23%, 81%, 25%) over NeWCRFs on (Sun-RGBD, iBims1, DIODE, HyperSim) datasets, compared to (16%, 18%, 45%, 9%) by ZoeDepth. The project page is available at https://ecodepth-iitd.github.io

Updated: 2024-04-01 18:26:22

标题: ECoDepth：用于单目深度估计的扩散模型的有效调节

摘要: 在没有视差线索的情况下，基于学习的单图深度估计（SIDE）模型在图像中严重依赖阴影和上下文线索。虽然这种简单性很吸引人，但有必要在大型和多样化的数据集上训练这样的模型，这些数据集很难捕捉。已经表明，使用来自预训练的基础模型（如CLIP）的嵌入能够改善几种应用中的零迁移。在我们的论文中，受此启发，我们探讨了使用从预训练的ViT模型生成的全局图像先验，以提供更详细的上下文信息。我们认为，来自在大型数据集上预训练的ViT模型的嵌入向量捕获了比通常的生成伪图像标题，然后是基于CLIP的文本嵌入更多相关信息，以用于SIDE。基于这一思想，我们提出了一个使用扩散骨干的新SIDE模型，该模型受ViT嵌入的条件限制。我们提出的设计在NYUv2数据集上建立了一个新的最先进（SOTA），将Abs Rel误差从当前SOTA（VPD）的0.069提高到0.059（14％改进）。在KITTI数据集上，与当前SOTA（GEDepth）的0.142相比，实现了0.139的Sq Rel误差（2％改进）。对于在NYUv2上训练的模型的零迁移，我们在（Sun-RGBD、iBims1、DIODE、HyperSim）数据集上相对于NeWCRFs分别报告了（20％、23％、81％、25％）的平均相对改进，而ZoeDepth为（16％、18％、45％、9％）。项目页面可在https://ecodepth-iitd.github.io找到。

更新时间: 2024-04-01 18:26:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.18807v3

ContactHandover: Contact-Guided Robot-to-Human Object Handover

Robot-to-human object handover is an important step in many human robot collaboration tasks. A successful handover requires the robot to maintain a stable grasp on the object while making sure the human receives the object in a natural and easy-to-use manner. We propose ContactHandover, a robot to human handover system that consists of two phases: a contact-guided grasping phase and an object delivery phase. During the grasping phase, ContactHandover predicts both 6-DoF robot grasp poses and a 3D affordance map of human contact points on the object. The robot grasp poses are reranked by penalizing those that block human contact points, and the robot executes the highest ranking grasp. During the delivery phase, the robot end effector pose is computed by maximizing human contact points close to the human while minimizing the human arm joint torques and displacements. We evaluate our system on 27 diverse household objects and show that our system achieves better visibility and reachability of human contacts to the receiver compared to several baselines. More results can be found on https://clairezixiwang.github.io/ContactHandover.github.io

Updated: 2024-04-01 18:12:09

标题: 接触移交：接触引导的机器人到人类对象的移交

摘要: 机器人向人类交接物体是许多人机协作任务中的重要步骤。成功的交接需要机器人在保持对物体稳固抓取的同时，确保人类以一种自然且易于使用的方式接收物体。我们提出了ContactHandover，一个机器人到人类交接系统，包括两个阶段：接触引导抓取阶段和物体交付阶段。在抓取阶段，ContactHandover预测了机器人的6自由度抓取姿势和物体上人类接触点的3D可利用性地图。机器人的抓取姿势通过惩罚阻挡人类接触点的姿势进行重新排序，然后机器人执行排名最高的抓取。在交付阶段，机器人末端执行器姿势通过最大化接近人类的接触点，同时最小化人类手臂关节扭矩和位移来计算。我们在27个不同的家庭物品上评估了我们的系统，并展示了相比于几个基准线，我们的系统实现了更好的接收者的人类接触可见性和可达性。更多结果可在https://clairezixiwang.github.io/ContactHandover.github.io找到。

更新时间: 2024-04-01 18:12:09

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.01402v1

End-to-End Crystal Structure Prediction from Powder X-Ray Diffraction

Crystal structure prediction (CSP) has made significant progress, but most methods focus on unconditional generations of inorganic crystal with limited atoms in the unit cell. This study introduces XtalNet, the first equivariant deep generative model for end-to-end CSP from Powder X-ray Diffraction (PXRD). Unlike previous methods that rely solely on composition, XtalNet leverages PXRD as an additional condition, eliminating ambiguity and enabling the generation of complex organic structures with up to 400 atoms in the unit cell. XtalNet comprises two modules: a Contrastive PXRD-Crystal Pretraining (CPCP) module that aligns PXRD space with crystal structure space, and a Conditional Crystal Structure Generation (CCSG) module that generates candidate crystal structures conditioned on PXRD patterns. Evaluation on two MOF datasets (hMOF-100 and hMOF-400) demonstrates XtalNet's effectiveness. XtalNet achieves a top-10 Match Rate of 90.2% and 79% for hMOF-100 and hMOF-400 datasets in conditional crystal structure prediction task, respectively. XtalNet represents a significant advance in CSP, enabling the prediction of complex structures from PXRD data without the need for external databases or manual intervention. It has the potential to revolutionize PXRD analysis. It enables the direct prediction of crystal structures from experimental measurements, eliminating the need for manual intervention and external databases. This opens up new possibilities for automated crystal structure determination and the accelerated discovery of novel materials.

Updated: 2024-04-01 18:09:08

标题: 从粉末X射线衍射到端到端晶体结构预测

摘要: 晶体结构预测（CSP）取得了重大进展，但大多数方法侧重于无条件生成具有有限原子的无机晶体单元。本研究引入了XtalNet，这是首个用于从粉末X射线衍射（PXRD）进行端到端CSP的等变深度生成模型。与先前仅依赖成分的方法不同，XtalNet利用PXRD作为附加条件，消除了歧义，使得能够生成具有高达400个原子的复杂有机结构。XtalNet包括两个模块：对比PXRD-晶体预训练（CPCP）模块，将PXRD空间与晶体结构空间对齐；以及有条件晶体结构生成（CCSG）模块，根据PXRD模式生成候选晶体结构。对两个MOF数据集（hMOF-100和hMOF-400）的评估显示XtalNet的有效性。在条件晶体结构预测任务中，XtalNet在hMOF-100和hMOF-400数据集中分别达到90.2％和79％的前十名匹配率。XtalNet代表了CSP的重大进步，使得能够从PXRD数据中预测复杂结构，无需外部数据库或手动干预。它有可能彻底改变PXRD分析，实现从实验测量直接预测晶体结构，消除了手动干预和外部数据库的需求，为自动晶体结构确定和加速发现新材料打开了新的可能性。

更新时间: 2024-04-01 18:09:08

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2401.03862v2

Object-conditioned Bag of Instances for Few-Shot Personalized Instance Recognition

Nowadays, users demand for increased personalization of vision systems to localize and identify personal instances of objects (e.g., my dog rather than dog) from a few-shot dataset only. Despite outstanding results of deep networks on classical label-abundant benchmarks (e.g., those of the latest YOLOv8 model for standard object detection), they struggle to maintain within-class variability to represent different instances rather than object categories only. We construct an Object-conditioned Bag of Instances (OBoI) based on multi-order statistics of extracted features, where generic object detection models are extended to search and identify personal instances from the OBoI's metric space, without need for backpropagation. By relying on multi-order statistics, OBoI achieves consistent superior accuracy in distinguishing different instances. In the results, we achieve 77.1% personal object recognition accuracy in case of 18 personal instances, showing about 12% relative gain over the state of the art.

Updated: 2024-04-01 18:08:58

标题: 基于物体条件的少样本个性化实例识别的实例包模型

摘要: 如今，用户要求视觉系统增加个性化定位和识别个人对象实例（例如，我的狗而不是狗）的能力，仅使用少量数据集。尽管深度网络在传统的标签丰富的基准测试上取得了出色的结果（例如，最新的YOLOv8模型用于标准物体检测），但它们很难在类内变异性中保持，仅代表不同实例而不是物体类别。我们构建了一种基于提取特征的多阶统计的对象条件实例包（OBoI），其中通用对象检测模型被扩展为从OBoI的度量空间中搜索和识别个人实例，无需反向传播。通过依赖于多阶统计，OBoI在区分不同实例方面取得了一致较高的准确性。在结果中，我们在18个个人实例的情况下实现了77.1%的个人对象识别准确率，相对于最先进技术有约12%的相对增益。

更新时间: 2024-04-01 18:08:58

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.01397v1

NeRF-MAE : Masked AutoEncoders for Self Supervised 3D representation Learning for Neural Radiance Fields

Neural fields excel in computer vision and robotics due to their ability to understand the 3D visual world such as inferring semantics, geometry, and dynamics. Given the capabilities of neural fields in densely representing a 3D scene from 2D images, we ask the question: Can we scale their self-supervised pretraining, specifically using masked autoencoders, to generate effective 3D representations from posed RGB images. Owing to the astounding success of extending transformers to novel data modalities, we employ standard 3D Vision Transformers to suit the unique formulation of NeRFs. We leverage NeRF's volumetric grid as a dense input to the transformer, contrasting it with other 3D representations such as pointclouds where the information density can be uneven, and the representation is irregular. Due to the difficulty of applying masked autoencoders to an implicit representation, such as NeRF, we opt for extracting an explicit representation that canonicalizes scenes across domains by employing the camera trajectory for sampling. Our goal is made possible by masking random patches from NeRF's radiance and density grid and employing a standard 3D Swin Transformer to reconstruct the masked patches. In doing so, the model can learn the semantic and spatial structure of complete scenes. We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.6 million images. Once pretrained, the encoder is used for effective 3D transfer learning. Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks. Utilizing unlabeled posed 2D data for pretraining, NeRF-MAE significantly outperforms self-supervised 3D pretraining and NeRF scene understanding baselines on Front3D and ScanNet datasets with an absolute performance improvement of over 20% AP50 and 8% AP25 for 3D object detection.

Updated: 2024-04-01 17:59:55

标题: NeRF-MAE：用于神经辐射场的自监督3D表示学习的掩码自动编码器

摘要: 神经场在计算机视觉和机器人领域表现出色，因为它们能够理解3D视觉世界，如推断语义、几何和动态。鉴于神经场在密集表示3D场景的能力，我们提出了一个问题：我们能否扩展它们的自监督预训练，特别是使用遮罩自编码器，以从姿态RGB图像生成有效的3D表示。由于将transformers扩展到新的数据模态的惊人成功，我们使用标准的3D Vision Transformers来适应NeRF的独特公式。我们利用NeRF的体积网格作为transformer的密集输入，与其他3D表示（如点云）形成对比，其中信息密度可能不均匀，表示不规则。由于将遮罩自编码器应用于如NeRF这样的隐式表示的困难，我们选择提取一个明确的表示，通过使用相机轨迹进行采样来规范化跨领域的场景。我们的目标通过从NeRF的辐射和密度网格中遮蔽随机补丁，并使用标准的3D Swin Transformer来重建被遮蔽的补丁来实现。通过这样做，模型可以学习完整场景的语义和空间结构。我们在我们提出的精心策划的姿态RGB数据上以大规模预训练这种表示，总共超过160万张图像。预训练后，编码器用于有效的3D迁移学习。我们的新颖的自监督NeRF预训练NeRF-MAE表现出色，并在各种具有挑战性的3D任务上提高了性能。利用未标记的姿态2D数据进行预训练，NeRF-MAE在Front3D和ScanNet数据集上的绝对性能提高超过20%AP50和8% AP25，显著优于自监督3D预训练和NeRF场景理解基线的3D物体检测。

更新时间: 2024-04-01 17:59:55

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01300v1

CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning analysis. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. With thoughtful questions and multi-level answers, our dataset contains much longer causal chains embedded in dynamic interactions and visuals, at the same time principles of animation allows animators to create well-defined, unambiguous causal relationships. These factors allow models to solve more challenging, yet well-defined causal relationships. We also introduce hard negative mining, including CausalConfusion version. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling and joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field. We will release our dataset, codes, and models to help future efforts in this domain.

Updated: 2024-04-01 17:59:53

标题: CausalChaos!数据集：基于动态视觉场景中较长因果链的综合因果行为问答

摘要: 因果视频问答（QA）引起了越来越多的关注，然而现有的数据集在因果推理分析方面往往缺乏深度。为了填补这一空白，我们利用卡通的独特特性构建了CausalChaos！，这是一个基于标志性的“猫和老鼠”卡通系列的新颖且具有挑战性的因果为什么问答数据集。通过深思熟虑的问题和多层次的答案，我们的数据集中包含了更长的因果链，嵌入在动态互动和视觉中，同时动画原理允许动画师创建明确定义、不含糊的因果关系。这些因素使模型能够解决更具挑战性但又明确定义的因果关系。我们还引入了困难的负例挖掘，包括CausalConfusion版本。虽然模型表现良好，但仍有很大的改进空间，特别是在开放式答案方面。我们确定了更先进/明确的因果关系建模和视觉与语言的联合建模作为未来努力的重点领域。与其他互补数据集一起，我们的新挑战性数据集将为该领域的发展铺平道路。我们将发布我们的数据集、代码和模型，以帮助未来在这一领域的努力。

更新时间: 2024-04-01 17:59:53

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.01299v1

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency. While improved network architecture and inference algorithms have shown to effectively boost sampling efficiency of diffusion models, the role of model size -- a critical determinant of sampling efficiency -- has not been thoroughly examined. Through empirical analysis of established text-to-image diffusion models, we conduct an in-depth investigation into how model size influences sampling efficiency across varying sampling steps. Our findings unveil a surprising trend: when operating under a given inference budget, smaller models frequently outperform their larger equivalents in generating high-quality results. Moreover, we extend our study to demonstrate the generalizability of the these findings by applying various diffusion samplers, exploring diverse downstream tasks, evaluating post-distilled models, as well as comparing performance relative to training compute. These findings open up new pathways for the development of LDM scaling strategies which can be employed to enhance generative capabilities within limited inference budgets.

Updated: 2024-04-01 17:59:48

标题: 更大并不总是更好：潜在扩散模型的规模特性

摘要: 我们研究了潜在扩散模型（LDMs）的缩放特性，重点关注它们的采样效率。尽管改进的网络架构和推理算法已经被证明能有效提升扩散模型的采样效率，但模型大小这一决定采样效率的关键因素尚未得到深入研究。通过对已建立的文本到图像扩散模型进行实证分析，我们深入研究了模型大小如何影响在不同采样步骤中的采样效率。我们的研究发现了一个令人惊讶的趋势：在给定的推理预算下，较小的模型经常在生成高质量结果方面胜过较大的等价模型。此外，我们扩展了我们的研究，通过应用各种扩散采样器，探索不同的下游任务，评估后蒸馏模型，以及比较相对于训练计算的性能，来展示这些发现的普适性。这些发现为LDM缩放策略的发展开辟了新的途径，可以用于在有限的推理预算内增强生成能力。

更新时间: 2024-04-01 17:59:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01367v1

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, and hurting users' mental health. In this work, we propose to balance safety and helpfulness in diverse use cases by controlling both attributes in LLM. We explore training-free and fine-tuning methods that do not require extra human annotations and analyze the challenges of controlling safety and helpfulness in LLMs. Our experiments demonstrate that our method can rewind a learned model and unlock its controllability.

Updated: 2024-04-01 17:59:06

标题: 朝向通过可控大型语言模型实现安全性和帮助性平衡的响应

摘要: 随着大型语言模型(LLMs)如今变得更容易获取，安全性和帮助性之间的权衡关系可能会显著影响用户体验。一个将安全性放在首位的模型会导致用户感觉参与度和帮助度降低，而将帮助性放在首位可能会造成伤害。可能的危害包括教人们如何制造炸弹、让青少年接触不当内容，以及伤害用户的心理健康。在这项工作中，我们提出通过控制LLM中的这两个属性来在不同用例中平衡安全性和帮助性。我们探讨了无需额外人工注释的训练免费和微调方法，并分析了在LLM中控制安全性和帮助性的挑战。我们的实验表明，我们的方法可以倒带学习模型并解锁其可控性。

更新时间: 2024-04-01 17:59:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01295v1

Measuring Style Similarity in Diffusion Models

Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a generated image is used for professional purposes. Existing tools for this purpose focus on retrieving images of similar semantic content. Meanwhile, many artists are concerned with style replication in text-to-image models. We present a framework for understanding and extracting style descriptors from images. Our framework comprises a new dataset curated using the insight that style is a subjective property of an image that captures complex yet meaningful interactions of factors including but not limited to colors, textures, shapes, etc. We also propose a method to extract style descriptors that can be used to attribute style of a generated image to the images used in the training dataset of a text-to-image model. We showcase promising results in various style retrieval tasks. We also quantitatively and qualitatively analyze style attribution and matching in the Stable Diffusion model. Code and artifacts are available at https://github.com/learn2phoenix/CSD.

Updated: 2024-04-01 17:58:30

标题: 在扩散模型中测量风格相似性

摘要: 生成模型现在被广泛应用于图形设计师和艺术家。先前的研究表明，这些模型在生成过程中通常会记住并复制训练数据中的内容。因此，随着它们的普及增加，有必要在每次将生成的图像用于专业目的之前进行数据库搜索，以确定图像的属性是否归因于特定的训练数据。现有工具专注于检索具有类似语义内容的图像。与此同时，许多艺术家关注的是文本到图像模型中的风格复制。我们提出了一个理解和提取图像风格描述符的框架。我们的框架包括使用了风格是图像的主观属性的见解而策划的新数据集，该属性捕捉了包括但不限于颜色、纹理、形状等因素之间复杂而有意义的相互作用。我们还提出了一种提取风格描述符的方法，可以用来将生成图像的风格归因于文本到图像模型的训练数据集中使用的图像。我们展示了在各种风格检索任务中的有希望的结果。我们还对Stable Diffusion模型中的风格归因和匹配进行了定量和定性分析。代码和工件可在https://github.com/learn2phoenix/CSD找到。

更新时间: 2024-04-01 17:58:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01292v1

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Despite significant progress in generative AI, comprehensive evaluation remains challenging because of the lack of effective metrics and standardized benchmarks. For instance, the widely-used CLIPScore measures the alignment between a (generated) image and text prompt, but it fails to produce reliable scores for complex prompts involving compositions of objects, attributes, and relations. One reason is that text encoders of CLIP can notoriously act as a "bag of words", conflating prompts such as "the horse is eating the grass" with "the grass is eating the horse". To address this, we introduce the VQAScore, which uses a visual-question-answering (VQA) model to produce an alignment score by computing the probability of a "Yes" answer to a simple "Does this figure show '{text}'?" question. Though simpler than prior art, VQAScore computed with off-the-shelf models produces state-of-the-art results across many (8) image-text alignment benchmarks. We also compute VQAScore with an in-house model that follows best practices in the literature. For example, we use a bidirectional image-question encoder that allows image embeddings to depend on the question being asked (and vice versa). Our in-house model, CLIP-FlanT5, outperforms even the strongest baselines that make use of the proprietary GPT-4V. Interestingly, although we train with only images, VQAScore can also align text with video and 3D models. VQAScore allows researchers to benchmark text-to-visual generation using complex texts that capture the compositional structure of real-world prompts. We introduce GenAI-Bench, a more challenging benchmark with 1,600 compositional text prompts that require parsing scenes, objects, attributes, relationships, and high-order reasoning like comparison and logic. GenAI-Bench also offers over 15,000 human ratings for leading image and video generation models such as Stable Diffusion, DALL-E 3, and Gen2.

Updated: 2024-04-01 17:58:06

标题: 评估文本到图像生成与图像到文本生成

摘要: 尽管生成式人工智能取得了重要进展，但由于缺乏有效的度量标准和标准化基准，全面评估仍然具有挑战性。例如，广泛使用的CLIPScore衡量生成的图像与文本提示之间的对齐程度，但在涉及对象、属性和关系组合的复杂提示时，它无法产生可靠的分数。一个原因是CLIP的文本编码器可以臭名昭著地充当“词袋”，混淆了诸如“马正在吃草”和“草正在吃马”这样的提示。为了解决这个问题，我们引入了VQAScore，它使用视觉问答（VQA）模型通过计算对简单的“这个图显示'{文本}'吗？”问题的“是”答案的概率来产生一个对齐分数。虽然比前人的方法更简单，但使用现成模型计算的VQAScore在许多（8个）图像文本对齐基准测试中产生了最先进的结果。我们还用文献中的最佳实践方法计算了VQAScore。例如，我们使用双向图像-问题编码器，允许图像嵌入依赖于所提问题（反之亦然）。我们的内部模型CLIP-FlanT5甚至胜过了使用专有GPT-4V的最强基线。有趣的是，尽管我们只用图像进行训练，VQAScore也可以将文本与视频和3D模型对齐。VQAScore使研究人员能够使用捕捉现实世界提示的组合结构的复杂文本来进行文本到视觉生成的基准测试。我们介绍了GenAI-Bench，一个更具挑战性的基准测试，包含1,600个需要解析场景、对象、属性、关系以及高阶推理（如比较和逻辑）的组合文本提示。GenAI-Bench还为领先的图像和视频生成模型（如稳定扩散、DALL-E 3和Gen2）提供了超过15,000个人类评分。

更新时间: 2024-04-01 17:58:06

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2404.01291v1

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the lack of consideration for spatial continuity in the scan scheme of Mamba. Secondly, building upon this insight, we introduce a simple, plug-and-play, zero-parameter method named Zigzag Mamba, which outperforms Mamba-based baselines and demonstrates improved speed and memory utilization compared to transformer-based baselines. Lastly, we integrate Zigzag Mamba with the Stochastic Interpolant framework to investigate the scalability of the model on large-resolution visual datasets, such as FacesHQ $1024\times 1024$ and UCF101, MultiModal-CelebA-HQ, and MS COCO $256\times 256$ . Code will be released at https://taohu.me/zigma/

Updated: 2024-04-01 17:58:02

标题: ZigMa：一种DiT风格的Zigzag Mamba扩散模型

摘要: 扩散模型长期以来一直受到可伸缩性和二次复杂性问题的困扰，特别是在基于变压器的结构中。在这项研究中，我们旨在利用名为Mamba的状态空间模型的长序列建模能力，将其适用于视觉数据生成。首先，我们发现大多数当前基于Mamba的视觉方法存在一个关键的疏忽，即在Mamba的扫描方案中缺乏对空间连续性的考虑。其次，基于这一洞察，我们引入了一种简单的、即插即用的、零参数的方法，命名为Zigzag Mamba，它在性能上优于基于Mamba的基准线，并在速度和内存利用方面优于基于变压器的基准线。最后，我们将Zigzag Mamba与随机插值框架集成，以研究模型在大分辨率视觉数据集（如FacesHQ $1024\times 1024$和UCF101、MultiModal-CelebA-HQ和MS COCO $256\times 256$）上的可伸缩性。代码将在https://taohu.me/zigma/发布。

更新时间: 2024-04-01 17:58:02

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.13802v2

Prompt-prompted Mixture of Experts for Efficient LLM Generation

With the development of transformer-based large language models (LLMs), they have been applied to many fields due to their remarkable utility, but this comes at a considerable computational cost at deployment. Fortunately, some methods such as pruning or constructing a mixture of experts (MoE) aim at exploiting sparsity in transformer feedforward (FF) blocks to gain boosts in speed and reduction in memory requirements. However, these techniques can be very costly and inflexible in practice, as they often require training or are restricted to specific types of architectures. To address this, we introduce GRIFFIN, a novel training-free MoE that selects unique FF experts at the sequence level for efficient generation across a plethora of LLMs with different non-ReLU activation functions. This is possible due to a critical observation that many trained LLMs naturally produce highly structured FF activation patterns within a sequence, which we call flocking. Despite our method's simplicity, we show with 50\% of the FF parameters, GRIFFIN maintains the original model's performance with little to no degradation on a variety of classification and generation tasks, all while improving latency (e.g. 1.25$\times$ speed-up in Llama 2 13B on an NVIDIA L40). Code will be available at https://github.com/hdong920/GRIFFIN.

Updated: 2024-04-01 17:56:06

标题: 促使混合专家用于高效的LLM生成

摘要: 随着基于变压器的大型语言模型（LLMs）的发展，由于其出色的实用性，它们已被应用于许多领域，但在部署时需要付出相当大的计算成本。幸运的是，一些方法，如修剪或构建专家混合物（MoE），旨在利用变压器前馈（FF）块中的稀疏性，以提高速度并降低内存需求。然而，这些技术在实践中可能非常昂贵和不灵活，因为它们通常需要训练或限制于特定类型的架构。为了解决这个问题，我们介绍了GRIFFIN，一种新颖的无需训练的MoE，它在序列级别选择独特的FF专家，以在各种具有不同非ReLU激活函数的LLMs中实现高效生成。这是由于一个关键观察结果，即许多经过训练的LLMs在序列内部自然产生高度结构化的FF激活模式，我们称之为群集。尽管我们的方法简单，但我们展示了GRIFFIN 仅使用原始模型的50%的FF参数，在各种分类和生成任务中保持了原始模型的性能，几乎没有降级，同时提高了延迟（例如在NVIDIA L40上的Llama 2 13B中加速1.25倍）。代码将可在 https://github.com/hdong920/GRIFFIN 上找到。

更新时间: 2024-04-01 17:56:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.01365v1

A Survey on Multimodal Large Language Models

Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial general intelligence. To this end, both academia and industry have endeavored to develop MLLMs that can compete with or even better than GPT-4V, pushing the limit of research at a surprising speed. In this paper, we aim to trace and summarize the recent progress of MLLMs. First of all, we present the basic formulation of MLLM and delineate its related concepts, including architecture, training strategy and data, as well as evaluation. Then, we introduce research topics about how MLLMs can be extended to support more granularity, modalities, languages, and scenarios. We continue with multimodal hallucination and extended techniques, including Multimodal ICL (M-ICL), Multimodal CoT (M-CoT), and LLM-Aided Visual Reasoning (LAVR). To conclude the paper, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.

Updated: 2024-04-01 17:51:54

标题: 关于多模式大型语言模型的调查

摘要: 最近，以GPT-4V为代表的多模态大型语言模型（MLLM）已成为新的研究热点，利用强大的大型语言模型（LLMs）作为大脑来执行多模态任务。MLLM的惊人新能力，如基于图像撰写故事和无OCR的数学推理，在传统的多模态方法中很少见，表明了通向人工通用智能的潜在路径。为此，学术界和工业界都努力开发能够与甚至优于GPT-4V的MLLM，以惊人的速度推动研究的极限。本文旨在追踪和总结MLLM的最新进展。首先，我们介绍MLLM的基本构成，并描述其相关概念，包括架构、训练策略和数据，以及评估。然后，我们介绍了关于如何将MLLM扩展以支持更多细分、模态、语言和场景的研究课题。我们继续讨论多模态幻觉和扩展技术，包括多模态ICL（M-ICL）、多模态CoT（M-CoT）和LLM辅助视觉推理（LAVR）。最后，我们讨论现有挑战并指出有前景的研究方向。鉴于MLLM时代刚刚开始，我们将继续更新此调查，并希望它能激发更多研究。收集最新论文的关联GitHub链接可在https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models找到。

更新时间: 2024-04-01 17:51:54

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2306.13549v2

TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significantly enhance patient safety, expedite development, reduce costs, and contribute to the broader scientific knowledge in healthcare. Existing research often focuses on leveraging electronic health records (EHRs) to support clinical trial outcome prediction. Yet, trained with limited clinical trial outcome data, existing approaches frequently struggle to perform accurate predictions. Some research has attempted to generate EHRs to augment model development but has fallen short in personalizing the generation for individual patient profiles. Recently, the emergence of large language models has illuminated new possibilities, as their embedded comprehensive clinical knowledge has proven beneficial in addressing medical issues. In this paper, we propose a large language model-based digital twin creation approach, called TWIN-GPT. TWIN-GPT can establish cross-dataset associations of medical information given limited data, generating unique personalized digital twins for different patients, thereby preserving individual patient characteristics. Comprehensive experiments show that using digital twins created by TWIN-GPT can boost clinical trial outcome prediction, exceeding various previous prediction approaches. Besides, we also demonstrate that TWIN-GPT can generate high-fidelity trial data that closely approximate specific patients, aiding in more accurate result predictions in data-scarce situations. Moreover, our study provides practical evidence for the application of digital twins in healthcare, highlighting its potential significance.

Updated: 2024-04-01 17:48:55

标题: TWIN-GPT: 基于大型语言模型的临床试验数字孪生

摘要: 最近，对虚拟临床试验引起了日益增长的兴趣，这些试验模拟现实世界的情景，并有可能显著提高患者安全性、加速开发、降低成本，并为医疗保健领域的广泛科学知识做出贡献。现有研究常常侧重于利用电子健康记录（EHRs）来支持临床试验结果预测。然而，受限于有限的临床试验结果数据，现有方法经常难以进行准确预测。一些研究尝试生成EHRs以增强模型开发，但在个性化生成个体患者概况方面表现不佳。最近，大型语言模型的出现开启了新的可能性，因为它们内置了全面的临床知识，已经被证明在解决医学问题方面具有益处。在本文中，我们提出了一种基于大型语言模型的数字孪生创建方法，称为TWIN-GPT。TWIN-GPT能够在有限数据的情况下建立医疗信息的跨数据集关联，为不同患者生成独特的个性化数字孪生，从而保留了个体患者的特征。全面的实验表明，使用TWIN-GPT创建的数字孪生可以提升临床试验结果预测，超过了各种先前的预测方法。此外，我们还证明了TWIN-GPT能够生成与特定患者紧密接近的高保真度试验数据，在数据稀缺情况下帮助更准确地预测结果。此外，我们的研究为数字孪生在医疗保健领域的应用提供了实际证据，突显了其潜在重要性。

更新时间: 2024-04-01 17:48:55

领域: cs.LG,cs.CL,stat.ME

下载: http://arxiv.org/abs/2404.01273v1

Decentralized Collaborative Learning Framework with External Privacy Leakage Analysis

This paper presents two methodological advancements in decentralized multi-task learning under privacy constraints, aiming to pave the way for future developments in next-generation Blockchain platforms. First, we expand the existing framework for collaborative dictionary learning (CollabDict), which has previously been limited to Gaussian mixture models, by incorporating deep variational autoencoders (VAEs) into the framework, with a particular focus on anomaly detection. We demonstrate that the VAE-based anomaly score function shares the same mathematical structure as the non-deep model, and provide comprehensive qualitative comparison. Second, considering the widespread use of "pre-trained models," we provide a mathematical analysis on data privacy leakage when models trained with CollabDict are shared externally. We show that the CollabDict approach, when applied to Gaussian mixtures, adheres to a Renyi differential privacy criterion. Additionally, we propose a practical metric for monitoring internal privacy breaches during the learning process.

Updated: 2024-04-01 17:46:17

标题: 去中心化协作学习框架及外部隐私泄露分析

摘要: 本文提出了两种在隐私约束下分散多任务学习中的方法论进展，旨在为未来下一代区块链平台的发展铺平道路。首先，我们通过将深度变分自动编码器（VAEs）纳入框架中，扩展了协作字典学习（CollabDict）的现有框架，该框架以前仅限于高斯混合模型，并特别关注异常检测。我们展示了基于VAE的异常分数函数与非深度模型具有相同的数学结构，并提供了全面的定性比较。其次，考虑到“预训练模型”的广泛使用，我们对使用CollabDict训练的模型在外部共享时数据隐私泄漏进行了数学分析。我们展示了当应用于高斯混合时，CollabDict方法符合Renyi差分隐私准则。此外，我们提出了一种用于监测学习过程中内部隐私泄漏的实用指标。

更新时间: 2024-04-01 17:46:17

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2404.01270v1

Mapping the Increasing Use of LLMs in Scientific Papers

Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings.

Updated: 2024-04-01 17:45:15

标题: Mapping the Increasing Use of LLMs in Scientific Papers （映射科学论文中对LLMs的使用不断增加）

摘要: 科学出版通过传播研究成果、促进合作、鼓励可重复性，并确保科学知识是可获取的、可验证的并随时间建立的，为科学奠定了基础。最近，人们对有多少人在学术写作中使用像ChatGPT这样的大型语言模型（LLMs），以及这种工具可能对全球科学实践产生何种影响进行了大量猜测。然而，我们缺乏一个精确的度量学术写作中被LLMs大幅修改或产生的比例。为了填补这一空白，我们对arXiv、bioRxiv和Nature组合期刊于2020年1月至2024年2月间发表的950,965篇论文进行了首次系统性、大规模分析，使用人群级别的统计框架来衡量随时间变化的LLM修改内容的普及程度。我们的统计估计在语料库级别操作，比对个别实例进行推断更为稳健。我们的研究结果显示，LLM的使用量稳步增加，其中计算机科学论文的增长最大且最快（最高达17.5%）。相比之下，数学论文和Nature组合显示了最少的LLM修改（最高达6.3%）。此外，在总体水平上，我们的分析显示，LLM修改水平较高的论文往往是第一作者更频繁发布预印本、研究领域更拥挤且篇幅较短的论文。我们的研究结果表明，LLMs广泛应用于科学写作中。

更新时间: 2024-04-01 17:45:15

领域: cs.CL,cs.AI,cs.DL,cs.LG,cs.SI

下载: http://arxiv.org/abs/2404.01268v1

A Robust Semantics-based Watermark for Large Language Model against Paraphrasing

Large language models (LLMs) have show great ability in various natural language tasks. However, there are concerns that LLMs are possible to be used improperly or even illegally. To prevent the malicious usage of LLMs, detecting LLM-generated text becomes crucial in the deployment of LLM applications. Watermarking is an effective strategy to detect the LLM-generated content by encoding a pre-defined secret watermark to facilitate the detection process. However, the majority of existing watermark methods leverage the simple hashes of precedent tokens to partition vocabulary. Such watermark can be easily eliminated by paraphrase and correspondingly the detection effectiveness will be greatly compromised. Thus, to enhance the robustness against paraphrase, we propose a semantics-based watermark framework SemaMark. It leverages the semantics as an alternative to simple hashes of tokens since the paraphrase will likely preserve the semantic meaning of the sentences. Comprehensive experiments are conducted to demonstrate the effectiveness and robustness of SemaMark under different paraphrases.

Updated: 2024-04-01 17:44:19

标题: 一种针对大型语言模型的强大基于语义的水印防范改写的方法

摘要: 大型语言模型（LLMs）在各种自然语言任务中展现了出色的能力。然而，人们担心LLMs可能会被不当使用甚至非法使用。为了防止LLMs的恶意使用，在LLM应用部署过程中检测LLM生成的文本变得至关重要。数字水印是一种有效的策略，通过将预定义的秘密水印编码到内容中以便于检测过程。然而，现有大多数数字水印方法利用先前标记的简单哈希来划分词汇。这种水印可以很容易地通过改写消除，相应地检测效果将大大降低。因此，为了增强对改写的抗性，我们提出了一种基于语义的水印框架SemaMark。它利用语义作为标记的替代，而不是简单的哈希标记，因为改写可能会保留句子的语义含义。进行了全面的实验来展示SemaMark在不同改写情况下的有效性和稳健性。

更新时间: 2024-04-01 17:44:19

领域: cs.CR

下载: http://arxiv.org/abs/2311.08721v2

ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy

Language agents have demonstrated autonomous decision-making abilities by reasoning with foundation models. Recently, efforts have been made to train language agents for performance improvement, with multi-step reasoning and action trajectories as the training data. However, collecting such trajectories still requires considerable human effort, by either artificial annotation or implementations of diverse prompting frameworks. In this work, we propose A$^3$T, a framework that enables the Autonomous Annotation of Agent Trajectories in the style of ReAct. The central role is an ActRe prompting agent, which explains the reason for an arbitrary action. When randomly sampling an external action, the ReAct-style agent could query the ActRe agent with the action to obtain its textual rationales. Novel trajectories are then synthesized by prepending the posterior reasoning from ActRe to the sampled action. In this way, the ReAct-style agent executes multiple trajectories for the failed tasks, and selects the successful ones to supplement its failed trajectory for contrastive self-training. Realized by policy gradient methods with binarized rewards, the contrastive self-training with accumulated trajectories facilitates a closed loop for multiple rounds of language agent self-improvement. We conduct experiments using QLoRA fine-tuning with the open-sourced Mistral-7B-Instruct-v0.2. In AlfWorld, the agent trained with A$^3$T obtains a 1-shot success rate of 96%, and 100% success with 4 iterative rounds. In WebShop, the 1-shot performance of the A$^3$T agent matches human average, and 4 rounds of iterative refinement lead to the performance approaching human experts. A$^3$T agents significantly outperform existing techniques, including prompting with GPT-4, advanced agent frameworks, and fully fine-tuned LLMs.

Updated: 2024-04-01 17:37:15

标题: ReAct遇见ActRe：当语言代理享受训练数据自主权

摘要: 语言代理通过基础模型推理展示了自主决策能力。最近，人们努力训练语言代理以提高性能，使用多步推理和动作轨迹作为训练数据。然而，收集这样的轨迹仍然需要相当大的人力，通过人工注释或实现各种提示框架。在这项工作中，我们提出了A$^3$T框架，以ReAct风格实现了代理轨迹的自主注释。中心角色是一个ActRe提示代理，解释任意动作的原因。当随机抽样外部动作时，ReAct风格代理可以向ActRe代理查询动作，以获取其文本理由。然后，通过将ActRe的后验推理前置到抽样动作中，合成新的轨迹。通过这种方式，ReAct风格代理为失败的任务执行多个轨迹，并选择成功的轨迹来补充其失败的轨迹以进行对比的自我训练。通过二值化奖励的策略梯度方法实现，累积轨迹的对比自我训练促进了多轮语言代理的自我改进闭环。我们使用QLoRA微调开源的Mistral-7B-Instruct-v0.2进行实验。在AlfWorld中，使用A$^3$T训练的代理获得了96%的1次成功率，并在4轮迭代后成功率达到100%。在WebShop中，A$^3$T代理的一次性表现与人类平均水平相匹配，经过4轮迭代的改进后，表现接近人类专家。A$^3$T代理明显优于现有技术，包括使用GPT-4提示、高级代理框架和完全微调的LLMs。

更新时间: 2024-04-01 17:37:15

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.14589v3

Efficient Benchmarking of Language Models

The increasing versatility of language models (LMs) has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs, extending to thousands of GPU hours per model. However, the efficiency aspect of these evaluation efforts had raised little discussion in the literature. In this work, we present the problem of Efficient Benchmarking, namely, intelligently reducing the computation costs of LM evaluation without compromising reliability. Using the HELM benchmark as a test case, we investigate how different benchmark design choices affect the computation-reliability trade-off. We propose to evaluate the reliability of such decisions, by using a new measure -- Decision Impact on Reliability, DIoR for short. We find, for example, that a benchmark leader may change by merely removing a low-ranked model from the benchmark, and observe that a correct benchmark ranking can be obtained by considering only a fraction of the evaluation examples. Based on our findings, we outline a set of concrete recommendations for efficient benchmark design and utilization practices. To take a step further, we use our findings to propose an evaluation algorithm, that, when applied to the HELM benchmark, leads to dramatic cost savings with minimal loss of benchmark reliability, often reducing computation by x100 or more.

Updated: 2024-04-01 17:34:34

标题: 语言模型的高效基准测试

摘要: 语言模型（LMs）的多功能性不断增强，催生了一类全面评估各种能力的新基准。这些基准与庞大的计算成本相关，每个模型的GPU时间延伸至数千小时。然而，文献中对这些评估工作的效率方面讨论较少。在本研究中，我们提出了高效基准测试的问题，即智能地降低LM评估的计算成本，而不损害可靠性。以HELM基准为案例，我们研究了不同基准设计选择如何影响计算可靠性权衡。我们建议通过使用一种新的度量--决策对可靠性的影响（DIoR）来评估此类决策的可靠性。例如，我们发现，仅通过将低排名模型从基准中删除，基准领先者可能会发生变化，并观察到仅考虑部分评估示例即可获得正确的基准排名。根据我们的发现，我们提出了一系列关于高效基准设计和利用实践的具体建议。为了更进一步，我们利用我们的发现提出了一种评估算法，当应用于HELM基准时，大大节省成本，降低基准可靠性的损失最小，通常将计算成本降低100倍或更多。

更新时间: 2024-04-01 17:34:34

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2308.11696v5

Information Plane Analysis Visualization in Deep Learning via Transfer Entropy

In a feedforward network, Transfer Entropy (TE) can be used to measure the influence that one layer has on another by quantifying the information transfer between them during training. According to the Information Bottleneck principle, a neural model's internal representation should compress the input data as much as possible while still retaining sufficient information about the output. Information Plane analysis is a visualization technique used to understand the trade-off between compression and information preservation in the context of the Information Bottleneck method by plotting the amount of information in the input data against the compressed representation. The claim that there is a causal link between information-theoretic compression and generalization, measured by mutual information, is plausible, but results from different studies are conflicting. In contrast to mutual information, TE can capture temporal relationships between variables. To explore such links, in our novel approach we use TE to quantify information transfer between neural layers and perform Information Plane analysis. We obtained encouraging experimental results, opening the possibility for further investigations.

Updated: 2024-04-01 17:34:18

标题: 深度学习中的信息平面分析可视化：通过转移熵

摘要: 在前馈网络中，转移熵（TE）可以用来衡量一个层对另一个层的影响，通过在训练过程中量化它们之间的信息传输。根据信息瓶颈原理，神经模型的内部表示应尽可能压缩输入数据，同时仍保留足够的关于输出的信息。信息平面分析是一种可视化技术，用于理解信息瓶颈方法中压缩和信息保留之间的权衡，通过绘制输入数据中的信息量和压缩表示之间的关系。声称信息论压缩与泛化之间存在因果关系，通过互信息来衡量，是合理的，但不同研究结果存在冲突。与互信息相比，TE可以捕捉变量之间的时间关系。为了探索这种联系，在我们的新方法中，我们使用TE来量化神经层之间的信息传输，并进行信息平面分析。我们获得了令人鼓舞的实验结果，为进一步的研究打开了可能性。

更新时间: 2024-04-01 17:34:18

领域: cs.LG,cs.AI,cs.HC,cs.IT,math.IT

下载: http://arxiv.org/abs/2404.01364v1

FABLES: Evaluating faithfulness and content selection in book-length summarization

While long-context large language models (LLMs) can technically summarize book-length documents (>100K tokens), the length and complexity of the documents have so far prohibited evaluations of input-dependent aspects like faithfulness. In this paper, we conduct the first large-scale human evaluation of faithfulness and content selection on LLM-generated summaries of fictional books. Our study mitigates the issue of data contamination by focusing on summaries of books published in 2023 or 2024, and we hire annotators who have fully read each book prior to the annotation task to minimize cost and cognitive burden. We collect FABLES, a dataset of annotations on 3,158 claims made in LLM-generated summaries of 26 books, at a cost of $5.2K USD, which allows us to rank LLM summarizers based on faithfulness: Claude-3-Opus significantly outperforms all closed-source LLMs, while the open-source Mixtral is on par with GPT-3.5-Turbo. An analysis of the annotations reveals that most unfaithful claims relate to events and character states, and they generally require indirect reasoning over the narrative to invalidate. While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims. Our experiments suggest that detecting unfaithful claims is an important future direction not only for summarization evaluation but also as a testbed for long-context understanding. Finally, we move beyond faithfulness by exploring content selection errors in book-length summarization: we develop a typology of omission errors related to crucial narrative elements and also identify a systematic over-emphasis on events occurring towards the end of the book.

Updated: 2024-04-01 17:33:38

标题: 寓言：评估书籍摘要中的忠实性和内容选择

摘要: 尽管长文本大型语言模型（LLMs）在技术上能够总结超过100K标记的书籍长度文档，但迄今为止文档的长度和复杂性阻碍了对像忠实性这样的输入相关方面的评估。在本文中，我们进行了第一次针对LLM生成的虚构书籍摘要的忠实性和内容选择的大规模人工评估。我们通过专注于2023年或2024年出版的书籍的摘要，并雇佣事先完全阅读过每本书的注释员来降低成本和认知负担，从而解决了数据污染的问题。我们收集了FABLES数据集，其中包含26本书的LLM生成摘要中3,158个声明的注释，花费了5.2K美元，这使我们能够基于忠实性对LLM摘要生成器进行排名：Claude-3-Opus在所有闭源LLMs中表现显著优于其他，而开源的Mixtral与GPT-3.5-Turbo不相上下。注释的分析显示，大多数不忠实的声明与事件和角色状态有关，并且通常需要对叙述进行间接推理才能否定。虽然基于LLM的自动评分器在其他环境中已被证明可靠用于事实和连贯性，但我们实现了几个LLM忠实性评分器，并发现没有一个与人类注释有强相关性，特别是在检测不忠实声明方面。我们的实验表明，检测不忠实声明不仅是总结评估的重要未来方向，也是长文本理解的测试平台。最后，我们通过探索书籍长度摘要中的内容选择错误超越了忠实性：我们发展了一个涉及关键叙事元素的遗漏错误的分类，并且还发现在书籍末尾发生事件的系统性过度强调。

更新时间: 2024-04-01 17:33:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01261v1

AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review

The management of modern IT systems poses unique challenges, necessitating scalability, reliability, and efficiency in handling extensive data streams. Traditional methods, reliant on manual tasks and rule-based approaches, prove inefficient for the substantial data volumes and alerts generated by IT systems. Artificial Intelligence for Operating Systems (AIOps) has emerged as a solution, leveraging advanced analytics like machine learning and big data to enhance incident management. AIOps detects and predicts incidents, identifies root causes, and automates healing actions, improving quality and reducing operational costs. However, despite its potential, the AIOps domain is still in its early stages, decentralized across multiple sectors, and lacking standardized conventions. Research and industrial contributions are distributed without consistent frameworks for data management, target problems, implementation details, requirements, and capabilities. This study proposes an AIOps terminology and taxonomy, establishing a structured incident management procedure and providing guidelines for constructing an AIOps framework. The research also categorizes contributions based on criteria such as incident management tasks, application areas, data sources, and technical approaches. The goal is to provide a comprehensive review of technical and research aspects in AIOps for incident management, aiming to structure knowledge, identify gaps, and establish a foundation for future developments in the field.

Updated: 2024-04-01 17:32:22

标题: AIOps解决方案用于故障管理：技术指南和全面文献综述

摘要: 现代IT系统管理面临着独特的挑战，需要处理大量数据流时具备可扩展性、可靠性和效率。传统方法依赖于手动任务和基于规则的方法，对IT系统生成的大量数据量和警报效率低下。操作系统人工智能（AIOps）已经出现作为解决方案，利用先进的分析技术如机器学习和大数据来增强事件管理。AIOps可以检测和预测事件，识别根本原因，并自动执行修复操作，提高质量并降低运营成本。然而，尽管具有潜力，AIOps领域仍处于早期阶段，分散在多个领域，缺乏标准化约定。研究和工业贡献分散，没有一致的数据管理框架、目标问题、实施细节、需求和能力。本研究提出了一个AIOps术语和分类法，建立一个结构化的事件管理程序，并提供构建AIOps框架的指导方针。研究还根据诸如事件管理任务、应用领域、数据源和技术方法等标准对贡献进行分类。目标是为事件管理中的AIOps的技术和研究方面提供全面的回顾，旨在构建知识结构，识别差距，并为该领域未来发展奠定基础。

更新时间: 2024-04-01 17:32:22

领域: cs.OS,cs.AI,cs.SE

下载: http://arxiv.org/abs/2404.01363v1

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

In the realm of geospatial analysis, the diversity of remote sensors, encompassing both optical and microwave technologies, offers a wealth of distinct observational capabilities. Recognizing this, we present msGFM, a multisensor geospatial foundation model that effectively unifies data from four key sensor modalities. This integration spans an expansive dataset of two million multisensor images. msGFM is uniquely adept at handling both paired and unpaired sensor data. For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach in masked image modeling, enabling the synthesis of joint representations from diverse sensors. msGFM, incorporating four remote sensors, upholds strong performance, forming a comprehensive model adaptable to various sensor types. msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks. These include scene classification, segmentation, cloud removal, and pan-sharpening. A key discovery of our research is that representations derived from natural images are not always compatible with the distinct characteristics of geospatial remote sensors, underscoring the limitations of existing representations in this field. Our work can serve as a guide for developing multisensor geospatial pretraining models, paving the way for more advanced geospatial capabilities.

Updated: 2024-04-01 17:30:56

标题: "利用多传感器地理空间基础模型连接远程传感器"

摘要: 在地理空间分析领域，遥感传感器的多样性，包括光学和微波技术，提供了丰富的独特观测能力。鉴于此，我们提出了msGFM，一种多传感器地理空间基础模型，有效地统一了来自四种关键传感器模式的数据。这种集成涵盖了两百万个多传感器图像的庞大数据集。msGFM在处理成对和未配对的传感器数据方面表现出独特的能力。对于来自相同地理位置的数据，我们的模型采用了一种创新的跨传感器预训练方法，在掩膜图像建模中，实现了从不同传感器合成联合表示。msGFM整合了四种遥感传感器，表现出强大的性能，形成了一个适用于各种传感器类型的综合模型。msGFM在各种单传感器和多传感器下游任务中展示了增强的熟练度。这些任务包括场景分类、分割、云去除和泛光。我们研究的一个重要发现是，从自然图像中派生的表示并不总是与地理空间遥感传感器的独特特征相容，突显了该领域现有表示的局限性。我们的工作可以作为发展多传感器地理空间预训练模型的指南，为更先进的地理空间能力铺平道路。

更新时间: 2024-04-01 17:30:56

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01260v1

LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

Chemistry plays a crucial role in many domains, such as drug discovery and material science. While large language models (LLMs) such as GPT-4 exhibit remarkable capabilities on natural language processing tasks, existing research indicates that their performance on chemistry tasks is discouragingly low. In this paper, however, we demonstrate that our developed LLMs can achieve very strong results on a comprehensive set of chemistry tasks, outperforming the most advanced GPT-4 and Claude 3 Opus by a substantial margin. To accomplish this, we propose SMolInstruct, a large-scale, comprehensive, and high-quality dataset for instruction tuning. It contains 14 selected chemistry tasks and over three million samples, laying a solid foundation for training and evaluating LLMs for chemistry. Using SMolInstruct, we fine-tune a set of open-source LLMs, among which, we find that Mistral serves as the best base model for chemistry tasks. Our analysis further demonstrates the critical role of the proposed dataset in driving the performance improvements.

Updated: 2024-04-01 17:28:16

标题: LlaSMol: 利用大规模、全面、高质量的指导调整数据集推进化学领域的大型语言模型

摘要: 化学在许多领域中发挥着关键作用，如药物发现和材料科学。虽然大型语言模型（LLMs）如GPT-4在自然语言处理任务上展现出卓越的能力，但现有研究表明它们在化学任务上的表现令人沮丧。然而，在本文中，我们展示了我们开发的LLMs在一系列化学任务上能够取得非常强大的结果，远远超过最先进的GPT-4和Claude 3 Opus。为了实现这一目标，我们提出了SMolInstruct，一个大规模、全面且高质量的用于指导调整的数据集。它包含14个选择的化学任务和超过三百万个样本，为化学LLMs的训练和评估奠定了坚实的基础。利用SMolInstruct，我们对一组开源LLMs进行微调，其中我们发现Mistral是化学任务的最佳基础模型。我们的分析进一步证明了所提出的数据集在推动性能改进方面的关键作用。

更新时间: 2024-04-01 17:28:16

领域: cs.AI,cs.CE,cs.CL

下载: http://arxiv.org/abs/2402.09391v3

New logarithmic step size for stochastic gradient descent

In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an $O(\frac{1}{\sqrt{T}})$ convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the ~FashionMinst,~ CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by $0.9\%$ for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.

Updated: 2024-04-01 17:25:27

标题: 随机梯度下降的新对数步长

摘要: 在本文中，我们提出了一种新颖的热重启技术，使用了一种新的对数步长来实现随机梯度下降（SGD）方法。对于光滑和非凸函数，我们建立了SGD的$O(\frac{1}{\sqrt{T}})$收敛速度。我们进行了全面的实现，展示了新提出的步长在FashionMinst、CIFAR10和CIFAR100数据集上的效率。此外，我们将我们的结果与其他九种现有方法进行了比较，并证明当我们使用卷积神经网络（CNN）模型时，新的对数步长可以使CIFAR100数据集的测试准确度提高$0.9\%$。

更新时间: 2024-04-01 17:25:27

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2404.01257v1

Shape-Guided Diffusion with Inside-Outside Attention

We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion. Our training-free method uses an Inside-Outside Attention mechanism during the inversion and generation process to apply a shape constraint to the cross- and self-attention maps. Our mechanism designates which spatial region is the object (inside) vs. background (outside) then associates edits to the correct region. We demonstrate the efficacy of our method on the shape-guided editing task, where the model must replace an object according to a text prompt and object mask. We curate a new ShapePrompts benchmark derived from MS-COCO and achieve SOTA results in shape faithfulness without a degradation in text alignment or image realism according to both automatic metrics and annotator ratings. Our data and code will be made available at https://shape-guided-diffusion.github.io.

Updated: 2024-04-01 17:19:02

标题: 形状引导的内外关注扩散

摘要: 我们引入了精确的对象轮廓作为文本到图像扩散模型中一种新的用户控制形式，我们将其称为Shape-Guided Diffusion。我们的无需训练的方法在反向和生成过程中使用了Inside-Outside Attention机制，将形状约束应用于交叉和自注意力图。我们的机制指定了哪个空间区域是对象（内部）而背景（外部），然后将编辑与正确的区域关联起来。我们展示了我们的方法在形状引导编辑任务上的有效性，其中模型必须根据文本提示和对象掩模替换对象。我们从MS-COCO中提取了一个新的ShapePrompts基准，并在形状忠实度方面取得了SOTA结果，而根据自动指标和注释者评分，在文本对齐或图像逼真度方面并没有降级。我们的数据和代码将在https://shape-guided-diffusion.github.io上提供。

更新时间: 2024-04-01 17:19:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2212.00210v3

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key -- provided by the LLM to the verifier -- to enable controlling the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks -- one of which has been internally implemented at OpenAI -- and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.

Updated: 2024-04-01 17:03:41

标题: 大型语言模型水印的统计框架：中心、检测效率和最优规则

摘要: 自ChatGPT于2022年11月推出以来，将（几乎）不可察觉的统计信号嵌入到大型语言模型（LLMs）生成的文本中，也被称为水印技术，已被用作从人工撰写的文本中可证明地检测LLM生成文本的方法。在本文中，我们介绍了一个通用且灵活的框架，用于推理关于水印的统计效率并设计强大的检测规则。受水印检测的假设检验公式的启发，我们的框架首先通过选择文本的一个关键统计量和一个由LLM提供给验证者的秘密密钥来启用控制误报率（错误地将人工撰写的文本误判为LLM生成的文本）。接下来，该框架允许评估水印检测规则的能力，通过获得渐近虚警率（将LLM生成的文本错误地分类为人工撰写文本）的闭合形式表达来实现。我们的框架进一步将确定最佳检测规则的问题简化为解决极小化最大化优化程序。我们将此框架应用于两个代表性水印--其中一个已在OpenAI内部实施--并得到几项可以指导实施水印技术实践的发现。特别是，我们在我们的框架下推导出了这些水印的最佳检测规则。通过数值实验，这些理论推导的检测规则被证明具有竞争力，有时比现有的检测方法具有更高的检测能力。

更新时间: 2024-04-01 17:03:41

领域: math.ST,cs.CL,cs.CR,cs.LG,stat.ML,stat.TH

下载: http://arxiv.org/abs/2404.01245v1

SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of Lumbar Spine MRI

Intervertebral disc disease, a prevalent ailment, frequently leads to intermittent or persistent low back pain, and diagnosing and assessing of this disease rely on accurate measurement of vertebral bone and intervertebral disc geometries from lumbar MR images. Deep neural network (DNN) models may assist clinicians with more efficient image segmentation of individual instances (disks and vertebrae) of the lumbar spine in an automated way, which is termed as instance image segmentation. In this work, we proposed SymTC, an innovative lumbar spine MR image segmentation model that combines the strengths of Transformer and Convolutional Neural Network (CNN). Specifically, we designed a parallel dual-path architecture to merge CNN layers and Transformer layers, and we integrated a novel position embedding into the self-attention module of Transformer, enhancing the utilization of positional information for more accurate segmentation. To further improves model performance, we introduced a new data augmentation technique to create synthetic yet realistic MR image dataset, named SSMSpine, which is made publicly available. We evaluated our SymTC and the other 15 existing image segmentation models on our private in-house dataset and the public SSMSpine dataset, using two metrics, Dice Similarity Coefficient and 95% Hausdorff Distance. The results show that our SymTC has the best performance for segmenting vertebral bones and intervertebral discs in lumbar spine MR images. The SymTC code and SSMSpine dataset are available at https://github.com/jiasongchen/SymTC.

Updated: 2024-04-01 17:03:08

标题: SymTC：一种共生的变压器-CNN网络，用于腰椎MRI实例分割

摘要: 腰椎间盘疾病是一种常见疾病，经常导致间歇性或持续性的腰痛，对该疾病的诊断和评估依赖于从腰椎MR图像中准确测量椎体骨和腰椎间盘的几何形状。深度神经网络（DNN）模型可以帮助临床医生更有效地自动分割腰椎脊柱中的个别实例（椎间盘和椎体）的图像，这被称为实例图像分割。在这项工作中，我们提出了SymTC，一种创新的腰椎MR图像分割模型，结合了Transformer和卷积神经网络（CNN）的优势。具体而言，我们设计了一个并行的双路径架构，将CNN层和Transformer层合并，我们在Transformer的自注意模块中整合了一种新的位置嵌入，增强了位置信息的利用以进行更准确的分割。为了进一步提高模型性能，我们引入了一种新的数据增强技术，创建了一个合成但真实的MR图像数据集，命名为SSMSpine，并向公众提供。我们在我们的内部私人数据集和公共SSMSpine数据集上评估了我们的SymTC和其他15种现有图像分割模型，使用两个指标，Dice相似系数和95％豪斯多夫距离。结果显示，我们的SymTC在分割腰椎MR图像中的椎骨和腰椎间盘方面表现最佳。SymTC代码和SSMSpine数据集可在https://github.com/jiasongchen/SymTC 上获得。

更新时间: 2024-04-01 17:03:08

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.09627v4

Multimodal Representation Learning by Alternating Unimodal Adaptation

Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant than others during multimodal learning, resulting in suboptimal performance. To address this challenge, we propose MLA (Multimodal Learning with Alternating Unimodal Adaptation). MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process, thereby minimizing interference between modalities. Simultaneously, it captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities. This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information. During the inference phase, MLA utilizes a test-time uncertainty-based model fusion mechanism to integrate multimodal information. Extensive experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities. These experiments demonstrate the superiority of MLA over competing prior approaches. Our code is available at https://github.com/Cecile-hi/Multimodal-Learning-with-Alternating-Unimodal-Adaptation.

Updated: 2024-04-01 16:56:13

标题: 多模态表示学习：交替单模态适应

摘要: 多模态学习，将来自不同感觉模式的数据整合在一起，在人工智能中发挥着关键作用。然而，现有的多模态学习方法往往在一些模态在多模态学习过程中比其他模态更占主导地位的挑战中遇到困难，导致性能不佳。为了解决这一挑战，我们提出了MLA（具有交替单模态适应的多模态学习）。MLA通过将传统的联合多模态学习过程转化为交替单模态学习过程来重新构建，从而最大程度地减少模态之间的干扰。同时，通过一个共享头部捕获跨模态交互，并在不同模态之间进行持续优化。这一优化过程由梯度修改机制控制，以防止共享头部丢失先前获得的信息。在推理阶段，MLA利用基于测试时不确定性的模型融合机制来整合多模态信息。我们在涵盖完整模态和缺失模态情景的五个不同数据集上进行了大量实验。这些实验表明了MLA相对于竞争先前方法的优越性。我们的代码可在https://github.com/Cecile-hi/Multimodal-Learning-with-Alternating-Unimodal-Adaptation 获取。

更新时间: 2024-04-01 16:56:13

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2311.10707v2

Optimal Ridge Regularization for Out-of-Distribution Prediction

We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting. For example, a negative regularization level can be optimal under covariate shift or regression shift, even when the training features are isotropic or the design is underparameterized. Furthermore, we prove that the optimally-tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting and when optimizing over negative regularization levels. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization levels.

Updated: 2024-04-01 16:51:19

标题: 超出分布预测的最佳岭回归正则化

摘要: 我们研究了针对分布偏离的预测的最优岭正则化和最优岭风险的行为。在测试分布与训练分布任意偏离的情况下，我们建立了确定最优正则化水平符号的一般条件，这些条件捕捉了训练和测试数据中协变量和回归偏移之间的对齐，并揭示了与分布内环境相比的明显差异。例如，在协变量偏移或回归偏移下，即使训练特征是各向同性的或设计是欠参数化的，负的正则化水平也可能是最优的。此外，我们证明在数据长宽比方面，即使在分布偏离的情况下以及在优化负的正则化水平时，优化调整的风险也是单调的。总的来说，我们的结果对于训练或测试分布不做任何建模假设，除了矩边界，并允许任意偏移和最广泛范围的（负的）正则化水平。

更新时间: 2024-04-01 16:51:19

领域: math.ST,cs.LG,stat.ML,stat.TH

下载: http://arxiv.org/abs/2404.01233v1

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.

Updated: 2024-04-01 16:50:54

标题: 隐私后门：通过污染预训练模型增强成员推断

摘要: 在生产特定应用程序模型时，通常会使用小型定制数据集对大型预训练模型进行微调。网络上广泛可用的基础模型检查点存在相当大的风险，包括容易受到后门攻击的影响。本文揭示了一种新的漏洞：隐私后门攻击。这种黑盒隐私攻击旨在放大通过微调模型时出现的隐私泄漏问题：当受害者微调了一个带有后门的模型时，他们的训练数据泄露率会比微调典型模型时高得多。我们在各种数据集和模型上进行了广泛实验，包括视觉语言模型（CLIP）和大型语言模型，展示了这种攻击的广泛适用性和有效性。此外，我们进行了多项消融研究，使用不同的微调方法和推理策略，全面分析这种新威胁。我们的发现突显了机器学习社区内的一个关键隐私问题，并呼吁重新评估在使用开源预训练模型时的安全协议。

更新时间: 2024-04-01 16:50:54

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.01231v1

Introducing an ensemble method for the early detection of Alzheimer's disease through the analysis of PET scan images

Alzheimer's disease is a progressive neurodegenerative disorder that primarily affects cognitive functions such as memory, thinking, and behavior. In this disease, there is a critical phase, mild cognitive impairment, that is really important to be diagnosed early since some patients with progressive MCI will develop the disease. This study delves into the challenging task of classifying Alzheimer's disease into four distinct groups: control normal (CN), progressive mild cognitive impairment (pMCI), stable mild cognitive impairment (sMCI), and Alzheimer's disease (AD). This classification is based on a thorough examination of PET scan images obtained from the ADNI dataset, which provides a thorough understanding of the disease's progression. Several deep-learning and traditional machine-learning models have been used to detect Alzheimer's disease. In this paper, three deep-learning models, namely VGG16 and AlexNet, and a custom Convolutional neural network (CNN) with 8-fold cross-validation have been used for classification. Finally, an ensemble technique is used to improve the overall result of these models. The results show that using deep-learning models to tell the difference between MCI patients gives an overall average accuracy of 93.13% and an AUC of 94.4%.

Updated: 2024-04-01 16:37:08

标题: 引入一种集成方法用于通过PET扫描图像分析早期检测阿尔茨海默病

摘要: 阿尔茨海默病是一种逐渐发展的神经退行性疾病，主要影响认知功能，如记忆、思维和行为。在这种疾病中，有一个关键阶段，轻度认知障碍，对早期诊断非常重要，因为一些患有逐渐发展性MCI的患者将发展成病。这项研究探讨了将阿尔茨海默病分类为四个不同组：对照正常（CN）、逐渐发展性轻度认知障碍（pMCI）、稳定轻度认知障碍（sMCI）和阿尔茨海默病（AD）的挑战性任务。该分类基于对从ADNI数据集获得的PET扫描图像进行彻底检查，从而深入了解疾病的进展。已经使用了几种深度学习和传统机器学习模型来检测阿尔茨海默病。在本文中，使用了三种深度学习模型，即VGG16和AlexNet，以及一个具有8倍交叉验证的自定义卷积神经网络（CNN）用于分类。最后，使用集成技术来改善这些模型的整体结果。结果显示，使用深度学习模型来区分MCI患者的总体平均准确率为93.13％，AUC为94.4％。

更新时间: 2024-04-01 16:37:08

领域: eess.SP,cs.AI,cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2403.15443v2

Collaborative Pareto Set Learning in Multiple Multi-Objective Optimization Problems

Pareto Set Learning (PSL) is an emerging research area in multi-objective optimization, focusing on training neural networks to learn the mapping from preference vectors to Pareto optimal solutions. However, existing PSL methods are limited to addressing a single Multi-objective Optimization Problem (MOP) at a time. When faced with multiple MOPs, this limitation not only leads to significant inefficiencies but also fails to exploit the potential synergies across varying MOPs. In this paper, we propose a Collaborative Pareto Set Learning (CoPSL) framework, which simultaneously learns the Pareto sets of multiple MOPs in a collaborative manner. CoPSL employs an architecture consisting of shared and MOP-specific layers, where shared layers aim to capture common relationships among MOPs collaboratively, and MOP-specific layers process these relationships to generate solution sets for each MOP. This collaborative approach enables CoPSL to efficiently learn the Pareto sets of multiple MOPs in a single run while leveraging the relationships among various MOPs. To further understand these relationships, we experimentally demonstrate that there exist shareable representations among MOPs. Leveraging these collaboratively shared representations can effectively improve the capability to approximate Pareto sets. Extensive experiments underscore the superior efficiency and robustness of CoPSL in approximating Pareto sets compared to state-of-the-art approaches on a variety of synthetic and real-world MOPs. Code is available at https://github.com/ckshang/CoPSL.

Updated: 2024-04-01 16:31:06

标题: 多目标优化问题中的多个多目标协作帕累托集学习

摘要: Pareto Set Learning (PSL)是多目标优化中一个新兴的研究领域，专注于训练神经网络学习从偏好向量到帕累托最优解的映射。然而，现有的PSL方法仅限于一次处理单个多目标优化问题（MOP）。面对多个MOP时，这种限制不仅导致显著的低效，而且未能充分利用不同MOP之间的潜在协同作用。在本文中，我们提出了一个协作帕累托集学习（CoPSL）框架，以协作方式同时学习多个MOP的帕累托集。CoPSL采用了一个由共享层和特定于MOP的层组成的架构，共享层旨在协作地捕捉MOP之间的共同关系，而特定于MOP的层处理这些关系以生成每个MOP的解集。这种协作方法使得CoPSL能够在一次运行中高效地学习多个MOP的帕累托集，同时利用各种MOP之间的关系。为了进一步了解这些关系，我们通过实验证明，MOP之间存在可共享的表示。利用这些共享表示能够有效地提高逼近帕累托集的能力。大量实验强调了CoPSL在逼近帕累托集方面相对于最先进方法的卓越效率和稳健性，涵盖了各种合成和真实世界的MOP。代码可在https://github.com/ckshang/CoPSL获得。

更新时间: 2024-04-01 16:31:06

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2404.01224v1

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. Project website: https://feature-splatting.github.io/

Updated: 2024-04-01 16:31:04

标题: 特征点插值：基于语言驱动的基于物理的场景合成和编辑

摘要: 使用3D高斯原语表示场景在建模静态和动态3D场景外观方面取得了出色的结果。然而，许多图形应用程序需要能够操作物体的外观和物理特性。我们引入了Feature Splatting，这是一种将基于物理的动态场景合成与通过自然语言为基础的视觉语言基础模型中的丰富语义统一起来的方法。我们的第一个贡献是一种将高质量的以对象为中心的视觉语言特征提炼为3D高斯函数的方法，这使得可以使用文本查询进行半自动场景分解。我们的第二个贡献是一种通过基于粒子的模拟器从静态场景中合成基于物理的动态的方法，其中材料属性通过文本查询自动分配。我们消融了在此流程中使用的关键技术，以说明使用携带特征的3D高斯函数作为外观、几何、材料属性和基于自然语言的语义的统一格式所面临的挑战和机会。项目网站：https://feature-splatting.github.io/

更新时间: 2024-04-01 16:31:04

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2404.01223v1

Entity-Centric Reinforcement Learning for Object Manipulation from Pixels

Manipulating objects is a hallmark of human intelligence, and an important task in domains such as robotics. In principle, Reinforcement Learning (RL) offers a general approach to learn object manipulation. In practice, however, domains with more than a few objects are difficult for RL agents due to the curse of dimensionality, especially when learning from raw image observations. In this work we propose a structured approach for visual RL that is suitable for representing multiple objects and their interaction, and use it to learn goal-conditioned manipulation of several objects. Key to our method is the ability to handle goals with dependencies between the objects (e.g., moving objects in a certain order). We further relate our architecture to the generalization capability of the trained agent, based on a theoretical result for compositional generalization, and demonstrate agents that learn with 3 objects but generalize to similar tasks with over 10 objects. Videos and code are available on the project website: https://sites.google.com/view/entity-centric-rl

Updated: 2024-04-01 16:25:08

标题: 基于实体中心的强化学习从像素级别进行对象操作

摘要: 操作物体是人类智能的一个标志，也是领域如机器人技术中的一个重要任务。原则上，强化学习（RL）提供了一种学习物体操作的通用方法。然而，在实践中，由于维度的诅咒，特别是在从原始图像观察中学习时，具有多个物体的领域对RL代理来说是困难的。在这项工作中，我们提出了一种适用于表示多个物体及其相互作用的视觉RL的结构化方法，并将其用于学习对多个物体进行目标导向的操作。我们方法的关键在于能够处理具有对象之间依赖关系的目标（例如，按特定顺序移动对象）。我们进一步将我们的架构与经过训练的代理的泛化能力联系起来，基于组合泛化的理论结果，并展示学习使用3个对象的代理，但可以泛化到具有超过10个对象的类似任务。视频和代码可在项目网站上找到：https://sites.google.com/view/entity-centric-rl

更新时间: 2024-04-01 16:25:08

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01220v1

AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph

Cognitive research indicates that abstraction ability is essential in human intelligence, which remains under-explored in language models. In this paper, we present AbsPyramid, a unified entailment graph of 221K textual descriptions of abstraction knowledge. While existing resources only touch nouns or verbs within simplified events or specific domains, AbsPyramid collects abstract knowledge for three components of diverse events to comprehensively evaluate the abstraction ability of language models in the open domain. Experimental results demonstrate that current LLMs face challenges comprehending abstraction knowledge in zero-shot and few-shot settings. By training on our rich abstraction knowledge, we find LLMs can acquire basic abstraction abilities and generalize to unseen events. In the meantime, we empirically show that our benchmark is comprehensive to enhance LLMs across two previous abstraction tasks.

Updated: 2024-04-01 16:24:24

标题: AbsPyramid：通过统一蕴涵图对语言模型的抽象能力进行基准测试

摘要: 认知研究表明，抽象能力在人类智力中至关重要，但在语言模型中仍未得到充分探究。在本文中，我们提出了AbsPyramid，这是一个包含22.1万个文本描述的统一蕴涵图，涵盖了抽象知识。现有资源仅涉及简化事件或特定领域中的名词或动词，而AbsPyramid则收集了不同事件的三个组成部分的抽象知识，从而全面评估语言模型在开放领域中的抽象能力。实验结果表明，当前的语言模型在零样本和少样本设置下面对理解抽象知识的挑战。通过在我们丰富的抽象知识上进行训练，我们发现语言模型可以获得基本的抽象能力，并推广到未见过的事件。同时，我们凭经验证明，我们的基准测试可以全面提升语言模型在之前的两个抽象任务中的表现。

更新时间: 2024-04-01 16:24:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.09174v3

Towards System Modelling to Support Diseases Data Extraction from the Electronic Health Records for Physicians Research Activities

The use of Electronic Health Records (EHRs) has increased dramatically in the past 15 years, as, it is considered an important source of managing data od patients. The EHRs are primary sources of disease diagnosis and demographic data of patients worldwide. Therefore, the data can be utilized for secondary tasks such as research. This paper aims to make such data usable for research activities such as monitoring disease statistics for a specific population. As a result, the researchers can detect the disease causes for the behavior and lifestyle of the target group. One of the limitations of EHRs systems is that the data is not available in the standard format but in various forms. Therefore, it is required to first convert the names of the diseases and demographics data into one standardized form to make it usable for research activities. There is a large amount of EHRs available, and solving the standardizing issues requires some optimized techniques. We used a first-hand EHR dataset extracted from EHR systems. Our application uploads the dataset from the EHRs and converts it to the ICD-10 coding system to solve the standardization problem. So, we first apply the steps of pre-processing, annotation, and transforming the data to convert it into the standard form. The data pre-processing is applied to normalize demographic formats. In the annotation step, a machine learning model is used to recognize the diseases from the text. Furthermore, the transforming step converts the disease name to the ICD-10 coding format. The model was evaluated manually by comparing its performance in terms of disease recognition with an available dictionary-based system (MetaMap). The accuracy of the proposed machine learning model is 81%, that outperformed MetaMap accuracy of 67%. This paper contributed to system modelling for EHR data extraction to support research activities.

Updated: 2024-04-01 16:18:40

标题: 朝向系统建模以支持从电子健康记录中提取疾病数据以支持医生研究活动

摘要: 在过去的15年中，电子健康记录（EHRs）的使用量大幅增加，因为它被认为是管理患者数据的重要来源。EHRs是全球疾病诊断和患者人口统计数据的主要来源。因此，这些数据可以用于研究等二级任务。本文旨在使这些数据可用于监测特定人群的疾病统计数据等研究活动。因此，研究人员可以检测目标群体的行为和生活方式导致的疾病原因。EHRs系统的一个限制是数据不以标准格式而是以各种形式存在。因此，需要首先将疾病名称和人口统计数据转换为一个标准化形式，以便用于研究活动。大量的EHRs可用，解决标准化问题需要一些优化技术。我们使用了从EHR系统中提取的第一手EHR数据集。我们的应用程序上传了来自EHRs的数据集，并将其转换为ICD-10编码系统以解决标准化问题。因此，我们首先应用预处理、注释和数据转换步骤将其转换为标准形式。数据预处理用于规范人口统计格式。在注释步骤中，使用机器学习模型从文本中识别疾病。此外，转换步骤将疾病名称转换为ICD-10编码格式。通过将其在疾病识别方面的性能与现有基于字典的系统（MetaMap）进行比较，手动评估了该模型。提出的机器学习模型的准确性为81%，优于MetaMap的67%准确性。本文为支持研究活动的EHR数据提取系统建模做出了贡献。

更新时间: 2024-04-01 16:18:40

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2404.01218v1

Incorporating Domain Differential Equations into Graph Convolutional Networks to Lower Generalization Discrepancy

Ensuring both accuracy and robustness in time series prediction is critical to many applications, ranging from urban planning to pandemic management. With sufficient training data where all spatiotemporal patterns are well-represented, existing deep-learning models can make reasonably accurate predictions. However, existing methods fail when the training data are drawn from different circumstances (e.g., traffic patterns on regular days) compared to test data (e.g., traffic patterns after a natural disaster). Such challenges are usually classified under domain generalization. In this work, we show that one way to address this challenge in the context of spatiotemporal prediction is by incorporating domain differential equations into Graph Convolutional Networks (GCNs). We theoretically derive conditions where GCNs incorporating such domain differential equations are robust to mismatched training and testing data compared to baseline domain agnostic models. To support our theory, we propose two domain-differential-equation-informed networks called Reaction-Diffusion Graph Convolutional Network (RDGCN), which incorporates differential equations for traffic speed evolution, and Susceptible-Infectious-Recovered Graph Convolutional Network (SIRGCN), which incorporates a disease propagation model. Both RDGCN and SIRGCN are based on reliable and interpretable domain differential equations that allow the models to generalize to unseen patterns. We experimentally show that RDGCN and SIRGCN are more robust with mismatched testing data than the state-of-the-art deep learning methods.

Updated: 2024-04-01 16:17:11

标题: 将域微分方程融入图卷积网络以降低泛化差异

摘要: 确保时间序列预测的准确性和稳健性对许多应用至关重要，从城市规划到疫情管理。在具有充分训练数据的情况下，其中所有时空模式都得到充分代表的情况下，现有的深度学习模型可以进行相当准确的预测。然而，在训练数据来自不同情况（例如，常规交通模式）与测试数据（例如，自然灾害后的交通模式）不同时，现有方法会失败。这些挑战通常被归类为域泛化问题。在这项工作中，我们展示了在时空预测背景下解决这一挑战的一种方法是将域微分方程纳入图卷积网络（GCNs）中。我们在理论上推导了GCN纳入这种域微分方程时对不匹配的训练和测试数据的鲁棒性条件，相比基线域不可知模型。为了支持我们的理论，我们提出了两种基于域微分方程的网络，分别称为反应扩散图卷积网络（RDGCN），它纳入了交通速度演变的微分方程，以及易感-感染-康复图卷积网络（SIRGCN），它纳入了一种疾病传播模型。RDGCN和SIRGCN都基于可靠且可解释的域微分方程，使模型能够泛化到未见过的模式。我们通过实验证明，RDGCN和SIRGCN在具有不匹配测试数据时比最先进的深度学习方法更加稳健。

更新时间: 2024-04-01 16:17:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.01217v1

Novel Node Category Detection Under Subpopulation Shift

In real-world graph data, distribution shifts can manifest in various ways, such as the emergence of new categories and changes in the relative proportions of existing categories. It is often important to detect nodes of novel categories under such distribution shifts for safety or insight discovery purposes. We introduce a new approach, Recall-Constrained Optimization with Selective Link Prediction (RECO-SLIP), to detect nodes belonging to novel categories in attributed graphs under subpopulation shifts. By integrating a recall-constrained learning framework with a sample-efficient link prediction mechanism, RECO-SLIP addresses the dual challenges of resilience against subpopulation shifts and the effective exploitation of graph structure. Our extensive empirical evaluation across multiple graph datasets demonstrates the superior performance of RECO-SLIP over existing methods.

Updated: 2024-04-01 16:16:19

标题: 新型节点类别检测在亚群移位下

摘要: 在现实世界的图数据中，分布变化可以以各种方式表现出来，例如新类别的出现和现有类别相对比例的改变。在这种分布变化下，检测属于新类别的节点通常对于安全或洞察发现至关重要。我们引入了一种新方法，召回约束优化与选择性链接预测（RECO-SLIP），用于在属性图中检测在子群体变化下属于新类别的节点。通过将召回约束学习框架与高效的样本链接预测机制相结合，RECO-SLIP解决了对抗子群体变化和有效利用图结构的双重挑战。我们在多个图数据集上进行了广泛的实证评估，结果表明RECO-SLIP相对于现有方法具有更好的性能。

更新时间: 2024-04-01 16:16:19

领域: cs.LG,cs.SI,stat.ML

下载: http://arxiv.org/abs/2404.01216v1

Causal Bayesian Optimization via Exogenous Distribution Learning

Maximizing a target variable as an operational objective in a structured causal model is an important problem. Existing Causal Bayesian Optimization (CBO) methods either rely on hard interventions that alter the causal structure to maximize the reward; or introduce action nodes to endogenous variables so that the data generation mechanisms are adjusted to achieve the objective. In this paper, a novel method is introduced to learn the distribution of exogenous variables, which is typically ignored or marginalized through expectation by existing methods. Exogenous distribution learning improves the approximation accuracy of structured causal models in a surrogate model that is usually trained with limited observational data. Moreover, the learned exogenous distribution extends existing CBO to general causal schemes beyond Additive Noise Models (ANM). The recovery of exogenous variables allows us to use a more flexible prior for noise or unobserved hidden variables. A new CBO method is developed by leveraging the learned exogenous distribution. Experiments on different datasets and applications show the benefits of our proposed method.

Updated: 2024-04-01 16:13:23

标题: 因果贝叶斯优化通过外源分布学习

摘要: 将目标变量最大化作为结构因果模型中的操作目标是一个重要问题。现有的因果贝叶斯优化（CBO）方法要么依赖于改变因果结构以最大化奖励的硬干预；要么引入行动节点到内生变量，以调整数据生成机制以实现目标。本文介绍了一种新的方法，用于学习外生变量的分布，这在现有方法中通常被忽视或通过期望边缘化。外生分布学习提高了通常通过有限观测数据训练的替代模型中结构化因果模型的逼真度。此外，学习的外生分布将现有CBO扩展到超出加性噪声模型（ANM）的一般因果方案。外生变量的恢复使我们能够使用更灵活的先验用于噪声或未观察到的隐藏变量。通过利用学习的外生分布开发了一种新的CBO方法。在不同数据集和应用上的实验证明了我们提出的方法的好处。

更新时间: 2024-04-01 16:13:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.02277v3

DeepEdit: Knowledge Editing as Decoding with Constraints

We propose a new perspective of knowledge editing (KE) for large language models (LLMs) that treats it as a constrained decoding problem. We design decoding constraints to regulate LLMs, ensuring coherence between reasoning steps when incorporating new knowledge. To enforce these constraints, we utilize a depth-first search to adaptively substitute new knowledge for the LLMs' original reasoning steps, greedily seeking the optimal path of multi-hop reasoning with new knowledge. From this vantage, we propose DEEPEDIT: Depth-first Search-based Decoding for Knowledge Editing. DEEPEDIT improves the KE of LLMs by enhancing the conciseness, coherence, pertinence, and receptiveness of reasoning with new knowledge. DEEPEDIT is flexibly applicable to any black-box LLM without requiring access to model parameters or token-wise distributions. In addition to DEEPEDIT, we propose two new KE benchmarks: MQuAKE-2002 and MQuAKE-hard, which are designed to provide more precise and challenging assessments of KE approaches. Qualitatively, DEEPEDIT enables LLMs to produce more succinct reasoning outputs in accordance with new knowledge. Quantitatively, it yields significant improvements on multiple KE benchmarks.

Updated: 2024-04-01 16:12:50

标题: DeepEdit：知识编辑作为带有约束的解码

摘要: 我们提出了一种新的知识编辑（KE）的观点，针对大型语言模型（LLMs），将其视为受限解码问题。我们设计解码约束来调节LLMs，确保在整合新知识时推理步骤之间的连贯性。为了强制执行这些约束，我们利用深度优先搜索来自适应地替换LLMs原始的推理步骤，贪婪地寻求多跳推理与新知识的最佳路径。基于这一观点，我们提出了DEEPEDIT：基于深度优先搜索的知识编辑解码。DEEPEDIT通过增强推理与新知识的简洁性，连贯性，相关性和接受性，改进了LLMs的KE。DEEPEDIT灵活适用于任何黑盒LLMs，无需访问模型参数或令牌分布。除了DEEPEDIT，我们提出了两个新的KE基准：MQuAKE-2002和MQuAKE-hard，旨在提供更精确和具有挑战性的KE方法评估。从质量上看，DEEPEDIT使LLMs能够根据新知识产生更简洁的推理输出。从数量上看，它在多个KE基准上取得了显著改进。

更新时间: 2024-04-01 16:12:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.10471v2

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging.

Updated: 2024-04-01 16:11:58

标题: 多模态细胞分割挑战：走向通用解决方案

摘要: 细胞分割是显微镜图像中定量单细胞分析的关键步骤。现有的细胞分割方法通常针对特定的模态或需要手动干预以指定不同实验设置中的超参数。在这里，我们提出了一个多模态细胞分割基准，包括来自50多个不同生物实验的超过1500张标记图像。排名靠前的参与者开发了基于Transformer的深度学习算法，不仅超越了现有的方法，而且可以应用于不同成像平台和组织类型的多样显微镜图像，无需手动参数调整。这一基准和改进的算法为显微镜成像中更准确、更灵活的细胞分析提供了有希望的途径。

更新时间: 2024-04-01 16:11:58

领域: eess.IV,cs.CV,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2308.05864v2

Machine Unlearning for Traditional Models and Large Language Models: A Short Survey

With the implementation of personal data privacy regulations, the field of machine learning (ML) faces the challenge of the "right to be forgotten". Machine unlearning has emerged to address this issue, aiming to delete data and reduce its impact on models according to user requests. Despite the widespread interest in machine unlearning, comprehensive surveys on its latest advancements, especially in the field of Large Language Models (LLMs) is lacking. This survey aims to fill this gap by providing an in-depth exploration of machine unlearning, including the definition, classification and evaluation criteria, as well as challenges in different environments and their solutions. Specifically, this paper categorizes and investigates unlearning on both traditional models and LLMs, and proposes methods for evaluating the effectiveness and efficiency of unlearning, and standards for performance measurement. This paper reveals the limitations of current unlearning techniques and emphasizes the importance of a comprehensive unlearning evaluation to avoid arbitrary forgetting. This survey not only summarizes the key concepts of unlearning technology but also points out its prominent issues and feasible directions for future research, providing valuable guidance for scholars in the field.

Updated: 2024-04-01 16:08:18

标题: 传统模型和大型语言模型的机器取消学习：简要调查

摘要: 随着个人数据隐私法规的实施，机器学习（ML）领域面临着“被遗忘的权利”的挑战。机器反学习已经出现以解决这一问题，旨在根据用户请求删除数据并减少其对模型的影响。尽管对机器反学习存在广泛兴趣，但对其最新进展，特别是在大型语言模型（LLMs）领域的综合调查仍然缺乏。本调查旨在填补这一空白，通过深入探讨机器反学习，包括定义、分类和评估标准，以及不同环境中的挑战及其解决方案。具体来说，本文对传统模型和LLMs上的反学习进行了分类和调查，并提出了评估反学习的有效性和效率的方法，以及性能测量的标准。本文揭示了当前反学习技术的局限性，并强调全面反学习评估的重要性，以避免随意遗忘。这项调查不仅总结了反学习技术的关键概念，还指出了其突出问题和未来研究的可行方向，为该领域的学者提供了有价值的指导。

更新时间: 2024-04-01 16:08:18

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2404.01206v1

The twin peaks of learning neural networks

Recent works demonstrated the existence of a double-descent phenomenon for the generalization error of neural networks, where highly overparameterized models escape overfitting and achieve good test performance, at odds with the standard bias-variance trade-off described by statistical learning theory. In the present work, we explore a link between this phenomenon and the increase of complexity and sensitivity of the function represented by neural networks. In particular, we study the Boolean mean dimension (BMD), a metric developed in the context of Boolean function analysis. Focusing on a simple teacher-student setting for the random feature model, we derive a theoretical analysis based on the replica method that yields an interpretable expression for the BMD, in the high dimensional regime where the number of data points, the number of features, and the input size grow to infinity. We find that, as the degree of overparameterization of the network is increased, the BMD reaches an evident peak at the interpolation threshold, in correspondence with the generalization error peak, and then slowly approaches a low asymptotic value. The same phenomenology is then traced in numerical experiments with different model classes and training setups. Moreover, we find empirically that adversarially initialized models tend to show higher BMD values, and that models that are more robust to adversarial attacks exhibit a lower BMD.

Updated: 2024-04-01 16:07:28

标题: 学习神经网络的双峰

摘要: 最近的研究表明，神经网络的泛化误差存在双下降现象，高度过度参数化的模型逃避过拟合并取得良好的测试性能，与统计学习理论描述的标准偏差-方差权衡相悖。在本研究中，我们探讨了这一现象与神经网络表示的函数的复杂性和敏感性增加之间的联系。特别地，我们研究了布尔均值维度（BMD），这是在布尔函数分析领域发展起来的一个度量。在随机特征模型的简单师生设置中，我们基于副本方法进行了理论分析，得出了一个可解释的BMD表达式，在数据点数量、特征数量和输入尺寸增长到无穷大的高维度范围内。我们发现，随着网络过度参数化程度的增加，BMD在插值阈值处明显达到峰值，与泛化误差峰值对应，然后缓慢接近一个较低的渐近值。在不同模型类和训练设置的数值实验中，我们发现相同的现象。此外，我们在经验上发现，对抗性初始化的模型往往显示出较高的BMD值，而对抗性攻击更加稳健的模型表现出较低的BMD。

更新时间: 2024-04-01 16:07:28

领域: cs.LG,cond-mat.dis-nn,math.PR,math.ST,stat.TH

下载: http://arxiv.org/abs/2401.12610v2

Foundations of Cyber Resilience: The Confluence of Game, Control, and Learning Theories

Cyber resilience is a complementary concept to cybersecurity, focusing on the preparation, response, and recovery from cyber threats that are challenging to prevent. Organizations increasingly face such threats in an evolving cyber threat landscape. Understanding and establishing foundations for cyber resilience provide a quantitative and systematic approach to cyber risk assessment, mitigation policy evaluation, and risk-informed defense design. A systems-scientific view toward cyber risks provides holistic and system-level solutions. This chapter starts with a systemic view toward cyber risks and presents the confluence of game theory, control theory, and learning theories, which are three major pillars for the design of cyber resilience mechanisms to counteract increasingly sophisticated and evolving threats in our networks and organizations. Game and control theoretic methods provide a set of modeling frameworks to capture the strategic and dynamic interactions between defenders and attackers. Control and learning frameworks together provide a feedback-driven mechanism that enables autonomous and adaptive responses to threats. Game and learning frameworks offer a data-driven approach to proactively reason about adversarial behaviors and resilient strategies. The confluence of the three lays the theoretical foundations for the analysis and design of cyber resilience. This chapter presents various theoretical paradigms, including dynamic asymmetric games, moving horizon control, conjectural learning, and meta-learning, as recent advances at the intersection. This chapter concludes with future directions and discussions of the role of neurosymbolic learning and the synergy between foundation models and game models in cyber resilience.

Updated: 2024-04-01 16:02:21

标题: 网络弹性的基础：游戏、控制和学习理论的融合

摘要: 网络韧性是网络安全的一个补充概念，侧重于应对那些难以预防的网络威胁的准备、响应和恢复。组织在不断发展的网络威胁环境中面临着越来越多的挑战。理解和建立网络韧性的基础提供了一种量化和系统化的方法来评估网络风险、评估缓解政策以及基于风险的防御设计。对网络风险的系统科学观点提供了全面和系统级的解决方案。本章从网络风险的系统观开始，介绍了博弈论、控制论和学习理论这三大支柱的融合，这是设计网络韧性机制以应对网络和组织中日益复杂和不断发展的威胁的理论基础。博弈和控制理论方法提供了一套建模框架，捕捉了防御者和攻击者之间的战略和动态互动。控制和学习框架共同提供了一种反馈驱动机制，使得对威胁能够进行自主和自适应的响应。博弈和学习框架提供了一种基于数据的方法，可以积极推理对手的行为和韧性策略。这三者的融合奠定了网络韧性分析和设计的理论基础。本章介绍了各种理论范式，包括动态非对称博弈、移动地平线控制、推测学习和元学习，作为交叉领域的最新进展。本章最后讨论了未来的方向，并探讨了神经符号学习在网络韧性中的作用以及基础模型和博弈模型之间的协同作用。

更新时间: 2024-04-01 16:02:21

领域: eess.SY,cs.CR,cs.GT,cs.SY

下载: http://arxiv.org/abs/2404.01205v1

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes $\chi^2$-divergences as a special case. We prove that our algorithm finds an $\epsilon$-stationary point with a computational complexity of $\mathcal O(\epsilon^{-3k_*-5})$, where $k_*$ is the parameter of the Cressie-Read divergence. The numerical results indicate that our method outperforms existing methods.} Our method also applies to the smoothed conditional value at risk (CVaR) DRO.

Updated: 2024-04-01 15:56:58

标题: 大规模非凸随机约束分布鲁棒优化

摘要: 分布鲁棒优化（DRO）是训练对抗数据分布转移的鲁棒模型的强大框架。本文关注受限DRO，该框架具有对鲁棒性水平的明确描述。现有关于受限DRO的研究大多集中在凸损失函数上，并排除了具有非凸损失函数的实际且具有挑战性的情况，例如神经网络。本文开发了一种针对非凸受限DRO的随机算法及其性能分析。我们的随机算法在每次迭代中的计算复杂度与整体数据集大小无关，因此适用于大规模应用。我们关注的是一般的Cressie-Read家族散度定义的不确定性集，其中包括$\chi^2$-散度作为特例。我们证明了我们的算法在计算复杂度为$\mathcal O(\epsilon^{-3k_*-5})$的情况下找到了一个$\epsilon$-稳定点，其中$k_*$是Cressie-Read散度的参数。数值结果表明我们的方法优于现有方法。我们的方法也适用于平滑条件风险价值（CVaR）DRO。

更新时间: 2024-04-01 15:56:58

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.01200v1

Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem

We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An instance of this problem has $k$ arms, each of whose reward function is a concave and increasing function of the number of times that arm has been pulled so far. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then provide a randomized online algorithm that guarantees an $O(\sqrt{k})$ approximation factor, if it is told the maximum reward achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an extra $O(\log k)$ approximation factor, achieving an overall $O(\sqrt{k} \log k)$ approximation relative to optimal.

Updated: 2024-04-01 15:55:45

标题: 改进的多臂赌博问题的近似保证

摘要: 我们为改进多臂赌博问题提供了接近最紧密的上下界。该问题的一个实例有$k$个臂，每个臂的奖励函数是一个关于该臂被拉动的次数的凹函数和增函数。我们证明，对于任何随机在线算法，存在一个实例，它至少必须遭受相对于最优奖励的一个$\Omega(\sqrt{k})$的近似因子。然后我们提供一个随机在线算法，如果事先告知最优臂可实现的最大奖励，则保证一个$O(\sqrt{k})$的近似因子。然后我们展示如何去掉这种假设，以额外的$O(\log k)$的近似因子为代价，达到整体相对于最优的$O(\sqrt{k} \log k)$的近似。

更新时间: 2024-04-01 15:55:45

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2404.01198v1

Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

Learning world models can teach an agent how the world works in an unsupervised manner. Even though it can be viewed as a special case of sequence modeling, progress for scaling world models on robotic applications such as autonomous driving has been somewhat less rapid than scaling language models with Generative Pre-trained Transformers (GPT). We identify two reasons as major bottlenecks: dealing with complex and unstructured observation space, and having a scalable generative model. Consequently, we propose Copilot4D, a novel world modeling approach that first tokenizes sensor observations with VQVAE, then predicts the future via discrete diffusion. To efficiently decode and denoise tokens in parallel, we recast Masked Generative Image Transformer as discrete diffusion and enhance it with a few simple changes, resulting in notable improvement. When applied to learning world models on point cloud observations, Copilot4D reduces prior SOTA Chamfer distance by more than 65% for 1s prediction, and more than 50% for 3s prediction, across NuScenes, KITTI Odometry, and Argoverse2 datasets. Our results demonstrate that discrete diffusion on tokenized agent experience can unlock the power of GPT-like unsupervised learning for robotics.

Updated: 2024-04-01 15:41:50

标题: Copilot4D：通过离散扩散学习无监督的自动驾驶世界模型

摘要: 学习世界模型可以以无监督的方式教会代理人世界如何运作。尽管它可以被视为序列建模的一种特殊情况，但在自动驾驶等机器人应用中，扩展世界模型的进展相对于扩展语言模型如生成预训练变压器（GPT）来说略显缓慢。我们确定了两个主要瓶颈：处理复杂和无结构的观测空间，以及具有可扩展性的生成模型。因此，我们提出了Copilot4D，一种新颖的世界建模方法，首先使用VQVAE对传感器观测进行标记，然后通过离散扩散来预测未来。为了有效地并行解码和去噪标记，我们将掩蔽生成图像变压器重新构建为离散扩散，并通过一些简单的改变加以增强，结果表现出明显的改进。当应用于点云观测的学习世界模型时，Copilot4D将先前的SOTA Chamfer距离在1秒预测上减少了65%以上，在3秒预测上减少了50%以上，跨过NuScenes、KITTI Odometry和Argoverse2数据集。我们的结果表明，对标记化代理经验进行离散扩散可以释放出类似GPT的无监督学习在机器人领域的力量。

更新时间: 2024-04-01 15:41:50

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2311.01017v4

Efficient Motion Planning for Manipulators with Control Barrier Function-Induced Neural Controller

Sampling-based motion planning methods for manipulators in crowded environments often suffer from expensive collision checking and high sampling complexity, which make them difficult to use in real time. To address this issue, we propose a new generalizable control barrier function (CBF)-based steering controller to reduce the number of samples needed in a sampling-based motion planner RRT. Our method combines the strength of CBF for real-time collision-avoidance control and RRT for long-horizon motion planning, by using CBF-induced neural controller (CBF-INC) to generate control signals that steer the system towards sampled configurations by RRT. CBF-INC is learned as Neural Networks and has two variants handling different inputs, respectively: state (signed distance) input and point-cloud input from LiDAR. In the latter case, we also study two different settings: fully and partially observed environmental information. Compared to manually crafted CBF which suffers from over-approximating robot geometry, CBF-INC can balance safety and goal-reaching better without being over-conservative. Given state-based input, our neural CBF-induced neural controller-enhanced RRT (CBF-INC-RRT) can increase the success rate by 14% while reducing the number of nodes explored by 30%, compared with vanilla RRT on hard test cases. Given LiDAR input where vanilla RRT is not directly applicable, we demonstrate that our CBF-INC-RRT can improve the success rate by 10%, compared with planning with other steering controllers. Our project page with supplementary material is at https://mit-realm.github.io/CBF-INC-RRT-website/.

Updated: 2024-04-01 15:36:39

标题: 具有控制屏障功能诱导的神经控制器的机械臂高效运动规划

摘要: 在拥挤环境中，基于采样的机械臂运动规划方法通常受到昂贵的碰撞检测和高采样复杂性的困扰，这使得它们难以实时使用。为了解决这个问题，我们提出了一种新的通用的基于控制屏障函数（CBF）的转向控制器，以减少采样式运动规划器RRT中所需的样本数量。我们的方法结合了CBF在实时碰撞避免控制中的优势和RRT在长程规划中的优势，通过使用CBF引导的神经控制器（CBF-INC）生成控制信号，将系统转向RRT采样的配置。CBF-INC作为神经网络进行学习，有两种处理不同输入的变体：状态（有符号距离）输入和来自LiDAR的点云输入。在后一种情况下，我们还研究了两种不同的设置：完全和部分观测到的环境信息。与手工设计的CBF相比，CBF-INC不会过度逼近机器人几何形状，可以更好地平衡安全性和目标达成，并且不会过度保守。在给定基于状态的输入的情况下，我们的神经CBF引导的神经控制器增强的RRT（CBF-INC-RRT）可以将成功率提高14%，同时将探索的节点数量减少30%，与在困难测试案例上的普通RRT相比。在给定LiDAR输入的情况下，普通RRT不直接适用，我们展示了我们的CBF-INC-RRT可以将成功率提高10%，与使用其他转向控制器进行规划相比。我们的项目页面和补充材料位于https://mit-realm.github.io/CBF-INC-RRT-website/。

更新时间: 2024-04-01 15:36:39

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.01184v1

BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning

Data mixing methods play a crucial role in semi-supervised learning (SSL), but their application is unexplored in long-tailed semi-supervised learning (LTSSL). The primary reason is that the in-batch mixing manner fails to address class imbalance. Furthermore, existing LTSSL methods mainly focus on re-balancing data quantity but ignore class-wise uncertainty, which is also vital for class balance. For instance, some classes with sufficient samples might still exhibit high uncertainty due to indistinguishable features. To this end, this paper introduces the Balanced and Entropy-based Mix (BEM), a pioneering mixing approach to re-balance the class distribution of both data quantity and uncertainty. Specifically, we first propose a class balanced mix bank to store data of each class for mixing. This bank samples data based on the estimated quantity distribution, thus re-balancing data quantity. Then, we present an entropy-based learning approach to re-balance class-wise uncertainty, including entropy-based sampling strategy, entropy-based selection module, and entropy-based class balanced loss. Our BEM first leverages data mixing for improving LTSSL, and it can also serve as a complement to the existing re-balancing methods. Experimental results show that BEM significantly enhances various LTSSL frameworks and achieves state-of-the-art performances across multiple benchmarks.

Updated: 2024-04-01 15:31:04

标题: BEM：长尾半监督学习的平衡和基于熵的混合

摘要: 数据混合方法在半监督学习（SSL）中起着至关重要的作用，但它们在长尾半监督学习（LTSSL）中的应用尚未被探索。主要原因是批内混合方式无法解决类别不平衡问题。此外，现有的LTSSL方法主要关注重新平衡数据量，但忽略了类别间的不确定性，这对于类别平衡也是至关重要的。例如，一些样本数量充足的类别可能仍然由于特征不可区分而表现出较高的不确定性。为此，本文引入了基于平衡和熵的混合（BEM），这是一种创新的混合方法，可以重新平衡数据量和不确定性的类别分布。具体来说，我们首先提出了一个类别平衡混合库，用于存储每个类别的数据以供混合。该库根据估计的数量分布对数据进行采样，从而重新平衡数据量。然后，我们提出了一种基于熵的学习方法，以重新平衡类别间的不确定性，包括基于熵的采样策略、基于熵的选择模块和基于熵的类别平衡损失。我们的BEM首先利用数据混合来改进LTSSL，并可以作为现有重新平衡方法的补充。实验结果表明，BEM显著提高了各种LTSSL框架的性能，并在多个基准测试中取得了最先进的表现。

更新时间: 2024-04-01 15:31:04

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01179v1

Poisoning Decentralized Collaborative Recommender System and Its Countermeasures

To make room for privacy and efficiency, the deployment of many recommender systems is experiencing a shift from central servers to personal devices, where the federated recommender systems (FedRecs) and decentralized collaborative recommender systems (DecRecs) are arguably the two most representative paradigms. While both leverage knowledge (e.g., gradients) sharing to facilitate learning local models, FedRecs rely on a central server to coordinate the optimization process, yet in DecRecs, the knowledge sharing directly happens between clients. Knowledge sharing also opens a backdoor for model poisoning attacks, where adversaries disguise themselves as benign clients and disseminate polluted knowledge to achieve malicious goals like promoting an item's exposure rate. Although research on such poisoning attacks provides valuable insights into finding security loopholes and corresponding countermeasures, existing attacks mostly focus on FedRecs, and are either inapplicable or ineffective for DecRecs. Compared with FedRecs where the tampered information can be universally distributed to all clients once uploaded to the cloud, each adversary in DecRecs can only communicate with neighbor clients of a small size, confining its impact to a limited range. To fill the gap, we present a novel attack method named Poisoning with Adaptive Malicious Neighbors (PAMN). With item promotion in top-K recommendation as the attack objective, PAMN effectively boosts target items' ranks with several adversaries that emulate benign clients and transfers adaptively crafted gradients conditioned on each adversary's neighbors. Moreover, with the vulnerabilities of DecRecs uncovered, a dedicated defensive mechanism based on user-level gradient clipping with sparsified updating is proposed. Extensive experiments demonstrate the effectiveness of the poisoning attack and the robustness of our defensive mechanism.

Updated: 2024-04-01 15:30:02

标题: 中毒去中心化协作推荐系统及其对策

摘要: 为了为隐私和效率腾出空间，许多推荐系统的部署正在从中央服务器转移到个人设备，其中联邦推荐系统（FedRecs）和分散协作推荐系统（DecRecs）可以说是最具代表性的两种范式。虽然两者都利用知识共享（例如梯度）来促进学习本地模型，FedRecs依赖于中央服务器来协调优化过程，而在DecRecs中，知识共享直接发生在客户端之间。知识共享还为模型中毒攻击打开了后门，其中对手伪装成良性客户端并传播被污染的知识以实现恶意目标，比如提高物品的曝光率。尽管对这种中毒攻击的研究为发现安全漏洞和相应的对策提供了宝贵的见解，现有的攻击主要集中在FedRecs上，对于DecRecs来说要么不适用，要么效果不佳。与在上传到云端后可以普遍分发给所有客户端的FedRecs不同，在DecRecs中，每个对手只能与一个小规模的邻居客户端进行通信，将其影响限制在有限范围内。为填补这一空白，我们提出了一种名为“具有自适应恶意邻居”的新型攻击方法（PAMN）。以在排名前K的推荐中的物品推广为攻击目标，PAMN通过模拟良性客户端的多个对手有效地提升目标物品的排名，并传输适应性制作的梯度，这些梯度是根据每个对手的邻居而调整的。此外，通过揭示DecRecs的漏洞，提出了一种基于用户级梯度剪切和稀疏更新的专门的防御机制。大量实验证明了中毒攻击的有效性和我们防御机制的鲁棒性。

更新时间: 2024-04-01 15:30:02

领域: cs.CR,cs.IR

下载: http://arxiv.org/abs/2404.01177v1

Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction

Developing a generalist agent is a longstanding objective in artificial intelligence. Previous efforts utilizing extensive offline datasets from various tasks demonstrate remarkable performance in multitasking scenarios within Reinforcement Learning. However, these works encounter challenges in extending their capabilities to new tasks. Recent approaches integrate textual guidance or visual trajectory into decision networks to provide task-specific contextual cues, representing a promising direction. However, it is observed that relying solely on textual guidance or visual trajectory is insufficient for accurately conveying the contextual information of tasks. This paper explores enhanced forms of task guidance for agents, enabling them to comprehend gameplay instructions, thereby facilitating a "read-to-play" capability. Drawing inspiration from the success of multimodal instruction tuning in visual tasks, we treat the visual-based RL task as a long-horizon vision task and construct a set of multimodal game instructions to incorporate instruction tuning into a decision transformer. Experimental results demonstrate that incorporating multimodal game instructions significantly enhances the decision transformer's multitasking and generalization capabilities.

Updated: 2024-04-01 15:18:57

标题: 阅读以玩耍（R2-Play）：多模式游戏指令的决策转换器

摘要: 在人工智能领域，开发一种通用代理是一个长期的目标。先前的工作利用来自不同任务的大量离线数据集，在强化学习中展现出出色的多任务性能。然而，这些工作在将能力扩展到新任务时遇到挑战。最近的方法将文本指导或视觉轨迹整合到决策网络中，提供任务特定的上下文线索，代表了一个有前途的方向。然而，观察到仅依赖文本指导或视觉轨迹是不足以准确传达任务的上下文信息的。本文探讨了为代理增强形式的任务指导，使其能够理解游戏说明，从而实现“读玩”能力。受视觉任务中多模态指导调优成功的启发，我们将基于视觉的强化学习任务视为一个长远视野的视觉任务，并构建一组多模态游戏说明，将指导调优整合到决策变换器中。实验结果表明，整合多模态游戏说明显著增强了决策变换器的多任务和泛化能力。

更新时间: 2024-04-01 15:18:57

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.04154v5

CGCL: Collaborative Graph Contrastive Learning without Handcrafted Graph Data Augmentations

Unsupervised graph representation learning is a non-trivial topic. The success of contrastive methods in the unsupervised representation learning on structured data inspires similar attempts on the graph. Existing graph contrastive learning (GCL) aims to learn the invariance across multiple augmentation views, which renders it heavily reliant on the handcrafted graph augmentations. However, inappropriate graph data augmentations can potentially jeopardize such invariance. In this paper, we show the potential hazards of inappropriate augmentations and then propose a novel Collaborative Graph Contrastive Learning framework (CGCL). This framework harnesses multiple graph encoders to observe the graph. Features observed from different encoders serve as the contrastive views in contrastive learning, which avoids inducing unstable perturbation and guarantees the invariance. To ensure the collaboration among diverse graph encoders, we propose the concepts of asymmetric architecture and complementary encoders as the design principle. To further prove the rationality, we utilize two quantitative metrics to measure the assembly of CGCL respectively. Extensive experiments demonstrate the advantages of CGCL in unsupervised graph-level representation learning and the potential of collaborative framework. The source code for reproducibility is available at https://github.com/zhangtia16/CGCL

Updated: 2024-04-01 15:14:06

标题: CGCL：无需手工制作图数据增强的协作图对比学习

摘要: 无监督图表示学习是一个非常复杂的主题。在结构化数据上无监督表示学习的对比方法成功启发了相似的图尝试。现有的图对比学习（GCL）旨在学习多重增强视图之间的不变性，这使其严重依赖手工制作的图增强。然而，不恰当的图数据增强可能会危及这种不变性。在本文中，我们展示了不当增强的潜在危险，然后提出了一种新颖的协作图对比学习框架（CGCL）。该框架利用多个图编码器观察图形。从不同编码器观察到的特征作为对比学习中的对比视图，避免引起不稳定的扰动并保证不变性。为了确保不同图编码器之间的协作，我们提出了不对称架构和互补编码器的概念作为设计原则。为了进一步证明合理性，我们利用两个定量指标分别衡量CGCL的组装。大量实验证明了CGCL在无监督图级表示学习中的优势及协作框架的潜力。可在https://github.com/zhangtia16/CGCL上获得用于可重复性的源代码。

更新时间: 2024-04-01 15:14:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2111.03262v2

Capturing Shock Waves by Relaxation Neural Networks

In this paper, we put forward a neural network framework to solve the nonlinear hyperbolic systems. This framework, named relaxation neural networks(RelaxNN), is a simple and scalable extension of physics-informed neural networks(PINN). It is shown later that a typical PINN framework struggles to handle shock waves that arise in hyperbolic systems' solutions. This ultimately results in the failure of optimization that is based on gradient descent in the training process. Relaxation systems provide a smooth asymptotic to the discontinuity solution, under the expectation that macroscopic problems can be solved from a microscopic perspective. Based on relaxation systems, the RelaxNN framework alleviates the conflict of losses in the training process of the PINN framework. In addition to the remarkable results demonstrated in numerical simulations, most of the acceleration techniques and improvement strategies aimed at the standard PINN framework can also be applied to the RelaxNN framework.

Updated: 2024-04-01 15:13:46

标题: 用松弛神经网络捕捉冲击波

摘要: 在本文中，我们提出了一个神经网络框架来解决非线性双曲系统。这个框架被称为松弛神经网络（RelaxNN），是对物理信息神经网络（PINN）的简单且可扩展的扩展。后来证明，典型的PINN框架难以处理出现在双曲系统解决方案中的激波。这最终导致基于梯度下降的优化在训练过程中失败。松弛系统提供了一个光滑的渐近解，期望从微观的角度解决宏观问题。基于松弛系统，RelaxNN框架缓解了PINN框架训练过程中损失的冲突。除了在数值模拟中展示出的显著结果外，大多数加速技术和改进策略都可以应用于RelaxNN框架。

更新时间: 2024-04-01 15:13:46

领域: math.NA,cs.AI,cs.NA,76L05, 35D99, 68T07, 65D15

下载: http://arxiv.org/abs/2404.01163v1

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adapt the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large models to adapt it to a specific task while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to the algorithmic perspective, we overview various real-world system designs to investigate the implementation costs associated with different PEFT algorithms. This survey serves as an indispensable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed insights into recent advancements and practical applications.

Updated: 2024-04-01 15:11:36

标题: 大型模型的参数高效调整：全面调查

摘要: 大型模型代表了在多个应用领域的突破性进展，使得在各种任务中取得了显著的成就。然而，其空前规模带来了显著的计算成本。这些模型通常由数十亿个参数组成，执行时需要大量的计算资源。特别是，在为特定下游任务定制这些模型时，其庞大的规模和计算需求给硬件平台带来了巨大挑战，尤其是在计算能力受限的情况下。参数高效微调（PEFT）通过有效地调整大型模型以适应各种下游任务，提供了一个实际的解决方案。具体而言，PEFT是指调整预训练大型模型的参数，使其适应特定任务，同时最大限度地减少引入的额外参数或所需的计算资源。在处理具有高参数数量的大语言模型时，此方法尤为重要，因为从头开始微调这些模型可能会消耗大量计算资源，对支持系统平台设计提出了重大挑战。在本调查中，我们对各种PEFT算法进行了全面研究，考察了它们的性能和计算开销。此外，我们概述了使用不同PEFT算法开发的应用程序，并讨论了用于减轻PEFT计算成本的常见技术。除了算法的角度，我们还概述了各种现实世界系统设计，以调查与不同PEFT算法相关的实施成本。这项调查作为一个不可或缺的资源，为那些希望了解PEFT算法及其系统实施的研究人员提供了详细的见解，提供了最近进展和实际应用的详细信息。

更新时间: 2024-04-01 15:11:36

领域: cs.LG

下载: http://arxiv.org/abs/2403.14608v2

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

The advancements in Large Language Models (LLMs) have been hindered by their substantial sizes, which necessitate LLM compression methods for practical deployment. Singular Value Decomposition (SVD) offers a promising solution for LLM compression. However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression loss, and the lack of update on the compressed weight after SVD truncation. In this work, we propose SVD-LLM, a new SVD-based LLM compression method that addresses the limitations of existing methods. SVD-LLM incorporates a truncation-aware data whitening strategy to ensure a direct mapping between singular values and compression loss. Moreover, SVD-LLM adopts a layer-wise closed-form model parameter update strategy to compensate for accuracy degradation under high compression ratios. We evaluate SVD-LLM on a total of 10 datasets and eight models from three different LLM families at four different scales. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios.

Updated: 2024-04-01 15:04:15

标题: SVD-LLM：用于大型语言模型压缩的截断感知奇异值分解

摘要: 大型语言模型（LLMs）的进展受到其巨大尺寸的限制，这需要LLM压缩方法进行实际部署。奇异值分解（SVD）为LLM压缩提供了一种有前途的解决方案。然而，基于SVD的LLM压缩方法的现有方法存在两个关键限制：截断较小的奇异值可能导致更高的压缩损失，并且在SVD截断后缺乏对压缩权重的更新。在这项工作中，我们提出了SVD-LLM，这是一种新的基于SVD的LLM压缩方法，解决了现有方法的限制。SVD-LLM采用了一种截断感知的数据白化策略，以确保奇异值和压缩损失之间的直接映射。此外，SVD-LLM采用了一种层次闭合的模型参数更新策略，以补偿高压缩比下的精度降级。我们在总共10个数据集和来自三种不同LLM家族的八个模型上对SVD-LLM进行评估，这四个不同规模。我们的结果表明，SVD-LLM在高模型压缩比下优于现有技术。

更新时间: 2024-04-01 15:04:15

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.07378v3

Differentially Private Next-Token Prediction of Large Language Models

Ensuring the privacy of Large Language Models (LLMs) is becoming increasingly important. The most widely adopted technique to accomplish this is DP-SGD, which trains a model to guarantee Differential Privacy (DP). However, DP-SGD overestimates an adversary's capabilities in having white box access to the model and, as a result, causes longer training times and larger memory usage than SGD. On the other hand, commercial LLM deployments are predominantly cloud-based; hence, adversarial access to LLMs is black-box. Motivated by these observations, we present Private Mixing of Ensemble Distributions (PMixED): a private prediction protocol for next-token prediction that utilizes the inherent stochasticity of next-token sampling and a public model to achieve Differential Privacy. We formalize this by introducing RD-mollifers which project each of the model's output distribution from an ensemble of fine-tuned LLMs onto a set around a public LLM's output distribution, then average the projected distributions and sample from it. Unlike DP-SGD which needs to consider the model architecture during training, PMixED is model agnostic, which makes PMixED a very appealing solution for current deployments. Our results show that PMixED achieves a stronger privacy guarantee than sample-level privacy and outperforms DP-SGD for privacy $\epsilon = 8$ on large-scale datasets. Thus, PMixED offers a practical alternative to DP training methods for achieving strong generative utility without compromising privacy.

Updated: 2024-04-01 15:02:32

标题: 大型语言模型的差分隐私下一个标记预测

摘要: 确保大型语言模型（LLMs）的隐私变得越来越重要。实现这一目标最广泛采用的技术是DP-SGD，它训练模型以确保差分隐私（DP）。然而，DP-SGD高估了对模型具有白盒访问权限的对手的能力，导致训练时间更长，内存使用更大。另一方面，商业LLM部署主要基于云；因此，对LLMs的对抗访问是黑盒的。受到这些观察的启发，我们提出了Private Mixing of Ensemble Distributions（PMixED）：一种用于下一个令牌预测的私密预测协议，利用下一个令牌抽样的固有随机性和公共模型实现差分隐私。我们通过引入RD-mollifers来形式化这一点，它将模型的每个输出分布从一组经过精细调整的LLMs的集合投影到公共LLM的输出分布周围的一个集合，然后平均投影分布并从中抽样。与DP-SGD需要在训练过程中考虑模型架构不同，PMixED是模型不可知的，这使得PMixED成为当前部署的一个非常吸引人的解决方案。我们的结果表明，PMixED实现了比样本级隐私更强的隐私保证，并且在大规模数据集上为隐私$\epsilon = 8$优于DP-SGD。因此，PMixED为实现强大的生成效用提供了一种实用的替代方案，而不会损害隐私。

更新时间: 2024-04-01 15:02:32

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.15638v2

SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

Vision-language models (VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in fashion domain, datasets often exhibit a disparity between the information conveyed in image and text. This issue stems from datasets containing multiple images of a single fashion item all paired with one text, leading to cases where some textual details are not visible in individual images. This mismatch, particularly when non-co-occurring elements are masked, undermines the training of conventional VLM objectives like Masked Language Modeling and Masked Image Modeling, thereby hindering the model's ability to accurately align fine-grained visual and textual features. Addressing this problem, we propose Synchronized attentional Masking (SyncMask), which generate masks that pinpoint the image patches and word tokens where the information co-occur in both image and text. This synchronization is accomplished by harnessing cross-attentional features obtained from a momentum model, ensuring a precise alignment between the two modalities. Additionally, we enhance grouped batch sampling with semi-hard negatives, effectively mitigating false negative issues in Image-Text Matching and Image-Text Contrastive learning objectives within fashion datasets. Our experiments demonstrate the effectiveness of the proposed approach, outperforming existing methods in three downstream tasks.

Updated: 2024-04-01 15:01:38

标题: SyncMask：面向时尚视觉语言预训练的同步注意力屏蔽

摘要: 视觉语言模型（VLMs）通过大规模配对数据集在跨模态理解方面取得了重大进展。然而，在时尚领域，数据集通常展示出图像和文本传达的信息之间存在差异。这个问题源于数据集包含多个图像与一段文本配对的单个时尚物品，导致一些文本细节在单个图像中不可见。这种不匹配，尤其是当非共现元素被掩盖时，削弱了传统VLM目标（如遮蔽语言建模和遮蔽图像建模）的训练，从而阻碍了模型准确对齐细粒度视觉和文本特征的能力。为解决这一问题，我们提出了同步注意遮蔽（SyncMask），它生成能够准确指示图像补丁和单词令牌在图像和文本中共现信息的遮罩。通过利用从动量模型获得的交叉注意力特征，实现了这种同步操作，确保两种模态之间的精确对齐。此外，我们通过半硬负例增强了组合批量采样，有效地缓解了时尚数据集中图像-文本匹配和图像-文本对比学习目标中的假阴性问题。我们的实验证明了所提出的方法的有效性，在三个下游任务中胜过了现有方法。

更新时间: 2024-04-01 15:01:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01156v1

Uncovering the Text Embedding in Text-to-Image Diffusion Models

The correspondence between input text and the generated image exhibits opacity, wherein minor textual modifications can induce substantial deviations in the generated image. While, text embedding, as the pivotal intermediary between text and images, remains relatively underexplored. In this paper, we address this research gap by delving into the text embedding space, unleashing its capacity for controllable image editing and explicable semantic direction attributes within a learning-free framework. Specifically, we identify two critical insights regarding the importance of per-word embedding and their contextual correlations within text embedding, providing instructive principles for learning-free image editing. Additionally, we find that text embedding inherently possesses diverse semantic potentials, and further reveal this property through the lens of singular value decomposition (SVD). These uncovered properties offer practical utility for image editing and semantic discovery. More importantly, we expect the in-depth analyses and findings of the text embedding can enhance the understanding of text-to-image diffusion models.

Updated: 2024-04-01 14:59:13

标题: 揭示文本嵌入在文本到图像扩散模型中的作用

摘要: 输入文本和生成的图像之间的对应关系表现出不透明性，即微小的文本修改可能导致生成的图像出现重大偏差。然而，作为文本和图像之间的关键中介的文本嵌入仍然相对未被充分探索。在本文中，我们通过深入研究文本嵌入空间，释放其在可控图像编辑和可解释语义方向属性方面的潜力，填补了这一研究空白。具体来说，我们确定了两个关键的洞见，关于每个单词嵌入的重要性及其在文本嵌入中的上下文相关性，为学习无关的图像编辑提供了指导性原则。此外，我们发现文本嵌入固有地具有多样的语义潜力，并通过奇异值分解的视角进一步揭示了这一属性。这些揭示出的属性为图像编辑和语义发现提供了实用性。更重要的是，我们期望对文本嵌入的深入分析和发现能够增进对文本到图像扩散模型的理解。

更新时间: 2024-04-01 14:59:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01154v1

TransFusion: Covariate-Shift Robust Transfer Learning for High-Dimensional Regression

The main challenge that sets transfer learning apart from traditional supervised learning is the distribution shift, reflected as the shift between the source and target models and that between the marginal covariate distributions. In this work, we tackle model shifts in the presence of covariate shifts in the high-dimensional regression setting. Specifically, we propose a two-step method with a novel fused-regularizer that effectively leverages samples from source tasks to improve the learning performance on a target task with limited samples. Nonasymptotic bound is provided for the estimation error of the target model, showing the robustness of the proposed method to covariate shifts. We further establish conditions under which the estimator is minimax-optimal. Additionally, we extend the method to a distributed setting, allowing for a pretraining-finetuning strategy, requiring just one round of communication while retaining the estimation rate of the centralized version. Numerical tests validate our theory, highlighting the method's robustness to covariate shifts.

Updated: 2024-04-01 14:58:16

标题: TransFusion：高维回归的协变量偏移稳健迁移学习

摘要: 转移学习与传统监督学习之间的主要挑战是分布偏移，表现为源模型和目标模型之间以及边缘协变量分布之间的偏移。在这项工作中，我们在高维回归设置中处理模型偏移和协变量偏移。具体来说，我们提出了一种两步方法，其中包含一种新颖的融合正则化器，可以有效地利用来自源任务的样本，以提高在样本有限的目标任务上的学习性能。我们为目标模型的估计误差提供了非渐近界限，显示了所提出方法对协变量偏移的鲁棒性。我们进一步建立了估计量是极小-最优的条件。此外，我们将该方法扩展到分布式设置，允许使用预训练-微调策略，只需一轮通信，同时保持中心化版本的估计速率。数值测试验证了我们的理论，突显了该方法对协变量偏移的鲁棒性。

更新时间: 2024-04-01 14:58:16

领域: stat.ML,cs.DC,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2404.01153v1

Self-Organization Towards $1/f$ Noise in Deep Neural Networks

The presence of $1/f$ noise, also known as pink noise, is a well-established phenomenon in biological neural networks, and is thought to play an important role in information processing in the brain. In this study, we find that such $1/f$ noise is also found in deep neural networks trained on natural language, resembling that of their biological counterparts. Specifically, we trained Long Short-Term Memory (LSTM) networks on the `IMDb' AI benchmark dataset, then measured the neuron activations. The detrended fluctuation analysis (DFA) on the time series of the different neurons demonstrate clear $1/f$ patterns, which is absent in the time series of the inputs to the LSTM. Interestingly, when the neural network is at overcapacity, having more than enough neurons to achieve the learning task, the activation patterns deviate from $1/f$ noise and shifts towards white noise. This is because many of the neurons are not effectively used, showing little fluctuations when fed with input data. We further examine the exponent values in the $1/f$ noise in ``internal" and ``external" activations in the LSTM cell, finding some resemblance in the variations of the exponents in fMRI signals of the human brain. Our findings further supports the hypothesis that $1/f$ noise is a signature of optimal learning. With deep learning models approaching or surpassing humans in certain tasks, and being more ``experimentable'' than their biological counterparts, our study suggests that they are good candidates to understand the fundamental origins of $1/f$ noise.

Updated: 2024-04-01 14:47:08

标题: 深度神经网络中自组织朝向$1/f$噪声

摘要: $1/f噪声的存在，也被称为粉红噪声，在生物神经网络中是一个被充分确认的现象，并且被认为在大脑信息处理中扮演着重要的角色。在这项研究中，我们发现这种$1/f噪声也存在于训练在自然语言上的深度神经网络中，类似于它们的生物对应物。具体来说，我们在“IMDb”AI基准数据集上训练了长短期记忆（LSTM）网络，然后测量了神经元的激活。对不同神经元的时间序列进行去趋势波动分析（DFA）表明明确的$1/f模式，在LSTM的输入时间序列中则没有这种模式。有趣的是，当神经网络处于超容量状态，拥有足够多的神经元来完成学习任务时，激活模式会偏离$1/f噪声并向白噪声转变。这是因为许多神经元没有被有效利用，在输入数据时表现出很少的波动。我们进一步研究了LSTM单元中“内部”和“外部”激活中$1/f噪声的指数值，在人类大脑的fMRI信号中发现了一些相似之处。我们的发现进一步支持了$1/f噪声是最佳学习的特征的假设。随着深度学习模型在某些任务上逼近或超越人类，并且比它们的生物对应物更“可实验”，我们的研究表明它们是理解$1/f噪声根本起源的良好候选者。

更新时间: 2024-04-01 14:47:08

领域: physics.data-an,cs.AI,nlin.AO

下载: http://arxiv.org/abs/2301.08530v2

Do LLMs Find Human Answers To Fact-Driven Questions Perplexing? A Case Study on Reddit

Large language models (LLMs) have been shown to be proficient in correctly answering questions in the context of online discourse. However, the study of using LLMs to model human-like answers to fact-driven social media questions is still under-explored. In this work, we investigate how LLMs model the wide variety of human answers to fact-driven questions posed on several topic-specific Reddit communities, or subreddits. We collect and release a dataset of 409 fact-driven questions and 7,534 diverse, human-rated answers from 15 r/Ask{Topic} communities across 3 categories: profession, social identity, and geographic location. We find that LLMs are considerably better at modeling highly-rated human answers to such questions, as opposed to poorly-rated human answers. We present several directions for future research based on our initial findings.

Updated: 2024-04-01 14:46:20

标题: LLM是否发现人类对基于事实的问题感到困惑？Reddit上的案例研究

摘要: 大型语言模型(LLMs)已被证明在在线对话的背景下能够熟练地正确回答问题。然而，利用LLMs来模拟人类对基于事实的社交媒体问题的答案仍未被充分探讨。在这项工作中，我们调查了LLMs如何模拟广泛的人类对在多个特定主题的Reddit社区或subreddits上提出的基于事实的问题的答案。我们收集并发布了一个包含409个基于事实的问题和7,534个不同的、经过人工评分的答案的数据集，来自15个r/Ask{Topic}社区，涵盖了3个类别：职业、社会身份和地理位置。我们发现LLMs在模拟这类问题中受到高评分人类答案的影响要比受到低评分人类答案的影响要大。我们根据我们的初步发现提出了几个未来研究方向。

更新时间: 2024-04-01 14:46:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.01147v1

Safe and Interpretable Estimation of Optimal Treatment Regimes

Recent statistical and reinforcement learning methods have significantly advanced patient care strategies. However, these approaches face substantial challenges in high-stakes contexts, including missing data, inherent stochasticity, and the critical requirements for interpretability and patient safety. Our work operationalizes a safe and interpretable framework to identify optimal treatment regimes. This approach involves matching patients with similar medical and pharmacological characteristics, allowing us to construct an optimal policy via interpolation. We perform a comprehensive simulation study to demonstrate the framework's ability to identify optimal policies even in complex settings. Ultimately, we operationalize our approach to study regimes for treating seizures in critically ill patients. Our findings strongly support personalized treatment strategies based on a patient's medical history and pharmacological features. Notably, we identify that reducing medication doses for patients with mild and brief seizure episodes while adopting aggressive treatment for patients in intensive care unit experiencing intense seizures leads to more favorable outcomes.

Updated: 2024-04-01 14:46:17

标题: 安全且可解释的最佳治疗方案估计

摘要: 最近的统计学和强化学习方法显著推动了患者护理策略的进步。然而，这些方法在高风险环境中面临着重大挑战，包括缺失数据、固有的随机性以及对可解释性和患者安全的关键要求。我们的工作实现了一个安全和可解释的框架，用于识别最佳治疗方案。这种方法涉及将具有相似医学和药物特征的患者进行匹配，从而通过插值构建最佳策略。我们进行了全面的模拟研究，以证明该框架能够在复杂环境中识别最佳策略。最终，我们将我们的方法实现为研究如何治疗危重病患者的癫痫发作。我们的研究结果强烈支持基于患者医疗史和药物特征的个性化治疗策略。值得注意的是，我们发现对于癫痫发作轻微且短暂的患者减少药物剂量，而对于重症监护室中经历强烈癫痫发作的患者采取积极治疗，会导致更有利的结果。

更新时间: 2024-04-01 14:46:17

领域: cs.LG,stat.AP,stat.ME

下载: http://arxiv.org/abs/2310.15333v2

Sequential-in-time training of nonlinear parametrizations for solving time-dependent partial differential equations

Sequential-in-time methods solve a sequence of training problems to fit nonlinear parametrizations such as neural networks to approximate solution trajectories of partial differential equations over time. This work shows that sequential-in-time training methods can be understood broadly as either optimize-then-discretize (OtD) or discretize-then-optimize (DtO) schemes, which are well known concepts in numerical analysis. The unifying perspective leads to novel stability and a posteriori error analysis results that provide insights into theoretical and numerical aspects that are inherent to either OtD or DtO schemes such as the tangent space collapse phenomenon, which is a form of over-fitting. Additionally, the unified perspective facilitates establishing connections between variants of sequential-in-time training methods, which is demonstrated by identifying natural gradient descent methods on energy functionals as OtD schemes applied to the corresponding gradient flows.

Updated: 2024-04-01 14:45:16

标题: 时间序列训练非线性参数化用于解决时变偏微分方程

摘要: 时间顺序方法解决一系列训练问题，以适应非线性参数化，如神经网络，以近似解决时间上的偏微分方程轨迹。本文表明，时间顺序训练方法可以广泛理解为优化-离散化（OtD）或离散化-优化（DtO）方案，这在数值分析中是众所周知的概念。统一的视角带来了新颖的稳定性和事后误差分析结果，为理论和数值方面提供了洞察，这些是优化-离散化或离散化-优化方案固有的，如切线空间坍塌现象，这是一种过拟合形式。此外，统一的视角有助于建立顺序时间训练方法变体之间的联系，通过确定将能量泛函的自然梯度下降方法视为应用于相应梯度流的OtD方案来证明这一点。

更新时间: 2024-04-01 14:45:16

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2404.01145v1

Condition-Aware Neural Network for Controlled Image Generation

We present Condition-Aware Neural Network (CAN), a new method for adding control to image generative models. In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network. This is achieved by introducing a condition-aware weight generation module that generates conditional weight for convolution/linear layers based on the input condition. We test CAN on class-conditional image generation on ImageNet and text-to-image generation on COCO. CAN consistently delivers significant improvements for diffusion transformer models, including DiT and UViT. In particular, CAN combined with EfficientViT (CaT) achieves 2.78 FID on ImageNet 512x512, surpassing DiT-XL/2 while requiring 52x fewer MACs per sampling step.

Updated: 2024-04-01 14:42:57

标题: 条件感知神经网络用于受控图像生成

摘要: 我们提出了Condition-Aware Neural Network（CAN），一种为图像生成模型添加控制的新方法。与先前的条件控制方法并行，CAN通过动态操纵神经网络的权重来控制图像生成过程。这是通过引入一个条件感知权重生成模块来实现的，该模块根据输入条件为卷积/线性层生成条件权重。我们在ImageNet上进行了类别条件图像生成的CAN测试，并在COCO上进行了文本到图像的生成。CAN一贯为扩散变换模型（包括DiT和UViT）提供显著改进。特别是，与EfficientViT（CaT）相结合的CAN在ImageNet 512x512上实现了2.78的FID，超过了DiT-XL/2，同时每个采样步骤需要的MAC数量减少了52倍。

更新时间: 2024-04-01 14:42:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01143v1

DANSE: Data-driven Non-linear State Estimation of Model-free Process in Unsupervised Learning Setup

We address the tasks of Bayesian state estimation and forecasting for a model-free process in an unsupervised learning setup. For a model-free process, we do not have any a-priori knowledge of the process dynamics. In the article, we propose DANSE -- a Data-driven Nonlinear State Estimation method. DANSE provides a closed-form posterior of the state of the model-free process, given linear measurements of the state. In addition, it provides a closed-form posterior for forecasting. A data-driven recurrent neural network (RNN) is used in DANSE to provide the parameters of a prior of the state. The prior depends on the past measurements as input, and then we find the closed-form posterior of the state using the current measurement as input. The data-driven RNN captures the underlying non-linear dynamics of the model-free process. The training of DANSE, mainly learning the parameters of the RNN, is executed using an unsupervised learning approach. In unsupervised learning, we have access to a training dataset comprising only a set of measurement data trajectories, but we do not have any access to the state trajectories. Therefore, DANSE does not have access to state information in the training data and can not use supervised learning. Using simulated linear and non-linear process models (Lorenz attractor and Chen attractor), we evaluate the unsupervised learning-based DANSE. We show that the proposed DANSE, without knowledge of the process model and without supervised learning, provides a competitive performance against model-driven methods, such as the Kalman filter (KF), extended KF (EKF), unscented KF (UKF), a data-driven deep Markov model (DMM) and a recently proposed hybrid method called KalmanNet. In addition, we show that DANSE works for high-dimensional state estimation.

Updated: 2024-04-01 14:40:30

标题: DANSE：无监督学习设置中模型无关过程的基于数据驱动的非线性状态估计

摘要: 我们在无监督学习设置下，针对无模型过程的贝叶斯状态估计和预测任务进行了探讨。对于无模型过程，我们没有任何关于过程动态的先验知识。在本文中，我们提出了DANSE - 一种数据驱动的非线性状态估计方法。DANSE给出了基于线性状态测量的无模型过程状态的闭合后验。此外，它还提供了一个用于预测的闭合后验。在DANSE中，我们使用基于数据的循环神经网络（RNN）来提供状态的先验参数。先验取决于过去的测量作为输入，然后我们使用当前测量作为输入找到状态的闭合后验。数据驱动的RNN捕捉了无模型过程的潜在非线性动态。DANSE的训练主要是使用无监督学习方法来学习RNN的参数。在无监督学习中，我们只能访问包含一组测量数据轨迹的训练数据集，但我们无法访问状态轨迹。因此，DANSE在训练数据中没有状态信息，并且无法使用监督学习。我们使用模拟的线性和非线性过程模型（Lorenz吸引子和Chen吸引子）来评估基于无监督学习的DANSE。我们展示了在不具备过程模型知识和无监督学习的情况下，所提出的DANSE与模型驱动方法（如卡尔曼滤波器（KF），扩展卡尔曼滤波器（EKF），无痕卡尔曼滤波器（UKF），基于数据的深度马尔可夫模型（DMM）和最近提出的混合方法KalmanNet）具有竞争性的表现。此外，我们展示了DANSE适用于高维状态估计。

更新时间: 2024-04-01 14:40:30

领域: eess.SY,cs.LG,cs.SY,eess.SP

下载: http://arxiv.org/abs/2306.03897v2

SoK: A Review of Differentially Private Linear Models For High-Dimensional Data

Linear models are ubiquitous in data science, but are particularly prone to overfitting and data memorization in high dimensions. To guarantee the privacy of training data, differential privacy can be used. Many papers have proposed optimization techniques for high-dimensional differentially private linear models, but a systematic comparison between these methods does not exist. We close this gap by providing a comprehensive review of optimization methods for private high-dimensional linear models. Empirical tests on all methods demonstrate robust and coordinate-optimized algorithms perform best, which can inform future research. Code for implementing all methods is released online.

Updated: 2024-04-01 14:38:51

标题: SoK：对高维数据的差分隐私线性模型综述

摘要: 线性模型在数据科学中随处可见，但在高维空间中很容易过拟合和记忆数据。为了确保训练数据的隐私性，可以使用差分隐私。许多论文提出了针对高维差分隐私线性模型的优化技术，但这些方法之间的系统比较尚不存在。我们通过提供对私密高维线性模型的优化方法的全面评估来填补这一空白。对所有方法进行的实证测试表明，鲁棒性和坐标优化算法表现最佳，这可以为未来研究提供信息。所有方法的实现代码已在网上发布。

更新时间: 2024-04-01 14:38:51

领域: cs.LG,cs.CR,stat.ML,I.2

下载: http://arxiv.org/abs/2404.01141v1

Enhancing Reasoning Capacity of SLM using Cognitive Enhancement

Large Language Models (LLMs) have been applied to automate cyber security activities and processes including cyber investigation and digital forensics. However, the use of such models for cyber investigation and digital forensics should address accountability and security considerations. Accountability ensures models have the means to provide explainable reasonings and outcomes. This information can be extracted through explicit prompt requests. For security considerations, it is crucial to address privacy and confidentiality of the involved data during data processing as well. One approach to deal with this consideration is to have the data processed locally using a local instance of the model. Due to limitations of locally available resources, namely memory and GPU capacities, a Smaller Large Language Model (SLM) will typically be used. These SLMs have significantly fewer parameters compared to the LLMs. However, such size reductions have notable performance reduction, especially when tasked to provide reasoning explanations. In this paper, we aim to mitigate performance reduction through the integration of cognitive strategies that humans use for problem-solving. We term this as cognitive enhancement through prompts. Our experiments showed significant improvement gains of the SLMs' performances when such enhancements were applied. We believe that our exploration study paves the way for further investigation into the use of cognitive enhancement to optimize SLM for cyber security applications.

Updated: 2024-04-01 14:29:58

标题: 通过认知增强技术提升SLM的推理能力

摘要: 大型语言模型（LLMs）已被应用于自动化网络安全活动和过程，包括网络调查和数字取证。然而，将这些模型用于网络调查和数字取证应该考虑问责和安全性。问责确保模型有提供可解释的推理和结果的方法。这些信息可以通过明确的提示请求来提取。对于安全性考虑，在数据处理过程中关键的是要处理涉及数据的隐私和机密性。处理这种考虑的一种方法是使用本地实例来本地处理数据。由于本地可用资源的限制，即内存和GPU容量，通常会使用较小的大型语言模型（SLM）。这些SLMs与LLMs相比具有显著较少的参数。然而，这种大小缩减会导致明显的性能下降，特别是在要求提供推理解释时。在本文中，我们旨在通过整合人类用于解决问题的认知策略来减轻性能下降。我们将这称为通过提示进行的认知增强。我们的实验表明，当应用这些增强时，SLMs的性能显著提高。我们相信我们的探索研究为进一步研究如何利用认知增强来优化SLM以用于网络安全应用铺平了道路。

更新时间: 2024-04-01 14:29:58

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.01135v1

GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems

For multi-agent reinforcement learning systems (MARLS), the problem formulation generally involves investing massive reward engineering effort specific to a given problem. However, this effort often cannot be translated to other problems; worse, it gets wasted when system dynamics change drastically. This problem is further exacerbated in sparse reward scenarios, where a meaningful heuristic can assist in the policy convergence task. We propose GOVerned Reward Engineering Kernels (GOV-REK), which dynamically assign reward distributions to agents in MARLS during its learning stage. We also introduce governance kernels, which exploit the underlying structure in either state or joint action space for assigning meaningful agent reward distributions. During the agent learning stage, it iteratively explores different reward distribution configurations with a Hyperband-like algorithm to learn ideal agent reward models in a problem-agnostic manner. Our experiments demonstrate that our meaningful reward priors robustly jumpstart the learning process for effectively learning different MARL problems.

Updated: 2024-04-01 14:19:00

标题: GOV-REK:用于设计稳健多智能体强化学习系统的受控奖励工程核心

摘要: 对于多智能体强化学习系统（MARLS），问题的制定通常涉及投入大量奖励工程工作，针对特定问题进行特定奖励设计。然而，这种工作通常无法转化为其他问题；更糟糕的是，当系统动态发生剧烈变化时，这些工作会被浪费。在稀疏奖励场景中，这个问题进一步加剧，其中有意义的启发式可以帮助策略收敛任务。我们提出GOVerned Reward Engineering Kernels（GOV-REK），在MARLS的学习阶段动态地为智能体分配奖励分布。我们还引入了治理核心，利用状态或联合行动空间中的潜在结构，为智能体分配有意义的奖励分布。在智能体学习阶段，它使用类似Hyperband的算法迭代地探索不同的奖励分布配置，以以问题不可知的方式学习理想的智能体奖励模型。我们的实验表明，我们有意义的奖励先验可以有效地启动学习过程，使其能够有效地学习不同的MARL问题。

更新时间: 2024-04-01 14:19:00

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2404.01131v1

NJUST-KMG at TRAC-2024 Tasks 1 and 2: Offline Harm Potential Identification

This report provide a detailed description of the method that we proposed in the TRAC-2024 Offline Harm Potential dentification which encloses two sub-tasks. The investigation utilized a rich dataset comprised of social media comments in several Indian languages, annotated with precision by expert judges to capture the nuanced implications for offline context harm. The objective assigned to the participants was to design algorithms capable of accurately assessing the likelihood of harm in given situations and identifying the most likely target(s) of offline harm. Our approach ranked second in two separate tracks, with F1 values of 0.73 and 0.96 respectively. Our method principally involved selecting pretrained models for finetuning, incorporating contrastive learning techniques, and culminating in an ensemble approach for the test set.

Updated: 2024-04-01 14:16:42

标题: NJUST-KMG在TRAC-2024任务1和2中的表现：离线危害潜力识别

摘要: 这份报告提供了我们在TRAC-2024离线伤害潜力识别中提出的方法的详细描述，该方法包含了两个子任务。调查利用了一个丰富的数据集，其中包括用专家评委精确注释的几种印度语言的社交媒体评论，以捕捉离线环境伤害的微妙含义。参与者的目标是设计能够准确评估给定情况下伤害可能性的算法，并识别最可能成为离线伤害目标的人。我们的方法在两个独立的跟踪中排名第二，F1值分别为0.73和0.96。我们的方法主要涉及选择预训练模型进行微调，融入对比学习技术，并最终在测试集中采用集成方法。

更新时间: 2024-04-01 14:16:42

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.19713v2

Metalearning with Very Few Samples Per Task

Metalearning and multitask learning are two frameworks for solving a group of related learning tasks more efficiently than we could hope to solve each of the individual tasks on their own. In multitask learning, we are given a fixed set of related learning tasks and need to output one accurate model per task, whereas in metalearning we are given tasks that are drawn i.i.d. from a metadistribution and need to output some common information that can be easily specialized to new tasks from the metadistribution. We consider a binary classification setting where tasks are related by a shared representation, that is, every task $P$ can be solved by a classifier of the form $f_{P} \circ h$ where $h \in H$ is a map from features to a representation space that is shared across tasks, and $f_{P} \in F$ is a task-specific classifier from the representation space to labels. The main question we ask is how much data do we need to metalearn a good representation? Here, the amount of data is measured in terms of the number of tasks $t$ that we need to see and the number of samples $n$ per task. We focus on the regime where $n$ is extremely small. Our main result shows that, in a distribution-free setting where the feature vectors are in $\mathbb{R}^d$, the representation is a linear map from $\mathbb{R}^d \to \mathbb{R}^k$, and the task-specific classifiers are halfspaces in $\mathbb{R}^k$, we can metalearn a representation with error $\varepsilon$ using $n = k+2$ samples per task, and $d \cdot (1/\varepsilon)^{O(k)}$ tasks. Learning with so few samples per task is remarkable because metalearning would be impossible with $k+1$ samples per task, and because we cannot even hope to learn an accurate task-specific classifier with $k+2$ samples per task. Our work also yields a characterization of distribution-free multitask learning and reductions between meta and multitask learning.

Updated: 2024-04-01 14:13:22

标题: 在每个任务中只有非常少的样本的元学习

摘要: 元学习和多任务学习是解决一组相关学习任务的两种框架，比起单独解决每个个体任务，它们可以更高效地完成这组任务。在多任务学习中，我们被给定一组相关学习任务，并需要为每个任务输出一个准确的模型，而在元学习中，我们被给定从元分布中独立同分布绘制的任务，并需要输出一些通用信息，这些信息可以轻松地适用于来自元分布的新任务。我们考虑一个二分类设置，其中任务通过共享表示相关联，即每个任务$P$可以通过形式为$f_{P} \circ h$的分类器解决，其中$h \in H$是从特征到表示空间的映射，该空间在所有任务之间共享，$f_{P} \in F$是从表示空间到标签的特定于任务的分类器。我们要问的主要问题是我们需要多少数据来元学习一个好的表示？这里，数据量是以我们需要看到的任务数$t$和每个任务的样本数$n$来衡量的。我们关注的是$n$非常小的情况。我们的主要结果表明，在一个无分布的设置中，特征向量在$\mathbb{R}^d$中，表示是一个从$\mathbb{R}^d \to \mathbb{R}^k$的线性映射，任务特定的分类器是$\mathbb{R}^k$中的半空间时，我们可以使用每个任务$n = k+2$个样本来元学习一个误差为$\varepsilon$的表示，并且有$d \cdot (1/\varepsilon)^{O(k)}$个任务。使用如此少的样本量进行每个任务的学习是非凡的，因为使用$k+1$个样本每个任务进行元学习是不可能的，而且我们甚至不能希望使用$k+2$个样本每个任务学习一个准确的特定于任务的分类器。我们的工作还提供了无分布多任务学习的表征和元学习与多任务学习之间的简化。

更新时间: 2024-04-01 14:13:22

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2312.13978v2

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Large Multimodal Models (LMMs) have shown significant reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically use a fixed amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which increase the number of visual tokens significantly. However, due to the design of the Transformer architecture, computational costs associated with these models tend to increase quadratically with the number of input tokens. To tackle this problem, we explore a token reduction mechanism and find, similar to prior work, that many visual tokens are spatially redundant. Based on this, we propose PruMerge, a novel adaptive visual token reduction approach, which largely reduces the number of visual tokens while maintaining comparable model performance. We first select the unpruned visual tokens based on their similarity to class tokens and spatial tokens. We then cluster the pruned tokens based on key similarity and merge the clustered tokens with the unpruned tokens to supplement their information. Empirically, when applied to LLaVA-1.5, our approach can compress the visual tokens by 18 times on average, and achieve comparable performance across diverse visual question-answering and reasoning tasks. Code and checkpoints are at https://llava-prumerge.github.io/.

Updated: 2024-04-01 14:08:06

标题: LLaVA-PruMerge：用于高效大型多模态模型的自适应标记减少

摘要: 大型多模态模型（LMMs）通过连接视觉编码器和大型语言模型展示出显著的推理能力。LMMs通常使用固定数量的视觉标记，例如CLIP视觉编码器中的倒数第二层特征，作为前缀内容。最近的LMMs整合了更复杂的视觉输入，如高分辨率图像和视频，这显著增加了视觉标记的数量。然而，由于Transformer架构的设计，这些模型相关的计算成本随着输入标记数量的增加呈二次增长。为了解决这个问题，我们探索了一种标记减少机制，并发现，与之前的工作类似，许多视觉标记在空间上是冗余的。基于此，我们提出了PruMerge，一种新颖的自适应视觉标记减少方法，大大减少了视觉标记的数量，同时保持了可比较的模型性能。我们首先基于它们与类标记和空间标记的相似性选择未经修剪的视觉标记。然后，我们根据关键相似性将修剪过的标记聚类，并将聚类的标记与未经修剪的标记合并，以补充它们的信息。从经验上看，当应用于LLaVA-1.5时，我们的方法平均可以将视觉标记压缩18倍，并在各种视觉问答和推理任务中获得可比较的性能。代码和检查点位于https://llava-prumerge.github.io/。

更新时间: 2024-04-01 14:08:06

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.15388v3

Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation

Accurate segmentation of lesion regions is crucial for clinical diagnosis and treatment across various diseases. While deep convolutional networks have achieved satisfactory results in medical image segmentation, they face challenges such as loss of lesion shape information due to continuous convolution and downsampling, as well as the high cost of manually labeling lesions with varying shapes and sizes. To address these issues, we propose a novel medical visual prompting (MVP) framework that leverages pre-training and prompting concepts from natural language processing (NLP). The framework utilizes three key components: Super-Pixel Guided Prompting (SPGP) for superpixelating the input image, Image Embedding Guided Prompting (IEGP) for freezing patch embedding and merging with superpixels to provide visual prompts, and Adaptive Attention Mechanism Guided Prompting (AAGP) for pinpointing prompt content and efficiently adapting all layers. By integrating SPGP, IEGP, and AAGP, the MVP enables the segmentation network to better learn shape prompting information and facilitates mutual learning across different tasks. Extensive experiments conducted on five datasets demonstrate superior performance of this method in various challenging medical image tasks, while simplifying single-task medical segmentation models. This novel framework offers improved performance with fewer parameters and holds significant potential for accurate segmentation of lesion regions in various medical tasks, making it clinically valuable.

Updated: 2024-04-01 14:06:48

标题: 医学视觉提示（MVP）：用于多功能和高质量医学图像分割的统一框架

摘要: 准确分割病变区域对于跨越各种疾病的临床诊断和治疗至关重要。尽管深度卷积网络在医学图像分割中取得了令人满意的结果，但由于连续卷积和下采样导致病变形状信息的丢失，以及手动标记不同形状和大小病变的高成本等挑战。为了解决这些问题，我们提出了一种新颖的医学视觉提示（MVP）框架，利用自然语言处理（NLP）中的预训练和提示概念。该框架利用三个关键组件：超像素引导提示（SPGP）用于对输入图像进行超像素处理，图像嵌入引导提示（IEGP）用于冻结补丁嵌入并与超像素合并以提供视觉提示，自适应关注机制引导提示（AAGP）用于确定提示内容并有效适应所有层。通过整合SPGP、IEGP和AAGP，MVP使分割网络能够更好地学习形状提示信息，并促进不同任务之间的相互学习。在五个数据集上进行的大量实验证明了该方法在各种具有挑战性的医学图像任务中的卓越表现，同时简化了单一任务的医学分割模型。这种新颖的框架提供了更好的性能，参数更少，并具有在各种医学任务中准确分割病变区域的显著潜力，因此具有临床价值。

更新时间: 2024-04-01 14:06:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01127v1

Enhanced Precision in Rainfall Forecasting for Mumbai: Utilizing Physics Informed ConvLSTM2D Models for Finer Spatial and Temporal Resolution

Forecasting rainfall in tropical areas is challenging due to complex atmospheric behaviour, elevated humidity levels, and the common presence of convective rain events. In the Indian context, the difficulty is further exacerbated because of the monsoon intra seasonal oscillations, which introduce significant variability in rainfall patterns over short periods. Earlier investigations into rainfall prediction leveraged numerical weather prediction methods, along with statistical and deep learning approaches. This study introduces deep learning spatial model aimed at enhancing rainfall prediction accuracy on a finer scale. In this study, we hypothesize that integrating physical understanding improves the precipitation prediction skill of deep learning models with high precision for finer spatial scales, such as cities. To test this hypothesis, we introduce a physics informed ConvLSTM2D model to predict precipitation 6hr and 12hr ahead for Mumbai, India. We utilize ERA5 reanalysis data select predictor variables, across various geopotential levels. The ConvLSTM2D model was trained on the target variable precipitation for 4 different grids representing different spatial grid locations of Mumbai. Thus, the use of the ConvLSTM2D model for rainfall prediction, utilizing physics informed data from specific grids with limited spatial information, reflects current advancements in meteorological research that emphasize both efficiency and localized precision.

Updated: 2024-04-01 13:56:12

标题: 增强孟买降雨预测的精度：利用物理信息ConvLSTM2D模型实现更精细的空间和时间分辨率

摘要: 在热带地区预测降雨是具有挑战性的，这是因为大气行为复杂，湿度水平高，且对流性降雨事件普遍存在。在印度的情况下，困难进一步加剧，因为季风季节内部振荡引入了短期内降雨模式的显著变化。先前对降雨预测的调查利用了数值天气预报方法，以及统计和深度学习方法。本研究引入了旨在提高细粒度降雨预测准确性的深度学习空间模型。在本研究中，我们假设整合物理理解提高了深度学习模型对细微空间尺度（如城市）的降水预测技巧。为了验证这一假设，我们引入了一种基于物理的ConvLSTM2D模型，用于预测印度孟买未来6小时和12小时的降水情况。我们利用ERA5再分析数据选择预测变量，涵盖各种高度位势水平。ConvLSTM2D模型针对代表孟买不同空间网格位置的4个不同网格进行了降水目标变量的训练。因此，利用来自特定网格的物理信息数据进行降雨预测，反映了气象研究的最新进展，强调效率和局部精度。

更新时间: 2024-04-01 13:56:12

领域: cs.LG

下载: http://arxiv.org/abs/2404.01122v1

OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

Large language models (LLMs) often struggle with maintaining accuracy throughout multiple multiple reasoning steps, especially in mathematical reasoning where an error in earlier steps can propagate to subsequent ones and it ultimately leading to an incorrect answer. To reduce error propagation, guided decoding is employed to direct the LM decoding on a step-by-step basis. We argue that in guided decoding, assessing the potential of an incomplete reasoning path can be more advantageous than simply ensuring per-step correctness, as the former approach leads towards a correct final answer. This transforms the task into a $\textit{value estimation}$ problem in planning. Inspired by the findings that $\textit{outcome supervision for guided decoding essentially acts as a value model}$, we propose Outcome-supervised Value Model (OVM) that employs outcome supervision for training a value model, which prioritizes steps that lead to accurate conclusions. Furthermore, the OVM eliminates the need for labor-intensive annotations of step-level correctness, thereby significantly enhancing its scalability. Our experiments on two multi-step mathematical reasoning datasets, GSM8K and Game of 24, demonstrate the superior performance of the OVM model. Notably, in GSM8K, our $\textbf{OVM-7B model achieves state-of-the-art results among LLMs up to 13B parameters}$; especially it does not utilize GPT-4 or code execution. These findings offer a novel perspective on the role of outcome supervision in training value models for multi-step reasoning tasks and provide theoretical justification for its advantage in value estimation for guided decoding.

Updated: 2024-04-01 13:50:51

标题: OVM，数学推理规划中的结果监督价值模型

摘要: 大型语言模型（LLMs）通常在多个推理步骤中保持准确性方面存在困难，尤其是在数学推理中，较早步骤中的错误可能会传播到后续步骤，最终导致答案不正确。为了减少错误传播，引入了引导解码来在逐步基础上指导LM解码。我们认为，在引导解码中，评估不完整推理路径的潜力可能比仅确保每个步骤的正确性更有优势，因为前者方法会导向正确的最终答案。这将任务转变为规划中的价值估计问题。受到发现的启发，即引导解码的结果监督实质上起到了价值模型的作用，我们提出了Outcome-supervised Value Model（OVM），它利用结果监督来训练一个价值模型，优先考虑导致准确结论的步骤。此外，OVM消除了对步骤级别正确性的劳动密集型注释的需求，因此显著提高了其可扩展性。我们在两个多步数学推理数据集GSM8K和Game of 24上的实验表明了OVM模型的卓越性能。值得注意的是，在GSM8K中，我们的OVM-7B模型在LLMs中达到了13B参数的最新结果；尤其是它没有使用GPT-4或代码执行。这些发现为多步推理任务训练价值模型的结果监督角色提供了新的视角，并为其在引导解码中的价值估计优势提供了理论上的证明。

更新时间: 2024-04-01 13:50:51

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2311.09724v2

An incremental hybrid adaptive network-based IDS in Software Defined Networks to detect stealth attacks

Network attacks have became increasingly more sophisticated and stealthy due to the advances in technologies and the growing sophistication of attackers. Advanced Persistent Threats (APTs) are a type of attack that implement a wide range of strategies to evade detection and be under the defence radar. Software Defined Network (SDN) is a network paradigm that implements dynamic configuration by separating the control plane from the network plane. This approach improves security aspects by facilitating the employment of network intrusion detection systems. Implementing Machine Learning (ML) techniques in Intrusion Detection Systems (IDSs) is widely used to detect such attacks but has a challenge when the data distribution changes. Concept drift is a term that describes the change in the relationship between the input data and the target value (label or class). The model is expected to degrade as certain forms of change occur. In this paper, the primary form of change will be in user behaviour (particularly changes in attacker behaviour). It is essential for a model to adapt itself to deviations in data distribution. SDN can help in monitoring changes in data distribution. This paper discusses changes in stealth attacker behaviour. The work described here investigates various concept drift detection algorithms. An incremental hybrid adaptive Network Intrusion Detection System (NIDS) is proposed to tackle the issue of concept drift in SDN. It can detect known and unknown attacks. The model is evaluated over different datasets showing promising results.

Updated: 2024-04-01 13:33:40

标题: 一种增量式混合自适应网络IDS在软件定义网络中用于检测隐蔽攻击

摘要: 网络攻击由于技术的进步和攻击者的不断提高的复杂性而变得越来越复杂和隐秘。高级持续威胁（APTs）是一种攻击类型，采用一系列策略来规避检测并躲避防御雷达。软件定义网络（SDN）是一种网络范式，通过将控制平面与网络平面分离实现动态配置。这种方法通过促进网络入侵检测系统的使用来改善安全性。在入侵检测系统（IDSs）中实现机器学习（ML）技术被广泛用于检测此类攻击，但在数据分布发生变化时存在挑战。概念漂移是描述输入数据与目标值（标签或类）之间关系变化的术语。当某些形式的变化发生时，模型预计会下降。本文中，主要变化形式将是用户行为（特别是攻击者行为的变化）。模型需要适应数据分布的偏差。SDN可以帮助监测数据分布的变化。本文讨论了隐秘攻击者行为的变化。这里描述的工作调查了各种概念漂移检测算法。提出了一种增量混合自适应网络入侵检测系统（NIDS）来解决SDN中概念漂移的问题。它可以检测已知和未知攻击。该模型经过不同数据集的评估，显示出令人满意的结果。

更新时间: 2024-04-01 13:33:40

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.01109v1

MagLive: Near-Field Magnetic Sensing-Based Voice Liveness Detection on Smartphones

Voice authentication has been widely used on smartphones. However, it remains vulnerable to spoofing attacks, where the attacker replays recorded voice samples from authentic humans using loudspeakers to bypass the voice authentication system. In this paper, we present MagLive, a robust voice liveness detection scheme designed for smartphones to mitigate such spoofing attacks. MagLive leverages differences in magnetic field patterns generated by different speakers (i.e., humans or loudspeakers) when speaking for liveness detection. It uses the built-in magnetometer on smartphones to capture these magnetic field changes. Specifically, MagLive utilizes two CNN-based submodels and a self-attention-based feature fusion model to extract effective and robust features. Supervised contrastive learning is then employed to achieve user-irrelevance, device-irrelevance, and content-irrelevance. MagLive imposes no additional burdens on users and does not rely on active sensing or extra devices. We conducted comprehensive experiments with various settings to evaluate the security and robustness of MagLive. Our results demonstrate that MagLive effectively distinguishes between humans and attackers (i.e., loudspeakers), achieving a balanced accuracy of 99.01% and an equal error rate of 0.77%.

Updated: 2024-04-01 13:27:24

标题: MagLive：基于近场磁感应的智能手机语音活体检测

摘要: 语音认证已广泛应用于智能手机上。然而，它仍然容易受到欺骗攻击，攻击者利用扬声器重放录制的真实人类声音样本来绕过语音认证系统。本文提出了MagLive，一种专为智能手机设计的强大的语音活体检测方案，用于减轻此类欺骗攻击。MagLive利用了不同发声者（即人类或扬声器）在说话时产生的磁场模式的差异来进行活体检测。它利用智能手机上的内置磁力计来捕获这些磁场变化。具体来说，MagLive利用了两个基于CNN的子模型和一个基于自注意力的特征融合模型来提取有效和稳健的特征。然后采用监督对比学习来实现用户无关性、设备无关性和内容无关性。MagLive不会给用户增加额外负担，也不依赖于主动感知或额外设备。我们进行了各种设置的全面实验，以评估MagLive的安全性和稳健性。我们的结果表明，MagLive有效区分人类和攻击者（即扬声器），实现了99.01%的平衡准确率和0.77%的等误差率。

更新时间: 2024-04-01 13:27:24

领域: cs.CR

下载: http://arxiv.org/abs/2404.01106v1

Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

Cross-modality image segmentation aims to segment the target modalities using a method designed in the source modality. Deep generative models can translate the target modality images into the source modality, thus enabling cross-modality segmentation. However, a vast body of existing cross-modality image translation methods relies on supervised learning. In this work, we aim to address the challenge of zero-shot learning-based image translation tasks (extreme scenarios in the target modality is unseen in the training phase). To leverage generative learning for zero-shot cross-modality image segmentation, we propose a novel unsupervised image translation method. The framework learns to translate the unseen source image to the target modality for image segmentation by leveraging the inherent statistical consistency between different modalities for diffusion guidance. Our framework captures identical cross-modality features in the statistical domain, offering diffusion guidance without relying on direct mappings between the source and target domains. This advantage allows our method to adapt to changing source domains without the need for retraining, making it highly practical when sufficient labeled source domain data is not available. The proposed framework is validated in zero-shot cross-modality image segmentation tasks through empirical comparisons with influential generative models, including adversarial-based and diffusion-based models.

Updated: 2024-04-01 13:23:04

标题: 基于扩散的零样本医学图像到图像的跨模态分割转换

摘要: 跨模态图像分割旨在使用在源模态中设计的方法对目标模态进行分割。深度生成模型可以将目标模态图像转换为源模态，从而实现跨模态分割。然而，现有大量的跨模态图像翻译方法依赖于监督学习。在这项工作中，我们旨在解决基于零样本学习的图像翻译任务的挑战（目标模态中的极端情况在训练阶段未见）。为了利用生成学习进行零样本跨模态图像分割，我们提出了一种新颖的无监督图像翻译方法。该框架通过利用不同模态之间的固有统计一致性来学习将未见的源图像转换为目标模态进行图像分割的方法。我们的框架在统计领域捕捉相同的跨模态特征，提供扩散引导，而无需依赖于源域和目标域之间的直接映射。这种优势使得我们的方法能够适应源域的变化，而无需重新训练，当没有足够标记的源域数据可用时，这使得它在实践中非常实用。通过与诸如基于对抗性和基于扩散的模型等有影响力的生成模型进行实证比较，验证了提出的框架在零样本跨模态图像分割任务中的有效性。

更新时间: 2024-04-01 13:23:04

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01102v1

UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models

Diffusion Models are vulnerable to backdoor attacks, where malicious attackers inject backdoors by poisoning some parts of the training samples during the training stage. This poses a serious threat to the downstream users, who query the diffusion models through the API or directly download them from the internet. To mitigate the threat of backdoor attacks, there have been a plethora of investigations on backdoor detections. However, none of them designed a specialized backdoor detection method for diffusion models, rendering the area much under-explored. Moreover, these prior methods mainly focus on the traditional neural networks in the classification task, which cannot be adapted to the backdoor detections on the generative task easily. Additionally, most of the prior methods require white-box access to model weights and architectures, or the probability logits as additional information, which are not always practical. In this paper, we propose a Unified Framework for Input-level backdoor Detection (UFID) on the diffusion models, which is motivated by observations in the diffusion models and further validated with a theoretical causality analysis. Extensive experiments across different datasets on both conditional and unconditional diffusion models show that our method achieves a superb performance on detection effectiveness and run-time efficiency. The code is available at https://github.com/GuanZihan/official_UFID.

Updated: 2024-04-01 13:21:05

标题: UFID：一种用于扩散模型输入级后门检测的统一框架

摘要: 扩散模型容易受到后门攻击的影响，在训练阶段，恶意攻击者通过在训练样本的某些部分中注入后门来实施攻击。这给下游用户造成了严重威胁，他们通过 API 查询扩散模型或直接从互联网下载它们。为了减轻后门攻击的威胁，已经进行了大量关于后门检测的调查。然而，没有人为扩散模型设计了专门的后门检测方法，使该领域的研究还不够深入。此外，这些先前的方法主要集中在传统神经网络的分类任务上，很难适应生成任务上的后门检测。此外，大多数先前的方法要求对模型权重和架构有白盒访问权限，或者要求概率对数作为额外信息，这并不总是实际可行的。在本文中，我们提出了一个基于输入级别的后门检测统一框架（UFID）用于扩散模型，这是基于对扩散模型的观察，并通过理论因果分析进一步验证的。在不同数据集上进行的大量实验证明，我们的方法在检测有效性和运行效率方面表现出色。该代码可在https://github.com/GuanZihan/official_UFID 上找到。

更新时间: 2024-04-01 13:21:05

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01101v1

LLM Attributor: Interactive Visual Attribution for LLM Generation

While large language models (LLMs) have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library offers a new way to quickly attribute an LLM's text generation to training data points to inspect model behaviors, enhance its trustworthiness, and compare model-generated text with user-provided text. We describe the visual and interactive design of our tool and highlight usage scenarios for LLaMA2 models fine-tuned with two different datasets: online articles about recent disasters and finance-related question-answer pairs. Thanks to LLM Attributor's broad support for computational notebooks, users can easily integrate it into their workflow to interactively visualize attributions of their models. For easier access and extensibility, we open-source LLM Attributor at https://github.com/poloclub/ LLM-Attribution. The video demo is available at https://youtu.be/mIG2MDQKQxM.

Updated: 2024-04-01 13:16:34

标题: LLM Attributor: 交互式可视化属性标记工具用于LLM生成

摘要: 尽管大型语言模型(LLMs)展示了在不同领域生成令人信服文本的卓越能力，但对其潜在风险的担忧凸显了理解文本生成背后原理的重要性。我们提出了LLM Attributor，一个提供LLM文本生成训练数据归因的交互式可视化的Python库。我们的库提供了一种快速将LLM文本生成归因于训练数据点以检查模型行为、增强其可信度并比较模型生成的文本与用户提供的文本的新方法。我们描述了工具的视觉和交互设计，并强调了使用场景，用于使用两种不同数据集进行微调的LLaMA2模型：关于最近灾难和金融相关问答对的在线文章。由于LLM Attributor对计算笔记本的广泛支持，用户可以轻松将其集成到其工作流程中，以交互式可视化其模型的归因。为了更容易访问和可扩展性，我们在https://github.com/poloclub/LLM-Attribution开源了LLM Attributor。视频演示可在https://youtu.be/mIG2MDQKQxM观看。

更新时间: 2024-04-01 13:16:34

领域: cs.CL,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2404.01361v1

Blind-Touch: Homomorphic Encryption-Based Distributed Neural Network Inference for Privacy-Preserving Fingerprint Authentication

Fingerprint authentication is a popular security mechanism for smartphones and laptops. However, its adoption in web and cloud environments has been limited due to privacy concerns over storing and processing biometric data on servers. This paper introduces Blind-Touch, a novel machine learning-based fingerprint authentication system leveraging homomorphic encryption to address these privacy concerns. Homomorphic encryption allows computations on encrypted data without decrypting. Thus, Blind-Touch can keep fingerprint data encrypted on the server while performing machine learning operations. Blind-Touch combines three strategies to efficiently utilize homomorphic encryption in machine learning: (1) It optimizes the feature vector for a distributed architecture, processing the first fully connected layer (FC-16) in plaintext on the client side and the subsequent layer (FC-1) post-encryption on the server, thereby minimizing encrypted computations; (2) It employs a homomorphic encryption compatible data compression technique capable of handling 8,192 authentication results concurrently; and (3) It utilizes a clustered server architecture to simultaneously process authentication results, thereby enhancing scalability with increasing user numbers. Blind-Touch achieves high accuracy on two benchmark fingerprint datasets, with a 93.6% F1- score for the PolyU dataset and a 98.2% F1-score for the SOKOTO dataset. Moreover, Blind-Touch can match a fingerprint among 5,000 in about 0.65 seconds. With its privacy focused design, high accuracy, and efficiency, Blind-Touch is a promising alternative to conventional fingerprint authentication for web and cloud applications.

Updated: 2024-04-01 13:15:29

标题: 盲触：基于同态加密的分布式神经网络推理，用于保护隐私的指纹认证

摘要: 指纹认证是智能手机和笔记本电脑上流行的安全机制。然而，在网络和云环境中采用这种技术受到限制，因为人们担心在服务器上存储和处理生物特征数据会侵犯隐私。本文介绍了Blind-Touch，这是一种利用同态加密的新型基于机器学习的指纹认证系统，旨在解决这些隐私问题。同态加密允许对加密数据进行计算而无需解密。因此，Blind-Touch可以在服务器上保持指纹数据加密，同时执行机器学习操作。Blind-Touch结合了三种策略，以有效利用同态加密在机器学习中：（1）它优化了分布式架构的特征向量，将第一个全连接层（FC-16）在客户端上以明文处理，后续层（FC-1）在服务器上加密处理，从而最小化加密计算；（2）它采用了一种同态加密兼容的数据压缩技术，可以同时处理8,192个认证结果；和（3）它利用了一个集群服务器架构同时处理认证结果，从而随着用户数量的增加提高了可扩展性。Blind-Touch在两个基准指纹数据集上取得了高准确率，PolyU数据集的F1分数为93.6％，SOKOTO数据集的F1分数为98.2％。此外，Blind-Touch可以在大约0.65秒内从5,000个指纹中匹配一个指纹。凭借其以隐私为重点的设计、高准确性和效率，Blind-Touch对于网络和云应用来说是传统指纹认证的有前途的替代方案。

更新时间: 2024-04-01 13:15:29

领域: cs.CR

下载: http://arxiv.org/abs/2312.11575v2

Finite Sample Frequency Domain Identification

We study non-parametric frequency-domain system identification from a finite-sample perspective. We assume an open loop scenario where the excitation input is periodic and consider the Empirical Transfer Function Estimate (ETFE), where the goal is to estimate the frequency response at certain desired (evenly-spaced) frequencies, given input-output samples. We show that under sub-Gaussian colored noise (in time-domain) and stability assumptions, the ETFE estimates are concentrated around the true values. The error rate is of the order of $\mathcal{O}((d_{\mathrm{u}}+\sqrt{d_{\mathrm{u}}d_{\mathrm{y}}})\sqrt{M/N_{\mathrm{tot}}})$, where $N_{\mathrm{tot}}$ is the total number of samples, $M$ is the number of desired frequencies, and $d_{\mathrm{u}},\,d_{\mathrm{y}}$ are the dimensions of the input and output signals respectively. This rate remains valid for general irrational transfer functions and does not require a finite order state-space representation. By tuning $M$, we obtain a $N_{\mathrm{tot}}^{-1/3}$ finite-sample rate for learning the frequency response over all frequencies in the $ \mathcal{H}_{\infty}$ norm. Our result draws upon an extension of the Hanson-Wright inequality to semi-infinite matrices. We study the finite-sample behavior of ETFE in simulations.

Updated: 2024-04-01 13:13:25

标题: 有限样本频域识别

摘要: 我们从有限样本的角度研究了非参数频域系统辨识。我们假设一个开环情景，其中激励输入是周期性的，并考虑经验传递函数估计（ETFE），其目标是在给定输入输出样本的情况下估计在某些期望的（均匀间隔）频率处的频率响应。我们证明，在次高斯彩色噪声（在时间域）和稳定性假设下，ETFE估计值集中在真实值周围。误差率是$O((d_{\mathrm{u}}+\sqrt{d_{\mathrm{u}}d_{\mathrm{y}}})\sqrt{M/N_{\mathrm{tot}}})$的量级，其中$N_{\mathrm{tot}}$是样本总数，$M$是所需频率的数量，$d_{\mathrm{u}},\,d_{\mathrm{y}}$分别是输入和输出信号的维度。这个速率对于一般的无理传递函数仍然有效，并且不需要有限阶状态空间表示。通过调整$M$，我们获得了在$ \mathcal{H}_{\infty}$范数下学习所有频率的频率响应的$N_{\mathrm{tot}}^{-1/3}$有限样本速率。我们的结果基于将Hanson-Wright不等式推广到半无限矩阵。我们通过模拟研究了ETFE的有限样本行为。

更新时间: 2024-04-01 13:13:25

领域: eess.SY,cs.LG,cs.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2404.01100v1

What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety

Current Large Language Models (LLMs), even those tuned for safety and alignment, are susceptible to jailbreaking. Some have found that just further fine-tuning an aligned model with benign data (i.e., data without harmful content) surprisingly leads to substantial degradation in safety. We delve into the data-centric aspects of why benign fine-tuning inadvertently contributes to jailbreaking. First, we represent fine-tuning data through two lenses: representation and gradient spaces. Furthermore, we propose a bi-directional anchoring method that prioritizes data points that are close to harmful examples and distant from benign ones. By doing so, our approach effectively identifies subsets of benign data that are more likely to degrade the model's safety after fine-tuning. Training on just 100 of these seemingly benign datapoints can lead to the fine-tuned model affirmatively responding to > 70% of tested harmful requests, compared to < 20% after fine-tuning on randomly selected data. We further find that selected data are often in the form of lists and bullet points, or math questions.

Updated: 2024-04-01 13:12:30

标题: 您的“安全”数据中有什么？：识别破坏安全性的良性数据

摘要: 目前的大型语言模型（LLMs），即使调整为安全和对齐，也容易被越狱。一些人发现，仅仅通过使用良性数据（即没有有害内容的数据）对齐模型进行进一步微调，会导致安全性大幅下降，这让人感到惊讶。我们深入探讨了为什么良性微调会无意中导致越狱的数据中心方面。首先，我们通过两种视角来表示微调数据：表示和梯度空间。此外，我们提出了一种双向锚定方法，优先考虑那些接近有害示例而远离良性示例的数据点。通过这样做，我们的方法有效地识别出在微调后更有可能降低模型安全性的良性数据子集。仅仅在这些看似良性的100个数据点上进行训练，可以使微调后的模型对超过70%的测试有害请求作出积极回应，而随机选择数据进行微调后的情况则不到20%。我们进一步发现，所选数据通常是列表和项目符号，或者数学问题。

更新时间: 2024-04-01 13:12:30

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2404.01099v1

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Contrastive pretraining of image-text foundation models, such as CLIP, demonstrated excellent zero-shot performance and improved robustness on a wide range of downstream tasks. However, these models utilize large transformer-based encoders with significant memory and latency overhead which pose challenges for deployment on mobile devices. In this work, we introduce MobileCLIP -- a new family of efficient image-text models optimized for runtime performance along with a novel and efficient training approach, namely multi-modal reinforced training. The proposed training approach leverages knowledge transfer from an image captioning model and an ensemble of strong CLIP encoders to improve the accuracy of efficient models. Our approach avoids train-time compute overhead by storing the additional knowledge in a reinforced dataset. MobileCLIP sets a new state-of-the-art latency-accuracy tradeoff for zero-shot classification and retrieval tasks on several datasets. Our MobileCLIP-S2 variant is 2.3$\times$ faster while more accurate compared to previous best CLIP model based on ViT-B/16. We further demonstrate the effectiveness of our multi-modal reinforced training by training a CLIP model based on ViT-B/16 image backbone and achieving +2.9% average performance improvement on 38 evaluation benchmarks compared to the previous best. Moreover, we show that the proposed approach achieves 10$\times$-1000$\times$ improved learning efficiency when compared with non-reinforced CLIP training. Code and models are available at https://github.com/apple/ml-mobileclip .

Updated: 2024-04-01 13:06:06

标题: MobileCLIP：通过多模态强化训练实现快速图像-文本模型

摘要: 对比预训练的图像-文本基础模型，如CLIP，展示了出色的零样本性能，并在各种下游任务中提高了鲁棒性。然而，这些模型利用了具有显着内存和延迟开销的大型基于transformer的编码器，这对于在移动设备上部署提出了挑战。在这项工作中，我们介绍了MobileCLIP——一种新的高效图像-文本模型系列，针对运行时性能进行了优化，并提出了一种新颖而高效的训练方法，即多模式强化训练。所提出的训练方法利用了从图像字幕模型和一组强大的CLIP编码器的知识转移，以提高高效模型的准确性。我们的方法通过在强化数据集中存储附加知识来避免训练时的计算开销。MobileCLIP为零样本分类和检索任务在几个数据集上设定了新的延迟-准确性权衡的最新水平。我们的MobileCLIP-S2变体比基于ViT-B/16的先前最佳CLIP模型快2.3倍，同时更准确。我们进一步通过训练一个基于ViT-B/16图像骨干的CLIP模型，比以前最佳的平均性能提高了+2.9%。此外，我们表明所提出的方法在与非强化CLIP训练相比，学习效率提高了10倍至1000倍。代码和模型可在https://github.com/apple/ml-mobileclip 上找到。

更新时间: 2024-04-01 13:06:06

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.17049v2

Event Concealment and Concealability Enforcement in Discrete Event Systems Under Partial Observation

Inspired by privacy problems where the behavior of a system should not be revealed to an external curious observer, we investigate event concealment and concealability enforcement in discrete event systems modeled as non-deterministic finite automata under partial observation. Given a subset of secret events in a given system, concealability holds if the occurrence of all secret events remains hidden to a curious observer (an eavesdropper). A secret event is said to be (at least under some executions) unconcealable (inferable) if its occurrence can be indirectly determined with certainty after a finite number of observations. When concealability of a system does not hold (i.e., one or more secret events are unconcealable), we analyze how a defender, placed at the interface of the system with the eavesdropper, can be used to enforce concealability. The defender takes as input each observed event of the system and outputs a carefully modified event sequence (seen by the eavesdropper) using event deletion, insertion, or replacement. The defender is said to be C-enforceable if, following the occurrence of the secret events and regardless of subsequent activity generated by the system, it can always deploy a strategy to manipulate observations and conceal the events perpetually. We discuss systematic procedures to detect the presence of unconcealable secret events and verify C-Enforceability using techniques from state estimation and event diagnosis. We also propose a polynomial complexity construction for obtaining one necessary and one sufficient condition for C-Enforceability.

Updated: 2024-04-01 12:58:00

标题: 离散事件系统在部分观察下的事件隐藏和可隐藏性执行

摘要: 受到隐私问题的启发，其中系统的行为不应该被外部好奇的观察者揭示，我们研究了模型为非确定性有限自动机的离散事件系统中的事件隐藏和可隐藏性强制问题。在给定系统中的一组秘密事件的情况下，如果所有秘密事件的发生对于好奇的观察者（窃听者）都保持隐藏，则可隐藏性成立。如果一个秘密事件在一些执行中是不可隐藏的（可推断的），则其发生在有限次观察后可以间接确定。当系统的可隐藏性不成立时（即，一个或多个秘密事件是不可隐藏的），我们分析了一个位于系统与窃听者之间的防御者如何被用来强制可隐藏性。防御者以系统的每个观察事件作为输入，并通过事件的删除、插入或替换输出一个经过精心修改的事件序列（被窃听者看到）。如果防御者在秘密事件发生后，无论系统生成的后续活动如何，都能够始终部署一种策略来操纵观察并永久隐藏事件，则称之为C-可执行。我们讨论了检测不可隐藏秘密事件的存在以及使用状态估计和事件诊断技术验证C-可执行性的系统化程序。我们还提出了一个多项式复杂度的构造，用于获得C-可执行性的一个必要条件和一个充分条件。

更新时间: 2024-04-01 12:58:00

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2205.03170v2

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

Image-based virtual try-on is an increasingly important task for online shopping. It aims to synthesize images of a specific person wearing a specified garment. Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks. However, these approaches usually employ additional image encoders and rely on the cross-attention mechanism for texture transfer from the garment to the person image, which affects the try-on's efficiency and fidelity. To address these issues, we propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results and introduces no additional image encoders. Accordingly, we make contributions from two aspects. First, we propose to concatenate the masked person and reference garment images along the spatial dimension and utilize the resulting image as the input for the diffusion model's denoising UNet. This enables the original self-attention layers contained in the diffusion model to achieve efficient and accurate texture transfer. Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images, further enhancing the reliability of the try-on results. In addition, we integrate mask prediction and image synthesis into a single compact model. The experimental results show that our approach can be applied to various try-on tasks, e.g., garment-to-person and person-to-person try-ons, and significantly outperforms state-of-the-art methods on popular VITON, VITON-HD databases.

Updated: 2024-04-01 12:43:22

标题: 高保真虚拟试穿的纹理保护扩散模型

摘要: 基于图像的虚拟试穿是在线购物中越来越重要的任务。它的目标是合成特定人物穿着特定服装的图像。最近，基于扩散模型的方法变得流行起来，因为它们在图像合成任务中表现出色。然而，这些方法通常使用额外的图像编码器，并依赖于交叉注意机制从服装到人物图像进行纹理传输，这影响了试穿的效率和保真度。为了解决这些问题，我们提出了一种用于虚拟试穿的保持纹理的扩散（TPD）模型，它增强了结果的保真度，并不引入额外的图像编码器。因此，我们从两个方面做出了贡献。首先，我们建议沿空间维度连接遮罩人物和参考服装图像，并利用结果图像作为扩散模型去噪 UNet 的输入。这使得扩散模型中原始的自注意层能够实现高效准确的纹理传输。其次，我们提出了一种基于扩散的新方法，根据人物和参考服装图像预测精确的修复遮罩，进一步增强了试穿结果的可靠性。此外，我们将遮罩预测和图像合成集成到一个单一紧凑的模型中。实验结果表明，我们的方法可以应用于各种试穿任务，例如服装到人物和人物到人物的试穿，并在流行的 VITON、VITON-HD 数据库上明显优于最先进的方法。

更新时间: 2024-04-01 12:43:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01089v1

Valid prediction intervals for regression problems

Over the last few decades, various methods have been proposed for estimating prediction intervals in regression settings, including Bayesian methods, ensemble methods, direct interval estimation methods and conformal prediction methods. An important issue is the calibration of these methods: the generated prediction intervals should have a predefined coverage level, without being overly conservative. In this work, we review the above four classes of methods from a conceptual and experimental point of view. Results on benchmark data sets from various domains highlight large fluctuations in performance from one data set to another. These observations can be attributed to the violation of certain assumptions that are inherent to some classes of methods. We illustrate how conformal prediction can be used as a general calibration procedure for methods that deliver poor results without a calibration step.

Updated: 2024-04-01 12:30:49

标题: 回归问题的有效预测区间

摘要: 在过去的几十年中，已经提出了各种方法来估计回归设置中的预测区间，包括贝叶斯方法、集成方法、直接区间估计方法和符合预测方法。一个重要问题是这些方法的校准：生成的预测区间应该具有预定义的覆盖水平，而不应过于保守。在本文中，我们从概念和实验的角度回顾了上述四类方法。来自各个领域的基准数据集的结果突显了从一个数据集到另一个数据集的性能大幅波动。这些观察结果可以归因于某些方法类别固有假设的违反。我们阐明了如何将符合预测用作那些在没有校准步骤的情况下提供糟糕结果的方法的一般校准程序。

更新时间: 2024-04-01 12:30:49

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2107.00363v4

AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking Puzzles

In this paper, we outline our submission for the SemEval-2024 Task 9 competition: 'BRAINTEASER: A Novel Task Defying Common Sense'. We engage in both sub-tasks: Sub-task A-Sentence Puzzle and Sub-task B-Word Puzzle. We evaluate a plethora of pre-trained transformer-based language models of different sizes through fine-tuning. Subsequently, we undertake an analysis of their scores and responses to aid future researchers in understanding and utilizing these models effectively. Our top-performing approaches secured competitive positions on the competition leaderboard across both sub-tasks. In the evaluation phase, our best submission attained an average accuracy score of 81.7% in the Sentence Puzzle, and 85.4% in the Word Puzzle, significantly outperforming the best neural baseline (ChatGPT) by more than 20% and 30% respectively.

Updated: 2024-04-01 12:27:55

标题: AILS-NTUA参加SemEval-2024任务9：破解脑筋急转弯：变压器模型用于横向思维谜题

摘要: 在这篇论文中，我们概述了我们提交给SemEval-2024任务9竞赛的作品：“BRAINTEASER：一个打破常识的新任务”。我们参与了子任务A-句子拼图和子任务B-单词拼图。我们通过微调评估了各种不同尺寸的预训练变压器语言模型。随后，我们进行了对它们的得分和响应的分析，以帮助未来的研究人员有效地理解和利用这些模型。我们的表现最好的方法在两个子任务中都在竞赛排行榜上获得了竞争力的位置。在评估阶段，我们最好的提交作品在句子拼图中获得了平均准确率为81.7%，在单词拼图中为85.4%，分别比最好的神经基线（ChatGPT）高出20%以上和30%以上。

更新时间: 2024-04-01 12:27:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01084v1

BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System

Text plagiarism detection task is a common natural language processing task that aims to detect whether a given text contains plagiarism or copying from other texts. In existing research, detection of high level plagiarism is still a challenge due to the lack of high quality datasets. In this paper, we propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs of text plagiarism detection datasets covering a wide range of plagiarism methods, bridging the gap in this part of research. Meanwhile, we propose a plagiarism identification method based on Faiss with BERT with high efficiency and high accuracy. Our experiments show that the performance of this model outperforms other models in several metrics, including 98.86\%, 98.90%, 98.86%, and 0.9888 for Accuracy, Precision, Recall, and F1 Score, respectively. At the end, we also provide a user-friendly demo platform that allows users to upload a text library and intuitively participate in the plagiarism analysis.

Updated: 2024-04-01 12:20:34

标题: BERT增强检索工具用于作业抄袭检测系统

摘要: 文本抄袭检测任务是一项常见的自然语言处理任务，旨在检测给定文本是否包含抄袭或复制其他文本的内容。在现有研究中，检测高级别抄袭仍然是一个挑战，因为缺乏高质量的数据集。本文提出了一种基于GPT-3.5的抄袭文本数据生成方法，生成了32,927对文本抄袭检测数据集，涵盖了各种抄袭方法，填补了这一研究领域的空白。同时，我们提出了一种基于Faiss和BERT的高效高准确性的抄袭识别方法。我们的实验表明，该模型在几个指标上优于其他模型，包括准确率、精确度、召回率和F1分数分别为98.86％、98.90％、98.86％和0.9888。最后，我们还提供了一个用户友好的演示平台，允许用户上传文本库并直观地参与抄袭分析。

更新时间: 2024-04-01 12:20:34

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2404.01582v1

Energy Model-based Accurate Shapley Value Estimation for Interpretable Deep Learning Predictive Modelling

As a favorable tool for explainable artificial intelligence (XAI), Shapley value has been widely used to interpret deep learning based predictive models. However, accurate and efficient estimation of Shapley value is a difficult task since the computation load grows exponentially with the increase of input features. Most existing accelerated Shapley value estimation methods have to compromise on estimation accuracy with efficiency. In this article, we present EmSHAP(Energy model-based Shapley value estimation), which can effectively approximate the expectation of Shapley contribution function/deep learning model under arbitrary subset of features given the rest. In order to determine the proposal conditional distribution in the energy model, a gated recurrent unit(GRU) is introduced by mapping the input features onto a hidden space, so that the impact of input feature orderings can be eliminated. In addition, a dynamic masking scheme is proposed to improve the generalization ability. It is proved in Theorems 1, 2 and 3 that EmSHAP achieves tighter error bound than state-of-the-art methods like KernelSHAP and VAEAC, leading to higher estimation accuracy. Finally, case studies on a medical application and an industrial application show that the proposed Shapley value-based explainable framework exhibits enhanced estimation accuracy without compromise on efficiency.

Updated: 2024-04-01 12:19:33

标题: 基于能量模型的准确Shapley值估计，用于解释性深度学习预测建模

摘要: 作为可解释人工智能（XAI）的有利工具，Shapley值已被广泛用于解释基于深度学习的预测模型。然而，准确和高效地估计Shapley值是一项困难的任务，因为随着输入特征的增加，计算负载呈指数增长。大多数现有的加速Shapley值估计方法不得不在估计精度和效率之间进行妥协。本文提出了EmSHAP（基于能量模型的Shapley值估计），可以有效地近似给定其余特征集的Shapley贡献函数/深度学习模型的期望。为了确定能量模型中的提议条件分布，引入了门控循环单元（GRU），通过将输入特征映射到隐藏空间，从而消除输入特征排序的影响。此外，提出了一种动态掩模方案以提高泛化能力。在定理1、2和3中证明了EmSHAP比KernelSHAP和VAEAC等最先进方法实现了更紧密的误差界限，从而实现更高的估计精度。最后，对医疗应用和工业应用进行的案例研究表明，所提出的基于Shapley值的可解释框架在不牺牲效率的情况下表现出增强的估计精度。

更新时间: 2024-04-01 12:19:33

领域: cs.LG

下载: http://arxiv.org/abs/2404.01078v1

Prompt Learning for Oriented Power Transmission Tower Detection in High-Resolution SAR Images

Detecting transmission towers from synthetic aperture radar (SAR) images remains a challenging task due to the comparatively small size and side-looking geometry, with background clutter interference frequently hindering tower identification. A large number of interfering signals superimposes the return signal from the tower. We found that localizing or prompting positions of power transmission towers is beneficial to address this obstacle. Based on this revelation, this paper introduces prompt learning into the oriented object detector (P2Det) for multimodal information learning. P2Det contains the sparse prompt coding and cross-attention between the multimodal data. Specifically, the sparse prompt encoder (SPE) is proposed to represent point locations, converting prompts into sparse embeddings. The image embeddings are generated through the Transformer layers. Then a two-way fusion module (TWFM) is proposed to calculate the cross-attention of the two different embeddings. The interaction of image-level and prompt-level features is utilized to address the clutter interference. A shape-adaptive refinement module (SARM) is proposed to reduce the effect of aspect ratio. Extensive experiments demonstrated the effectiveness of the proposed model on high-resolution SAR images. P2Det provides a novel insight for multimodal object detection due to its competitive performance.

Updated: 2024-04-01 12:16:00

标题: 高分辨率SAR图像中定向电力输电塔检测的快速学习

摘要: 从合成孔径雷达（SAR）图像中检测输电塔仍然是一个具有挑战性的任务，这是由于其相对较小的尺寸和侧视几何结构，背景杂波干扰经常会阻碍塔的识别。大量的干扰信号叠加在塔的返回信号上。我们发现，定位或提示电力输电塔的位置有助于解决这一障碍。基于这一发现，本文将快速学习引入定向物体检测器（P2Det）中进行多模态信息学习。P2Det包含稀疏提示编码和多模态数据之间的交叉注意力。具体而言，提出了稀疏提示编码器（SPE）来表示点位置，将提示转换为稀疏嵌入。图像嵌入是通过Transformer层生成的。然后提出了一个双向融合模块（TWFM）来计算两种不同嵌入的交叉注意力。利用图像级和提示级特征的相互作用来应对杂波干扰。提出了一个形状自适应细化模块（SARM）来减少长宽比的影响。大量实验证明了所提出模型在高分辨率SAR图像上的有效性。P2Det通过其竞争性能提供了一种多模态目标检测的新视角。

更新时间: 2024-04-01 12:16:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01074v1

GI-PIP: Do We Require Impractical Auxiliary Dataset for Gradient Inversion Attacks?

Deep gradient inversion attacks expose a serious threat to Federated Learning (FL) by accurately recovering private data from shared gradients. However, the state-of-the-art heavily relies on impractical assumptions to access excessive auxiliary data, which violates the basic data partitioning principle of FL. In this paper, a novel method, Gradient Inversion Attack using Practical Image Prior (GI-PIP), is proposed under a revised threat model. GI-PIP exploits anomaly detection models to capture the underlying distribution from fewer data, while GAN-based methods consume significant more data to synthesize images. The extracted distribution is then leveraged to regulate the attack process as Anomaly Score loss. Experimental results show that GI-PIP achieves a 16.12 dB PSNR recovery using only 3.8% data of ImageNet, while GAN-based methods necessitate over 70%. Moreover, GI-PIP exhibits superior capability on distribution generalization compared to GAN-based methods. Our approach significantly alleviates the auxiliary data requirement on both amount and distribution in gradient inversion attacks, hence posing more substantial threat to real-world FL.

Updated: 2024-04-01 12:15:44

标题: GI-PIP：我们在梯度反转攻击中需要不切实际的辅助数据集吗？

摘要: 深度梯度反演攻击揭示了对联合学习(Federated Learning, FL)构成严重威胁，能够准确地从共享梯度中恢复私人数据。然而，现有技术在获取过多辅助数据方面依赖于不切实际的假设，违反了FL的基本数据分区原则。本文提出了一种新颖的方法，即使用实用图像先验的梯度反演攻击(GI-PIP)，在一个修订的威胁模型下。GI-PIP利用异常检测模型从较少的数据中捕捉潜在分布，而基于GAN的方法需要消耗大量数据来合成图像。提取的分布随后被利用来调节攻击过程作为异常分数损失。实验结果表明，GI-PIP仅使用ImageNet的3.8%数据就能实现16.12 dB的PSNR恢复，而基于GAN的方法则需要超过70%的数据。此外，与基于GAN的方法相比，GI-PIP在分布泛化能力上表现出更好的能力。我们的方法显著减轻了梯度反演攻击中对辅助数据的要求，无论是数量还是分布，因此对现实世界的FL构成更大的威胁。

更新时间: 2024-04-01 12:15:44

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.11748v3

Advancing AI with Integrity: Ethical Challenges and Solutions in Neural Machine Translation

This paper addresses the ethical challenges of Artificial Intelligence in Neural Machine Translation (NMT) systems, emphasizing the imperative for developers to ensure fairness and cultural sensitivity. We investigate the ethical competence of AI models in NMT, examining the Ethical considerations at each stage of NMT development, including data handling, privacy, data ownership, and consent. We identify and address ethical issues through empirical studies. These include employing Transformer models for Luganda-English translations and enhancing efficiency with sentence mini-batching. And complementary studies that refine data labeling techniques and fine-tune BERT and Longformer models for analyzing Luganda and English social media content. Our second approach is a literature review from databases such as Google Scholar and platforms like GitHub. Additionally, the paper probes the distribution of responsibility between AI systems and humans, underscoring the essential role of human oversight in upholding NMT ethical standards. Incorporating a biblical perspective, we discuss the societal impact of NMT and the broader ethical responsibilities of developers, positing them as stewards accountable for the societal repercussions of their creations.

Updated: 2024-04-01 12:03:35

标题: 以诚信推进人工智能：神经机器翻译中的道德挑战和解决方案

摘要: 本文讨论了人工智能在神经机器翻译（NMT）系统中的道德挑战，强调开发者确保公平性和文化敏感性的必要性。我们调查了NMT中AI模型的道德素养，审查了NMT开发的每个阶段的道德考虑，包括数据处理、隐私、数据所有权和同意。我们通过实证研究确定和解决了道德问题。这些问题包括使用Transformer模型进行卢干达语-英语翻译，并通过句子小批量处理提高效率。以及通过细化数据标记技术和微调BERT和Longformer模型分析卢干达语和英语社交媒体内容的互补研究。我们的第二个方法是从Google Scholar等数据库和GitHub等平台进行文献综述。此外，本文探讨了AI系统和人类之间责任的分配，强调人类监督在维护NMT道德标准中的重要作用。结合圣经的观点，我们讨论了NMT的社会影响以及开发者更广泛的道德责任，将他们定位为对其创造的社会后果负有责任的管理者。

更新时间: 2024-04-01 12:03:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01070v1

Towards White Box Deep Learning

This paper introduces semantic features as a candidate conceptual framework for white-box neural networks. A proof of concept model for informative subproblem of MNIST consists of 4 such layers with the total of 5K learnable parameters. The model is well-motivated, inherently interpretable, requires little hyperparameter tuning and achieves almost human-level adversarial test metrics - with no form of adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn

Updated: 2024-04-01 12:03:24

标题: 朝着白盒深度学习前进

摘要: 本文介绍了语义特征作为白盒神经网络的候选概念框架。一个用于MNIST信息子问题的概念验证模型包括4个这样的层，总共有5K可学习参数。该模型动机充分，固有可解释性，需要很少的超参数调整，并且在没有任何形式的对抗性训练的情况下实现了几乎人类级别的对抗性测试指标！这些结果以及方法的一般性质有待进一步研究语义特征。代码可在https://github.com/314-Foundation/white-box-nn 上找到。

更新时间: 2024-04-01 12:03:24

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2403.09863v3

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk motivates safety efforts such as red-teaming and large-scale feedback learning, which aim to make models both helpful and harmless. However, there is a tension between these two objectives, since harmlessness requires models to refuse to comply with unsafe prompts, and thus not be helpful. Recent anecdotal evidence suggests that some models may have struck a poor balance, so that even clearly safe prompts are refused if they use similar language to unsafe prompts or mention sensitive topics. In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviours in a systematic way. XSTest comprises 250 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with, and 200 unsafe prompts as contrasts that models, for most applications, should refuse. We describe XSTest's creation and composition, and then use the test suite to highlight systematic failure modes in state-of-the-art language models as well as more general challenges in building safer language models.

Updated: 2024-04-01 11:50:35

标题: XSTest：用于识别大型语言模型中夸大安全行为的测试套件

摘要: 没有适当的保障措施，大型语言模型很容易遵循恶意指令并生成有毒内容。这种风险促使安全工作，如红队测试和大规模反馈学习，旨在使模型既有用又无害。然而，这两个目标之间存在紧张关系，因为无害性要求模型拒绝遵循不安全的提示，因此不会有帮助。最近的一些个例证据表明，一些模型可能达到了一个不良的平衡，以至于即使是明显安全的提示，如果使用类似不安全提示的语言或提及敏感话题也会被拒绝。在本文中，我们引入了一个名为XSTest的新测试套件，以系统化地识别这种夸张的安全行为。XSTest包括250个安全提示，涵盖十种提示类型，对于良好校准的模型来说，不应该拒绝遵从，并且包括200个不安全提示作为对比，对于大多数应用程序来说，模型应该拒绝。我们描述了XSTest的创建和组成，然后使用测试套件来凸显最先进的语言模型中的系统性失败模式，以及构建更安全语言模型时所面临的更一般挑战。

更新时间: 2024-04-01 11:50:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2308.01263v3

A comparison of Single- and Double-generator formalisms for Thermodynamics-Informed Neural Networks

The development of inductive biases has been shown to be a very effective way to increase the accuracy and robustness of neural networks, particularly when they are used to predict physical phenomena. These biases significantly increase the certainty of predictions, decrease the error made and allow considerably smaller datasets to be used. There are a multitude of methods in the literature to develop these biases. One of the most effective ways, when dealing with physical phenomena, is to introduce physical principles of recognised validity into the network architecture. The problem becomes more complex without knowledge of the physical principles governing the phenomena under study. A very interesting possibility then is to turn to the principles of thermodynamics, which are universally valid, regardless of the level of abstraction of the description sought for the phenomenon under study. To ensure compliance with the principles of thermodynamics, there are formulations that have a long tradition in many branches of science. In the field of rheology, for example, two main types of formalisms are used to ensure compliance with these principles: one-generator and two-generator formalisms. In this paper we study the advantages and disadvantages of each, using classical problems with known solutions and synthetic data.

Updated: 2024-04-01 11:48:03

标题: 热力学知识驱动神经网络中单发电机和双发电机形式主义的比较

摘要: 研究表明，引入归纳偏差是增加神经网络准确性和稳健性的一种非常有效的方法，特别是在预测物理现象时。这些偏差显著增加了预测的确定性，减少了错误，并允许使用相对较小的数据集。文献中有许多方法可以开发这些偏差。在处理物理现象时，其中一种最有效的方法是将被公认为有效性的物理原理引入网络架构中。在没有了解所研究现象的物理原理的情况下，问题变得更加复杂。一个非常有趣的可能性是转向热力学原理，这些原理是普遍有效的，无论所研究现象描述的抽象级别如何。为确保符合热力学原理，有许多在许多科学领域中具有悠久传统的表述。例如，在流变学领域，有两种主要类型的形式主义用于确保符合这些原理：单生成器和双生成器形式主义。在本文中，我们使用具有已知解决方案和合成数据的经典问题来研究每种形式主义的优缺点。

更新时间: 2024-04-01 11:48:03

领域: cs.LG,I.2.6, K.3.2

下载: http://arxiv.org/abs/2404.01060v1

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

With the remarkable advent of text-to-image diffusion models, image editing methods have become more diverse and continue to evolve. A promising recent approach in this realm is Delta Denoising Score (DDS) - an image editing technique based on Score Distillation Sampling (SDS) framework that leverages the rich generative prior of text-to-image diffusion models. However, relying solely on the difference between scoring functions is insufficient for preserving specific structural elements from the original image, a crucial aspect of image editing. To address this, here we present an embarrassingly simple yet very powerful modification of DDS, called Contrastive Denoising Score (CDS), for latent diffusion models (LDM). Inspired by the similarities and differences between DDS and the contrastive learning for unpaired image-to-image translation(CUT), we introduce a straightforward approach using CUT loss within the DDS framework. Rather than employing auxiliary networks as in the original CUT approach, we leverage the intermediate features of LDM, specifically those from the self-attention layers, which possesses rich spatial information. Our approach enables zero-shot image-to-image translation and neural radiance field (NeRF) editing, achieving structural correspondence between the input and output while maintaining content controllability. Qualitative results and comparisons demonstrates the effectiveness of our proposed method. Project page: https://hyelinnam.github.io/CDS/

Updated: 2024-04-01 11:44:25

标题: 文本引导的潜在扩散图像编辑的对比去噪分数

摘要: 随着文本到图像扩散模型的显著出现，图像编辑方法变得更加多样化并不断发展。在这个领域中一个有前途的最近方法是Delta Denoising Score (DDS) - 一种基于Score Distillation Sampling (SDS)框架的图像编辑技术，利用了文本到图像扩散模型的丰富生成先验。然而，仅依赖于评分函数之间的差异是不足以保留原始图像的特定结构元素的，这是图像编辑的一个关键方面。为了解决这个问题，我们在这里提出了DDS的一个尴尬简单但非常强大的修改，称为Contrastive Denoising Score (CDS)，用于潜在扩散模型（LDM）。受到DDS和非配对图像到图像翻译的对比学习（CUT）之间的相似和不同之处的启发，我们介绍了一种在DDS框架内使用CUT损失的简单方法。与原始CUT方法中使用辅助网络不同，我们利用LDM的中间特征，特别是来自自注意力层的那些，这些层具有丰富的空间信息。我们的方法实现了零样本图像到图像翻译和神经辐射场（NeRF）编辑，实现了输入和输出之间的结构对应，同时保持内容可控性。定性结果和比较展示了我们提出方法的有效性。项目页面：https://hyelinnam.github.io/CDS/

更新时间: 2024-04-01 11:44:25

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.18608v2

Harnessing Data and Physics for Deep Learning Phase Recovery

Phase recovery, calculating the phase of a light wave from its intensity measurements, is essential for various applications, such as coherent diffraction imaging, adaptive optics, and biomedical imaging. It enables the reconstruction of an object's refractive index distribution or topography as well as the correction of imaging system aberrations. In recent years, deep learning has been proven to be highly effective in addressing phase recovery problems. Two main deep learning phase recovery strategies are data-driven (DD) with supervised learning mode and physics-driven (PD) with self-supervised learning mode. DD and PD achieve the same goal in different ways and lack the necessary study to reveal similarities and differences. Therefore, in this paper, we comprehensively compare these two deep learning phase recovery strategies in terms of time consumption, accuracy, generalization ability, ill-posedness adaptability, and prior capacity. What's more, we propose a co-driven (CD) strategy of combining datasets and physics for the balance of high- and low-frequency information. The codes for DD, PD, and CD are publicly available at https://github.com/kqwang/DLPR.

Updated: 2024-04-01 11:42:43

标题: 利用数据和物理学进行深度学习相位恢复

摘要: 相位恢复是从光强度测量中计算出光波的相位，对于各种应用非常重要，如相干衍射成像、自适应光学和生物医学成像。它可以重建物体的折射率分布或地形，以及校正成像系统的像差。近年来，深度学习已被证明在解决相位恢复问题方面非常有效。两种主要的深度学习相位恢复策略是基于数据驱动的（DD）使用监督学习模式和基于物理驱动的（PD）使用自监督学习模式。DD和PD以不同的方式实现相同的目标，但缺乏必要的研究来揭示相似性和差异。因此，在本文中，我们全面比较这两种深度学习相位恢复策略在时间消耗、准确性、泛化能力、不适定性适应性和先验容量方面的差异。此外，我们提出了一种结合数据集和物理知识以平衡高低频信息的共驱动（CD）策略。DD、PD和CD的代码可在https://github.com/kqwang/DLPR 上公开获取。

更新时间: 2024-04-01 11:42:43

领域: eess.IV,cs.LG,physics.optics

下载: http://arxiv.org/abs/2404.01360v1

A Novel Audio Representation for Music Genre Identification in MIR

For Music Information Retrieval downstream tasks, the most common audio representation is time-frequency-based, such as Mel spectrograms. In order to identify musical genres, this study explores the possibilities of a new form of audio representation one of the most usual MIR downstream tasks. Therefore, to discretely encoding music using deep vector quantization; a novel audio representation was created for the innovative generative music model i.e. Jukebox. The effectiveness of Jukebox's audio representation is compared to Mel spectrograms using a dataset that is almost equivalent to State-of-the-Art (SOTA) and an almost same transformer design. The results of this study imply that, at least when the transformers are pretrained using a very modest dataset of 20k tracks, Jukebox's audio representation is not superior to Mel spectrograms. This could be explained by the fact that Jukebox's audio representation does not sufficiently take into account the peculiarities of human hearing perception. On the other hand, Mel spectrograms are specifically created with the human auditory sense in mind.

Updated: 2024-04-01 11:40:09

标题: 一个新的音频表示法用于音乐流派在音乐信息检索中的识别

摘要: 在音乐信息检索下游任务中，最常见的音频表示是基于时间-频率的，例如Mel频谱图。为了识别音乐流派，本研究探讨了一种新形式的音频表示的可能性，这是最常见的MIR下游任务之一。因此，为了使用深度向量量化离散地编码音乐；为创新的生成音乐模型即Jukebox创建了一种新的音频表示。将Jukebox的音频表示的有效性与Mel频谱图进行了比较，使用了一个几乎等同于最先进技术(SOTA)的数据集和一个几乎相同的变压器设计。本研究的结果暗示，至少在变压器使用一个非常适度的数据集（20k音轨）进行预训练时，Jukebox的音频表示并不优于Mel频谱图。这可能是因为Jukebox的音频表示没有充分考虑人类听觉感知的特殊性。另一方面，Mel频谱图是专门考虑到人类听觉感知而创建的。

更新时间: 2024-04-01 11:40:09

领域: cs.SD,cs.IR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2404.01058v1

Unlocking Emergent Modularity in Large Language Models

Modular Neural Networks (MNNs) demonstrate various advantages over monolithic models. Existing MNNs are generally $\textit{explicit}$: their modular architectures are pre-defined, with individual modules expected to implement distinct functions. Recent works reveal that there exists $\textit{implicit}$ modularity in standard pre-trained transformers, namely $\textit{Emergent Modularity}$. They indicate that such modular structures spontaneously exhibit during the early pre-training phase. Despite the benefits of modularity, most Language Models (LMs) are still treated as monolithic models in the pre-train and fine-tune paradigm, with their emergent modularity locked and underutilized. In this work, focusing on unlocking the emergent modularity in LMs, we showcase that standard LMs could be fine-tuned as their Mixture-of-Expert (MoEs) counterparts without introducing any extra parameters. Such MoEs are derived from emergent modularity and are referred to as Emergent MoEs (EMoE). Our experiments demonstrate that fine-tuning EMoE effectively improves downstream in-domain and out-of-domain generalization compared with vanilla fine-tuning. Our analysis and ablation studies further illustrate that it is robust to various configurations and can scale up to Large Language Models (i.e., Llama2-7B and Llama-30B). Code is available at https://github.com/qiuzh20/EMoE.

Updated: 2024-04-01 11:37:39

标题: 解锁大型语言模型中的新兴模块化特性

摘要: 模块化神经网络（MNNs）展示了相对于单体模型的各种优势。现有的MNNs通常是明确的：它们的模块化架构是预先定义的，预期各个模块实现不同的功能。最近的研究揭示了在标准预训练transformers中存在隐含的模块化，即 Emergent Modularity。它们表明这种模块化结构在早期的预训练阶段会自发地展现出来。尽管模块化的好处，大多数语言模型（LMs）在预训练和微调范式中仍被视为单体模型，它们的 emergent modularity 被锁定且未被充分利用。在这项工作中，我们专注于解锁LMs中的 emergent modularity，展示了标准LMs可以像其Mixture-of-Expert（MoEs）对应物一样进行微调，而不引入任何额外的参数。这些MoEs源自 emergent modularity，并被称为 Emergent MoEs（EMoE）。我们的实验表明，相比于传统的微调，微调EMoE可以有效提高领域内外的泛化能力。我们的分析和消融研究进一步说明了它对各种配置都是稳健的，并且可以扩展到大型语言模型（即Llama2-7B和Llama-30B）。代码可在https://github.com/qiuzh20/EMoE找到。

更新时间: 2024-04-01 11:37:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.10908v2

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment

Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) to human preferences at the time of decoding. BoN sampling is susceptible to a problem known as reward hacking. Because the reward model is an imperfect proxy for the true objective, over-optimizing its value can compromise its performance on the true objective. A common solution to prevent reward hacking in preference learning techniques is to optimize a reward using proximity regularization (e.g., KL regularization), which ensures that the language model remains close to the reference model. In this research, we propose Regularized Best-of-N (RBoN), a variant of BoN that aims to mitigate reward hacking by incorporating a proximity term in response selection, similar to preference learning techniques. We evaluate two variants of RBoN on the AlpacaFarm dataset and find that they outperform BoN, especially when the proxy reward model has a low correlation with the true objective.

Updated: 2024-04-01 11:26:50

标题: 规范化的最佳-N采样以减轻语言模型对齐中的奖励欺骗

摘要: 最佳N（BoN）采样与奖励模型已被证明是一种有效的策略，用于在解码时将大型语言模型（LLMs）与人类偏好对齐。BoN采样容易受到一种称为奖励欺骗的问题的影响。由于奖励模型是真实目标的不完美代理，过度优化其值可能会影响其在真实目标上的性能。在偏好学习技术中防止奖励欺骗的常见解决方案是使用接近正则化（例如KL正则化）优化奖励，从而确保语言模型保持接近参考模型。在这项研究中，我们提出了一种名为正则化最佳N（RBoN）的BoN变体，旨在通过在响应选择中结合接近项来减轻奖励欺骗，类似于偏好学习技术。我们在AlpacaFarm数据集上评估了两种RBoN变体，并发现它们的性能优于BoN，特别是当代理奖励模型与真实目标的相关性较低时。

更新时间: 2024-04-01 11:26:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01054v1

Social Dynamics of Consumer Response: A Unified Framework Integrating Statistical Physics and Marketing Dynamics

Comprehending how consumers react to advertising inputs is essential for marketers aiming to optimize advertising strategies and improve campaign effectiveness. This study examines the complex nature of consumer behaviour by applying theoretical frameworks derived from physics and social psychology. We present an innovative equation that captures the relation between spending on advertising and consumer response, using concepts such as symmetries, scaling laws, and phase transitions. By validating our equation against well-known models such as the Michaelis-Menten and Hill equations, we prove its effectiveness in accurately representing the complexity of consumer response dynamics. The analysis emphasizes the importance of key model parameters, such as marketing effectiveness, response sensitivity, and behavioural sensitivity, in influencing consumer behaviour. The work explores the practical implications for advertisers and marketers, as well as discussing the limitations and future research directions. In summary, this study provides a thorough framework for comprehending and forecasting consumer reactions to advertising, which has implications for optimizing advertising strategies and allocating resources.

Updated: 2024-04-01 11:23:31

标题: 消费者反应的社会动态：统一框架集成统计物理学和市场动态

摘要: 理解消费者对广告输入的反应对于旨在优化广告策略并提高广告效果的营销人员至关重要。本研究通过应用从物理学和社会心理学中得出的理论框架来研究消费者行为的复杂性。我们提出了一个创新的方程，捕捉了广告支出与消费者反应之间的关系，使用了诸如对称性、比例律和相变等概念。通过将我们的方程与著名模型（如迈克尔斯-门特恩方程和希尔方程）进行验证，我们证明了它在准确表示消费者反应动态的复杂性方面的有效性。分析强调了关键模型参数的重要性，例如营销效果、反应敏感性和行为敏感性，这些因素影响了消费者行为。该工作探讨了对广告商和营销人员的实际影响，同时讨论了局限性和未来研究方向。总之，本研究为理解和预测消费者对广告的反应提供了一个全面的框架，这对于优化广告策略和分配资源具有重要意义。

更新时间: 2024-04-01 11:23:31

领域: physics.soc-ph,cs.LG,q-fin.GN

下载: http://arxiv.org/abs/2404.02175v1

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise.

Updated: 2024-04-01 11:09:40

标题: 拖动您的噪音：通过扩散语义传播进行交互式基于点的编辑

摘要: 基于点的交互式编辑是一种重要的工具，用来补充现有生成模型的可控性。与之同时进行的工作DragDiffusion根据用户输入更新扩散潜在图，导致全局潜在图的改变。这导致原始内容的不精确保留和由于梯度消失而失败的编辑。相比之下，我们提出了DragNoise，提供了稳健且加速的编辑，而无需重新跟踪潜在图。DragNoise的核心理念在于利用每个U-Net预测的噪声输出作为语义编辑器。这种方法基于两个关键观察：首先，U-Net的瓶颈特征本身具有丰富的语义特征，非常适合交互式编辑；其次，在去噪过程中早期建立的高级语义在随后的阶段中变化很小。利用这些观察结果，DragNoise在单个去噪步骤中编辑扩散语义，并有效传播这些变化，确保扩散编辑的稳定性和效率。比较实验表明，与DragDiffusion相比，DragNoise实现了更好的控制和语义保留，将优化时间缩短了超过50%。我们的代码可在https://github.com/haofengl/DragNoise找到。

更新时间: 2024-04-01 11:09:40

领域: cs.CV,cs.GR,cs.HC,cs.LG

下载: http://arxiv.org/abs/2404.01050v1

A Novel Sector-Based Algorithm for an Optimized Star-Galaxy Classification

This paper introduces a novel sector-based methodology for star-galaxy classification, leveraging the latest Sloan Digital Sky Survey data (SDSS-DR18). By strategically segmenting the sky into sectors aligned with SDSS observational patterns and employing a dedicated convolutional neural network (CNN), we achieve state-of-the-art performance for star galaxy classification. Our preliminary results demonstrate a promising pathway for efficient and precise astronomical analysis, especially in real-time observational settings.

Updated: 2024-04-01 11:08:53

标题: 一个用于优化星系分类的新型基于区块的算法

摘要: 本文介绍了一种基于新颖的面向行业的方法论，用于星系分类，利用了最新的斯隆数字天空调查数据（SDSS-DR18）。通过将天空战略性地分割成与SDSS观测模式对齐的扇区，并使用专用的卷积神经网络（CNN），我们实现了星系分类的最新性能。我们的初步结果展示了一条在实时观测环境中进行高效和精确天文分析的有前途的途径。

更新时间: 2024-04-01 11:08:53

领域: astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2404.01049v1

A Survey on Hypergraph Neural Networks: An In-Depth and Step-By-Step Guide

Higher-order interactions (HOIs) are ubiquitous in real-world complex systems and applications, and thus investigation of deep learning for HOIs has become a valuable agenda for the data mining and machine learning communities. As networks of HOIs are expressed mathematically as hypergraphs, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraphs. Given the emerging trend, we present the first survey dedicated to HNNs, with an in-depth and step-by-step guide. Broadly, the present survey overviews HNN architectures, training strategies, and applications. First, we break existing HNNs down into four design components: (i) input features, (ii) input structures, (iii) message-passing schemes, and (iv) training strategies. Second, we examine how HNNs address and learn HOIs with each of their components. Third, we overview the recent applications of HNNs in recommendation, biological and medical science, time series analysis, and computer vision. Lastly, we conclude with a discussion on limitations and future directions.

Updated: 2024-04-01 10:50:34

标题: 一份关于超图神经网络的调查：一份深入和逐步指南

摘要: 高阶相互作用（HOIs）在现实世界复杂系统和应用中普遍存在，因此深度学习对于HOIs的研究已成为数据挖掘和机器学习社区的宝贵议程。由于HOIs网络在数学上被表达为超图，超图神经网络（HNNs）已经成为在超图上表示学习的强大工具。鉴于这一新兴趋势，我们提出了第一份专门致力于HNNs的调查报告，具有深入和逐步指导。总体而言，这份调查概述了HNNs的体系结构、训练策略和应用。首先，我们将现有的HNNs分解为四个设计组件：（i）输入特征，（ii）输入结构，（iii）消息传递方案，以及（iv）训练策略。其次，我们检查HNNs如何通过每个组件解决和学习HOIs。第三，我们概述了HNNs在推荐、生物和医学科学、时间序列分析以及计算机视觉中的最新应用。最后，我们总结了关于限制和未来方向的讨论。

更新时间: 2024-04-01 10:50:34

领域: cs.LG

下载: http://arxiv.org/abs/2404.01039v1

Higher education assessment practice in the era of generative AI tools

The higher education (HE) sector benefits every nation's economy and society at large. However, their contributions are challenged by advanced technologies like generative artificial intelligence (GenAI) tools. In this paper, we provide a comprehensive assessment of GenAI tools towards assessment and pedagogic practice and, subsequently, discuss the potential impacts. This study experimented using three assessment instruments from data science, data analytics, and construction management disciplines. Our findings are two-fold: first, the findings revealed that GenAI tools exhibit subject knowledge, problem-solving, analytical, critical thinking, and presentation skills and thus can limit learning when used unethically. Secondly, the design of the assessment of certain disciplines revealed the limitations of the GenAI tools. Based on our findings, we made recommendations on how AI tools can be utilised for teaching and learning in HE.

Updated: 2024-04-01 10:43:50

标题: 生成式人工智能工具时代的高等教育评估实践

摘要: Higher education (HE) sector plays a crucial role in the economy and society of every nation. However, advancements in technologies such as generative artificial intelligence (GenAI) tools pose challenges to their contributions. This paper offers a thorough evaluation of GenAI tools in assessment and pedagogic practices, discussing their potential impacts. The study conducted experiments using assessment instruments from data science, data analytics, and construction management disciplines. The results indicate that GenAI tools demonstrate subject knowledge, problem-solving, analytical, critical thinking, and presentation skills, but can hinder learning when used unethically. Additionally, the assessment design of certain disciplines revealed limitations of GenAI tools. Recommendations are provided on how AI tools can be effectively utilized for teaching and learning in higher education based on the findings.

更新时间: 2024-04-01 10:43:50

领域: cs.IR,cs.AI,cs.CV,cs.LG,I.2.7; I.2.10; H.3.3

下载: http://arxiv.org/abs/2404.01036v1

Observation-Guided Diffusion Probabilistic Models

We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM), which effectively addresses the tradeoff between quality control and fast sampling. Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain in a principled way. This is achieved by introducing an additional loss term derived from the observation based on a conditional discriminator on noise level, which employs a Bernoulli distribution indicating whether its input lies on the (noisy) real manifold or not. This strategy allows us to optimize the more accurate negative log-likelihood induced in the inference stage especially when the number of function evaluations is limited. The proposed training scheme is also advantageous even when incorporated only into the fine-tuning process, and it is compatible with various fast inference strategies since our method yields better denoising networks using the exactly the same inference procedure without incurring extra computational cost. We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines. Our implementation is available at https://github.com/Junoh-Kang/OGDM_edm.

Updated: 2024-04-01 10:42:02

标题: 观测引导的扩散概率模型

摘要: 我们提出了一种新颖的基于扩散的图像生成方法，称为观察引导的扩散概率模型（OGDM），有效地解决了质量控制和快速抽样之间的权衡。我们的方法通过以原则性方式将观察过程的指导与马尔可夫链集成起来，重新建立了训练目标。这是通过引入一个基于条件鉴别器的附加损失项来实现的，该鉴别器基于噪声水平使用伯努利分布，指示其输入是否位于（嘈杂的）真实流形上。这种策略使我们能够在推理阶段优化更准确的负对数似然，特别是当函数评估次数有限时。提出的训练方案即使仅用于微调过程，也具有优势，并且与各种快速推理策略兼容，因为我们的方法在不产生额外计算成本的情况下使用完全相同的推理过程产生更好的去噪网络。我们使用强扩散模型基线展示了我们的训练算法的有效性。我们的实现可在https://github.com/Junoh-Kang/OGDM_edm上找到。

更新时间: 2024-04-01 10:42:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.04041v2

Communication-Efficient Federated Learning with Accelerated Client Gradient

Federated learning often suffers from slow and unstable convergence due to the heterogeneous characteristics of participating client datasets. Such a tendency is aggravated when the client participation ratio is low since the information collected from the clients has large variations. To address this challenge, we propose a simple but effective federated learning framework, which improves the consistency across clients and facilitates the convergence of the server model. This is achieved by making the server broadcast a global model with a lookahead gradient. This strategy enables the proposed approach to convey the projected global update information to participants effectively without additional client memory and extra communication costs. We also regularize local updates by aligning each client with the overshot global model to reduce bias and improve the stability of our algorithm. We provide the theoretical convergence rate of our algorithm and demonstrate remarkable performance gains in terms of accuracy and communication efficiency compared to the state-of-the-art methods, especially with low client participation rates. The source code is available at our project page.

Updated: 2024-04-01 10:37:53

标题: 具有加速客户端梯度的通信高效联邦学习

摘要: 联邦学习通常由于参与客户端数据集的异质性特征而导致收敛速度慢且不稳定。当客户参与比例较低时，这种趋势会加剧，因为从客户端收集的信息具有较大的变化。为了解决这一挑战，我们提出了一个简单但有效的联邦学习框架，该框架改善了客户端之间的一致性，并促进了服务器模型的收敛。通过让服务器广播具有前瞻梯度的全局模型，实现了这一目标。这种策略使得所提出的方法能够有效地向参与者传达预期的全局更新信息，而无需额外的客户端内存和额外的通信成本。我们还通过将每个客户端与过度全局模型对齐来规范化本地更新，以减少偏差并提高算法的稳定性。我们提供了算法的理论收敛速度，并展示了在准确性和通信效率方面相对于最先进的方法的显著性能提升，尤其在客户参与率较低的情况下。源代码可以在我们的项目页面上找到。

更新时间: 2024-04-01 10:37:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2201.03172v2

Parallel Proportional Fusion of Spiking Quantum Neural Network for Optimizing Image Classification

The recent emergence of the hybrid quantum-classical neural network (HQCNN) architecture has garnered considerable attention due to the potential advantages associated with integrating quantum principles to enhance various facets of machine learning algorithms and computations. However, the current investigated serial structure of HQCNN, wherein information sequentially passes from one network to another, often imposes limitations on the trainability and expressivity of the network. In this study, we introduce a novel architecture termed Parallel Proportional Fusion of Quantum and Spiking Neural Networks (PPF-QSNN). The dataset information is simultaneously fed into both the spiking neural network and the variational quantum circuits, with the outputs amalgamated in proportion to their individual contributions. We systematically assess the impact of diverse PPF-QSNN parameters on network performance for image classification, aiming to identify the optimal configuration. Numerical results on the MNIST dataset unequivocally illustrate that our proposed PPF-QSNN outperforms both the existing spiking neural network and the serial quantum neural network across metrics such as accuracy, loss, and robustness. This study introduces a novel and effective amalgamation approach for HQCNN, thereby laying the groundwork for the advancement and application of quantum advantage in artificial intelligent computations.

Updated: 2024-04-01 10:35:35

标题: 平行比例融合的脉冲量子神经网络用于优化图像分类

摘要: 最近出现的混合量子-经典神经网络（HQCNN）架构引起了广泛关注，因为将量子原理整合到增强各种机器学习算法和计算的潜力优势。然而，当前研究的HQCNN串行结构，在这种结构中信息依次从一个网络传递到另一个网络，往往会对网络的可训练性和表达能力施加限制。在本研究中，我们引入了一种新颖的架构，称为量子和脉冲神经网络的并行比例融合（PPF-QSNN）。数据集信息同时输入到脉冲神经网络和变分量子电路中，输出按照它们各自的贡献比例合并。我们系统地评估了不同PPF-QSNN参数对图像分类网络性能的影响，旨在确定最佳配置。对MNIST数据集的数字结果明确表明，我们提出的PPF-QSNN在准确性、损失和鲁棒性等指标上均优于现有的脉冲神经网络和串行量子神经网络。这项研究引入了一种新颖有效的HQCNN融合方法，从而为量子优势在人工智能计算中的推进和应用奠定了基础。

更新时间: 2024-04-01 10:35:35

领域: quant-ph,cs.AI,cs.NE

下载: http://arxiv.org/abs/2404.01359v1

Evaluating Fair Feature Selection in Machine Learning for Healthcare

With the universal adoption of machine learning in healthcare, the potential for the automation of societal biases to further exacerbate health disparities poses a significant risk. We explore algorithmic fairness from the perspective of feature selection. Traditional feature selection methods identify features for better decision making by removing resource-intensive, correlated, or non-relevant features but overlook how these factors may differ across subgroups. To counter these issues, we evaluate a fair feature selection method that considers equal importance to all demographic groups. We jointly considered a fairness metric and an error metric within the feature selection process to ensure a balance between minimizing both bias and global classification error. We tested our approach on three publicly available healthcare datasets. On all three datasets, we observed improvements in fairness metrics coupled with a minimal degradation of balanced accuracy. Our approach addresses both distributive and procedural fairness within the fair machine learning context.

Updated: 2024-04-01 10:20:09

标题: 评估机器学习中用于医疗保健的公平特征选择

摘要: 随着机器学习在医疗保健领域的普遍采用，自动化社会偏见进一步加剧健康差距的潜力构成了重大风险。我们从特征选择的角度探讨算法公平性。传统的特征选择方法通过去除资源密集、相关或不相关的特征，识别出更好的决策特征，但忽视了这些因素在不同子群体中可能存在差异。为了解决这些问题，我们评估了一种考虑给所有人口群体平等重要性的公平特征选择方法。在特征选择过程中，我们同时考虑了公平度量和误差度量，以确保在最小化偏见和全局分类错误之间取得平衡。我们在三个公开可用的医疗保健数据集上测试了我们的方法。在所有三个数据集上，我们观察到公平性度量的改善，同时平衡准确度仅有轻微下降。我们的方法在公平机器学习背景下解决了分配和程序公平性问题。

更新时间: 2024-04-01 10:20:09

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2403.19165v2

Utilizing AI and Social Media Analytics to Discover Adverse Side Effects of GLP-1 Receptor Agonists

Adverse side effects (ASEs) of drugs, revealed after FDA approval, pose a threat to patient safety. To promptly detect overlooked ASEs, we developed a digital health methodology capable of analyzing massive public data from social media, published clinical research, manufacturers' reports, and ChatGPT. We uncovered ASEs associated with the glucagon-like peptide 1 receptor agonists (GLP-1 RA), a market expected to grow exponentially to $133.5 billion USD by 2030. Using a Named Entity Recognition (NER) model, our method successfully detected 21 potential ASEs overlooked upon FDA approval, including irritability and numbness. Our data-analytic approach revolutionizes the detection of unreported ASEs associated with newly deployed drugs, leveraging cutting-edge AI-driven social media analytics. It can increase the safety of new drugs in the marketplace by unlocking the power of social media to support regulators and manufacturers in the rapid discovery of hidden ASE risks.

Updated: 2024-04-01 09:48:14

标题: 利用人工智能和社交媒体分析发现GLP-1受体激动剂的不良副作用

摘要: 药物的不良副作用（ASEs），在FDA批准后揭示出来，对患者安全构成威胁。为了及时发现被忽视的ASEs，我们开发了一种数字健康方法，能够分析来自社交媒体、发表的临床研究、制造商报告和ChatGPT的大量公开数据。我们发现了与糖类肽样肽1受体激动剂（GLP-1 RA）相关的ASEs，预计到2030年将增长至1335亿美元。使用命名实体识别（NER）模型，我们的方法成功检测到21种被忽视的潜在ASEs，包括易怒和麻木。我们的数据分析方法革新了与新部署药物相关的未报告ASEs的检测，利用最先进的人工智能驱动的社交媒体分析。它可以通过利用社交媒体的力量来支持监管机构和制造商快速发现隐藏的ASE风险，从而提高市场上新药的安全性。

更新时间: 2024-04-01 09:48:14

领域: q-bio.QM,cs.AI,cs.CL,cs.IR,cs.LG,cs.SI,62

下载: http://arxiv.org/abs/2404.01358v1

Source-Aware Training Enables Knowledge Attribution in Language Models

Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretraining source supporting a generated response. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. To give LLMs such ability, we explore source-aware training -- a post pretraining recipe that involves (i) training the LLM to associate unique source document identifiers with the knowledge in each document, followed by (ii) an instruction-tuning to teach the LLM to cite a supporting pretraining source when prompted. Source-aware training can easily be applied to pretrained LLMs off the shelf, and diverges minimally from existing pretraining/fine-tuning frameworks. Through experiments on carefully curated data, we demonstrate that our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's quality compared to standard pretraining. Our results also highlight the importance of data augmentation in achieving attribution.

Updated: 2024-04-01 09:39:38

标题: 源感知训练使语言模型中的知识归因成为可能

摘要: 大型语言模型（LLMs）在预训练期间学习了大量知识，但它们通常对这些知识的来源毫不知情。我们调查了内在来源引用的问题，其中LLMs被要求引用支持生成响应的预训练来源。内在来源引用可以增强LLM的透明度、可解释性和可验证性。为了赋予LLMs这种能力，我们探索了源感知训练——这是一种在预训练后进行的配方，涉及（i）训练LLM将唯一的源文档标识符与每个文档中的知识相关联，然后（ii）进行指令调整，教导LLM在提示时引用支持的预训练来源。源感知训练可以轻松应用于现成的预训练LLMs，并且与现有的预训练/微调框架几乎没有偏差。通过对精心策划的数据进行实验，我们证明我们的训练配方可以实现对预训练数据的忠实归因，而与标准预训练相比对模型质量的影响不大。我们的结果还强调了数据增强在实现归因方面的重要性。

更新时间: 2024-04-01 09:39:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01019v1

Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge

Teeth localization, segmentation, and labeling in 2D images have great potential in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, general instance segmentation frameworks are incompetent due to 1) the subtle differences between some teeth' shapes (e.g., maxillary first premolar and second premolar), 2) the teeth's position and shape variation across subjects, and 3) the presence of abnormalities in the dentition (e.g., caries and edentulism). To address these problems, we propose a ViT-based framework named TeethSEG, which consists of stacked Multi-Scale Aggregation (MSA) blocks and an Anthropic Prior Knowledge (APK) layer. Specifically, to compose the two modules, we design 1) a unique permutation-based upscaler to ensure high efficiency while establishing clear segmentation boundaries with 2) multi-head self/cross-gating layers to emphasize particular semantics meanwhile maintaining the divergence between token embeddings. Besides, we collect 3) the first open-sourced intraoral image dataset IO150K, which comprises over 150k intraoral photos, and all photos are annotated by orthodontists using a human-machine hybrid algorithm. Experiments on IO150K demonstrate that our TeethSEG outperforms the state-of-the-art segmentation models on dental image segmentation.

Updated: 2024-04-01 09:34:51

标题: Teeth-SEG：基于人类先验知识的牙齿正畸治疗高效实例分割框架

摘要: 牙齿在2D图像中的定位、分割和标记在现代牙科学中具有巨大潜力，可以提高牙科诊断、治疗规划和口腔健康的基于人口的研究。然而，由于一些牙齿形状之间的微妙差异（例如，上颌第一前磨牙和第二前磨牙）、牙齿在不同受试者之间的位置和形状变化以及牙齿异常（例如，龋齿和缺失），通用实例分割框架无法胜任。为了解决这些问题，我们提出了一种基于ViT的框架，命名为TeethSEG，它由堆叠的多尺度聚合（MSA）块和人类先验知识（APK）层组成。具体来说，为了组成这两个模块，我们设计了一种独特的基于排列的放大器，以确保高效性同时建立清晰的分割边界，并使用多头自/交叉门控层来强调特定语义，同时保持令牌嵌入之间的差异。此外，我们收集了第一个开源的口腔图像数据集IO150K，包括超过15万张口腔照片，并且所有照片都由正畸医生使用人机混合算法进行了注释。对IO150K的实验证明，我们的TeethSEG在牙科图像分割上优于最先进的分割模型。

更新时间: 2024-04-01 09:34:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01013v1

Query Performance Prediction using Relevance Judgments Generated by Large Language Models

Query performance prediction (QPP) aims to estimate the retrieval quality of a search system for a query without human relevance judgments. Previous QPP methods typically return a single scalar value and do not require the predicted values to approximate a specific information retrieval (IR) evaluation measure, leading to certain drawbacks: (i) a single scalar is insufficient to accurately represent different IR evaluation measures, especially when metrics do not highly correlate, and (ii) a single scalar limits the interpretability of QPP methods because solely using a scalar is insufficient to explain QPP results. To address these issues, we propose a QPP framework using automatically generated relevance judgments (QPP-GenRE), which decomposes QPP into independent subtasks of judging the relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels; Also, this allows us to interpret predicted IR evaluation measures, and identify, track and rectify errors in generated relevance judgments to improve QPP quality. We judge relevance by leveraging a leading open-source large language model (LLM), LLaMA, to ensure scientific reproducibility. In doing so, we address two main challenges: (i) excessive computational costs of judging the entire corpus for predicting a recall-based metric, and (ii) poor performance in prompting LLaMA in a zero-/few-shot manner. We devise an approximation strategy to predict a recall-oriented IR measure and propose to fine-tune LLaMA using human-labeled relevance judgments. Experiments on the TREC 2019-2022 deep learning tracks show that QPP-GenRE achieves state-of-the-art QPP accuracy for both lexical and neural rankers in both precision- and recall-oriented metrics.

Updated: 2024-04-01 09:33:05

标题: 使用大型语言模型生成的相关性判断进行查询性能预测

摘要: 查询性能预测（QPP）旨在估计搜索系统对查询的检索质量，而无需人工相关性判断。先前的QPP方法通常返回单个标量值，并不要求预测值近似特定信息检索（IR）评估指标，导致某些缺点：（i）单个标量无法准确表示不同的IR评估指标，尤其是当指标之间相关性不高时，（ii）单个标量限制了QPP方法的可解释性，因为仅使用标量无法解释QPP结果。为了解决这些问题，我们提出了一个使用自动生成的相关性判断（QPP-GenRE）的QPP框架，将QPP分解为独立的子任务，即判断排名列表中每个项目与给定查询的相关性。这使我们能够使用生成的相关性判断作为伪标签来预测任何IR评估指标；同时，这使我们能够解释预测的IR评估指标，并识别、跟踪和纠正生成的相关性判断中的错误，以改进QPP质量。我们利用一种领先的开源大型语言模型（LLM）LLaMA来判断相关性，以确保科学再现性。通过这样做，我们解决了两个主要挑战：（i）判断整个语料库以预测基于召回率的指标的过度计算成本，以及（ii）在零/少次提示LLaMA时性能不佳。我们设计了一种近似策略来预测基于召回率的IR指标，并建议使用人工标记的相关性判断来对LLaMA进行微调。在TREC 2019-2022深度学习轨迹上的实验表明，QPP-GenRE在精度和召回率导向的指标上均实现了词法和神经排序器的最先进QPP准确性。

更新时间: 2024-04-01 09:33:05

领域: cs.IR,cs.AI,cs.CL,cs.LG,H.3.3

下载: http://arxiv.org/abs/2404.01012v1

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training.

Updated: 2024-04-01 09:29:49

标题: 使用最佳边界条件解决扩散ODE以实现更好的图像超分辨率

摘要: 扩散模型作为一种强大的生成模型，在图像超分辨率（SR）任务中取得了令人印象深刻的结果。然而，由于扩散模型反向过程中引入的随机性，基于扩散的SR模型的性能在每次抽样时波动较大，特别是对于具有少量重采样步骤的采样器。扩散模型固有的随机性导致了其无效性和不稳定性，使用户难以保证SR结果的质量。然而，我们的工作将这种随机性视为机会：全面分析和利用它导致了构建了一种有效的即插即用抽样方法，该方法有潜力使一系列基于扩散的SR方法受益。更具体地说，我们提出通过求解带有最佳边界条件（BCs）的扩散常微分方程（扩散ODEs）稳定地从预训练的基于扩散的SR模型中抽样高质量的SR图像，并分析BCs的选择与其相应的SR结果之间的特征。我们的分析显示了通过在整个空间中进行高效探索获得近似最佳BC的方法。所提出的方法抽取的SR结果质量比当前方法从相同预训练的基于扩散的SR模型中抽样的具有随机性的结果质量更高，这意味着我们的抽样方法“增强”了当前的基于扩散的SR模型，而无需进行额外的训练。

更新时间: 2024-04-01 09:29:49

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2305.15357v5

The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness

Deep neural networks (DNNs) are known to be sensitive to adversarial input perturbations, leading to a reduction in either prediction accuracy or individual fairness. To jointly characterize the susceptibility of prediction accuracy and individual fairness to adversarial perturbations, we introduce a novel robustness definition termed robust accurate fairness. Informally, robust accurate fairness requires that predictions for an instance and its similar counterparts consistently align with the ground truth when subjected to input perturbations. We propose an adversarial attack approach dubbed RAFair to expose false or biased adversarial defects in DNN, which either deceive accuracy or compromise individual fairness. Then, we show that such adversarial instances can be effectively addressed by carefully designed benign perturbations, correcting their predictions to be accurate and fair. Our work explores the double-edged sword of input perturbations to robust accurate fairness in DNN and the potential of using benign perturbations to correct adversarial instances.

Updated: 2024-04-01 09:29:16

标题: 输入扰动对稳健准确公平性的双刃剑

摘要: 深度神经网络(DNNs)被认为对敌对输入扰动敏感，导致预测准确性或个体公平性降低。为了共同表征预测准确性和个体公平性对敌对扰动的敏感性，我们引入了一个新颖的鲁棒性定义，称为鲁棒准确公平。简言之，鲁棒准确公平要求对于一个实例及其相似对应物，当受到输入扰动时，其预测结果始终与真实情况一致。我们提出了一种敌对攻击方法，称为RAFair，用于暴露DNN中的虚假或有偏见的敌对缺陷，这些缺陷可能欺骗准确性或损害个体公平性。然后，我们展示这些敌对实例可以通过精心设计的良性扰动有效地解决，将其预测结果纠正为准确和公平。我们的工作探讨了输入扰动对DNN中鲁棒准确公平性的双刃剑作用，以及使用良性扰动纠正敌对实例的潜力。

更新时间: 2024-04-01 09:29:16

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2404.01356v1

DiJiang: Efficient Large Language Models through Compact Kernelization

In an effort to reduce the computational load of Transformers, research on linear attention has gained significant momentum. However, the improvement strategies for attention mechanisms typically necessitate extensive retraining, which is impractical for large language models with a vast array of parameters. In this paper, we present DiJiang, a novel Frequency Domain Kernelization approach that enables the transformation of a pre-trained vanilla Transformer into a linear complexity model with little training costs. By employing a weighted Quasi-Monte Carlo method for sampling, the proposed approach theoretically offers superior approximation efficiency. To further reduce the training computational complexity, our kernelization is based on Discrete Cosine Transform (DCT) operations. Extensive experiments demonstrate that the proposed method achieves comparable performance to the original Transformer, but with significantly reduced training costs and much faster inference speeds. Our DiJiang-7B achieves comparable performance with LLaMA2-7B on various benchmark while requires only about 1/50 training cost. Code is available at https://github.com/YuchuanTian/DiJiang.

Updated: 2024-04-01 09:17:01

标题: DiJiang：通过紧凑核化实现高效的大型语言模型

摘要: 为了减少Transformer的计算负担，线性注意力的研究获得了显著的动力。然而，注意机制的改进策略通常需要大量的重新训练，这对于具有大量参数的大型语言模型来说是不切实际的。在本文中，我们提出了DiJiang，一种新颖的频域核化方法，可以将预训练的普通Transformer转化为具有较小训练成本的线性复杂度模型。通过采用加权拟蒙特卡洛方法进行采样，所提出的方法在理论上提供了优越的逼近效率。为了进一步减少训练计算复杂度，我们的核化基于离散余弦变换（DCT）操作。大量实验证明，所提出的方法实现了与原始Transformer相当的性能，但训练成本大大降低，并且推理速度更快。我们的DiJiang-7B在各种基准上实现了与LLaMA2-7B相当的性能，但只需要大约1/50的训练成本。代码可在https://github.com/YuchuanTian/DiJiang找到。

更新时间: 2024-04-01 09:17:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.19928v2

Time-aware Metapath Feature Augmentation for Ponzi Detection in Ethereum

With the development of Web 3.0 which emphasizes decentralization, blockchain technology ushers in its revolution and also brings numerous challenges, particularly in the field of cryptocurrency. Recently, a large number of criminal behaviors continuously emerge on blockchain, such as Ponzi schemes and phishing scams, which severely endanger decentralized finance. Existing graph-based abnormal behavior detection methods on blockchain usually focus on constructing homogeneous transaction graphs without distinguishing the heterogeneity of nodes and edges, resulting in partial loss of transaction pattern information. Although existing heterogeneous modeling methods can depict richer information through metapaths, the extracted metapaths generally neglect temporal dependencies between entities and do not reflect real behavior. In this paper, we introduce Time-aware Metapath Feature Augmentation (TMFAug) as a plug-and-play module to capture the real metapath-based transaction patterns during Ponzi scheme detection on Ethereum. The proposed module can be adaptively combined with existing graph-based Ponzi detection methods. Extensive experimental results show that our TMFAug can help existing Ponzi detection methods achieve significant performance improvements on the Ethereum dataset, indicating the effectiveness of heterogeneous temporal information for Ponzi scheme detection.

Updated: 2024-04-01 09:13:22

标题: 以太坊中用于庞氏骗局检测的基于时间感知的元路径特征增强

摘要: 随着强调去中心化的Web 3.0的发展，区块链技术引领其革命，同时也带来了许多挑战，特别是在加密货币领域。最近，大量犯罪行为不断出现在区块链上，如庞氏骗局和钓鱼诈骗，严重危害了去中心化金融。现有的基于图的异常行为检测方法通常侧重于构建同质交易图，而没有区分节点和边的异质性，导致部分交易模式信息的丢失。虽然现有的异质建模方法可以通过元路径描述更丰富的信息，但提取出的元路径通常忽略实体之间的时间依赖性，也不能反映真实的行为。在本文中，我们引入了Time-aware Metapath Feature Augmentation（TMFAug）作为一个即插即用的模块，用于在以太坊上检测庞氏骗局时捕捉真实的基于元路径的交易模式。所提出的模块可以与现有基于图的庞氏检测方法自适应地结合。大量实验结果表明，我们的TMFAug可以帮助现有的庞氏检测方法在以太坊数据集上取得显著的性能改进，表明异质时间信息对庞氏骗局检测的有效性。

更新时间: 2024-04-01 09:13:22

领域: cs.LG,q-fin.ST

下载: http://arxiv.org/abs/2210.16863v2

Improved weight initialization for deep and narrow feedforward neural network

Appropriate weight initialization settings, along with the ReLU activation function, have become cornerstones of modern deep learning, enabling the training and deployment of highly effective and efficient neural network models across diverse areas of artificial intelligence. The problem of \textquotedblleft dying ReLU," where ReLU neurons become inactive and yield zero output, presents a significant challenge in the training of deep neural networks with ReLU activation function. Theoretical research and various methods have been introduced to address the problem. However, even with these methods and research, training remains challenging for extremely deep and narrow feedforward networks with ReLU activation function. In this paper, we propose a novel weight initialization method to address this issue. We establish several properties of our initial weight matrix and demonstrate how these properties enable the effective propagation of signal vectors. Through a series of experiments and comparisons with existing methods, we demonstrate the effectiveness of the novel initialization method.

Updated: 2024-04-01 09:05:20

标题: 深度且窄的前馈神经网络的改进权重初始化

摘要: 适当的权重初始化设置，以及ReLU激活函数，已成为现代深度学习的基石，使得高效和有效的神经网络模型能够在人工智能的各个领域进行训练和部署。ReLU神经元变为不活跃并产生零输出的问题“死亡ReLU”在具有ReLU激活函数的深度神经网络的训练中构成了重大挑战。理论研究和各种方法已被引入以解决这个问题。然而，即使有了这些方法和研究，对于具有ReLU激活函数的极深和窄的前馈网络来说，训练仍然具有挑战性。在本文中，我们提出了一种新颖的权重初始化方法来解决这个问题。我们建立了初始权重矩阵的几个属性，并展示了这些属性如何使信号向量有效传播。通过一系列实验证明和与现有方法的比较，我们展示了新颖初始化方法的有效性。

更新时间: 2024-04-01 09:05:20

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2311.03733v2

LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation

Evaluating generated radiology reports is crucial for the development of radiology AI, but existing metrics fail to reflect the task's clinical requirements. This study proposes a novel evaluation framework using large language models (LLMs) to compare radiology reports for assessment. We compare the performance of various LLMs and demonstrate that, when using GPT-4, our proposed metric achieves evaluation consistency close to that of radiologists. Furthermore, to reduce costs and improve accessibility, making this method practical, we construct a dataset using LLM evaluation results and perform knowledge distillation to train a smaller model. The distilled model achieves evaluation capabilities comparable to GPT-4. Our framework and distilled model offer an accessible and efficient evaluation method for radiology report generation, facilitating the development of more clinically relevant models. The model will be further open-sourced and accessible.

Updated: 2024-04-01 09:02:12

标题: LLM-RadJudge：实现X射线报告生成的放射科医师级评估

摘要: 评估生成的放射学报告对于放射学人工智能的发展至关重要，但现有的指标未能反映任务的临床要求。本研究提出了一种新颖的评估框架，利用大型语言模型（LLMs）来比较放射学报告以进行评估。我们比较了各种LLMs的性能，并证明，当使用GPT-4时，我们提出的度量标准实现了与放射科医师评估一致性接近的结果。此外，为降低成本和提高可访问性，使这种方法实用化，我们使用LLM评估结果构建了一个数据集，并进行知识蒸馏来训练一个较小的模型。蒸馏模型实现了与GPT-4相媲美的评估能力。我们的框架和蒸馏模型提供了一种可访问且高效的放射学报告生成评估方法，促进了更具临床相关性模型的开发。该模型将进一步开源并可访问。

更新时间: 2024-04-01 09:02:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.00998v1

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentric monocular/binocular views with rich modalities including video, multi-channel audio, directional binaural delay, location data and textual scene descriptions within each scene captured, presenting comprehensive observation of the world. Figure 1 offers a glimpse of all 28 scene categories of our 360+x dataset. To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world. Through our benchmark analysis, we presented 5 different scene understanding tasks on the proposed 360+x dataset to evaluate the impact and benefit of each data modality and perspective in panoptic scene understanding. We hope this unique dataset could broaden the scope of comprehensive scene understanding and encourage the community to approach these problems from more diverse perspectives.

Updated: 2024-04-01 08:34:42

标题: 360+x：一个全视角多模态场景理解数据集

摘要: 人类对世界的感知受到多种观点和模态的影响。虽然许多现有数据集侧重于从某种视角（例如自我中心或第三人称视角）理解场景，但我们的数据集提供了一个全景视角（即多个视角和多个数据模态）。具体而言，我们包含了第三人称全景和前视图，以及带有丰富模态的自我中心单眼/双眼视图，包括视频、多通道音频、定向双耳延迟、位置数据和文本场景描述在每个捕获的场景中，呈现了对世界的全面观察。图1展示了我们360+x数据集的所有28个场景类别。据我们所知，这是第一个涵盖多个视角和多个数据模态的数据库，以模仿在现实世界中访问日常信息的方式。通过我们的基准分析，我们在提议的360+x数据集上提出了5个不同的场景理解任务，以评估每个数据模态和视角在全景场景理解中的影响和益处。我们希望这个独特的数据集能够拓宽全面场景理解的范围，并鼓励社区从更多不同的视角解决这些问题。

更新时间: 2024-04-01 08:34:42

领域: cs.CV,cs.AI,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2404.00989v1

Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behaviors and Adversarial Style Sampling for Assistive Tasks

Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver's policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver's policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver's policy, we propose a strategy for sampling a care-receiver's response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.

Updated: 2024-04-01 08:29:44

标题: 增强多智能体强化学习中的政策，通过多样化的合作行为和对抗风格采样来增强辅助任务

摘要: 自主协助运动受损人群是自主机器人系统最有前途的应用之一。最近的研究在医疗领域使用深度强化学习（RL）取得了令人鼓舞的结果。先前的研究表明，辅助任务可以被制定为多智能体RL，其中有两个智能体：护理人员和受护者。然而，在多智能体RL中训练的策略通常对其他智能体的策略敏感。在这种情况下，训练有素的护理人员策略可能不适用于不同的受护者。为了缓解这个问题，我们提出了一个框架，通过训练适应不同受护者反应的护理人员策略来学习一个强大的护理人员策略。在我们的框架中，不同受护者反应通过试错自主学习。此外，为了加强护理人员策略的鲁棒性，我们提出了一种在训练过程中以对抗方式采样受护者反应的策略。我们使用Assistive Gym中的任务评估了所提出的方法。我们展示了使用流行的深度RL方法训练的策略对其他智能体策略的变化是脆弱的，并且所提出的框架提高了对这种变化的鲁棒性。

更新时间: 2024-04-01 08:29:44

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2403.00344v2

Make Continual Learning Stronger via C-Flat

Model generalization ability upon incrementally acquiring dynamically updating knowledge from sequentially arriving tasks is crucial to tackle the sensitivity-stability dilemma in Continual Learning (CL). Weight loss landscape sharpness minimization seeking for flat minima lying in neighborhoods with uniform low loss or smooth gradient is proven to be a strong training regime improving model generalization compared with loss minimization based optimizer like SGD. Yet only a few works have discussed this training regime for CL, proving that dedicated designed zeroth-order sharpness optimizer can improve CL performance. In this work, we propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for CL. C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods. A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases. Code will be publicly available upon publication.

Updated: 2024-04-01 08:18:38

标题: 通过C-Flat使持续学习更加强大

摘要: 随着从顺序到达的任务中不断获取动态更新知识，模型的泛化能力对于解决不断学习中的敏感性-稳定性困境至关重要。寻求平坦最小值的权重损失景观锐度最小化，位于具有均匀低损失或平滑梯度的邻域中的最小值，已被证明是一种强大的训练方案，与基于损失最小化的优化器（如SGD）相比，可以改善模型的泛化能力。然而，只有少数研究探讨了这种训练方案对于不断学习的影响，证明专门设计的零阶锐度优化器可以改善不断学习的性能。在这项工作中，我们提出了一种专为不断学习定制的更平坦的损失景观的Continual Flatness（C-Flat）方法。C-Flat只需一行代码即可轻松调用，并且可以与任何不断学习方法插入并使用。本文提出了适用于所有不断学习类别的C-Flat的一般框架，并与最小损失优化器和基于平坦最小值的不断学习方法进行了彻底比较，结果显示我们的方法几乎在所有情况下都可以提高不断学习的性能。代码将在发表后公开。

更新时间: 2024-04-01 08:18:38

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.00986v1

Continual Learning for Smart City: A Survey

With the digitization of modern cities, large data volumes and powerful computational resources facilitate the rapid update of intelligent models deployed in smart cities. Continual learning (CL) is a novel machine learning paradigm that constantly updates models to adapt to changing environments, where the learning tasks, data, and distributions can vary over time. Our survey provides a comprehensive review of continual learning methods that are widely used in smart city development. The content consists of three parts: 1) Methodology-wise. We categorize a large number of basic CL methods and advanced CL frameworks in combination with other learning paradigms including graph learning, spatial-temporal learning, multi-modal learning, and federated learning. 2) Application-wise. We present numerous CL applications covering transportation, environment, public health, safety, networks, and associated datasets related to urban computing. 3) Challenges. We discuss current problems and challenges and envision several promising research directions. We believe this survey can help relevant researchers quickly familiarize themselves with the current state of continual learning research used in smart city development and direct them to future research trends.

Updated: 2024-04-01 07:59:29

标题: 智慧城市的持续学习：一项调查

摘要: 随着现代城市的数字化，大数据量和强大的计算资源促进了智能模型在智慧城市中的快速更新。持续学习（CL）是一种不断更新模型以适应不断变化环境的新颖机器学习范式，其中学习任务、数据和分布可能随时间变化。我们的调查全面审查了在智慧城市发展中广泛使用的持续学习方法。内容包括三个部分：1）方法学。我们对大量基本CL方法和结合其他学习范式的高级CL框架进行分类，包括图学习、时空学习、多模态学习和联邦学习。2）应用方面。我们展示了涵盖交通、环境、公共卫生、安全、网络以及与城市计算相关的众多CL应用和相关数据集。3）挑战。我们讨论当前问题和挑战，并展望几个具有前景的研究方向。我们相信这项调查可以帮助相关研究人员快速熟悉在智慧城市发展中使用的持续学习研究的当前状态，并引导他们关注未来的研究趋势。

更新时间: 2024-04-01 07:59:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.00983v1

Nonlinear Impulse Pattern Formulation dynamical social and political prediction algorithm for city planning and public participation

A nonlinear-dynamical algorithm for city planning is proposed as an Impulse Pattern Formulation (IPF) for predicting relevant parameters like health, artistic freedom, or financial developments of different social or political stakeholders over the cause of a planning process. The IPF has already shown high predictive precision at low computational cost in musical instrument simulations, brain dynamics, and human-human interactions. The social and political IPF consists of three basic equations of system state developments, self-adaptation of stakeholders, two adaptive interactions, and external impact terms suitable for respective planning situations. Typical scenarios of stakeholder interactions and developments are modeled by adjusting a set of system parameters. These include stakeholder reaction to external input, enhanced system stability through self-adaptation, stakeholder convergence due to mediative interaction adaptation, as well as complex dynamics in terms of direct stakeholder impacts. A workflow for implementing the algorithm in real city planning scenarios is outlined. This workflow includes machine learning of a suitable set of parameters suggesting best-practice planning to aim at the desired development of the planning process and its output.

Updated: 2024-04-01 07:49:10

标题: 非线性冲动模式制定：城市规划和公众参与的动态社会政治预测算法

摘要: 提出了一种用于城市规划的非线性动力学算法，被称为脉冲模式制定（IPF），用于预测不同社会或政治利益相关方的健康、艺术自由或财务发展等相关参数。IPF已经在乐器模拟、脑动态和人际互动等领域表现出高精度的预测能力，且计算成本较低。社会和政治IPF由系统状态发展的三个基本方程、利益相关方的自适应、两种自适应相互作用和适用于相应规划情况的外部影响项组成。通过调整一组系统参数来建模利益相关方的互动和发展的典型场景。这些参数包括利益相关方对外部输入的反应、通过自适应增强系统稳定性、利益相关方因调解互动适应而趋于一致，以及直接利益相关方影响方面的复杂动态。提纲了在实际城市规划场景中实施算法的工作流程。此工作流程包括机器学习适当的参数集，以提出最佳实践规划，以实现规划过程及其输出的期望发展。

更新时间: 2024-04-01 07:49:10

领域: nlin.AO,cs.AI,math.DS

下载: http://arxiv.org/abs/2404.00977v1

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explain why people laugh in a particular video and a dataset for this task. Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh. We propose a baseline by leveraging the reasoning capacity of large language models (LLMs) with textual video representation. Experiments show that our baseline can generate plausible explanations for laughter. We further investigate the scalability of our baseline by probing other video understanding tasks and in-the-wild videos. We release our dataset, code, and model checkpoints on https://github.com/postech-ami/SMILE-Dataset.

Updated: 2024-04-01 07:47:54

标题: SMILE：用于语言模型理解视频中笑声的多模态数据集

摘要: 尽管人工智能取得了近期的进展，但构建社交智能仍然是一个挑战。在社交信号中，笑声是人类社交互动中的一个独特表达。在这项工作中，我们应对了一个新的机器理解视频中笑声背后原因的挑战，即视频笑声推理。我们引入了这一新任务来解释为什么人们在特定视频中笑，并为此任务提供了一个数据集。我们提出的数据集SMILE由视频片段和人们笑的语言描述组成。我们通过利用大型语言模型（LLMs）的推理能力与文本视频表示提出了一个基线。实验表明，我们的基线可以生成笑声的合理解释。我们进一步调查了我们基线的可扩展性，通过探究其他视频理解任务和野外视频。我们在https://github.com/postech-ami/SMILE-Dataset上发布了我们的数据集、代码和模型检查点。

更新时间: 2024-04-01 07:47:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.09818v2

Efficiently Distilling LLMs for Edge Applications

Supernet training of LLMs is of great interest in industrial applications as it confers the ability to produce a palette of smaller models at constant cost, regardless of the number of models (of different size / latency) produced. We propose a new method called Multistage Low-rank Fine-tuning of Super-transformers (MLFS) for parameter-efficient supernet training. We show that it is possible to obtain high-quality encoder models that are suitable for commercial edge applications, and that while decoder-only models are resistant to a comparable degree of compression, decoders can be effectively sliced for a significant reduction in training time.

Updated: 2024-04-01 07:35:15

标题: 高效提炼边缘应用的LLMs

摘要: LLMs的超网络训练在工业应用中具有很大的兴趣，因为它赋予了能够以恒定成本生产一系列较小模型的能力，而不管产生的模型数量（大小/延迟）是多少。我们提出了一种名为Multistage Low-rank Fine-tuning of Super-transformers（MLFS）的新方法，用于参数高效的超网络训练。我们展示了可以获得适用于商业边缘应用的高质量编码器模型，并且虽然仅解码器模型对相当程度的压缩具有抵抗力，但解码器可以有效地切片以显著减少训练时间。

更新时间: 2024-04-01 07:35:15

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.01353v1

Exploring and Evaluating Hallucinations in LLM-Powered Code Generation

The rise of Large Language Models (LLMs) has significantly advanced many applications on software engineering tasks, particularly in code generation. Despite the promising performance, LLMs are prone to generate hallucinations, which means LLMs might produce outputs that deviate from users' intent, exhibit internal inconsistencies, or misalign with the factual knowledge, making the deployment of LLMs potentially risky in a wide range of applications. Existing work mainly focuses on investing the hallucination in the domain of natural language generation (NLG), leaving a gap in understanding the types and extent of hallucinations in the context of code generation. To bridge the gap, we conducted a thematic analysis of the LLM-generated code to summarize and categorize the hallucinations present in it. Our study established a comprehensive taxonomy of hallucinations in LLM-generated code, encompassing 5 primary categories of hallucinations depending on the conflicting objectives and varying degrees of deviation observed in code generation. Furthermore, we systematically analyzed the distribution of hallucinations, exploring variations among different LLMs and their correlation with code correctness. Based on the results, we proposed HalluCode, a benchmark for evaluating the performance of code LLMs in recognizing hallucinations. Hallucination recognition and mitigation experiments with HalluCode and HumanEval show existing LLMs face great challenges in recognizing hallucinations, particularly in identifying their types, and are hardly able to mitigate hallucinations. We believe our findings will shed light on future research about hallucination evaluation, detection, and mitigation, ultimately paving the way for building more effective and reliable code LLMs in the future.

Updated: 2024-04-01 07:31:45

标题: 探索和评估LLM驱动的代码生成中的幻觉

摘要: 大型语言模型（LLMs）的兴起显著推进了许多软件工程任务的应用，特别是在代码生成方面。尽管表现有希望，LLMs很容易产生幻觉，这意味着LLMs可能会产生与用户意图不符的输出，表现出内部不一致性，或者与事实知识不一致，使LLMs在各种应用中的部署潜在风险。现有工作主要集中在自然语言生成（NLG）领域中进行幻觉研究，留下了对代码生成环境中幻觉类型和程度的理解空白。为了弥合这一差距，我们对由LLM生成的代码进行了主题分析，总结和分类其中存在的幻觉。我们的研究建立了LLM生成代码中幻觉的综合分类法，包括5个主要类别的幻觉，取决于代码生成中所观察到的冲突目标和不同程度的偏离。此外，我们系统地分析了幻觉的分布，探索了不同LLMs之间的变化及其与代码正确性的相关性。根据结果，我们提出了HalluCode，一个用于评估代码LLMs在识别幻觉方面表现的基准。通过使用HalluCode和HumanEval进行幻觉识别和缓解实验，我们发现现有的LLMs在识别幻觉方面面临巨大挑战，特别是在识别其类型方面，并且几乎无法减轻幻觉。我们相信我们的研究结果将为幻觉评估、检测和缓解的未来研究提供启示，最终为未来构建更有效可靠的代码LLMs铺平道路。

更新时间: 2024-04-01 07:31:45

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.00971v1

Towards Learning a Generalist Model for Embodied Navigation

Building a generalist agent that can interact with the world is the intriguing target of AI systems, thus spurring the research for embodied navigation, where an agent is required to navigate according to instructions or respond to queries. Despite the major progress attained, previous works primarily focus on task-specific agents and lack generalizability to unseen scenarios. Recently, LLMs have presented remarkable capabilities across various fields, and provided a promising opportunity for embodied navigation. Drawing on this, we propose the first generalist model for embodied navigation, NaviLLM. It adapts LLMs to embodied navigation by introducing schema-based instruction. The schema-based instruction flexibly casts various tasks into generation problems, thereby unifying a wide range of tasks. This approach allows us to integrate diverse data sources from various datasets into the training, equipping NaviLLM with a wide range of capabilities required by embodied navigation. We conduct extensive experiments to evaluate the performance and generalizability of our model. The experimental results demonstrate that our unified model achieves state-of-the-art performance on CVDN, SOON, and ScanQA. Specifically, it surpasses the previous stats-of-the-art method by a significant margin of 29% in goal progress on CVDN. Moreover, our model also demonstrates strong generalizability and presents impressive results on unseen tasks, e.g., embodied question answering and 3D captioning.

Updated: 2024-04-01 07:21:52

标题: 朝向学习一个通用模型以实现具象导航

摘要: 构建一个可以与世界互动的通用代理是人工智能系统的一个引人注目的目标，因此推动了对具体导航的研究，其中代理需要根据指示进行导航或回答查询。尽管取得了重大进展，先前的工作主要集中在特定任务代理上，缺乏对未知情景的泛化能力。最近，大型语言模型在各个领域展示了显著的能力，并为具体导航提供了一个有前途的机会。基于此，我们提出了第一个具体导航的通用模型NaviLLM。它通过引入基于模式的指示将大型语言模型适应到具体导航中。基于模式的指示将各种任务灵活地转化为生成问题，从而统一了各种任务。这种方法使我们能够将来自各种数据集的多样化数据源整合到训练中，为NaviLLM提供具体导航所需的广泛能力。我们进行了广泛的实验来评估我们模型的性能和泛化能力。实验结果表明，我们的统一模型在CVDN、SOON和ScanQA上实现了最先进的性能。具体来说，在CVDN上，它在目标进展方面超过了以前的最先进方法29%。此外，我们的模型还展示了很强的泛化能力，并在未知任务上取得了令人印象深刻的结果，例如具体问题回答和3D标题。

更新时间: 2024-04-01 07:21:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.02010v3

Diffusion-Driven Domain Adaptation for Generating 3D Molecules

Can we train a molecule generator that can generate 3D molecules from a new domain, circumventing the need to collect data? This problem can be cast as the problem of domain adaptive molecule generation. This work presents a novel and principled diffusion-based approach, called GADM, that allows shifting a generative model to desired new domains without the need to collect even a single molecule. As the domain shift is typically caused by the structure variations of molecules, e.g., scaffold variations, we leverage a designated equivariant masked autoencoder (MAE) along with various masking strategies to capture the structural-grained representations of the in-domain varieties. In particular, with an asymmetric encoder-decoder module, the MAE can generalize to unseen structure variations from the target domains. These structure variations are encoded with an equivariant encoder and treated as domain supervisors to control denoising. We show that, with these encoded structural-grained domain supervisors, GADM can generate effective molecules within the desired new domains. We conduct extensive experiments across various domain adaptation tasks over benchmarking datasets. We show that our approach can improve up to 65.6% in terms of success rate defined based on molecular validity, uniqueness, and novelty compared to alternative baselines.

Updated: 2024-04-01 07:12:27

标题: 扩散驱动的领域适应性用于生成3D分子

摘要: 我们能否训练一个能够从新领域生成3D分子的分子生成器，避免收集数据的必要性？这个问题可以被看作是领域自适应分子生成的问题。这项工作提出了一种新颖且原则性的基于扩散的方法，名为GADM，它允许将一个生成模型转移到所需的新领域，而无需收集一种分子。由于领域转移通常是由分子的结构变化引起的，例如支架变化，我们利用一个指定的等变掩码自编码器（MAE）以及各种掩码策略来捕捉领域内变化的结构粒度表示。特别是，通过一个不对称的编码器-解码器模块，MAE可以泛化到目标领域中看不见的结构变化。这些结构变化被编码为等变编码器，并被视为领域监督员来控制去噪。我们展示了，通过这些编码的结构粒度领域监督员，GADM可以在所需的新领域内生成有效的分子。我们在各种基准数据集上进行了大量实验，涵盖了各种领域自适应任务。我们展示了，相对于替代基线，我们的方法在分子的有效性、独特性和新颖性定义的成功率方面可以提高高达65.6%。

更新时间: 2024-04-01 07:12:27

领域: cs.LG,physics.chem-ph,q-bio.BM

下载: http://arxiv.org/abs/2404.00962v1

ArEEG_Chars: Dataset for Envisioned Speech Recognition using EEG for Arabic Characters

Brain-Computer-Interface (BCI) has been a hot research topic in the last few years that could help paralyzed people in their lives. Several researches were done to classify electroencephalography (EEG) signals automatically into English characters and words. Arabic language is one of the most used languages around the world. However, to the best of our knowledge, there is no dataset for Arabic characters EEG signals. In this paper, we have created an EEG dataset for Arabic characters and named it ArEEG_Chars. Moreover, several experiments were done on ArEEG_Chars using deep learning. Best results were achieved using LSTM and reached an accuracy of 97%. ArEEG_Chars dataset will be public for researchers.

Updated: 2024-04-01 06:59:21

标题: ArEEG_Chars：使用脑电图进行阿拉伯字符的设想语音识别数据集

摘要: 脑-计算机接口（BCI）是近几年热门研究课题，可以帮助瘫痪患者改善生活。有几项研究旨在将脑电图（EEG）信号自动分类为英文字母和单词。阿拉伯语是世界上使用最广泛的语言之一。然而据我们所知，尚无针对阿拉伯字符的EEG信号数据集。本文创建了一个用于阿拉伯字符的EEG数据集，并命名为ArEEG_Chars。此外，还使用深度学习对ArEEG_Chars进行了多次实验。最佳结果是使用LSTM获得了97%的准确度。ArEEG_Chars数据集将对研究人员公开。

更新时间: 2024-04-01 06:59:21

领域: cs.HC,cs.CL,cs.LG,eess.SP

下载: http://arxiv.org/abs/2402.15733v2

Invariant Representation via Decoupling Style and Spurious Features from Images

This paper considers the out-of-distribution (OOD) generalization problem under the setting that both style distribution shift and spurious features exist and domain labels are missing. This setting frequently arises in real-world applications and is underlooked because previous approaches mainly handle either of these two factors. The critical challenge is decoupling style and spurious features in the absence of domain labels. To address this challenge, we first propose a structural causal model (SCM) for the image generation process, which captures both style distribution shift and spurious features. The proposed SCM enables us to design a new framework called IRSS, which can gradually separate style distribution and spurious features from images by introducing adversarial neural networks and multi-environment optimization, thus achieving OOD generalization. Moreover, it does not require additional supervision (e.g., domain labels) other than the images and their corresponding labels. Experiments on benchmark datasets demonstrate that IRSS outperforms traditional OOD methods and solves the problem of Invariant risk minimization (IRM) degradation, enabling the extraction of invariant features under distribution shift.

Updated: 2024-04-01 06:57:31

标题: 通过将风格和虚假特征与图像解耦，实现不变表示

摘要: 本文考虑了在样式分布变化和虚假特征存在且缺少域标签的情况下，超出分布（OOD）泛化问题。这种设置在现实世界的应用中经常出现，但却被忽视，因为先前的方法主要处理这两个因素之一。关键挑战是在没有域标签的情况下解耦样式和虚假特征。为了解决这一挑战，我们首先提出了一个用于图像生成过程的结构因果模型（SCM），它捕捉了样式分布变化和虚假特征。所提出的SCM使我们能够设计一个名为IRSS的新框架，通过引入对抗神经网络和多环境优化逐渐从图像中分离样式分布和虚假特征，从而实现OOD泛化。此外，它不需要除图像及其对应标签之外的额外监督（例如域标签）。基准数据集上的实验证明，IRSS优于传统OOD方法，并解决了不变风险最小化（IRM）退化问题，实现在分布变化下提取不变特征。

更新时间: 2024-04-01 06:57:31

领域: cs.CV,cs.AI,I.2.6; I.2.10

下载: http://arxiv.org/abs/2312.06226v2

Energy-Guided Data Sampling for Traffic Prediction with Mini Training Datasets

Recent endeavors aimed at forecasting future traffic flow states through deep learning encounter various challenges and yield diverse outcomes. A notable obstacle arises from the substantial data requirements of deep learning models, a resource often scarce in traffic flow systems. Despite the abundance of domain knowledge concerning traffic flow dynamics, prevailing deep learning methodologies frequently fail to fully exploit it. To address these issues, we propose an innovative solution that merges Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) architecture to enhance the prediction of traffic flow dynamics. A key revelation of our research is the feasibility of sampling training data for large traffic systems from simulations conducted on smaller traffic systems. This insight suggests the potential for referencing a macroscopic-level distribution to inform the sampling of microscopic data. Such sampling is facilitated by the observed scale invariance in the normalized energy distribution of the statistical mechanics model, thereby streamlining the data generation process for large-scale traffic systems. Our simulations demonstrate promising agreement between predicted and actual traffic flow dynamics, underscoring the efficacy of our proposed approach.

Updated: 2024-04-01 06:51:56

标题: 能量引导数据抽样技术在小型训练数据集下的交通预测应用

摘要: 最近针对通过深度学习预测未来交通流状态的努力遇到各种挑战，并产生了不同的结果。一个显著的障碍来自深度学习模型对大量数据的需求，这在交通流系统中往往是稀缺的资源。尽管关于交通流动力学的领域知识丰富，但普遍的深度学习方法经常未能充分利用它。为了解决这些问题，我们提出了一种创新解决方案，将卷积神经网络（CNNs）与长短期记忆（LSTM）架构合并，以增强交通流动力学的预测能力。我们研究的一个关键发现是从在较小的交通系统上进行的模拟中对大型交通系统的训练数据进行采样的可行性。这一见解表明参考宏观水平的分布以指导微观数据的采样的潜力。观察到统计力学模型中的能量分布的归一化具有尺度不变性，从而简化了大规模交通系统的数据生成过程。我们的模拟展示了预测交通流动力学和实际交通流动力学之间的有希望的一致性，强调了我们提出的方法的有效性。

更新时间: 2024-04-01 06:51:56

领域: cs.LG

下载: http://arxiv.org/abs/2403.18710v2

Simple Policy Optimization

PPO (Proximal Policy Optimization) algorithm has demonstrated excellent performance in many fields, and it is considered as a simple version of TRPO (Trust Region Policy Optimization) algorithm. However, the ratio clipping operation in PPO may not always effectively enforce the trust region constraints, this can be a potential factor affecting the stability of the algorithm. In this paper, we propose Simple Policy Optimization (SPO) algorithm, which introduces a novel clipping method for KL divergence between the old and current policies. Extensive experimental results in Atari 2600 environments indicate that, compared to the mainstream variants of PPO, SPO achieves better sample efficiency, extremely low KL divergence, and higher policy entropy, and is robust to the increase in network depth or complexity. More importantly, SPO maintains the simplicity of an unconstrained first-order algorithm. Code is available at https://github.com/MyRepositories-hub/Simple-Policy-Optimization.

Updated: 2024-04-01 06:51:38

标题: 简单的策略优化

摘要: PPO（Proximal Policy Optimization）算法在许多领域表现出色，被认为是TRPO（Trust Region Policy Optimization）算法的简化版本。然而，在PPO中的比例剪切操作并不总是有效地强制执行信任区域约束，这可能是影响算法稳定性的潜在因素。本文提出了Simple Policy Optimization（SPO）算法，引入了一种新颖的剪切方法，用于旧策略和当前策略之间的KL散度。在Atari 2600环境中进行的大量实验结果表明，与PPO的主流变体相比，SPO实现了更好的样本效率，极低的KL散度和更高的策略熵，并且对网络深度或复杂性的增加具有鲁棒性。更重要的是，SPO保持了一个无约束的一阶算法的简单性。代码可在https://github.com/MyRepositories-hub/Simple-Policy-Optimization找到。

更新时间: 2024-04-01 06:51:38

领域: cs.LG

下载: http://arxiv.org/abs/2401.16025v3

Resolving Ethics Trade-offs in Implementing Responsible AI

While the operationalisation of high-level AI ethics principles into practical AI/ML systems has made progress, there is still a theory-practice gap in managing tensions between the underlying AI ethics aspects. We cover five approaches for addressing the tensions via trade-offs, ranging from rudimentary to complex. The approaches differ in the types of considered context, scope, methods for measuring contexts, and degree of justification. None of the approaches is likely to be appropriate for all organisations, systems, or applications. To address this, we propose a framework which consists of: (i) proactive identification of tensions, (ii) prioritisation and weighting of ethics aspects, (iii) justification and documentation of trade-off decisions. The proposed framework aims to facilitate the implementation of well-rounded AI/ML systems that are appropriate for potential regulatory requirements.

Updated: 2024-04-01 06:50:45

标题: 解决在实施负责任人工智能中的伦理权衡问题

摘要: 尽管将高级AI伦理原则的操作化转化为实际的AI/ML系统取得了进展，但在处理潜在AI伦理方面的紧张关系方面仍存在理论实践差距。我们涵盖了通过权衡来解决这些紧张关系的五种方法，从基本到复杂不等。这些方法在考虑的背景、范围、衡量背景的方法和正当化程度上有所不同。这些方法中没有一种可能适用于所有组织、系统或应用。为了解决这个问题，我们提出了一个框架，包括：（i）主动识别紧张关系，（ii）对伦理方面进行优先排序和加权，（iii）正当化和记录权衡决策。所提出的框架旨在促进实施全面的AI/ML系统，以满足潜在的监管要求。

更新时间: 2024-04-01 06:50:45

领域: cs.CY,cs.AI,68T01,K.4.1; I.2.m; C.4

下载: http://arxiv.org/abs/2401.08103v3

iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer

In the last few years, the success of Transformers in computer vision has stimulated the discovery of many alternative models that compete with Transformers, such as the MLP-Mixer. Despite their weak inductive bias, these models have achieved performance comparable to well-studied convolutional neural networks. Recent studies on modern Hopfield networks suggest the correspondence between certain energy-based associative memory models and Transformers or MLP-Mixer, and shed some light on the theoretical background of the Transformer-type architectures design. In this paper, we generalize the correspondence to the recently introduced hierarchical Hopfield network, and find iMixer, a novel generalization of MLP-Mixer model. Unlike ordinary feedforward neural networks, iMixer involves MLP layers that propagate forward from the output side to the input side. We characterize the module as an example of invertible, implicit, and iterative mixing module. We evaluate the model performance with various datasets on image classification tasks, and find that iMixer, despite its unique architecture, exhibits stable learning capabilities and achieves performance comparable to or better than the baseline vanilla MLP-Mixer. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.

Updated: 2024-04-01 06:42:17

标题: iMixer：层次Hopfield网络意味着可逆、隐式和迭代的MLP-Mixer

摘要: 在过去几年中，Transformers在计算机视觉领域取得了成功，这激发了许多与Transformers竞争的替代模型的发现，如MLP-Mixer。尽管它们的归纳偏差较弱，这些模型已经实现了与经过深入研究的卷积神经网络相媲美的性能。最近对现代Hopfield网络的研究表明，基于能量的关联记忆模型与Transformers或MLP-Mixer之间存在对应关系，并对Transformer型架构设计的理论背景进行了一些阐释。在本文中，我们将这种对应关系推广到最近引入的分层Hopfield网络，并发现了iMixer，这是MLP-Mixer模型的一种新的泛化。与普通的前馈神经网络不同，iMixer包括从输出端向输入端传播的MLP层。我们将该模块描述为可逆、隐式和迭代混合模块的一个示例。我们通过在各种数据集上进行图像分类任务的模型性能评估发现，尽管其独特的架构，iMixer表现出稳定的学习能力，并实现了与基线vanilla MLP-Mixer相媲美或更好的性能。结果表明，Hopfield网络与Mixer模型之间的对应关系可作为理解更广泛的Transformer-like架构设计原则。

更新时间: 2024-04-01 06:42:17

领域: cs.LG,cond-mat.dis-nn,cs.CV,cs.NE

下载: http://arxiv.org/abs/2304.13061v2

RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models

In this paper, we investigate the in-context learning ability of retrieval-augmented encoder-decoder language models. We first conduct a comprehensive analysis of existing models and identify their limitations in in-context learning, primarily due to a mismatch between pretraining and inference, as well as a restricted context length. To address these issues, we propose RAVEN, a model that combines retrieval-augmented masked language modeling and prefix language modeling. We further introduce Fusion-in-Context Learning to enhance the few-shot performance by enabling the model to leverage more in-context examples without requiring additional training. Through extensive experiments, we demonstrate that our simple yet effective design significantly improves performance, achieving results comparable to the most advanced language models in certain scenarios, despite having substantially fewer parameters. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning and encourages further research in this direction.

Updated: 2024-04-01 06:32:12

标题: RAVEN：具有检索增强编码器-解码器语言模型的上下文学习

摘要: 在这篇论文中，我们研究了检索增强编码器-解码器语言模型的上下文学习能力。我们首先对现有模型进行了全面分析，并确定了它们在上下文学习中的局限性，主要是由于预训练和推理之间的不匹配，以及受限的上下文长度。为了解决这些问题，我们提出了RAVEN，这是一个结合了检索增强掩蔽语言建模和前缀语言建模的模型。我们进一步引入了Fusion-in-Context Learning来增强少样本性能，使模型能够利用更多上下文示例而无需额外训练。通过大量实验证明，我们简单而有效的设计显著提高了性能，在某些情况下达到了与最先进的语言模型相当的结果，尽管参数要少得多。我们的工作强调了检索增强编码器-解码器语言模型在上下文学习中的潜力，并鼓励在这个方向进一步研究。

更新时间: 2024-04-01 06:32:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2308.07922v2

Evalverse: Unified and Accessible Library for Large Language Model Evaluation

This paper introduces Evalverse, a novel library that streamlines the evaluation of Large Language Models (LLMs) by unifying disparate evaluation tools into a single, user-friendly framework. Evalverse enables individuals with limited knowledge of artificial intelligence to easily request LLM evaluations and receive detailed reports, facilitated by an integration with communication platforms like Slack. Thus, Evalverse serves as a powerful tool for the comprehensive assessment of LLMs, offering both researchers and practitioners a centralized and easily accessible evaluation framework. Finally, we also provide a demo video for Evalverse, showcasing its capabilities and implementation in a two-minute format.

Updated: 2024-04-01 06:03:39

标题: Evalverse: 大型语言模型评估的统一和易用库

摘要: 本文介绍了Evalverse，这是一个新颖的库，通过将不同的评估工具统一到一个用户友好的框架中，简化了大型语言模型（LLMs）的评估过程。Evalverse使对人工智能了解有限的个人能够轻松请求LLM评估并收到详细报告，通过与Slack等通信平台的集成实现。因此，Evalverse可作为LLMs全面评估的强大工具，为研究人员和实践者提供了一个集中且易于访问的评估框架。最后，我们还提供了Evalverse的演示视频，展示其能力和实现过程，以两分钟的格式呈现。

更新时间: 2024-04-01 06:03:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.00943v1

Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

The advent of Large Language Models (LLMs) has significantly transformed the AI landscape, enhancing machine learning and AI capabilities. Factuality issue is a critical concern for LLMs, as they may generate factually incorrect responses. In this paper, we propose GraphEval to evaluate an LLM's performance using a substantially large test dataset. Specifically, the test dataset is retrieved from a large knowledge graph with more than 10 million facts without expensive human efforts. Unlike conventional methods that evaluate LLMs based on generated responses, GraphEval streamlines the evaluation process by creating a judge model to estimate the correctness of the answers given by the LLM. Our experiments demonstrate that the judge model's factuality assessment aligns closely with the correctness of the LLM's generated outputs, while also substantially reducing evaluation costs. Besides, our findings offer valuable insights into LLM performance across different metrics and highlight the potential for future improvements in ensuring the factual integrity of LLM outputs. The code is publicly available at https://github.com/xz-liu/GraphEval.

Updated: 2024-04-01 06:01:17

标题: 使用大规模知识图谱评估大型语言模型的真实性

摘要: 大语言模型（LLMs）的出现显著改变了人工智能领域，增强了机器学习和人工智能的能力。事实性问题是LLMs面临的一个关键问题，因为它们可能生成事实不准确的响应。在本文中，我们提出了GraphEval，通过使用一个相当大的测试数据集来评估LLM的性能。具体来说，测试数据集是从一个包含超过1000万条事实的大型知识图中检索出来的，而无需昂贵的人力。与传统方法根据生成的响应对LLMs进行评估不同，GraphEval通过创建一个判定模型来简化评估过程，以估计LLM给出的答案的正确性。我们的实验表明，判定模型的事实性评估与LLM生成的输出的正确性密切相关，同时大大降低了评估成本。此外，我们的发现为LLM在不同指标上的表现提供了有价值的见解，并突显了未来改进以确保LLM输出事实完整性的潜力。代码可在https://github.com/xz-liu/GraphEval上公开获得。

更新时间: 2024-04-01 06:01:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.00942v1

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer

Video recognition systems are vulnerable to adversarial examples. Recent studies show that style transfer-based and patch-based unrestricted perturbations can effectively improve attack efficiency. These attacks, however, face two main challenges: 1) Adding large stylized perturbations to all pixels reduces the naturalness of the video and such perturbations can be easily detected. 2) Patch-based video attacks are not extensible to targeted attacks due to the limited search space of reinforcement learning that has been widely used in video attacks recently. In this paper, we focus on the video black-box setting and propose a novel attack framework named LogoStyleFool by adding a stylized logo to the clean video. We separate the attack into three stages: style reference selection, reinforcement-learning-based logo style transfer, and perturbation optimization. We solve the first challenge by scaling down the perturbation range to a regional logo, while the second challenge is addressed by complementing an optimization stage after reinforcement learning. Experimental results substantiate the overall superiority of LogoStyleFool over three state-of-the-art patch-based attacks in terms of attack performance and semantic preservation. Meanwhile, LogoStyleFool still maintains its performance against two existing patch-based defense methods. We believe that our research is beneficial in increasing the attention of the security community to such subregional style transfer attacks.

Updated: 2024-04-01 05:57:55

标题: LogoStyleFool: 通过标志风格转移破坏视频识别系统

摘要: 视频识别系统容易受到对抗性样本的攻击。最近的研究表明，基于风格转移和基于补丁的无限制扰动可以有效提高攻击效率。然而，这些攻击面临两个主要挑战：1）向所有像素添加大量风格化扰动会降低视频的自然性，并且这种扰动很容易被检测到。2）基于补丁的视频攻击无法扩展到有针对性的攻击，因为最近在视频攻击中广泛使用的强化学习的搜索空间有限。在本文中，我们专注于视频黑盒设置，并提出了一种名为LogoStyleFool的新型攻击框架，通过向干净视频添加风格化的标志来实现攻击。我们将攻击分为三个阶段：风格参考选择、基于强化学习的标志风格转移和扰动优化。我们通过将扰动范围缩小到区域性标志来解决第一个挑战，而通过在强化学习之后补充优化阶段来解决第二个挑战。实验结果证实，在攻击性能和语义保留方面，LogoStyleFool相对于三种最先进的基于补丁的攻击具有全面的优势。同时，LogoStyleFool仍然保持对两种现有基于补丁的防御方法的性能。我们相信我们的研究有助于引起安全社区对这种子区域风格转移攻击的关注。

更新时间: 2024-04-01 05:57:55

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2312.09935v2

StyleFool: Fooling Video Classification Systems via Style Transfer

Video classification systems are vulnerable to adversarial attacks, which can create severe security problems in video verification. Current black-box attacks need a large number of queries to succeed, resulting in high computational overhead in the process of attack. On the other hand, attacks with restricted perturbations are ineffective against defenses such as denoising or adversarial training. In this paper, we focus on unrestricted perturbations and propose StyleFool, a black-box video adversarial attack via style transfer to fool the video classification system. StyleFool first utilizes color theme proximity to select the best style image, which helps avoid unnatural details in the stylized videos. Meanwhile, the target class confidence is additionally considered in targeted attacks to influence the output distribution of the classifier by moving the stylized video closer to or even across the decision boundary. A gradient-free method is then employed to further optimize the adversarial perturbations. We carry out extensive experiments to evaluate StyleFool on two standard datasets, UCF-101 and HMDB-51. The experimental results demonstrate that StyleFool outperforms the state-of-the-art adversarial attacks in terms of both the number of queries and the robustness against existing defenses. Moreover, 50% of the stylized videos in untargeted attacks do not need any query since they can already fool the video classification model. Furthermore, we evaluate the indistinguishability through a user study to show that the adversarial samples of StyleFool look imperceptible to human eyes, despite unrestricted perturbations.

Updated: 2024-04-01 05:51:31

标题: StyleFool：通过风格转移愚弄视频分类系统

摘要: 视频分类系统容易受到对抗性攻击，这可能在视频验证中造成严重的安全问题。当前的黑盒攻击需要大量查询才能成功，导致攻击过程中的计算开销很高。另一方面，对带有限制扰动的攻击对于防御措施（如去噪或对抗性训练）是无效的。在本文中，我们专注于无限制扰动，并提出了StyleFool，一种通过风格转移进行黑盒视频对抗攻击以欺骗视频分类系统。StyleFool首先利用颜色主题接近性来选择最佳的风格图像，这有助于避免风格化视频中的不自然细节。同时，在有针对性的攻击中还额外考虑了目标类别的置信度，通过将风格化视频移动到或甚至穿过决策边界来影响分类器的输出分布。然后采用无梯度方法进一步优化对抗性扰动。我们进行了大量实验评估StyleFool在两个标准数据集UCF-101和HMDB-51上的表现。实验结果表明，StyleFool在查询数量和对抗现有防御措施的鲁棒性方面优于最先进的对抗性攻击。此外，50%的风格化视频在无针对性攻击中不需要任何查询，因为它们已经可以欺骗视频分类模型。此外，我们通过用户研究评估了不可辨别性，结果表明StyleFool的对抗样本在无限制扰动的情况下对人眼看起来是不可察觉的。

更新时间: 2024-04-01 05:51:31

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2203.16000v4

Can Large Language Models Write Parallel Code?

Large language models are increasingly becoming a popular tool for software development. Their ability to model and generate source code has been demonstrated in a variety of contexts, including code completion, summarization, translation, and lookup. However, they often struggle to generate code for complex programs. In this paper, we study the capabilities of state-of-the-art language models to generate parallel code. In order to evaluate language models, we create a benchmark, ParEval, consisting of prompts that represent 420 different coding tasks. We use ParEval to evaluate the effectiveness of several state-of-the-art open- and closed-source language models on these tasks. We introduce novel metrics for evaluating the performance of generated code, and use them to explore how well each LLM performs for 12 different computational problem types and six different parallel programming models.

Updated: 2024-04-01 05:34:36

标题: 大型语言模型能编写并行代码吗？

摘要: 大型语言模型越来越成为软件开发的热门工具。它们已经在各种情境下展示了对源代码建模和生成的能力，包括代码补全、摘要、翻译和查找。然而，它们通常难以为复杂的程序生成代码。本文研究了最先进的语言模型生成并行代码的能力。为了评估语言模型，我们创建了一个基准测试ParEval，其中包含代表420个不同编码任务的提示。我们使用ParEval来评估几种最先进的开源和闭源语言模型在这些任务上的有效性。我们引入了用于评估生成代码性能的新型指标，并使用它们来探索每个LLM在12种不同的计算问题类型和六种不同的并行编程模型中的表现如何。

更新时间: 2024-04-01 05:34:36

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2401.12554v2

FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation

Molecular docking is a pivotal process in drug discovery. While traditional techniques rely on extensive sampling and simulation governed by physical principles, these methods are often slow and costly. The advent of deep learning-based approaches has shown significant promise, offering increases in both accuracy and efficiency. Building upon the foundational work of FABind, a model designed with a focus on speed and accuracy, we present FABind+, an enhanced iteration that largely boosts the performance of its predecessor. We identify pocket prediction as a critical bottleneck in molecular docking and propose a novel methodology that significantly refines pocket prediction, thereby streamlining the docking process. Furthermore, we introduce modifications to the docking module to enhance its pose generation capabilities. In an effort to bridge the gap with conventional sampling/generative methods, we incorporate a simple yet effective sampling technique coupled with a confidence model, requiring only minor adjustments to the regression framework of FABind. Experimental results and analysis reveal that FABind+ remarkably outperforms the original FABind, achieves competitive state-of-the-art performance, and delivers insightful modeling strategies. This demonstrates FABind+ represents a substantial step forward in molecular docking and drug discovery. Our code is in https://github.com/QizhiPei/FABind.

Updated: 2024-04-01 05:18:57

标题: FABind+: 通过改进口袋预测和姿态生成增强分子对接

摘要: 分子对接是药物发现中的关键过程。传统技术依赖于由物理原理控制的广泛采样和模拟，这些方法通常速度缓慢且成本高昂。基于深度学习的方法的出现展现了显著的潜力，提高了准确性和效率。在以速度和准确性为重点设计的模型FABind的基础工作之上，我们提出了FABind+，这是一个大大提升了其前身性能的增强版本。我们认为口袋预测是分子对接中的一个关键瓶颈，并提出了一种显著改进口袋预测的新方法，从而简化了对接过程。此外，我们对对接模块进行了修改，以增强其姿态生成能力。为了弥合与传统采样/生成方法之间的差距，我们结合了一种简单而有效的采样技术和置信模型，只需要对FABind的回归框架进行轻微调整。实验结果和分析显示，FABind+明显优于原始的FABind，实现了有竞争力的最新性能，并提供了深入的建模策略。这表明FABind+代表了分子对接和药物发现领域的重大进步。我们的代码在https://github.com/QizhiPei/FABind。

更新时间: 2024-04-01 05:18:57

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.20261v2

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, hoping to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inherent bias. In this paper, we aim to provide a comprehensive analysis of MLLMs, delving deeply into discussions surrounding these critical issues. First of all, we start by presenting an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities. Secondly, we explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks that are crucial for enhancing the cross-lingual capability of MLLMs. Thirdly, we survey the existing studies on multilingual representations and investigate whether the current MLLMs can learn a universal language representation. Fourthly, we discuss bias on MLLMs including its category and evaluation metrics, and summarize the existing debiasing techniques. Finally, we discuss existing challenges and point out promising research directions. By demonstrating these aspects, this paper aims to facilitate a deeper understanding of MLLMs and their potentiality in various domains.

Updated: 2024-04-01 05:13:56

标题: 一份关于多语言大型语言模型的调查：语料库、对齐和偏见

摘要: 基于大型语言模型（LLMs）的基础，多语言大型语言模型（MLLMs）已经被开发出来以解决多语言自然语言处理任务所面临的挑战，希望实现从高资源语言到低资源语言的知识转移。然而，仍然存在重要的限制和挑战，如语言不平衡、多语言对齐和固有偏见。本文旨在对MLLMs进行全面分析，深入探讨围绕这些关键问题的讨论。首先，我们开始通过概述MLLMs，涵盖它们的演变、关键技术和多语言能力。其次，我们探讨了广泛使用的多语言语料库用于MLLMs的训练和面向下游任务的多语言数据集，这对增强MLLMs的跨语言能力至关重要。第三，我们调查了现有关于多语言表示的研究，并探讨当前MLLMs是否能够学习到通用语言表示。第四，我们讨论了MLLMs上的偏见，包括其类别和评估指标，并总结了现有的去偏见技术。最后，我们讨论了现有的挑战，并指出了有前途的研究方向。通过展示这些方面，本文旨在促进对MLLMs及其在各个领域中的潜力的更深入理解。

更新时间: 2024-04-01 05:13:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.00929v1

VortexViz: Finding Vortex Boundaries by Learning from Particle Trajectories

Vortices are studied in various scientific disciplines, offering insights into fluid flow behavior. Visualizing the boundary of vortices is crucial for understanding flow phenomena and detecting flow irregularities. This paper addresses the challenge of accurately extracting vortex boundaries using deep learning techniques. While existing methods primarily train on velocity components, we propose a novel approach incorporating particle trajectories (streamlines or pathlines) into the learning process. By leveraging the regional/local characteristics of the flow field captured by streamlines or pathlines, our methodology aims to enhance the accuracy of vortex boundary extraction.

Updated: 2024-04-01 05:12:55

标题: VortexViz：通过学习粒子轨迹找到涡旋边界

摘要: 涡旋在各种科学学科中得到研究，为了了解流体流动行为提供了见解。可视化涡旋边界对于理解流动现象和检测流动不规律性至关重要。本文讨论了使用深度学习技术准确提取涡旋边界的挑战。虽然现有方法主要是基于速度分量进行训练，我们提出了一种新颖的方法，将粒子轨迹（流线或路径线）纳入学习过程中。通过利用流线或路径线捕捉的流场的区域/局部特征，我们的方法旨在提高涡旋边界提取的准确性。

更新时间: 2024-04-01 05:12:55

领域: physics.flu-dyn,cs.AI,cs.CV,cs.GR

下载: http://arxiv.org/abs/2404.01352v1

Instance-Aware Group Quantization for Vision Transformers

Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model using only a small calibration set of unlabeled samples without retraining. PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts. Directly applying them to vision transformers (ViTs), however, incurs severe performance degradation, mainly due to the differences in architectures between CNNs and ViTs. In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs. To address this, we introduce instance-aware group quantization for ViTs (IGQ-ViT). To this end, we propose to split the channels of activation maps into multiple groups dynamically for each input instance, such that activations within each group share similar statistical properties. We also extend our scheme to quantize softmax attentions across tokens. In addition, the number of groups for each layer is adjusted to minimize the discrepancies between predictions from quantized and full-precision models, under a bit-operation (BOP) constraint. We show extensive experimental results on image classification, object detection, and instance segmentation, with various transformer architectures, demonstrating the effectiveness of our approach.

Updated: 2024-04-01 05:12:30

标题: 视觉变压器的实例感知组量化

摘要: 后训练量化（PTQ）是一种高效的模型压缩技术，它利用未标记样本的小型校准集对预训练的全精度模型进行量化，而无需重新训练。对卷积神经网络（CNNs）的PTQ方法提供了与全精度对应物相媲美的量化结果。然而，直接将它们应用于视觉变换器（ViTs）会导致严重的性能下降，主要是因为CNNs和ViTs之间的架构差异。特别是，每个通道的激活分布根据输入实例而变化，使得CNNs的PTQ方法不适用于ViTs。为了解决这个问题，我们引入了适用于ViTs的实例感知组量化（IGQ-ViT）。为此，我们建议动态地将激活图的通道分割成多个组，以便为每个输入实例，这样每个组内的激活具有相似的统计特性。我们还将我们的方案扩展到跨标记的softmax注意力的量化。此外，每个层的组数被调整以在位运算（BOP）约束下最小化量化模型和全精度模型之间的预测差异。我们展示了在图像分类、目标检测和实例分割方面的广泛实验结果，使用各种变换器架构，展示了我们方法的有效性。

更新时间: 2024-04-01 05:12:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.00928v1

Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Healthcare Professionals

This paper explores the evolving relationship between clinician trust in LLMs, the transformation of data sources from predominantly human-generated to AI-generated content, and the subsequent impact on the precision of LLMs and clinician competence. One of the primary concerns identified is the potential feedback loop that arises as LLMs become more reliant on their outputs for learning, which may lead to a degradation in output quality and a reduction in clinician skills due to decreased engagement with fundamental diagnostic processes. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in healthcare deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. A key takeaway from our investigation is the critical role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by offloading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. Moreover, we delve into the potential risks associated with LLMs' self-referential learning loops and the deskilling of healthcare professionals. The risk of LLMs operating within an echo chamber, where AI-generated content feeds into the learning algorithms, threatens the diversity and quality of the data pool, potentially entrenching biases and reducing the efficacy of LLMs.

Updated: 2024-04-01 05:03:45

标题: 大型语言模型与用户信任：自指学习循环的后果和医疗专业人员技能丧失

摘要: 本文探讨了临床医生对LLM的信任与数据来源从主要由人类生成转变为AI生成内容之间的关系的演变，以及对LLM的精准度和临床医生能力的影响。其中一个主要关注点是随着LLM越来越依赖其输出进行学习，可能导致输出质量下降以及由于与基础诊断过程的参与减少而导致临床医生技能减弱的潜在反馈循环的产生。尽管在这个阶段还是理论上的，但这种反馈循环提出了一个重要挑战，因为LLM在医疗保健中的整合加深，强调了需要积极对话和战略措施以确保LLM技术的安全有效使用。我们调查得出的一个关键结论是用户专业知识的关键作用以及对信任和验证LLM输出的审慎方法的必要性。本文强调了专业用户，特别是临床医生，如何利用LLM来增强生产力，通过卸载例行任务同时保持关键监督，以识别和纠正AI生成内容中潜在的不准确之处。信任和怀疑的平衡对于确保LLM增强而非削弱患者护理质量至关重要。此外，我们深入探讨了与LLM的自我参考学习循环和医疗保健专业人员技能下降相关的潜在风险。LLM在一个回音室内运行的风险，即AI生成内容反馈到学习算法中，威胁到数据池的多样性和质量，可能巩固偏见并降低LLM的效力。

更新时间: 2024-04-01 05:03:45

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.14691v2

Evaluation of large language models for discovery of gene set function

Gene set analysis is a mainstay of functional genomics, but it relies on curated databases of gene functions that are incomplete. Here we evaluate five Large Language Models (LLMs) for their ability to discover the common biological functions represented by a gene set, substantiated by supporting rationale, citations and a confidence assessment. Benchmarking against canonical gene sets from the Gene Ontology, GPT-4 confidently recovered the curated name or a more general concept (73% of cases), while benchmarking against random gene sets correctly yielded zero confidence. Gemini-Pro and Mixtral-Instruct showed ability in naming but were falsely confident for random sets, whereas Llama2-70b had poor performance overall. In gene sets derived from 'omics data, GPT-4 identified novel functions not reported by classical functional enrichment (32% of cases), which independent review indicated were largely verifiable and not hallucinations. The ability to rapidly synthesize common gene functions positions LLMs as valuable 'omics assistants.

Updated: 2024-04-01 05:02:23

标题: 评估大型语言模型在发现基因集功能方面的应用

摘要: 基因集分析是功能基因组学的支柱，但它依赖于不完整的基因功能数据库。在这里，我们评估了五种大型语言模型（LLMs）的能力，以发现由基因集表示的常见生物功能，并通过支持理由、引用和置信度评估予以证实。与基因本体论的规范基因集进行基准测试时，GPT-4自信地恢复了已编辑的名称或更一般的概念（73%的情况），而与随机基因集进行基准测试时，正确地产生了零置信度。Gemini-Pro和Mixtral-Instruct在命名方面表现出能力，但对于随机集却错误地表现出自信，而Llama2-70b的整体表现较差。在从'omics数据中导出的基因集中，GPT-4识别出了经典功能富集未报告的新功能（32%的情况），独立审查表明这些功能大部分是可验证的，而非幻觉。快速综合常见基因功能的能力使LLMs成为有价值的'omics助手。

更新时间: 2024-04-01 05:02:23

领域: q-bio.GN,cs.AI,cs.CL,q-bio.MN

下载: http://arxiv.org/abs/2309.04019v2

MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. Project Webpage: https://vita-group.github.io/MM3DGS-SLAM

Updated: 2024-04-01 04:57:41

标题: MM3DGS SLAM：利用视觉、深度和惯性测量的多模态3D高斯喷涂SLAM

摘要: 同时定位与地图构建对于位置跟踪和场景理解至关重要。基于3D高斯的地图表示使得能够使用多个摆放的相机对场景进行逼真重建和实时渲染。我们首次展示，使用3D高斯对地图表示，结合未摆放相机图像和惯性测量，可以实现准确的SLAM。我们的方法MM3DGS通过解决先前基于神经辐射场的表示的限制，实现了更快的渲染、尺度感知和改进的轨迹跟踪。我们的框架实现了基于关键帧的地图构建和跟踪，利用包含从预积分惯性测量、深度估计和光度渲染质量度量的相对姿态变换的损失函数。我们还发布了一个多模态数据集UT-MM，该数据集是从配备相机和惯性测量单元的移动机器人收集而来。对数据集中的几个场景进行的实验评估显示，与当前的3DGS SLAM最先进技术相比，MM3DGS在跟踪方面实现了3倍的改进，并且在光度渲染质量方面实现了5%的改进，同时允许实时渲染高分辨率密集的3D地图。项目网页：https://vita-group.github.io/MM3DGS-SLAM

更新时间: 2024-04-01 04:57:41

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.00923v1

Role Similarity Metric Based on Spanning Rooted Forest

As a fundamental issue in network analysis, structural node similarity has received much attention in academia and is adopted in a wide range of applications. Among these proposed structural node similarity measures, role similarity stands out because of satisfying several axiomatic properties including automorphism conformation. Existing role similarity metrics cannot handle top-k queries on large real-world networks due to the high time and space cost. In this paper, we propose a new role similarity metric, namely \textsf{ForestSim}. We prove that \textsf{ForestSim} is an admissible role similarity metric and devise the corresponding top-k similarity search algorithm, namely \textsf{ForestSimSearch}, which is able to process a top-k query in $O(k)$ time once the precomputation is finished. Moreover, we speed up the precomputation by using a fast approximate algorithm to compute the diagonal entries of the forest matrix, which reduces the time and space complexity of the precomputation to $O(\epsilon^{-2}m\log^5{n}\log{\frac{1}{\epsilon}})$ and $O(m\log^3{n})$, respectively. Finally, we conduct extensive experiments on 26 real-world networks. The results show that \textsf{ForestSim} works efficiently on million-scale networks and achieves comparable performance to the state-of-art methods.

Updated: 2024-04-01 04:51:37

标题: 基于跨越根树的角色相似性度量

摘要: 作为网络分析中的一个基本问题，结构节点相似性在学术界引起了广泛关注，并被应用于各种应用中。在提出的结构节点相似性度量中，角色相似性因满足包括自同态确认在内的几个公设性质而脱颖而出。现有的角色相似性度量无法处理大型现实网络上的top-k查询，因为时间和空间成本高昂。在本文中，我们提出了一种新的角色相似性度量，即\textsf{ForestSim}。我们证明\textsf{ForestSim}是一个合理的角色相似性度量，并设计了相应的top-k相似性搜索算法，即\textsf{ForestSimSearch}，一旦预计算完成，就能在$O(k)$时间内处理一个top-k查询。此外，我们通过使用一个快速近似算法计算森林矩阵的对角线条目，加速了预计算过程，将预计算的时间和空间复杂度分别降低到$O(\epsilon^{-2}m\log^5{n}\log{\frac{1}{\epsilon}})$和$O(m\log^3{n})。最后，我们在26个真实世界网络上进行了大量实验。实验结果表明，\textsf{ForestSim}在百万级网络上高效运行，并且达到了与最先进方法相当的性能水平。

更新时间: 2024-04-01 04:51:37

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2110.07872v2

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

We study offline reinforcement learning (RL) in partially observable Markov decision processes. In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. Such a dataset is confounded in the sense that the latent state simultaneously affects the action and the observation, which is prohibitive for existing offline RL algorithms. To this end, we propose the \underline{P}roxy variable \underline{P}essimistic \underline{P}olicy \underline{O}ptimization (\texttt{P3O}) algorithm, which addresses the confounding bias and the distributional shift between the optimal and behavior policies in the context of general function approximation. At the core of \texttt{P3O} is a coupled sequence of pessimistic confidence regions constructed via proximal causal inference, which is formulated as minimax estimation. Under a partial coverage assumption on the confounded dataset, we prove that \texttt{P3O} achieves a $n^{-1/2}$-suboptimality, where $n$ is the number of trajectories in the dataset. To our best knowledge, \texttt{P3O} is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.

Updated: 2024-04-01 04:49:15

标题: 在困扰因素面前的悲观主义：部分可观测马尔可夫决策过程中可证实高效的离线强化学习

摘要: 我们研究部分可观察马尔可夫决策过程中的离线强化学习（RL）。特别是，我们的目标是从一个由可能依赖潜在状态的行为策略收集的数据集中学习最优策略。这样的数据集在某种意义上是混淆的，因为潜在状态同时影响动作和观测，这对现有的离线RL算法是禁止的。为此，我们提出了代理变量悲观策略优化（P3O）算法，该算法解决了混淆偏差和最优和行为策略之间的分布偏移在一般函数逼近的背景下。在P3O的核心是通过近因果推断构建的悲观置信区间的耦合序列，该序列被制定为极小估计。在对混淆数据集的部分覆盖假设下，我们证明了P3O实现了n^{-1/2}次最优性，其中n是数据集中轨迹的数量。据我们所知，P3O是第一个在POMDPs中具有混淆数据集的离线RL算法，且具有可证有效性。

更新时间: 2024-04-01 04:49:15

领域: cs.LG,cs.AI,math.ST,stat.ME,stat.ML,stat.TH

下载: http://arxiv.org/abs/2205.13589v3

Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation

Talking face generation is the challenging task of synthesizing a natural and realistic face that requires accurate synchronization with a given audio. Due to co-articulation, where an isolated phone is influenced by the preceding or following phones, the articulation of a phone varies upon the phonetic context. Therefore, modeling lip motion with the phonetic context can generate more spatio-temporally aligned lip movement. In this respect, we investigate the phonetic context in generating lip motion for talking face generation. We propose Context-Aware Lip-Sync framework (CALS), which explicitly leverages phonetic context to generate lip movement of the target face. CALS is comprised of an Audio-to-Lip module and a Lip-to-Face module. The former is pretrained based on masked learning to map each phone to a contextualized lip motion unit. The contextualized lip motion unit then guides the latter in synthesizing a target identity with context-aware lip motion. From extensive experiments, we verify that simply exploiting the phonetic context in the proposed CALS framework effectively enhances spatio-temporal alignment. We also demonstrate the extent to which the phonetic context assists in lip synchronization and find the effective window size for lip generation to be approximately 1.2 seconds.

Updated: 2024-04-01 04:45:30

标题: 探索基于语音上下文的唇部同步技术，用于生成说话人脸

摘要: 说话面部生成是合成一个自然和逼真面部的挑战性任务，需要与给定音频准确同步。由于共同发音，其中一个孤立的音素受到前后音素的影响，一个音素的发音会根据语音上下文而变化。因此，用语音上下文建模唇部运动可以生成更多时空对齐的唇部运动。在这方面，我们研究了用于说话面部生成的唇部运动的语音上下文。我们提出了一种明确利用语音上下文生成目标面部唇部运动的上下文感知唇部同步框架（CALS）。CALS由一个从音频到唇部的模块和一个从唇部到面部的模块组成。前者基于掩蔽学习进行预训练，将每个音素映射到一个上下文化的唇部运动单元。然后，上下文化的唇部运动单元指导后者合成具有上下文感知唇部运动的目标身份。通过广泛的实验，我们验证了在提出的CALS框架中简单地利用语音上下文有效地增强了时空对齐。我们还展示了语音上下文在唇部同步中的帮助程度，并发现唇部生成的有效窗口大小约为1.2秒。

更新时间: 2024-04-01 04:45:30

领域: cs.CV,cs.AI,cs.SD,eess.AS,eess.IV

下载: http://arxiv.org/abs/2305.19556v3

Token-Efficient Leverage Learning in Large Language Models

Large Language Models (LLMs) have excelled in various tasks but perform better in high-resource scenarios, which presents challenges in low-resource scenarios. Data scarcity and the inherent difficulty of adapting LLMs to specific tasks compound the challenge. To address the twin hurdles, we introduce \textbf{Leverage Learning}. We present a streamlined implement of this methodology called Token-Efficient Leverage Learning (TELL). TELL showcases the potential of Leverage Learning, demonstrating effectiveness across various LLMs and low-resource tasks, ranging from $10^4$ to $10^6$ tokens. It reduces task data requirements by up to nearly an order of magnitude compared to conventional Supervised Fine-Tuning (SFT) while delivering competitive performance. With the same amount of task data, TELL leads in improving task performance compared to SFT. We discuss the mechanism of Leverage Learning, suggesting it aligns with quantization hypothesis and explore its promising potential through empirical testing.

Updated: 2024-04-01 04:39:44

标题: 大语言模型中令牌高效杠杆学习

摘要: 大型语言模型（LLMs）在各种任务中表现出色，但在高资源场景中表现更好，这在低资源场景中会带来挑战。数据稀缺和将LLMs调整到特定任务的固有困难使这一挑战更加复杂。为了解决这一双重障碍，我们引入了\textbf{Leverage Learning}。我们提出了这一方法的简化实现，称为Token-Efficient Leverage Learning（TELL）。TELL展示了Leverage Learning的潜力，展示了在各种LLMs和低资源任务中的有效性，涵盖了从$10^4$到$10^6$个标记的范围。与传统的监督微调（SFT）相比，它将任务数据需求降低了近一个数量级，同时提供了竞争性的性能。在相同数量的任务数据的情况下，TELL在提高任务性能方面领先于SFT。我们讨论了Leverage Learning的机制，提出它与量化假设一致，并通过经验测试探索了其有前途的潜力。

更新时间: 2024-04-01 04:39:44

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.00914v1

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, which introduce extra modules or additional input sequences to inject new skills or knowledge, may compromise the innate abilities of LLMs. In this paper, we propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. Specifically, the LLaMA-Excitor does not directly change the intermediate hidden state during the self-attention calculation of the transformer structure. We designed the Excitor block as a bypass module for the similarity score computation in LLMs' self-attention to reconstruct keys and change the importance of values by learnable prompts. LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions, thus effectively preserving LLMs' pre-trained knowledge when fine-tuning LLMs on low-quality instruction-following datasets. Furthermore, we unify the modeling of multi-modal tuning and language-only tuning, extending LLaMA-Excitor to a powerful visual instruction follower without the need for complex multi-modal alignment. Our proposed approach is evaluated in language-only and multi-modal tuning experimental scenarios. Notably, LLaMA-Excitor is the only method that maintains basic capabilities while achieving a significant improvement (+6%) on the MMLU benchmark. In the visual instruction tuning, we achieve a new state-of-the-art image captioning performance of 157.5 CIDEr on MSCOCO, and a comparable performance (88.39%) on ScienceQA to cutting-edge models with more parameters and extensive vision-language pertaining.

Updated: 2024-04-01 04:39:21

标题: LLaMA-Excitor：通过间接特征交互进行通用指令调整

摘要: 现有的微调LLMs的方法，如Adapter、Prefix-tuning和LoRA，引入了额外的模块或附加输入序列，以注入新的技能或知识，但可能会损害LLMs的固有能力。在本文中，我们提出了LLaMA-Excitor，这是一种轻量级方法，通过逐渐更多地关注有价值的信息来激发LLMs更好地遵循指令的潜力。具体而言，LLaMA-Excitor不会直接改变transformer结构中自注意力计算过程中的中间隐藏状态。我们设计了Excitor块作为LLMs自注意力中相似度评分计算的旁路模块，以通过可学习的提示重新构建键并改变值的重要性。LLaMA-Excitor确保对输入指令自适应地分配额外的关注，从而在微调LLMs时有效保留LLMs的预训练知识，尤其是在低质量指令遵循数据集上。此外，我们统一了多模态调整和仅语言调整的建模，将LLaMA-Excitor扩展为一个强大的视觉指令跟随者，无需复杂的多模态对齐。我们提出的方法在仅语言和多模态调整实验场景中进行了评估。值得注意的是，LLaMA-Excitor是唯一一种能够在MMLU基准测试上实现显著改进（+6％）的方法，同时保持基本功能。在视觉指令调整中，我们在MSCOCO上实现了新的基准图像字幕性能，达到了157.5的CIDEr分数，并在ScienceQA上实现了88.39％的可比性能，与具有更多参数和广泛视觉-语言相关性的尖端模型相当。

更新时间: 2024-04-01 04:39:21

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.00913v1

Causal State Distillation for Explainable Reinforcement Learning

Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond sub-rewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: causal sufficiency, sparseness, and orthogonality. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agent's action selections.

Updated: 2024-04-01 04:31:34

标题: 因果状态提取用于可解释的强化学习

摘要: 强化学习（RL）是训练智能代理的强大技术，但理解这些代理为什么做出特定决策可能非常具有挑战性。RL模型缺乏透明度是一个长期存在的问题，使用户难以理解代理行为背后的原因。为解决这一问题，已经探索了各种方法，其中一种有前途的途径是奖励分解（RD）。RD具有吸引力，因为它避开了其他试图事后合理化代理行为的方法所涉及的一些问题。RD通过暴露在训练过程中有助于代理目标的奖励的各个方面来工作。然而，RD本身存在局限性，因为它主要基于子奖励提供见解，而不深入研究RL代理神经模型内部发生的复杂因果关系。在本文中，我们提出了一个超越子奖励的RD扩展，提供更具信息量的解释。我们的方法以因果学习框架为中心，利用信息论度量来促进三个因果因素的关键属性：因果充分性、稀疏性和正交性。这些属性帮助我们提炼代理状态和动作或奖励之间的因果关系，从而更深入地理解其决策过程。我们的框架旨在生成局部解释，并可应用于具有多个奖励通道的各种RL任务。通过一系列实验，我们证明我们的方法为代理的动作选择提供了更有意义和深刻的解释。

更新时间: 2024-04-01 04:31:34

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2401.00104v2

AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation

Test-time adaptation (TTA) has emerged as a viable solution to adapt pre-trained models to domain shifts using unlabeled test data. However, TTA faces challenges of adaptation failures due to its reliance on blind adaptation to unknown test samples in dynamic scenarios. Traditional methods for out-of-distribution performance estimation are limited by unrealistic assumptions in the TTA context, such as requiring labeled data or re-training models. To address this issue, we propose AETTA, a label-free accuracy estimation algorithm for TTA. We propose the prediction disagreement as the accuracy estimate, calculated by comparing the target model prediction with dropout inferences. We then improve the prediction disagreement to extend the applicability of AETTA under adaptation failures. Our extensive evaluation with four baselines and six TTA methods demonstrates that AETTA shows an average of 19.8%p more accurate estimation compared with the baselines. We further demonstrate the effectiveness of accuracy estimation with a model recovery case study, showcasing the practicality of our model recovery based on accuracy estimation. The source code is available at https://github.com/taeckyung/AETTA.

Updated: 2024-04-01 04:21:49

标题: AETTA: 无标签准确性估计用于测试时间适应

摘要: 测试时适应（TTA）已经成为一种可行的解决方案，可以利用未标记的测试数据来使预训练模型适应领域转移。然而，TTA面临适应失败的挑战，因为它依赖于对动态场景中未知测试样本的盲目适应。在TTA环境中，传统的超出分布性能估计方法受到限制，因为它们对于现实场景做出了不切实际的假设，比如需要有标记的数据或重新训练模型。为了解决这个问题，我们提出了AETTA，这是一种用于TTA的无标签准确度估计算法。我们提出了预测的不一致性作为准确度估计，通过比较目标模型预测与辍学推断来计算。然后，我们改进了预测的不一致性，以扩展AETTA在适应失败情况下的适用性。我们通过四个基线和六种TTA方法的广泛评估表明，与基线相比，AETTA的平均准确度估计提高了19.8个百分点。我们进一步通过一个模型恢复案例研究展示了准确度估计的有效性，展示了基于准确度估计的模型恢复的实用性。源代码可在https://github.com/taeckyung/AETTA 上找到。

更新时间: 2024-04-01 04:21:49

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.01351v1

Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape

In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection. Due to the multiple local updates and the isolated non-iid dataset, clients are prone to overfit into their own optima, which extremely deviates from the global objective and significantly undermines the performance. Most previous works only focus on enhancing the consistency between the local and global objectives to alleviate this prejudicial client drifts from the perspective of the optimization view, whose performance would be prominently deteriorated on the high heterogeneity. In this work, we propose a novel and general algorithm {\ttfamily FedSMOO} by jointly considering the optimization and generalization targets to efficiently improve the performance in FL. Concretely, {\ttfamily FedSMOO} adopts a dynamic regularizer to guarantee the local optima towards the global objective, which is meanwhile revised by the global Sharpness Aware Minimization (SAM) optimizer to search for the consistent flat minima. Our theoretical analysis indicates that {\ttfamily FedSMOO} achieves fast $\mathcal{O}(1/T)$ convergence rate with low generalization bound. Extensive numerical studies are conducted on the real-world dataset to verify its peerless efficiency and excellent generality.

Updated: 2024-04-01 04:21:28

标题: 《联邦学习中的动态正则化尖锐度感知最小化：接近全局一致性和平滑景观》

摘要: 在联邦学习（FL）中，一组本地客户端在全局服务器的协调下共同训练一个具有隐私保护的模型。由于多个本地更新和孤立的非独立同分布数据集，客户端容易过度拟合到自己的最优点，这与全局目标极端偏离，严重损害性能。大多数先前的工作只关注提升本地和全局目标之间的一致性，以减轻这种有偏见的客户端漂移，从优化视角来看，其性能在高异质性下会明显恶化。在这项工作中，我们提出了一种新颖且通用的算法FedSMOO，通过同时考虑优化和泛化目标来有效提高FL的性能。具体而言，FedSMOO采用动态正则化器来保证本地最优点朝向全局目标，同时由全局Sharpness Aware Minimization（SAM）优化器修正，以搜索一致的平坦极小值。我们的理论分析表明，FedSMOO在低泛化界限下实现了快速的O(1/T)收敛速度。对真实世界数据集进行了广泛的数值研究，验证了其无与伦比的效率和出色的通用性。

更新时间: 2024-04-01 04:21:28

领域: cs.LG,cs.DC,math.OC

下载: http://arxiv.org/abs/2305.11584v2

Maximizing User Experience with LLMOps-Driven Personalized Recommendation Systems

The integration of LLMOps into personalized recommendation systems marks a significant advancement in managing LLM-driven applications. This innovation presents both opportunities and challenges for enterprises, requiring specialized teams to navigate the complexity of engineering technology while prioritizing data security and model interpretability. By leveraging LLMOps, enterprises can enhance the efficiency and reliability of large-scale machine learning models, driving personalized recommendations aligned with user preferences. Despite ethical considerations, LLMOps is poised for widespread adoption, promising more efficient and secure machine learning services that elevate user experience and shape the future of personalized recommendation systems.

Updated: 2024-04-01 04:13:42

标题: 通过基于LLMOps的个性化推荐系统最大化用户体验

摘要: 将LLMOps集成到个性化推荐系统中标志着LLM驱动应用程序管理的重大进步。这种创新为企业带来了机遇和挑战，需要专门的团队在优先考虑数据安全和模型可解释性的同时，应对工程技术的复杂性。通过利用LLMOps，企业可以提高大规模机器学习模型的效率和可靠性，推动与用户偏好一致的个性化推荐。尽管存在伦理考虑，LLMOps已经准备好被广泛采用，承诺提供更高效和安全的机器学习服务，提升用户体验并塑造个性化推荐系统的未来。

更新时间: 2024-04-01 04:13:42

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.00903v1

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years. However, most of the existing theoretical support for AC algorithms focuses on the case of linear function approximations, or linearized neural networks, where the feature representation is fixed throughout training. Such a limitation fails to capture the key aspect of representation learning in neural AC, which is pivotal in practical problems. In this work, we take a mean-field perspective on the evolution and convergence of feature-based neural AC. Specifically, we consider a version of AC where the actor and critic are represented by overparameterized two-layer neural networks and are updated with two-timescale learning rates. The critic is updated by temporal-difference (TD) learning with a larger stepsize while the actor is updated via proximal policy optimization (PPO) with a smaller stepsize. In the continuous-time and infinite-width limiting regime, when the timescales are properly separated, we prove that neural AC finds the globally optimal policy at a sublinear rate. Additionally, we prove that the feature representation induced by the critic network is allowed to evolve within a neighborhood of the initial one.

Updated: 2024-04-01 04:09:22

标题: Wasserstein流遇上复制动力学：演员-评论家中表示学习的均场分析

摘要: 由神经网络赋予力量的演员-评论家（AC）算法在近年来取得了重要的实证成功。然而，大部分现有的AC算法的理论支持集中在线性函数逼近的情况下，或者是线性化神经网络，其中功能表示在整个训练过程中是固定的。这种限制未能捕捉神经AC中表示学习的关键方面，而这在实际问题中至关重要。在这项工作中，我们从平均场的角度看待基于特征的神经AC的演变和收敛。具体地，我们考虑一个AC的版本，其中演员和评论家由过参数化的两层神经网络表示，并且使用两个时间尺度学习率进行更新。评论家通过具有更大步长的时序差异（TD）学习进行更新，而演员通过较小步长的近端策略优化（PPO）进行更新。在连续时间和无限宽度的极限情况下，当时间尺度正确分离时，我们证明神经AC以次线性速率找到全局最优策略。此外，我们证明评论家网络引发的特征表示允许在初始特征表示的邻域内演变。

更新时间: 2024-04-01 04:09:22

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2112.13530v2

Using Persuasive Writing Strategies to Explain and Detect Health Misinformation

The spread of misinformation is a prominent problem in today's society, and many researchers in academia and industry are trying to combat it. Due to the vast amount of misinformation that is created every day, it is unrealistic to leave this task to human fact-checkers. Data scientists and researchers have been working on automated misinformation detection for years, and it is still a challenging problem today. The goal of our research is to add a new level to automated misinformation detection; classifying segments of text with persuasive writing techniques in order to produce interpretable reasoning for why an article can be marked as misinformation. To accomplish this, we present a novel annotation scheme containing many common persuasive writing tactics, along with a dataset with human annotations accordingly. For this task, we make use of a RoBERTa model for text classification, due to its high performance in NLP. We develop several language model-based baselines and present the results of our persuasive strategy label predictions as well as the improvements these intermediate labels make in detecting misinformation and producing interpretable results.

Updated: 2024-04-01 04:08:28

标题: 使用说服性写作策略解释和检测健康错误信息

摘要: 虚假信息的传播是当今社会的一个突出问题，许多学术界和工业界的研究人员正致力于解决这个问题。由于每天产生的大量虚假信息，将这一任务交给人工事实核查人员是不现实的。数据科学家和研究人员多年来一直在致力于自动化虚假信息检测，至今仍然是一个具有挑战性的问题。我们的研究目标是为自动化虚假信息检测增加一个新的层次；通过对具有说服力写作技巧的文本段落进行分类，以便为为何一篇文章可能被标记为虚假信息提供可解释的推理。为了实现这一目标，我们提出了一个包含许多常见说服性写作策略的新颖注释方案，以及一个包含相应人类注释的数据集。对于这项任务，我们利用RoBERTa模型进行文本分类，因为它在自然语言处理领域具有高性能。我们开发了几种基于语言模型的基线模型，并呈现了我们对说服策略标签预测的结果，以及这些中间标签在检测虚假信息和产生可解释结果方面的改进。

更新时间: 2024-04-01 04:08:28

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2211.05985v3

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. At the core of their empirical successes is the learned feature representation, which embeds rich observations, e.g., images and texts, into the latent space that encodes semantic structures. Meanwhile, the evolution of such a feature representation is crucial to the convergence of temporal-difference and Q-learning. In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise. We aim to answer the following questions: When the function approximator is a neural network, how does the associated feature representation evolve? If it converges, does it converge to the optimal one? We prove that, utilizing an overparameterized two-layer neural network, temporal-difference and Q-learning globally minimize the mean-squared projected Bellman error at a sublinear rate. Moreover, the associated feature representation converges to the optimal one, generalizing the previous analysis of Cai et al. (2019) in the neural tangent kernel regime, where the associated feature representation stabilizes at the initial one. The key to our analysis is a mean-field perspective, which connects the evolution of a finite-dimensional parameter to its limiting counterpart over an infinite-dimensional Wasserstein space. Our analysis generalizes to soft Q-learning, which is further connected to policy gradient.

Updated: 2024-04-01 04:03:28

标题: 时间差异和Q学习能学习表示吗？一种均场理论

摘要: 时间差异和Q学习在深度强化学习中发挥关键作用，它们通过神经网络等表达力强大的非线性函数逼近器得以实现。它们的经验成功的核心是学习到的特征表示，将丰富的观察结果（如图像和文本）嵌入到编码语义结构的潜在空间中。同时，这种特征表示的演变对于时间差异和Q学习的收敛至关重要。具体来说，当函数逼近器在特征表示中是线性的且在整个学习过程中保持不变时，时间差异学习才会收敛，否则可能会发散。我们的目标是回答以下问题：当函数逼近器是神经网络时，相关的特征表示如何演化？如果它收敛，它是否会收敛到最优解？我们证明，利用过参数化的两层神经网络，时间差异和Q学习以次线性速率全局最小化均方投影贝尔曼误差。此外，相关的特征表示会收敛到最优解，推广了Cai等人（2019年）在神经切线核领域的先前分析，其中相关的特征表示会稳定在初始状态。我们分析的关键在于一种均场透视，它将有限维参数的演变与其在无限维Wasserstein空间中的极限对应起来。我们的分析推广到软Q学习，进一步与策略梯度相连接。

更新时间: 2024-04-01 04:03:28

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2006.04761v2

Towards Universal Fake Image Detectors that Generalize Across Generative Models

With generative models proliferating at a rapid rate, there is a growing need for general purpose fake image detectors. In this work, we first show that the existing paradigm, which consists of training a deep network for real-vs-fake classification, fails to detect fake images from newer breeds of generative models when trained to detect GAN fake images. Upon analysis, we find that the resulting classifier is asymmetrically tuned to detect patterns that make an image fake. The real class becomes a sink class holding anything that is not fake, including generated images from models not accessible during training. Building upon this discovery, we propose to perform real-vs-fake classification without learning; i.e., using a feature space not explicitly trained to distinguish real from fake images. We use nearest neighbor and linear probing as instantiations of this idea. When given access to the feature space of a large pretrained vision-language model, the very simple baseline of nearest neighbor classification has surprisingly good generalization ability in detecting fake images from a wide variety of generative models; e.g., it improves upon the SoTA by +15.07 mAP and +25.90% acc when tested on unseen diffusion and autoregressive models.

Updated: 2024-04-01 04:00:31

标题: 通向能够跨生成模型泛化的通用虚假图像检测器

摘要: 随着生成模型迅速增多，对于通用的假图像检测器的需求日益增长。在这项工作中，我们首先展示了现有的范式，即训练一个深度网络进行真假分类，当训练用于检测 GAN 假图像时，无法检测出最新一代生成模型产生的假图像。经过分析，我们发现所得到的分类器被不对称地调整以检测使图像变假的模式。真实类别成为一个容纳任何不是假的东西的汇集类别，包括在训练期间无法访问的模型生成的图像。基于这一发现，我们提出在不学习的情况下进行真假分类；即使用一个未经明确训练以区分真假图像的特征空间。我们使用最近邻和线性探测作为这一思想的实例。当给予一个大型预训练的视觉语言模型的特征空间时，最简单的最近邻分类基线在检测各种生成模型产生的假图像时具有出人意料的好的泛化能力；例如，在对未见过的扩散和自回归模型进行测试时，它的 mAP 提高了 +15.07，准确率提高了 +25.90%。

更新时间: 2024-04-01 04:00:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2302.10174v2

Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization

We consider the optimization problem of minimizing a functional defined over a family of probability distributions, where the objective functional is assumed to possess a variational form. Such a distributional optimization problem arises widely in machine learning and statistics, with Monte-Carlo sampling, variational inference, policy optimization, and generative adversarial network as examples. For this problem, we propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent over the manifold of probability distributions via iteratively pushing a set of particles. Specifically, we prove that moving along the geodesic in the direction of functional gradient with respect to the second-order Wasserstein distance is equivalent to applying a pushforward mapping to a probability distribution, which can be approximated accurately by pushing a set of particles. Specifically, in each iteration of variational transport, we first solve the variational problem associated with the objective functional using the particles, whose solution yields the Wasserstein gradient direction. Then we update the current distribution by pushing each particle along the direction specified by such a solution. By characterizing both the statistical error incurred in estimating the Wasserstein gradient and the progress of the optimization algorithm, we prove that when the objective function satisfies a functional version of the Polyak-\L{}ojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly to the global minimum of the objective functional up to a certain statistical error, which decays to zero sublinearly as the number of particles goes to infinity.

Updated: 2024-04-01 03:56:23

标题: 变分传输：一种用于分布优化的收敛基于粒子的算法

摘要: 我们考虑了在一组概率分布上定义的最小化函数的优化问题，其中假定目标函数具有变分形式。这样的分布优化问题在机器学习和统计学中广泛出现，例如蒙特卡罗采样、变分推理、策略优化和生成对抗网络。针对这个问题，我们提出了一种新颖的基于粒子的算法，称为变分传输，通过迭代地推动一组粒子在概率分布流形上近似执行Wasserstein梯度下降。具体地，我们证明沿着函数梯度方向沿着第二阶Wasserstein距离的测地线移动等价于对概率分布进行推送映射，这可以通过推动一组粒子来准确近似。在变分传输的每次迭代中，我们首先使用粒子解决与目标函数相关的变分问题，其解决方案产生Wasserstein梯度方向。然后，通过将每个粒子沿着这样的解决方向推动来更新当前分布。通过对估计Wasserstein梯度造成的统计误差和优化算法的进展进行刻画，我们证明当目标函数满足Polyak-\L{}ojasiewicz (PL)（Polyak，1963）和平滑条件的函数版本时，变分传输线性收敛到目标函数的全局最小值，直到某个统计误差为零下降到无穷的粒子数量。

更新时间: 2024-04-01 03:56:23

领域: cs.LG,math.OC,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2012.11554v2

CAAP: Class-Dependent Automatic Data Augmentation Based On Adaptive Policies For Time Series

Data Augmentation is a common technique used to enhance the performance of deep learning models by expanding the training dataset. Automatic Data Augmentation (ADA) methods are getting popular because of their capacity to generate policies for various datasets. However, existing ADA methods primarily focused on overall performance improvement, neglecting the problem of class-dependent bias that leads to performance reduction in specific classes. This bias poses significant challenges when deploying models in real-world applications. Furthermore, ADA for time series remains an underexplored domain, highlighting the need for advancements in this field. In particular, applying ADA techniques to vital signals like an electrocardiogram (ECG) is a compelling example due to its potential in medical domains such as heart disease diagnostics. We propose a novel deep learning-based approach called Class-dependent Automatic Adaptive Policies (CAAP) framework to overcome the notable class-dependent bias problem while maintaining the overall improvement in time-series data augmentation. Specifically, we utilize the policy network to generate effective sample-wise policies with balanced difficulty through class and feature information extraction. Second, we design the augmentation probability regulation method to minimize class-dependent bias. Third, we introduce the information region concepts into the ADA framework to preserve essential regions in the sample. Through a series of experiments on real-world ECG datasets, we demonstrate that CAAP outperforms representative methods in achieving lower class-dependent bias combined with superior overall performance. These results highlight the reliability of CAAP as a promising ADA method for time series modeling that fits for the demands of real-world applications.

Updated: 2024-04-01 03:51:38

标题: CAAP：基于自适应策略的时间序列的类别相关自动数据增强

摘要: 数据增强是一种常用的技术，通过扩展训练数据集来增强深度学习模型的性能。自动数据增强（ADA）方法因其能够为各种数据集生成策略而变得流行。然而，现有的ADA方法主要集中在整体性能的提升上，忽视了导致特定类别性能下降的类别相关偏差问题。这种偏差在实际应用中部署模型时会带来重要挑战。此外，时间序列的ADA仍然是一个尚未深入研究的领域，突显了这一领域的进展需求。特别是，将ADA技术应用于像心电图（ECG）这样的重要信号是一个引人注目的例子，因为它在心脏病诊断等医学领域具有潜力。我们提出了一个新颖的基于深度学习的方法，称为Class-dependent Automatic Adaptive Policies（CAAP）框架，旨在克服明显的类别相关偏差问题，同时保持时间序列数据增强的整体改进。具体来说，我们利用策略网络通过类别和特征信息提取生成有效的样本级策略，以平衡难度。其次，我们设计了增强概率调节方法来最小化类别相关偏差。第三，我们将信息区域的概念引入ADA框架中，以保留样本中的关键区域。通过一系列在真实ECG数据集上的实验，我们证明CAAP在实现更低的类别相关偏差和更优异的整体性能方面优于代表性方法。这些结果突显了CAAP作为一种有前途的ADA方法，适用于时间序列建模，符合实际应用需求的可靠性。

更新时间: 2024-04-01 03:51:38

领域: cs.LG

下载: http://arxiv.org/abs/2404.00898v1

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

With the attention mechanism, transformers achieve significant empirical successes. Despite the intuitive understanding that transformers perform relational inference over long sequences to produce desirable representations, we lack a rigorous theory on how the attention mechanism achieves it. In particular, several intriguing questions remain open: (a) What makes a desirable representation? (b) How does the attention mechanism infer the desirable representation within the forward pass? (c) How does a pretraining procedure learn to infer the desirable representation through the backward pass? We observe that, as is the case in BERT and ViT, input tokens are often exchangeable since they already include positional encodings. The notion of exchangeability induces a latent variable model that is invariant to input sizes, which enables our theoretical analysis. - To answer (a) on representation, we establish the existence of a sufficient and minimal representation of input tokens. In particular, such a representation instantiates the posterior distribution of the latent variable given input tokens, which plays a central role in predicting output labels and solving downstream tasks. - To answer (b) on inference, we prove that attention with the desired parameter infers the latent posterior up to an approximation error, which is decreasing in input sizes. In detail, we quantify how attention approximates the conditional mean of the value given the key, which characterizes how it performs relational inference over long sequences. - To answer (c) on learning, we prove that both supervised and self-supervised objectives allow empirical risk minimization to learn the desired parameter up to a generalization error, which is independent of input sizes. Particularly, in the self-supervised setting, we identify a condition number that is pivotal to solving downstream tasks.

Updated: 2024-04-01 03:51:06

标题: 通过可换性和潜变量模型的视角对注意力进行分析

摘要: 通过注意力机制，transformers取得了显著的实证成功。尽管我们直觉地理解transformers是通过对长序列进行关系推断来产生理想的表示，但我们缺乏关于注意力机制如何实现这一点的严格理论。特别是，仍有几个引人注目的问题尚未解决：(a)什么样的表示是理想的？(b)注意力机制如何在前向传递中推断出理想的表示？(c)预训练过程如何通过反向传递学习推断出理想的表示？我们观察到，在BERT和ViT中的情况也是如此，输入标记通常是可交换的，因为它们已经包括位置编码。交换性的概念引入了一个对输入大小不变的潜变量模型，这使得我们的理论分析成为可能。 - 为了回答关于表示的问题(a)，我们建立了输入标记的充分且最小表示的存在性。特别是，这种表示实例化了给定输入标记的潜变量的后验分布，后者在预测输出标签和解决下游任务中扮演着核心角色。 - 为了回答推断的问题(b)，我们证明了具有所需参数的注意力机制可以推断出潜变量的后验概率，误差逐渐减小，这是与输入大小相关的。具体来说，我们量化了注意力机制如何逼近给定键的值的条件均值，这表征了其如何在长序列上执行关系推断。 - 为了回答学习的问题(c)，我们证明了有监督和自监督目标都允许经验风险最小化学习所需参数，直到一个与输入大小无关的泛化误差。特别是，在自监督设置中，我们确定了一个对解决下游任务至关重要的条件数。

更新时间: 2024-04-01 03:51:06

领域: cs.LG

下载: http://arxiv.org/abs/2212.14852v3

Machine Learning Robustness: A Primer

This chapter explores the foundational concept of robustness in Machine Learning (ML) and its integral role in establishing trustworthiness in Artificial Intelligence (AI) systems. The discussion begins with a detailed definition of robustness, portraying it as the ability of ML models to maintain stable performance across varied and unexpected environmental conditions. ML robustness is dissected through several lenses: its complementarity with generalizability; its status as a requirement for trustworthy AI; its adversarial vs non-adversarial aspects; its quantitative metrics; and its indicators such as reproducibility and explainability. The chapter delves into the factors that impede robustness, such as data bias, model complexity, and the pitfalls of underspecified ML pipelines. It surveys key techniques for robustness assessment from a broad perspective, including adversarial attacks, encompassing both digital and physical realms. It covers non-adversarial data shifts and nuances of Deep Learning (DL) software testing methodologies. The discussion progresses to explore amelioration strategies for bolstering robustness, starting with data-centric approaches like debiasing and augmentation. Further examination includes a variety of model-centric methods such as transfer learning, adversarial training, and randomized smoothing. Lastly, post-training methods are discussed, including ensemble techniques, pruning, and model repairs, emerging as cost-effective strategies to make models more resilient against the unpredictable. This chapter underscores the ongoing challenges and limitations in estimating and achieving ML robustness by existing approaches. It offers insights and directions for future research on this crucial concept, as a prerequisite for trustworthy AI systems.

Updated: 2024-04-01 03:49:42

标题: 机器学习的稳健性：入门指南

摘要: 这一章探讨了机器学习（ML）中鲁棒性的基础概念及其在建立人工智能（AI）系统可信度中的关键作用。讨论从详细定义鲁棒性开始，将其描绘为ML模型在各种多变和意外环境条件下保持稳定性能的能力。ML鲁棒性通过多个角度进行解剖：与泛化性的互补性；作为可信AI的要求；其对抗性与非对抗性方面；其定量指标；以及可重现性和可解释性等指标。该章深入探讨了妨碍鲁棒性的因素，如数据偏差、模型复杂性和未明确规定的ML流程的缺陷。从广泛的角度调查了鲁棒性评估的关键技术，包括对抗性攻击，涵盖数字和物理领域。涵盖了非对抗性数据转移和深度学习（DL）软件测试方法的微妙之处。讨论逐渐深入探讨了增强鲁棒性的改善策略，从数据中心方法如去偏和增强开始。进一步的审查包括各种模型中心方法，如迁移学习、对抗性训练和随机平滑。最后，讨论了后训练方法，包括集成技术、修剪和模型修复，成为使模型更具抗干扰性的成本效益策略。该章强调了通过现有方法估计和实现ML鲁棒性的持续挑战和限制。它为未来关于这一关键概念的研究提供了见解和方向，作为可信AI系统的先决条件。

更新时间: 2024-04-01 03:49:42

领域: cs.LG,cs.AI,cs.SE

下载: http://arxiv.org/abs/2404.00897v1

A Comprehensive Review of Community Detection in Graphs

The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a thorough exposition of various community detection methods from perspectives of modularity-based method, spectral clustering, probabilistic modelling, and deep learning. Along with the methods, a new community detection method designed by us is also presented. Additionally, the performance of these methods on the datasets with and without ground truth is compared. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs.

Updated: 2024-04-01 03:47:40

标题: 图中社区检测的全面评论

摘要: 复杂网络的研究显著推进了我们对社区结构的理解，社区结构是现实世界图表的一个关键特征。在图表中检测社区是一个具有挑战性的问题，涉及到社会学、生物学和计算机科学等领域的应用。尽管跨学科科学家社区已经做出了努力，但仍未找到令人满意的解决方案。本综述文章深入探讨了图表中社区检测的主题，全面介绍了基于模块性的方法、谱聚类、概率建模和深度学习等多种社区检测方法。除了这些方法，我们还提出了一种新的社区检测方法。此外，还比较了这些方法在具有和不具有真实标签的数据集上的表现。总之，这篇全面的综述提供了对图表中社区检测的深入理解。

更新时间: 2024-04-01 03:47:40

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2309.11798v4

Mirage: Model-Agnostic Graph Distillation for Graph Classification

GNNs, like other deep learning models, are data and computation hungry. There is a pressing need to scale training of GNNs on large datasets to enable their usage on low-resource environments. Graph distillation is an effort in that direction with the aim to construct a smaller synthetic training set from the original training data without significantly compromising model performance. While initial efforts are promising, this work is motivated by two key observations: (1) Existing graph distillation algorithms themselves rely on training with the full dataset, which undermines the very premise of graph distillation. (2) The distillation process is specific to the target GNN architecture and hyper-parameters and thus not robust to changes in the modeling pipeline. We circumvent these limitations by designing a distillation algorithm called Mirage for graph classification. Mirage is built on the insight that a message-passing GNN decomposes the input graph into a multiset of computation trees. Furthermore, the frequency distribution of computation trees is often skewed in nature, enabling us to condense this data into a concise distilled summary. By compressing the computation data itself, as opposed to emulating gradient flows on the original training set-a prevalent approach to date-Mirage transforms into an unsupervised and architecture-agnostic distillation algorithm. Extensive benchmarking on real-world datasets underscores Mirage's superiority, showcasing enhanced generalization accuracy, data compression, and distillation efficiency when compared to state-of-the-art baselines.

Updated: 2024-04-01 03:43:22

标题: Mirage：用于图分类的模型无关图蒸馏

摘要: GNNs，像其他深度学习模型一样，对数据和计算需求很高。迫切需要对大型数据集上的GNNs进行训练以在低资源环境中使用。图蒸馏是朝着这个方向努力的一种方式，旨在从原始训练数据构建一个较小的合成训练集，而不会显著影响模型性能。虽然最初的努力是有希望的，但这项工作是基于两个关键观察而激发的：（1）现有的图蒸馏算法本身依赖于使用完整数据集进行训练，这破坏了图蒸馏的基本前提。（2）蒸馏过程针对目标GNN架构和超参数，因此不够稳健以适应建模管道的变化。我们通过设计一种名为Mirage的图分类蒸馏算法来绕过这些限制。Mirage建立在这样一个洞察上：一个消息传递的GNN将输入图分解为一个计算树的多重集合。此外，计算树的频率分布通常是倾斜的，使我们能够将这些数据压缩为简洁的蒸馏摘要。通过压缩计算数据本身，而不是模拟原始训练集上的梯度流-这是迄今为止一种流行的方法-Mirage转化为一种无监督和架构无关的蒸馏算法。对真实世界数据集的广泛基准测试强调了Mirage的优越性，展示了与最先进基线相比的增强的泛化精度、数据压缩和蒸馏效率。

更新时间: 2024-04-01 03:43:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.09486v4

GraphFM: Graph Factorization Machines for Feature Interaction Modeling

Factorization machine (FM) is a prevalent approach to modeling pairwise (second-order) feature interactions when dealing with high-dimensional sparse data. However, on the one hand, FM fails to capture higher-order feature interactions suffering from combinatorial expansion. On the other hand, taking into account interactions between every pair of features may introduce noise and degrade prediction accuracy. To solve the problems, we propose a novel approach, Graph Factorization Machine (GraphFM), by naturally representing features in the graph structure. In particular, we design a mechanism to select the beneficial feature interactions and formulate them as edges between features. Then the proposed model, which integrates the interaction function of FM into the feature aggregation strategy of Graph Neural Network (GNN), can model arbitrary-order feature interactions on the graph-structured features by stacking layers. Experimental results on several real-world datasets have demonstrated the rationality and effectiveness of our proposed approach. The code and data are available at \href{https://github.com/CRIPAC-DIG/GraphCTR}{https://github.com/CRIPAC-DIG/GraphCTR}.

Updated: 2024-04-01 03:36:20

标题: GraphFM：用于特征交互建模的图因子机

摘要: 因子分解机（FM）是在处理高维稀疏数据时建模成对（二阶）特征交互的一种流行方法。然而，一方面，FM无法捕捉高阶特征交互，因为受到组合扩展的影响。另一方面，考虑每对特征之间的交互可能会引入噪音并降低预测准确性。为了解决这些问题，我们提出了一种新颖的方法，图因子分解机（GraphFM），通过自然地将特征表示为图结构来实现。具体来说，我们设计了一种机制来选择有益的特征交互，并将它们构建为特征之间的边。然后，所提出的模型将FM的交互函数整合到图神经网络（GNN）的特征聚合策略中，通过堆叠层来对图结构特征上的任意阶特征交互进行建模。在几个真实数据集上的实验结果表明了我们提出的方法的合理性和有效性。代码和数据可在\href{https://github.com/CRIPAC-DIG/GraphCTR}{https://github.com/CRIPAC-DIG/GraphCTR}上找到。

更新时间: 2024-04-01 03:36:20

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2105.11866v4

Fake Alignment: Are LLMs Really Aligned Well?

The growing awareness of safety concerns in large language models (LLMs) has sparked considerable interest in the evaluation of safety. This study investigates an under-explored issue about the evaluation of LLMs, namely the substantial discrepancy in performance between multiple-choice questions and open-ended questions. Inspired by research on jailbreak attack patterns, we argue this is caused by mismatched generalization. That is, LLM only remembers the answer style for open-ended safety questions, which makes it unable to solve other forms of safety tests. We refer to this phenomenon as fake alignment and construct a comparative benchmark to empirically verify its existence in LLMs. We introduce a Fake alIgNment Evaluation (FINE) framework and two novel metrics--Consistency Score (CS) and Consistent Safety Score (CSS), which jointly assess two complementary forms of evaluation to quantify fake alignment and obtain corrected performance estimation. Applying FINE to 14 widely-used LLMs reveals several models with purported safety are poorly aligned in practice. Subsequently, we found that multiple-choice format data can also be used as high-quality contrast distillation-based fine-tuning data, which can strongly improve the alignment consistency of LLMs with minimal fine-tuning overhead. For data and code, see https://github.com/AIFlames/Fake-Alignment.

Updated: 2024-04-01 03:32:14

标题: 虚假对齐：LLMs真的对齐得很好吗？

摘要: 大型语言模型（LLMs）安全问题的日益引起关注，引发了对安全评估的广泛兴趣。本研究探讨了LLMs评估中一个未充分探讨的问题，即多项选择题和开放性问题之间性能差异的显著差异。受到关于越狱攻击模式研究的启发，我们认为这是由于通用化不匹配所导致的。也就是说，LLM只记住了用于开放性安全问题的答案样式，这使其无法解决其他形式的安全测试。我们将这一现象称为虚假对齐，并构建了一个比较基准来经验验证LLMs中其存在。我们引入了一个虚假对齐评估（FINE）框架和两个新颖的指标——一致性得分（CS）和一致性安全得分（CSS），共同评估两种互补的评估形式，以量化虚假对齐并获得更正的性能估计。将FINE应用于14个广泛使用的LLMs揭示了几个声称安全性的模型在实践中对齐较差。随后，我们发现多项选择格式数据也可以用作高质量对比提炼的微调数据，可以在最小微调开销下显着提高LLMs的对齐一致性。有关数据和代码，请查看https://github.com/AIFlames/Fake-Alignment。

更新时间: 2024-04-01 03:32:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.05915v3

MTLight: Efficient Multi-Task Reinforcement Learning for Traffic Signal Control

Traffic signal control has a great impact on alleviating traffic congestion in modern cities. Deep reinforcement learning (RL) has been widely used for this task in recent years, demonstrating promising performance but also facing many challenges such as limited performances and sample inefficiency. To handle these challenges, MTLight is proposed to enhance the agent observation with a latent state, which is learned from numerous traffic indicators. Meanwhile, multiple auxiliary and supervisory tasks are constructed to learn the latent state, and two types of embedding latent features, the task-specific feature and task-shared feature, are used to make the latent state more abundant. Extensive experiments conducted on CityFlow demonstrate that MTLight has leading convergence speed and asymptotic performance. We further simulate under peak-hour pattern in all scenarios with increasing control difficulty and the results indicate that MTLight is highly adaptable.

Updated: 2024-04-01 03:27:46

标题: MTLight: 高效的交通信号控制多任务强化学习

摘要: 交通信号控制对缓解现代城市交通拥堵有着巨大影响。近年来，深度强化学习（RL）被广泛用于这一任务，展现出有前途的性能，但也面临许多挑战，如性能受限和样本效率低等。为了解决这些挑战，提出了MTLight，用于增强代理观察结果，其中潜在状态是从众多交通指标中学习的。同时，构建了多个辅助和监督任务来学习潜在状态，并使用两种嵌入潜在特征，即任务特定特征和任务共享特征，使潜在状态更加丰富。在CityFlow上进行的大量实验表明，MTLight具有领先的收敛速度和渐近性能。我们进一步在所有场景下模拟高峰小时模式，增加控制难度，结果表明MTLight具有很高的适应性。

更新时间: 2024-04-01 03:27:46

领域: cs.AI

下载: http://arxiv.org/abs/2404.00886v1

Modeling Output-Level Task Relatedness in Multi-Task Learning with Feedback Mechanism

Multi-task learning (MTL) is a paradigm that simultaneously learns multiple tasks by sharing information at different levels, enhancing the performance of each individual task. While previous research has primarily focused on feature-level or parameter-level task relatedness, and proposed various model architectures and learning algorithms to improve learning performance, we aim to explore output-level task relatedness. This approach introduces a posteriori information into the model, considering that different tasks may produce correlated outputs with mutual influences. We achieve this by incorporating a feedback mechanism into MTL models, where the output of one task serves as a hidden feature for another task, thereby transforming a static MTL model into a dynamic one. To ensure the training process converges, we introduce a convergence loss that measures the trend of a task's outputs during each iteration. Additionally, we propose a Gumbel gating mechanism to determine the optimal projection of feedback signals. We validate the effectiveness of our method and evaluate its performance through experiments conducted on several baseline models in spoken language understanding.

Updated: 2024-04-01 03:27:34

标题: 使用反馈机制建模多任务学习中的输出级任务关联性

摘要: 多任务学习（MTL）是一种同时学习多个任务的范式，通过在不同层次共享信息来增强每个个体任务的性能。尽管先前的研究主要集中在特征级或参数级任务相关性上，并提出各种模型架构和学习算法来改善学习性能，但我们的目标是探索输出级任务相关性。这种方法将事后信息引入模型，考虑到不同任务可能产生相互影响的相关输出。我们通过将反馈机制纳入MTL模型来实现这一目标，其中一个任务的输出作为另一个任务的隐藏特征，从而将静态MTL模型转化为动态模型。为了确保训练过程收敛，我们引入了一个收敛损失，用于衡量每次迭代期间任务输出的趋势。此外，我们提出了一个Gumbel门控机制来确定反馈信号的最佳投影。我们通过在口语理解中对几个基准模型进行实验来验证我们方法的有效性并评估其性能。

更新时间: 2024-04-01 03:27:34

领域: cs.LG

下载: http://arxiv.org/abs/2404.00885v1

SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

We present SeaEval, a benchmark for multilingual foundation models. In addition to characterizing how these models understand and reason with natural language, we also investigate how well they comprehend cultural practices, nuances, and values. Alongside standard accuracy metrics, we investigate the brittleness of foundation models in the dimensions of semantics and multilinguality. Our analyses span both open-sourced and closed models, leading to empirical results across classic NLP tasks, reasoning, and cultural comprehension. Key findings indicate (1) Most models exhibit varied behavior when given paraphrased instructions. (2) Many models still suffer from exposure bias (e.g., positional bias, majority label bias). (3) For questions rooted in factual, scientific, and commonsense knowledge, consistent responses are expected across multilingual queries that are semantically equivalent. Yet, most models surprisingly demonstrate inconsistent performance on these queries. (4) Multilingually-trained models have not attained "balanced multilingual" capabilities. Our endeavors underscore the need for more generalizable semantic representations and enhanced multilingual contextualization. SeaEval can serve as a launchpad for more thorough investigations and evaluations for multilingual and multicultural scenarios.

Updated: 2024-04-01 03:26:26

标题: SeaEval多语言基础模型：从跨语言对齐到文化推理

摘要: 我们提出了SeaEval，这是一个用于多语言基础模型的基准测试。除了表征这些模型如何理解和推理自然语言之外，我们还调查它们对文化实践、细微差别和价值观的理解程度。除了标准准确度指标外，我们还研究了基础模型在语义和多语言性维度上的脆弱性。我们的分析涵盖了开源和封闭模型，从而得出了经验结果，涵盖了经典的NLP任务、推理和文化理解。关键发现包括：（1）大多数模型在给出释义指令时表现出不同行为。（2）许多模型仍然受到曝光偏差的影响（例如，位置偏差、多数标签偏差）。（3）对于根植于事实、科学和常识知识的问题，我们期望在语义上等效的多语言查询中获得一致的响应。然而，大多数模型在这些查询上表现出令人意外的不一致性。（4）接受多语言训练的模型尚未达到“平衡多语言”能力。我们的努力强调了对更具一般化语义表征和增强多语言情境化的需求。SeaEval可以作为更深入调查和评估多语言和多文化场景的发射台。

更新时间: 2024-04-01 03:26:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.04766v4

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models

Large language models (LLMs) have shown promising abilities of in-context learning (ICL), adapting swiftly to new tasks with only few-shot demonstrations. However, current few-shot methods heavily depend on high-quality, query-specific demos, which are often lacking. When faced with out-of-demonstration (OOD) queries, methods that rely on hand-crafted demos or external retrievers might fail. To bridge the gap between limited demos and OOD queries, we propose Self-Demos, a novel prompting method that elicits the inherent generalizability in LLMs by query-aware demo generation. The generated demos strategically interpolate between existing demos and the given query, transforming the query from OOD to ID. To evaluate the effectiveness of our approach, we manually constructed OOD-Toolset, a dataset in the tool-using scenario with over 300 real-world APIs and 1000 instances, each consisting of three tool-use cases as demos and an OOD query. Thorough experiments on our dataset and two public math benchmarks have shown that our method can outperform state-of-the-art baselines in the OOD setting. Moreover, we conduct a range of analyses to validate Self-Demos's generalization and provide more insights.

Updated: 2024-04-01 03:25:06

标题: 自我演示：在大型语言模型中引出演示外泛化

摘要: 大型语言模型(LLMs)显示出有前景的上下文学习能力(ICL)，仅凭少量示范就能迅速适应新任务。然而，目前的少样本方法严重依赖于高质量、特定于查询的演示，这种演示通常缺乏。当面临超出演示(OOD)的查询时，依赖手工制作的演示或外部检索器的方法可能会失败。为了弥合有限演示与OOD查询之间的差距，我们提出了自我演示(Self-Demos)，这是一种新颖的提示方法，通过查询感知演示生成来引出LLMs中固有的泛化能力。生成的演示在现有演示和给定查询之间进行战略插值，将查询从OOD转换为ID。为了评估我们方法的有效性，我们手动构建了OOD-Toolset数据集，在使用工具的场景中包含300多个真实世界API和1000个实例，每个实例包含三个工具使用案例作为演示和一个OOD查询。在我们的数据集和两个公共数学基准测试上进行了彻底的实验，结果显示我们的方法在OOD设置中可以胜过最先进的基线。此外，我们进行了一系列分析来验证Self-Demos的泛化性并提供更多见解。

更新时间: 2024-04-01 03:25:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.00884v1

Interpretable Multi-View Clustering Based on Anchor Graph Tensor Factorization

The clustering method based on the anchor graph has gained significant attention due to its exceptional clustering performance and ability to process large-scale data. One common approach is to learn bipartite graphs with K-connected components, helping avoid the need for post-processing. However, this method has strict parameter requirements and may not always get K-connected components. To address this issue, an alternative approach is to directly obtain the cluster label matrix by performing non-negative matrix factorization (NMF) on the anchor graph. Nevertheless, existing multi-view clustering methods based on anchor graph factorization lack adequate cluster interpretability for the decomposed matrix and often overlook the inter-view information. We address this limitation by using non-negative tensor factorization to decompose an anchor graph tensor that combines anchor graphs from multiple views. This approach allows us to consider inter-view information comprehensively. The decomposed tensors, namely the sample indicator tensor and the anchor indicator tensor, enhance the interpretability of the factorization. Extensive experiments validate the effectiveness of this method.

Updated: 2024-04-01 03:23:55

标题: 基于锚图张量分解的可解释多视角聚类

摘要: 基于锚图的聚类方法因其出色的聚类性能和处理大规模数据的能力而受到重视。一种常见的方法是学习具有K个连通分量的二部图，有助于避免后处理的必要性。然而，这种方法对参数要求严格，可能并非总能获得K个连通分量。为解决这一问题，另一种方法是通过在锚图上执行非负矩阵分解（NMF）直接获得聚类标签矩阵。然而，现有基于锚图分解的多视图聚类方法缺乏对分解矩阵的充分解释性，并常常忽略视图间的信息。我们通过使用非负张量分解来对结合了多个视图的锚图张量进行分解来解决这一限制。这种方法使我们能够全面考虑视图间的信息。分解后的张量，即样本指示张量和锚点指示张量，增强了分解的解释性。大量实验证实了这种方法的有效性。

更新时间: 2024-04-01 03:23:55

领域: cs.LG

下载: http://arxiv.org/abs/2404.00883v1

Metric Learning to Accelerate Convergence of Operator Splitting Methods for Differentiable Parametric Programming

Recent work has shown a variety of ways in which machine learning can be used to accelerate the solution of constrained optimization problems. Increasing demand for real-time decision-making capabilities in applications such as artificial intelligence and optimal control has led to a variety of approaches, based on distinct strategies. This work proposes a novel approach to learning optimization, in which the underlying metric space of a proximal operator splitting algorithm is learned so as to maximize its convergence rate. While prior works in optimization theory have derived optimal metrics for limited classes of problems, the results do not extend to many practical problem forms including general Quadratic Programming (QP). This paper shows how differentiable optimization can enable the end-to-end learning of proximal metrics, enhancing the convergence of proximal algorithms for QP problems beyond what is possible based on known theory. Additionally, the results illustrate a strong connection between the learned proximal metrics and active constraints at the optima, leading to an interpretation in which the learning of proximal metrics can be viewed as a form of active set learning.

Updated: 2024-04-01 03:23:43

标题: 度量学习加速可微参数规划的运算分裂方法收敛

摘要: 最近的研究表明，机器学习可以以各种方式加速受限优化问题的解决。在人工智能和最优控制等应用中对实时决策能力的需求不断增加，这导致了基于不同策略的各种方法。本文提出了一种新颖的学习优化方法，其中学习了一个近端算子分裂算法的基本度量空间，以最大化其收敛速度。尽管之前的优化理论作品已经针对有限类问题推导出了最优度量，但这些结果并不适用于许多实际问题形式，包括一般的二次规划（QP）。本文展示了可微优化如何实现端到端学习近端度量，提高了二次规划问题的近端算法的收敛速度，超越了已知理论的可能性。此外，结果表明，学习的近端度量与最优解的活动约束之间存在强烈的联系，导致了一种解释，即近端度量的学习可以被视为一种主动集学习形式。

更新时间: 2024-04-01 03:23:43

领域: cs.LG

下载: http://arxiv.org/abs/2404.00882v1

Rethinking the Relationship between Recurrent and Non-Recurrent Neural Networks: A Study in Sparsity

Neural networks (NN) can be divided into two broad categories, recurrent and non-recurrent. Both types of neural networks are popular and extensively studied, but they are often treated as distinct families of machine learning algorithms. In this position paper, we argue that there is a closer relationship between these two types of neural networks than is normally appreciated. We show that many common neural network models, such as Recurrent Neural Networks (RNN), Multi-Layer Perceptrons (MLP), and even deep multi-layer transformers, can all be represented as iterative maps. The close relationship between RNNs and other types of NNs should not be surprising. In particular, RNNs are known to be Turing complete, and therefore capable of representing any computable function (such as any other types of NNs), but herein we argue that the relationship runs deeper and is more practical than this. For example, RNNs are often thought to be more difficult to train than other types of NNs, with RNNs being plagued by issues such as vanishing or exploding gradients. However, as we demonstrate in this paper, MLPs, RNNs, and many other NNs lie on a continuum, and this perspective leads to several insights that illuminate both theoretical and practical aspects of NNs.

Updated: 2024-04-01 03:18:42

标题: 重新思考循环神经网络和非循环神经网络之间的关系：稀疏性研究

摘要: 神经网络（NN）可以分为两大类，即循环和非循环。这两种类型的神经网络都很受欢迎并得到广泛研究，但通常被视为机器学习算法的不同家族。在这篇立场论文中，我们认为这两种类型的神经网络之间存在比通常认为的更密切的关系。我们展示了许多常见的神经网络模型，如循环神经网络（RNN）、多层感知器（MLP）甚至深度多层变压器，都可以表示为迭代映射。 RNN和其他类型的NN之间的密切关系不应令人惊讶。特别是，RNN被认为是图灵完备的，因此能够表示任何可计算函数（如其他类型的NN），但我们在这里认为这种关系更深入、更实际。例如，RNN通常被认为比其他类型的NN更难训练，RNN受到梯度消失或爆炸等问题的困扰。然而，正如我们在本文中所展示的，MLP、RNN和许多其他NN都处于一个连续体上，这种视角导致了几个揭示NN的理论和实践方面的见解。

更新时间: 2024-04-01 03:18:42

领域: cs.LG

下载: http://arxiv.org/abs/2404.00880v1

Compositional Chain-of-Thought Prompting for Large Multimodal Models

The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks. However, recent research has shown that even the most advanced LMMs still struggle to capture aspects of compositional visual reasoning, such as attributes and relationships between objects. One solution is to utilize scene graphs (SGs)--a formalization of objects and their relations and attributes that has been extensively used as a bridge between the visual and textual domains. Yet, scene graph data requires scene graph annotations, which are expensive to collect and thus not easily scalable. Moreover, finetuning an LMM based on SG data can lead to catastrophic forgetting of the pretraining objective. To overcome this, inspired by chain-of-thought methods, we propose Compositional Chain-of-Thought (CCoT), a novel zero-shot Chain-of-Thought prompting method that utilizes SG representations in order to extract compositional knowledge from an LMM. Specifically, we first generate an SG using the LMM, and then use that SG in the prompt to produce a response. Through extensive experiments, we find that the proposed CCoT approach not only improves LMM performance on several vision and language VL compositional benchmarks but also improves the performance of several popular LMMs on general multimodal benchmarks, without the need for fine-tuning or annotated ground-truth SGs. Code: https://github.com/chancharikmitra/CCoT

Updated: 2024-04-01 03:17:09

标题: 大型多模态模型的构成链式思维提示

摘要: 强大的视觉骨干结构与大型语言模型（LLM）推理相结合，导致大型多模态模型（LMM）成为当前广泛视觉和语言（VL）任务的标准。然而，最近的研究表明，即使是最先进的LMM仍然难以捕捉构成性视觉推理的方面，如对象之间的属性和关系。一种解决方案是利用场景图（SGs）-一种对象及其关系和属性的形式化表示，已被广泛用作视觉和文本领域之间的桥梁。然而，场景图数据需要场景图注释，这些注释成本高昂，并且因此不易扩展。此外，基于SG数据对LMM进行微调可能导致对预训练目标的灾难性遗忘。为了克服这一问题，受链式思维方法的启发，我们提出了组合链式思维（CCoT），这是一种新颖的零样本链式思维提示方法，利用SG表示从LMM中提取构成性知识。具体而言，我们首先使用LMM生成一个SG，然后在提示中使用该SG生成一个响应。通过大量实验，我们发现所提出的CCoT方法不仅改善了LMM在多个视觉和语言VL组合基准上的性能，还提高了几种流行LMM在一般多模态基准上的性能，无需进行精细调整或注释的基准SGs。代码：https://github.com/chancharikmitra/CCoT

更新时间: 2024-04-01 03:17:09

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.17076v3

Honeybee: Locality-enhanced Projector for Multimodal LLM

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of visual tokens, crucial for MLLMs' overall efficiency, and (ii) preservation of local context from visual features, vital for spatial understanding. Based on these findings, we propose a novel projector design that is both flexible and locality-enhanced, effectively satisfying the two desirable properties. Additionally, we present comprehensive strategies to effectively utilize multiple and multifaceted instruction datasets. Through extensive experiments, we examine the impact of individual design choices. Finally, our proposed MLLM, Honeybee, remarkably outperforms previous state-of-the-art methods across various benchmarks, including MME, MMBench, SEED-Bench, and LLaVA-Bench, achieving significantly higher efficiency. Code and models are available at https://github.com/kakaobrain/honeybee.

Updated: 2024-04-01 03:00:06

标题: 蜜蜂：多模态LLM的局部增强投影仪

摘要: 在多模态大型语言模型（MLLMs）中，视觉投影仪在将预训练视觉编码器与LLMs桥接时发挥着关键作用，实现了深刻的视觉理解，同时利用了LLMs的强大能力。尽管视觉投影仪的重要性较高，但它的研究相对较少。在这项研究中，我们首先确定了两个关键的投影仪属性：（i）在管理视觉令牌数量方面灵活性很关键，对MLLMs的整体效率至关重要，以及（ii）保留来自视觉特征的局部上下文，对空间理解至关重要。基于这些发现，我们提出了一种新颖的投影仪设计，既灵活又增强了局部性，有效地满足了这两个理想属性。此外，我们提出了全面的策略，以有效利用多个和多方面的训练数据集。通过大量实验证明，我们考察了各自设计选择的影响。最后，我们提出的MLLM，Honeybee，在各种基准测试中显著优于先前的最先进方法，包括MME、MMBench、SEED-Bench和LLaVA-Bench，在效率方面实现了显著的提高。代码和模型可在https://github.com/kakaobrain/honeybee上找到。

更新时间: 2024-04-01 03:00:06

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.06742v2

Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation

In this paper, a Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE), where target networks are adopted to avoid double sampling. Inexact GN steps are analyzed so that one can safely and efficiently compute the GN updates by cheap matrix iterations. Under mild conditions, non-asymptotic finite-sample convergence to the globally optimal Q function is derived for various nonlinear function approximations. In particular, for neural network parameterization with relu activation, GNTD achieves an improved sample complexity of $\tilde{\mathcal{O}}(\varepsilon^{-1})$, as opposed to the $\mathcal{\mathcal{O}}(\varepsilon^{-2})$ sample complexity of the existing neural TD methods. An $\tilde{\mathcal{O}}(\varepsilon^{-1.5})$ sample complexity of GNTD is also established for general smooth function approximations. We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods.

Updated: 2024-04-01 02:57:46

标题: 高斯-牛顿时差学习与非线性函数逼近

摘要: 在这篇论文中，提出了一种高斯-牛顿时间差异（GNTD）学习方法，用于解决具有非线性函数逼近的Q学习问题。在每次迭代中，我们的方法采用一个高斯-牛顿（GN）步骤来优化变种的均方贝尔曼误差（MSBE），其中采用目标网络以避免双重采样。分析了不精确的GN步骤，以便可以通过廉价的矩阵迭代安全高效地计算GN更新。在温和条件下，针对各种非线性函数逼近，推导出非渐近有限样本收敛到全局最优Q函数。特别是，对于带有relu激活的神经网络参数化，GNTD实现了改进的样本复杂度$\tilde{\mathcal{O}}(\varepsilon^{-1})$，与现有神经TD方法的样本复杂度$\mathcal{\mathcal{O}}(\varepsilon^{-2})$相反。对于一般光滑函数逼近，还建立了GNTD的$\tilde{\mathcal{O}}(\varepsilon^{-1.5})$样本复杂度。我们通过在几个RL基准测试中进行广泛实验来验证我们的方法，在那里GNTD表现出比TD类型方法更高的奖励和更快的收敛速度。

更新时间: 2024-04-01 02:57:46

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2302.13087v2

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression, linear regression and Bayesian neural networks. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures shows that grokking is not restricted to settings considered in current theoretical and empirical studies. Instead, grokking may be possible in any model where solution search is guided by complexity and error.

Updated: 2024-04-01 02:54:46

标题: 超越神经网络的理解：模型复杂性的经验性探究

摘要: 在一些情境中，神经网络表现出一种被称为“理解”的现象，即它们在验证集上达到完美或接近完美的准确率，远早于在训练集上达到相同表现的时间。在本文中，我们发现“理解”不仅限于神经网络，而且在其他情境中也会出现，比如高斯过程（GP）分类、GP回归、线性回归和贝叶斯神经网络。我们还发现一种通过增加包含虚假信息的维度来诱发算法数据集中的“理解”的机制。非神经结构中出现这种现象表明，“理解”并不仅限于目前理论和实证研究中考虑的情境。相反，“理解”可能出现在任何模型中，只要解决方案搜索受到复杂性和误差的引导。

更新时间: 2024-04-01 02:54:46

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.17247v2

VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models

The role of data in building AI systems has recently been emphasized by the emerging concept of data-centric AI. Unfortunately, in the real-world, datasets may contain dirty samples, such as poisoned samples from backdoor attack, noisy labels in crowdsourcing, and even hybrids of them. The presence of such dirty samples makes the DNNs vunerable and unreliable.Hence, it is critical to detect dirty samples to improve the quality and realiability of dataset. Existing detectors only focus on detecting poisoned samples or noisy labels, that are often prone to weak generalization when dealing with dirty samples from other domains.In this paper, we find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile data cleanser (VDC) leveraging the surpassing capabilities of multimodal large language models (MLLM) in cross-modal alignment and reasoning.It consists of three consecutive modules: the visual question generation module to generate insightful questions about the image; the visual question answering module to acquire the semantics of the visual content by answering the questions with MLLM; followed by the visual answer evaluation module to evaluate the inconsistency.Extensive experiments demonstrate its superior performance and generalization to various categories and types of dirty samples. The code is available at \url{https://github.com/zihao-ai/vdc}.

Updated: 2024-04-01 02:49:49

标题: VDC：基于多模式大型语言模型的视觉-语言不一致性的多功能数据清洗器

摘要: 最近出现了数据为中心的人工智能概念，强调数据在构建AI系统中的作用。不幸的是，在现实世界中，数据集可能包含脏数据样本，例如来自后门攻击的毒样本，众包中的嘈杂标签，甚至它们的混合体。这些脏样本的存在使深度神经网络(DNNs)容易受到攻击且不可靠。因此，检测脏样本以提高数据集的质量和可靠性至关重要。现有的检测器只侧重于检测毒样本或嘈杂标签，当处理来自其他领域的脏样本时，往往容易出现弱泛化。在本文中，我们发现各种脏样本的一个共同点是图片和相关标签之间的视觉-语言不一致性。为了捕捉模态之间的语义不一致性，我们提出了利用多模态大型语言模型(MLLM)在跨模态对齐和推理方面的出色能力的多功能数据清洁器(VDC)。它由三个连续模块组成：视觉问题生成模块用于生成关于图片的有见地的问题；视觉问题回答模块通过使用MLLM回答问题来获取视觉内容的语义；接着是视觉答案评估模块用于评估不一致性。大量实验证明了其出色的性能和对各种类别和类型的脏样本的泛化能力。该代码可在\url{https://github.com/zihao-ai/vdc}上找到。

更新时间: 2024-04-01 02:49:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.16211v2

CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure, to tackle the challenge with a cost-effective multi-chip module (MCM) design. First, we devise a flexible core architecture whose configuration is adjustable to conform to the global organization of chiplets and design constraints. Its distinctive feature is a composable functional unit providing varying computational throughput for the number-theoretic transform, the most dominant function in FHE. Then, we establish generalized data mapping methodologies to minimize the interconnect overhead when organizing the chips into the MCM package in a tiled manner, which becomes a significant bottleneck due to the packaging constraints. This study demonstrates that a CiFHER package composed of a number of compact chiplets provides performance comparable to state-of-the-art monolithic ASIC accelerators while significantly reducing the package-wide power consumption and manufacturing cost.

Updated: 2024-04-01 02:45:41

标题: CiFHER：基于芯片组的具有可调整结构的全同态加速器

摘要: 全同态加密（FHE）作为隐私的终极解决方案备受关注，但FHE的高计算开销对其实际应用提出了挑战。虽然先前的研究已尝试设计ASIC加速器来减轻开销，但它们的设计需要大量芯片资源（例如面积）来容纳和处理FHE操作所需的大量数据。我们提出了CiFHER，一个基于芯片组的FHE加速器，具有可调整结构，以便通过成本效益的多芯片模块（MCM）设计解决这一挑战。首先，我们设计了一个灵活的核心架构，其配置可调整以符合芯片组的全局组织和设计约束。其独特之处在于提供为数论变换提供不同计算吞吐量的可组合功能单元，这是FHE中最主要的功能。然后，我们建立了通用的数据映射方法，以最小化在以瓦片方式组织芯片到MCM封装时的互连开销，这是由于封装约束而成为重要瓶颈。这项研究表明，由若干紧凑的芯片组成的CiFHER封装提供了与最先进的单片ASIC加速器相当的性能，同时显著降低了封装整体功耗和制造成本。

更新时间: 2024-04-01 02:45:41

领域: cs.AR,cs.CR

下载: http://arxiv.org/abs/2308.04890v3

Towards Automated Generation of Smart Grid Cyber Range for Cybersecurity Experiments and Training

Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid cyber range, has been demanded by industry players as well as academia. A smart grid cyber range is typically implemented as a combination of cyber system emulation, which allows interactivity, and physical system (i.e., power grid) simulation that are tightly coupled for consistent cyber and physical behaviours. However, its design and implementation require intensive expertise and efforts in cyber and physical aspects of smart power systems as well as software/system engineering. While many industry players, including power grid operators, device vendors, research and education sectors are interested, availability of the smart grid cyber range is limited to a small number of research labs. To address this challenge, we have developed a framework for modelling a smart grid cyber range using an XML-based language, called SG-ML, and for "compiling" the model into an operational cyber range with minimal engineering efforts. The modelling language includes standardized schema from IEC 61850 and IEC 61131, which allows industry players to utilize their existing configurations. The SG-ML framework aims at making a smart grid cyber range available to broader user bases to facilitate cybersecurity R\&D and hands-on exercises.

Updated: 2024-04-01 02:34:53

标题: 朝着智能电网网络安全实验和培训的自动化生成智能电网网络安全范围

摘要: 网络安全的保障对于确保智能电力系统的可靠性和韧性至关重要。为了评估潜在网络攻击的影响，评估网络安全措施的可部署性和有效性，并实现人员的实践和培训，行业和学术界要求一种模拟智能电网系统行为的交互式虚拟环境，即智能电网网络安全测试场。智能电网网络安全测试场通常实现为网络系统仿真和物理系统（即电力系统）仿真相结合，以保持一致的网络和物理行为。然而，其设计和实施需要在智能电力系统的网络和物理方面以及软件/系统工程方面的大量专业知识和工作。尽管许多行业参与者，包括电力系统运营商、设备供应商、研究和教育部门都感兴趣，但智能电网网络安全测试场的可用性仅限于少数研究实验室。为了解决这一挑战，我们开发了一个使用基于XML的语言SG-ML对智能电网网络安全测试场进行建模的框架，并通过最小的工程工作将模型“编译”为一个运行的网络安全测试场。建模语言包括来自IEC 61850和IEC 61131的标准化模式，使行业参与者可以利用其现有配置。SG-ML框架旨在使智能电网网络安全测试场对更广泛的用户群体可用，以促进网络安全研究和实践。

更新时间: 2024-04-01 02:34:53

领域: cs.CR

下载: http://arxiv.org/abs/2404.00869v1

Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation

Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain. Traditional convolutional neural networks (CNNs) and Transformer models have made significant advancements in this realm, but they still encounter challenges because of limited receptive field or high computing complexity. Recently, State Space Models (SSMs), particularly Mamba and its variants, have demonstrated notable performance in the field of vision. However, their feature extraction methods may not be sufficiently effective and retain some redundant structures, leaving room for parameter reduction. Motivated by previous spatial and channel attention methods, we propose Triplet Mamba-UNet. The method leverages residual VSS Blocks to extract intensive contextual features, while Triplet SSM is employed to fuse features across spatial and channel dimensions. We conducted experiments on ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and Kvasir-Instrument datasets, demonstrating the superior segmentation performance of our proposed TM-UNet. Additionally, compared to the previous VM-UNet, our model achieves a one-third reduction in parameters.

Updated: 2024-04-01 02:31:10

标题: "旋转扫描：具有三元SSM模块的UNet-like蛇形Mamba用于医学图像分割"

摘要: 图像分割在医学领域的诊断和治疗中占据着重要位置。传统的卷积神经网络（CNNs）和Transformer模型在这个领域取得了显著的进展，但它们仍然面临挑战，因为受限的感受野或高计算复杂性。最近，状态空间模型（SSMs），特别是Mamba及其变种，在视觉领域表现出了显著的性能。然而，它们的特征提取方法可能不够有效，保留了一些冗余结构，为参数减少留下了空间。受先前的空间和通道注意方法的启发，我们提出了Triplet Mamba-UNet。该方法利用残差VSS块提取强化的上下文特征，同时使用三元SSM来融合跨空间和通道维度的特征。我们在ISIC17、ISIC18、CVC-300、CVC-ClinicDB、Kvasir-SEG、CVC-ColonDB和Kvasir-Instrument数据集上进行了实验，展示了我们提出的TM-UNet的优越分割性能。此外，与先前的VM-UNet相比，我们的模型在参数上实现了三分之一的减少。

更新时间: 2024-04-01 02:31:10

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.17701v2

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

This paper introduces SpecInfer, a system that accelerates generative large language model (LLM) serving with tree-based speculative inference and verification. The key idea behind SpecInfer is leveraging small speculative models to predict the LLM's outputs; the predictions are organized as a token tree, whose nodes each represent a candidate token sequence. The correctness of all candidate token sequences represented by a token tree is verified against the LLM in parallel using a novel tree-based parallel decoding mechanism. SpecInfer uses an LLM as a token tree verifier instead of an incremental decoder, which significantly reduces the end-to-end latency and computational requirement for serving generative LLMs while provably preserving model quality. Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1.5-2.8x for distributed LLM inference and by 2.6-3.5x for offloading-based LLM inference, while preserving the same generative performance. SpecInfer is publicly available at https://github.com/flexflow/FlexFlow/

Updated: 2024-04-01 02:18:42

标题: SpecInfer: 使用基于树的猜测推理和验证加速生成式大型语言模型的服务

摘要: 本论文介绍了SpecInfer，这是一个利用基于树的推测推理和验证来加速生成式大型语言模型（LLM）服务的系统。SpecInfer的关键思想是利用小型的推测模型来预测LLM的输出；这些预测被组织成一个标记树，其中每个节点代表一个候选标记序列。通过一种新颖的基于树的并行解码机制，对由标记树表示的所有候选标记序列的正确性与LLM进行并行验证。SpecInfer使用LLM作为标记树验证器，而不是增量解码器，这显著减少了为生成式LLMs提供服务时的端到端延迟和计算需求，同时可以证明保留模型质量。我们的评估显示，SpecInfer在分布式LLM推理方面的性能优于现有的LLM服务系统1.5-2.8倍，并且在基于卸载的LLM推理方面优于2.6-3.5倍，同时保持相同的生成性能。SpecInfer可以在https://github.com/flexflow/FlexFlow/上公开获取。

更新时间: 2024-04-01 02:18:42

领域: cs.CL,cs.DC,cs.LG

下载: http://arxiv.org/abs/2305.09781v4

Uncertainty Quantification for Molecular Property Predictions with Graph Neural Architecture Search

Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction. AutoGNNUQ leverages architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of predictive uncertainties. Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them. In our computational experiments, we demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of both prediction accuracy and UQ performance on multiple benchmark datasets. Additionally, we utilize t-SNE visualization to explore correlations between molecular features and uncertainty, offering insight for dataset improvement. AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.

Updated: 2024-04-01 02:13:37

标题: 分子性质预测中的不确定性量化与图神经架构搜索

摘要: 图神经网络（GNNs）已经成为分子性质预测的一类重要的数据驱动方法。然而，典型GNN模型的一个关键局限是它们无法量化预测中的不确定性。这种能力对于确保模型在下游任务中的可信使用和部署至关重要。为此，我们介绍了AutoGNNUQ，一种用于分子性质预测的自动不确定性量化（UQ）方法。AutoGNNUQ利用架构搜索生成一组表现优异的GNNs，从而实现对预测不确定性的估计。我们的方法采用方差分解来区分数据（aleatoric）和模型（epistemic）不确定性，为减少它们提供了宝贵的见解。在我们的计算实验中，我们展示了AutoGNNUQ在多个基准数据集上在预测准确性和UQ性能方面优于现有的UQ方法。此外，我们利用t-SNE可视化探索分子特征和不确定性之间的相关性，为数据集改进提供见解。AutoGNNUQ在药物发现和材料科学等领域具有广泛的适用性，精确的不确定性量化对于决策至关重要。

更新时间: 2024-04-01 02:13:37

领域: cs.LG,physics.chem-ph,q-bio.BM

下载: http://arxiv.org/abs/2307.10438v2

Steady-State Error Compensation for Reinforcement Learning with Quadratic Rewards

The selection of a reward function in Reinforcement Learning (RL) has garnered significant attention because of its impact on system performance. Issues of significant steady-state errors often manifest when quadratic reward functions are employed. Although absolute-value-type reward functions alleviate this problem, they tend to induce substantial fluctuations in specific system states, leading to abrupt changes. In response to this challenge, this study proposes an approach that introduces an integral term. By integrating this integral term into quadratic-type reward functions, the RL algorithm is adeptly tuned, augmenting the system's consideration of reward history, and consequently alleviates concerns related to steady-state errors. Through experiments and performance evaluations on the Adaptive Cruise Control (ACC) and lane change models, we validate that the proposed method effectively diminishes steady-state errors and does not cause significant spikes in some system states.

Updated: 2024-04-01 02:09:32

标题: 强化学习中基于二次奖励的稳态误差补偿

摘要: 在强化学习（RL）中选择奖励函数引起了重要关注，因为它对系统性能有影响。当采用二次奖励函数时，通常会出现显著的稳态误差问题。尽管绝对值类型的奖励函数可以缓解这一问题，但它们往往会在特定系统状态下引起大幅波动，导致突然变化。针对这一挑战，本研究提出了一种引入积分项的方法。通过将这个积分项整合到二次类型的奖励函数中，RL算法可以得到熟练调整，增强系统对奖励历史的考虑，从而减轻与稳态误差相关的担忧。通过在自适应巡航控制（ACC）和变道模型上进行实验和性能评估，我们验证了所提出的方法有效地减少了稳态误差，并且不会在某些系统状态下引起显著的波动。

更新时间: 2024-04-01 02:09:32

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2402.09075v2

Bailong: Bilingual Transfer Learning based on QLoRA and Zip-tie Embedding

Large language models (LLMs) have demonstrated exceptional performance in various NLP applications. However, the majority of existing open-source LLMs are pre-trained primarily on English data and little part of other languages. This deficiency in multilingual training data results in suboptimal performance when applied to languages with fewer available resources. Furthermore, enhancing the performance of LLMs on low-resource languages by full-parameter fine-tuning with additional data requires substantial computational resources, posing computational barriers for research organizations and individual researchers. Consequently, several techniques such as parameter-efficient tuning and advanced embedding initialization have been proposed to address these challenges. In this work, we combine them to facilitate cross-lingual transfer on English-dominated open-source LLM. To effectively enhance the model's proficiency in Traditional Chinese, we conduct secondary pre-training on Llama 2 7B with Traditional Chinese data by leveraging QLoRA and our proposed zip-tie embedding initialization. The resulting model called Bailong, which stands for Bilingual trAnsfer learnIng based on qLOra and zip-tie embeddiNG. We present Bailong-instruct 7B, a fine-tuned version of Bailong 7B optimized for multi-turn dialogue scenarios. Recognizing the inadequacy of benchmark datasets in Traditional Chinese, we further introduce Bailong-bench to assess the alignment of models with human preferences and the capability to follow instructions in both Traditional Chinese and English tasks. In our evaluation, Bailong-instruct 7B exhibits competitive performance on Bailong-bench and other benchmark datasets when compared to other open-source models of similar or even larger parameter sizes. Bailong-instruct 7B and Bailong-bench are publicly available with the aim of empowering the community to build upon our efforts.

Updated: 2024-04-01 02:04:44

标题: Bailong：基于QLoRA和Zip-tie嵌入的双语迁移学习

摘要: 大型语言模型（LLMs）在各种自然语言处理应用中表现出卓越的性能。然而，现有大多数开源LLMs主要是在英语数据上进行预训练，而其他语言的数据量相对较少。这种多语言训练数据的不足导致在应用于资源较少的语言时性能不佳。此外，通过使用额外数据对LLMs进行全参数微调来增强在资源匮乏语言上的性能需要大量的计算资源，给研究机构和个人研究者带来了计算障碍。因此，已经提出了一些技术，如参数高效调整和高级嵌入初始化，以解决这些挑战。在这项工作中，我们将它们结合起来，以促进对英语主导的开源LLM的跨语言转移。为了有效提升模型在繁体中文上的熟练度，我们利用QLoRA和我们提出的zip-tie嵌入初始化，在Llama 27B上利用繁体中文数据进行二次预训练。最终产生的模型名为Bailong，代表着基于QLoRA和zip-tie嵌入的双语转移学习。我们介绍了Bailong-instruct 7B，这是Bailong 7B的微调版本，针对多轮对话场景进行了优化。鉴于繁体中文标准数据集的不足，我们进一步推出了Bailong-bench，以评估模型与人类偏好的一致性以及在繁体中文和英语任务中遵循指示的能力。在我们的评估中，与其他开源模型相比，Bailong-instruct 7B在Bailong-bench和其他标准数据集上表现出有竞争力的性能，即使参数规模相似甚至更大。Bailong-instruct 7B和Bailong-bench已公开提供，旨在赋予社区利用我们努力的能力。

更新时间: 2024-04-01 02:04:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.00862v1

Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.

Updated: 2024-04-01 02:04:25

标题: 利用生成式人工智能进行临床证据总结需要确保可信度

摘要: 基于证据的医学承诺通过利用最佳可用证据赋予医疗决策和实践更高的质量。医学证据的迅速增长，可以从各种来源获取，使得收集、评估和综合证据信息面临挑战。最近生成式人工智能的进展，以大型语言模型为例，有望在促进这一艰巨任务中发挥作用。然而，开发负责任、公平和包容性模型仍然是一项复杂的任务。在这个角度上，我们讨论了生成式人工智能在自动总结医学证据方面的可信度。

更新时间: 2024-04-01 02:04:25

领域: cs.AI

下载: http://arxiv.org/abs/2311.11211v3

Machine Learning-Augmented Optimization of Large Bilevel and Two-stage Stochastic Programs: Application to Cycling Network Design

Motivated by a cycling infrastructure planning application, we present a machine learning approach to solving bilevel programs with a large number of independent followers, which as a special case includes two-stage stochastic programming. We propose an optimization model that explicitly considers a sampled subset of followers and exploits a machine learning model to estimate the objective values of unsampled followers. Unlike existing approaches, we embed machine learning model training into the optimization problem, which allows us to employ follower features that cannot be represented using leader decisions. We prove bounds on the optimality gap of the generated leader decision as measured by the original objective that considers the full follower set. We develop follower sampling algorithms to tighten the bounds and a representation learning approach to learn follower features, which are used as inputs to our machine learning model. Through numerical studies, we show that our approach generates leader decisions of higher quality compared to baselines. Finally, we perform a real-world case study in Toronto, Canada, where we solve a cycling network design problem with over one million followers. Compared to the current practice, our approach improves a transportation metric by 19.2% and can lead to a potential cost saving of $18M.

Updated: 2024-04-01 02:02:52

标题: 机器学习增强的大型双层和两阶段随机规划优化：在自行车网络设计中的应用

摘要: 在一个自行车基础设施规划应用的驱动下，我们提出了一种机器学习方法来解决具有大量独立追随者的双层规划问题，其中特例包括两阶段随机规划。我们提出了一个优化模型，明确考虑了追随者的采样子集，并利用机器学习模型来估计未采样追随者的目标值。与现有方法不同的是，我们将机器学习模型训练嵌入到优化问题中，这使我们能够使用不能用领导者决策表示的追随者特征。我们证明了生成的领导者决策的最优性差距的界限，其由考虑完整追随者集合的原始目标度量而得。我们开发了追随者采样算法来收紧边界，并提出了一种表示学习方法来学习追随者特征，这些特征被用作我们的机器学习模型的输入。通过数值研究，我们展示了与基准相比，我们的方法生成更高质量的领导者决策。最后，我们在加拿大多伦多进行了一个真实案例研究，在这里我们解决了一个拥有超过一百万追随者的自行车网络设计问题。与当前做法相比，我们的方法将一个交通指标提高了19.2％，并可能导致节约成本1800万美元。

更新时间: 2024-04-01 02:02:52

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2209.09404v3

Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance

Large-scale contrastive vision-language pre-trained models provide the zero-shot model achieving competitive performance across a range of image classification tasks without requiring training on downstream data. Recent works have confirmed that while additional fine-tuning of the zero-shot model on the reference data results in enhanced downstream performance, it compromises the model's robustness against distribution shifts. Our investigation begins by examining the conditions required to achieve the goals of robust fine-tuning, employing descriptions based on feature distortion theory and joint energy-based models. Subsequently, we propose a novel robust fine-tuning algorithm, Lipsum-FT, that effectively utilizes the language modeling aspect of the vision-language pre-trained models. Extensive experiments conducted on distribution shift scenarios in DomainNet and ImageNet confirm the superiority of our proposed Lipsum-FT approach over existing robust fine-tuning methods.

Updated: 2024-04-01 02:01:33

标题: Lipsum-FT：使用随机文本引导对零样本模型进行稳健的微调

摘要: 大规模对比视觉-语言预训练模型提供了零翻译模型，能够在一系列图像分类任务中取得竞争性表现，而无需在下游数据上进行训练。最近的研究证实，虽然在参考数据上对零翻译模型进行额外的微调可以提高下游性能，但会损害模型对分布转移的鲁棒性。我们的研究从检验实现鲁棒微调目标所需的条件开始，采用基于特征扭曲理论和联合能量模型的描述。随后，我们提出了一种新颖的鲁棒微调算法 Lipsum-FT，有效利用视觉-语言预训练模型的语言建模方面。在DomainNet和ImageNet中进行的大量实验证实了我们提出的 Lipsum-FT 方法在分布转移场景中优于现有的鲁棒微调方法。

更新时间: 2024-04-01 02:01:33

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.00860v1

Do language models plan ahead for future tokens?

Do transformers "think ahead" during inference at a given position? It is known transformers prepare information in the hidden states of the forward pass at $t$ that is then used in future forward passes $t+\tau$. We posit two explanations for this phenomenon: pre-caching, in which off-diagonal gradient terms present in training result in the model computing features at $t$ irrelevant to the present inference task but useful for the future, and breadcrumbs, in which features most relevant to time step $t$ are already the same as those that would most benefit inference at time $t+\tau$. We test these hypotheses by training language models without propagating gradients to past timesteps, a scheme we formalize as myopic training. In a synthetic data setting, we find clear evidence for pre-caching. In the autoregressive language modeling setting, our experiments are more suggestive of the breadcrumbs hypothesis.

Updated: 2024-04-01 02:01:28

标题: 语言模型是否提前规划未来的单词？

摘要: Transformer在推理过程中是否在给定位置上“提前思考”？已知Transformer在前向传播的隐藏状态中准备信息，在$t$时刻被用于未来的前向传播$t+\tau$。我们提出了两种解释这一现象的解释：预缓存，即训练中存在的非对角梯度项导致模型在$t$时刻计算与当前推理任务无关但对未来有用的特征；以及面包屑，即与$t$时刻最相关的特征已经与那些对$t+\tau$时刻推理最有益的特征相同。我们通过训练语言模型而不将梯度传播到过去的时间步来测试这些假设，这一方案我们形式化为近视训练。在一个合成数据设置中，我们发现了预缓存的明显证据。在自回归语言建模设置中，我们的实验更倾向于支持面包屑假设。

更新时间: 2024-04-01 02:01:28

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.00859v1

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges. (i) It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon. (ii) The observation and state spaces are often continuous, which induces a sample complexity that scales exponentially with the extrinsic dimension. Addressing such challenges requires learning a minimal but sufficient representation of the observation and state histories by exploiting the structure of the POMDP. To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.~(i) For each step, ETC learns to represent the state with a low-dimensional feature, which factorizes the transition kernel. (ii) Across multiple steps, ETC learns to represent the full history with a low-dimensional embedding, which assembles the per-step feature. We integrate (i) and (ii) in a unified framework that allows a variety of estimators (including maximum likelihood estimators and generative adversarial networks). For a class of POMDPs with a low-rank structure in the transition kernel, ETC attains an $O(1/\epsilon^2)$ sample complexity that scales polynomially with the horizon and the intrinsic dimension (that is, the rank). Here $\epsilon$ is the optimality gap. To our best knowledge, ETC is the first sample-efficient algorithm that bridges representation learning and policy optimization in POMDPs with infinite observation and state spaces.

Updated: 2024-04-01 01:53:31

标题: Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency 将嵌入到部分观测系统中进行控制：具有可证实样本效率的表示学习

摘要: 部分观察的马尔可夫决策过程（POMDPs）中的强化学习面临两个挑战。 (i)通常需要完整的历史记录来预测未来，这导致了样本复杂度随着时间跨度呈指数增长。(ii)观察和状态空间通常是连续的，这导致了样本复杂度随着外在维度呈指数增长。解决这些挑战需要通过利用POMDP结构学习观察和状态历史的最小但足够的表示。为此，我们提出了一种名为Embed to Control（ETC）的强化学习算法，它在优化策略的同时学习两个级别的表示。(i)对于每一步，ETC学习使用低维特征表示状态，这将转移核进行因式分解。(ii)在多个步骤中，ETC学习使用低维嵌入表示完整历史记录，这汇集了每一步的特征。我们在一个统一的框架中集成了(i)和(ii)，允许各种估计器（包括最大似然估计器和生成对抗网络）。对于具有转移核中低秩结构的一类POMDPs，ETC实现了一个与时间跨度和内在维度（即秩）多项式相关的$O(1/\epsilon^2)$样本复杂度。这里的$\epsilon$是最优性差距。据我们所知，ETC是第一个在具有无限观察和状态空间的POMDP中在表示学习和策略优化之间建立联系的样本有效算法。

更新时间: 2024-04-01 01:53:31

领域: cs.LG,cs.AI,cs.SY,eess.SY,stat.ML

下载: http://arxiv.org/abs/2205.13476v2

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling

Recently, there have been efforts to encode the linguistic information of speech using a self-supervised framework for speech synthesis. However, predicting representations from surrounding representations can inadvertently entangle speaker information in the speech representation. This paper aims to remove speaker information by exploiting the structured nature of speech, composed of discrete units like phonemes with clear boundaries. A neural network predicts these boundaries, enabling variable-length pooling for event-based representation extraction instead of fixed-rate methods. The boundary predictor outputs a probability for the boundary between 0 and 1, making pooling soft. The model is trained to minimize the difference with the pooled representation of the data augmented by time-stretch and pitch-shift. To confirm that the learned representation includes contents information but is independent of speaker information, the model was evaluated with libri-light's phonetic ABX task and SUPERB's speaker identification task.

Updated: 2024-04-01 01:49:09

标题: 使用可变长度软池化去除语音表示中的说话者信息

摘要: 最近，已经有人努力使用自监督框架对语音的语言信息进行编码，用于语音合成。然而，从周围的表示中预测表示可能会无意中将说话者信息纳入语音表示中。本文旨在通过利用语音的结构化特性来消除说话者信息，语音由像带有明确边界的音素这样的离散单元组成。神经网络预测这些边界，实现了可变长度的池化，用于基于事件的表示提取，而不是固定速率的方法。边界预测器输出了0到1之间的边界概率，使得池化是软性的。该模型经过训练，以最小化通过时间拉伸和音调变化增强的数据的池化表示与之的差异。为了确认学习到的表示包含内容信息但独立于说话者信息，该模型在libri-light的音素ABX任务和SUPERB的说话者识别任务中进行了评估。

更新时间: 2024-04-01 01:49:09

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2404.00856v1

TSOM: Small Object Motion Detection Neural Network Inspired by Avian Visual Circuit

Detecting small moving objects in complex backgrounds from an overhead perspective is a highly challenging task for machine vision systems. As an inspiration from nature, the avian visual system is capable of processing motion information in various complex aerial scenes, and its Retina-OT-Rt visual circuit is highly sensitive to capturing the motion information of small objects from high altitudes. However, more needs to be done on small object motion detection algorithms based on the avian visual system. In this paper, we conducted mathematical modeling based on extensive studies of the biological mechanisms of the Retina-OT-Rt visual circuit. Based on this, we proposed a novel tectum small object motion detection neural network (TSOM). The neural network includes the retina, SGC dendritic, SGC Soma, and Rt layers, each layer corresponding to neurons in the visual pathway. The Retina layer is responsible for accurately projecting input content, the SGC dendritic layer perceives and encodes spatial-temporal information, the SGC Soma layer computes complex motion information and extracts small objects, and the Rt layer integrates and decodes motion information from multiple directions to determine the position of small objects. Extensive experiments on pigeon neurophysiological experiments and image sequence data showed that the TSOM is biologically interpretable and effective in extracting reliable small object motion features from complex high-altitude backgrounds.

Updated: 2024-04-01 01:49:08

标题: TSOM：受鸟类视觉回路启发的小物体运动检测神经网络

摘要: 从俯视角度在复杂背景中检测小型移动物体对机器视觉系统来说是一个极具挑战性的任务。作为自然界的灵感，鸟类视觉系统能够处理各种复杂的空中场景中的运动信息，其视网膜-OT-Rt视觉回路对从高空捕捉小物体的运动信息非常敏感。然而，基于鸟类视觉系统的小物体运动检测算法还需要进一步研究。本文基于对视网膜-OT-Rt视觉回路生物机制的广泛研究进行了数学建模。基于此，我们提出了一种新颖的tectum小物体运动检测神经网络（TSOM）。神经网络包括视网膜、SGC树突、SGC细胞体和Rt层，每一层对应于视觉通路中的神经元。视网膜层负责准确投射输入内容，SGC树突层感知和编码空间-时间信息，SGC细胞体层计算复杂的运动信息并提取小物体，Rt层集成并解码来自多个方向的运动信息以确定小物体的位置。对鸽子的神经生理实验和图像序列数据的广泛实验表明，TSOM在从复杂高空背景中提取可靠的小物体运动特征方面具有生物学解释性和有效性。

更新时间: 2024-04-01 01:49:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.00855v1

Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments

This paper presents a simple yet efficient ensemble learning framework for Vietnamese scene text spotting. Leveraging the power of ensemble learning, which combines multiple models to yield more accurate predictions, our approach aims to significantly enhance the performance of scene text spotting in challenging urban settings. Through experimental evaluations on the VinText dataset, our proposed method achieves a significant improvement in accuracy compared to existing methods with an impressive accuracy of 5%. These results unequivocally demonstrate the efficacy of ensemble learning in the context of Vietnamese scene text spotting in urban environments, highlighting its potential for real world applications, such as text detection and recognition in urban signage, advertisements, and various text-rich urban scenes.

Updated: 2024-04-01 01:45:30

标题: 在城市环境中越南场景文本定位的集成学习

摘要: 这篇论文介绍了一个简单而高效的集成学习框架，用于越南景观文本识别。利用集成学习的力量，将多个模型结合起来产生更准确的预测，我们的方法旨在显著提升在具有挑战性的城市环境中的景观文本识别性能。通过在VinText数据集上的实验评估，我们提出的方法在准确性方面取得了显著的改进，比现有方法的准确性高出5%。这些结果明确展示了集成学习在越南城市环境中景观文本识别的有效性，突显了其在实际应用中的潜力，例如在城市标牌、广告和各种文本丰富的城市场景中的文本检测和识别。

更新时间: 2024-04-01 01:45:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.00852v1

Delay-Induced Watermarking for Detection of Replay Attacks in Linear Systems

A state-feedback watermarking signal design for the detection of replay attacks in linear systems is proposed. The control input is augmented with a random time-delayed term of the system state estimate, in order to secure the system against attacks of replay type. We outline the basic analysis of the closed-loop response of the state-feedback watermarking in a LQG controlled system. Our theoretical results are applied on a temperature process control example. While the proposed secure control scheme requires very involved analysis, it, nevertheless, holds promise of being superior to conventional, feed-forward, watermarking schemes, in both its ability to detect attacks as well as the secured system performance.

Updated: 2024-04-01 01:34:30

标题: 延迟诱导水印技术用于线性系统中重放攻击的检测

摘要: 提出了一种用于检测线性系统中重放攻击的状态反馈水印信号设计。控制输入通过系统状态估计的随机延迟项进行增强，以保护系统免受重放类型的攻击。我们概述了LQG控制系统中状态反馈水印的闭环响应的基本分析。我们的理论结果应用于一个温度过程控制示例。虽然提出的安全控制方案需要非常复杂的分析，但它仍然有望优于传统的前馈水印方案，无论是在检测攻击还是在保护系统性能方面。

更新时间: 2024-04-01 01:34:30

领域: eess.SY,cs.CR,cs.SY

下载: http://arxiv.org/abs/2404.00850v1

TinyLLM: Learning a Small Student from Multiple Large Language Models

Transferring the reasoning capability from stronger large language models (LLMs) to smaller ones has been quite appealing, as smaller LLMs are more flexible to deploy with less expense. Among the existing solutions, knowledge distillation stands out due to its outstanding efficiency and generalization. However, existing methods suffer from several drawbacks, including limited knowledge diversity and the lack of rich contextual information. To solve the problems and facilitate the learning of compact language models, we propose TinyLLM, a new knowledge distillation paradigm to learn a small student LLM from multiple large teacher LLMs. In particular, we encourage the student LLM to not only generate the correct answers but also understand the rationales behind these answers. Given that different LLMs possess diverse reasoning skills, we guide the student model to assimilate knowledge from various teacher LLMs. We further introduce an in-context example generator and a teacher-forcing Chain-of-Thought strategy to ensure that the rationales are accurate and grounded in contextually appropriate scenarios. Extensive experiments on six datasets across two reasoning tasks demonstrate the superiority of our method. Results show that TinyLLM can outperform large teacher LLMs significantly, despite a considerably smaller model size.

Updated: 2024-04-01 01:28:48

标题: TinyLLM：从多个大型语言模型中学习一个小型学生

摘要: 将强大的大型语言模型（LLMs）的推理能力转移到更小的模型中具有很大吸引力，因为更小的LLMs更灵活，成本更低。在现有解决方案中，由于其出色的效率和泛化能力，知识蒸馏脱颖而出。然而，现有方法存在一些缺点，包括知识多样性有限和缺乏丰富的上下文信息。为了解决这些问题并促进紧凑语言模型的学习，我们提出了TinyLLM，一种新的知识蒸馏范式，用于从多个大型教师LLM中学习一个小型学生LLM。特别地，我们鼓励学生LLM不仅生成正确的答案，而且理解这些答案背后的原理。鉴于不同的LLMs具有不同的推理技能，我们引导学生模型从各种教师LLMs assimilate知识。我们进一步引入了一个上下文示例生成器和一个教师驱动的Chain-of-Thought策略，以确保推理是准确的并基于上下文合适的场景。在涵盖两个推理任务的六个数据集上进行的大量实验证明了我们方法的优越性。结果显示，尽管模型尺寸明显较小，TinyLLM可以明显优于大型教师LLMs。

更新时间: 2024-04-01 01:28:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.04616v2

Predictive Performance Comparison of Decision Policies Under Confounding

Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing decision-making policy that is generally under-specified and dependent on unobservable factors. These sources of uncertainty are often addressed in practice by making strong assumptions about the data-generating mechanism. In this work, we propose a method to compare the predictive performance of decision policies under a variety of modern identification approaches from the causal inference and off-policy evaluation literatures (e.g., instrumental variable, marginal sensitivity model, proximal variable). Key to our method is the insight that there are regions of uncertainty that we can safely ignore in the policy comparison. We develop a practical approach for finite-sample estimation of regret intervals under no assumptions on the parametric form of the status quo policy. We verify our framework theoretically and via synthetic data experiments. We conclude with a real-world application using our framework to support a pre-deployment evaluation of a proposed modification to a healthcare enrollment policy.

Updated: 2024-04-01 01:27:07

标题: 混杂下决策策略的预测性能比较

摘要: 预测模型通常被引入决策任务，因为它们可以提高现有决策政策的性能。然而，与通常不明确且依赖于不可观测因素的现有决策政策相比较预测性能是具有挑战性的。这些不确定性来源通常通过对数据生成机制进行强假设来处理。在这项工作中，我们提出了一种方法，通过从因果推断和脱机评估文献中的多种现代识别方法（例如，工具变量，边际敏感性模型，近端变量）来比较决策政策的预测性能。我们方法的关键在于，我们可以安全地忽略政策比较中的不确定性区域。我们提出了一个在不对现状政策的参数形式做任何假设的有限样本估计遗憾区间的实用方法。我们从理论上和通过合成数据实验验证了我们的框架。我们通过一个真实世界的应用来总结，使用我们的框架支持对医疗保健入职政策的建议修改进行部署前评估。

更新时间: 2024-04-01 01:27:07

领域: cs.LG,cs.CY,stat.ME

下载: http://arxiv.org/abs/2404.00848v1

Bounds of Block Rewards in Honest PinFi Systems

PinFi is a class of novel protocols for decentralized pricing of dissipative assets, whose value naturally declines over time. Central to the protocol's functionality and its market efficiency is the role of liquidity providers (LPs). This study addresses critical stability and sustainability challenges within the protocol, namely: the propensity of LPs to prefer selling in external markets over participation in the protocol; a similar inclination towards selling within the PinFi system rather than contributing as LPs; and a scenario where LPs are disinclined to sell within the protocol. Employing a game-theoretic approach, we explore PinFi's mechanisms and its broader ramifications. Our findings reveal that, under a variety of common conditions and with an assumption of participant integrity, PinFi is capable of fostering a dynamic equilibrium among LPs, sellers, and buyers. This balance is maintained through a carefully calibrated range of block rewards for LPs, ensuring the protocol's long-term stability and utility.

Updated: 2024-04-01 01:25:40

标题: "诚实PinFi系统中区块奖励的界限"

摘要: PinFi是一类用于去中心化定价逐渐贬值资产的新型协议，其价值自然随时间下降。协议功能和市场效率的核心是流动性提供者（LPs）的作用。本研究解决了协议内关键的稳定性和可持续性挑战，即LPs更倾向于在外部市场出售而不是参与协议；在PinFi系统内出售而不是作为LPs贡献；以及LPs不愿意在协议内出售的情况。通过博弈论方法，我们探讨了PinFi的机制及其更广泛的影响。我们的研究结果显示，在各种常见条件下并假设参与者诚实的情况下，PinFi能够在LPs、卖家和买家之间促进动态平衡。通过为LPs精心调整的区块奖励范围，确保协议的长期稳定性和效用。

更新时间: 2024-04-01 01:25:40

领域: cs.GT,cs.AI,cs.CE

下载: http://arxiv.org/abs/2404.02174v1

Transfer Learning with Point Transformers

Point Transformers are near state-of-the-art models for classification, segmentation, and detection tasks on Point Cloud data. They utilize a self attention based mechanism to model large range spatial dependencies between multiple point sets. In this project we explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model to classify 3D MNIST dataset after finetuning. We also train the model from scratch on 3D MNIST dataset to compare the performance of finetuned and from-scratch model on the MNIST dataset. We observe that since the two datasets have a large difference in the degree of the distributions, transfer learned models do not outperform the from-scratch models in this case. Although we do expect transfer learned models to converge faster since they already know the lower level edges, corners, etc features from the ModelNet10 dataset.

Updated: 2024-04-01 01:23:58

标题: Point Transformers的迁移学习

摘要: Point Transformers 是 Point Cloud 数据分类、分割和检测任务的近乎最先进模型。它们利用基于自注意力的机制来建模多个点集之间的大范围空间依赖关系。在这个项目中，我们探讨了两件事情：这些基于注意力的网络在 ModelNet10 数据集上的分类性能，然后利用训练好的模型对 3D MNIST 数据集进行分类。我们还从零开始在 3D MNIST 数据集上训练模型，以比较在 MNIST 数据集上微调和从头开始训练的模型的性能。我们观察到，由于这两个数据集在分布程度上存在较大差异，迁移学习模型在这种情况下并没有超越从头开始训练的模型。尽管我们预期迁移学习模型会收敛更快，因为它们已经从 ModelNet10 数据集中了解了较低级的边缘、角等特征。

更新时间: 2024-04-01 01:23:58

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.00846v1

Provably Efficient Exploration in Policy Optimization

While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an ``optimistic version'' of the policy gradient direction. This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves $\tilde{O}(\sqrt{d^2 H^3 T} )$ regret. Here $d$ is the feature dimension, $H$ is the episode horizon, and $T$ is the total number of steps. To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.

Updated: 2024-04-01 00:56:31

标题: 策略优化中可证明高效的探索

摘要: 尽管基于策略的强化学习在实践中取得了巨大成功，但在理论上却被理解得较少，特别是与基于价值的强化学习相比。特别地，如何设计一个能够证明有效的策略优化算法并且融入了探索仍然是一个难题。为了弥补这种差距，本文提出了一种Proximal Policy Optimization算法（OPPO）的乐观变体，其遵循了策略梯度方向的“乐观版本”。本文证明，在具有线性函数逼近、未知转移和对抗性奖励以及全信息反馈的情节性马尔可夫决策过程问题中，OPPO实现了$\tilde{O}(\sqrt{d^2 H^3 T})$的遗憾。这里$d$是特征维度，$H$是情节长度，$T$是总步数。据我们所知，OPPO是第一个能够探索并且具有证明有效性的策略优化算法。

更新时间: 2024-04-01 00:56:31

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/1912.05830v4

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

We study reinforcement learning for partially observed Markov decision processes (POMDPs) with infinite observation and state spaces, which remains less investigated theoretically. To this end, we make the first attempt at bridging partial observability and function approximation for a class of POMDPs with a linear structure. In detail, we propose a reinforcement learning algorithm (Optimistic Exploration via Adversarial Integral Equation or OP-TENET) that attains an $\epsilon$-optimal policy within $O(1/\epsilon^2)$ episodes. In particular, the sample complexity scales polynomially in the intrinsic dimension of the linear structure and is independent of the size of the observation and state spaces. The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the identification and estimation of such an operator via an adversarial integral equation, which features a smoothed discriminator tailored to the linear structure, and (iii) the exploration of the observation and state spaces via optimism, which is based on quantifying the uncertainty in the adversarial integral equation.

Updated: 2024-04-01 00:46:06

标题: 部分观测下的强化学习：具有可证明样本效率的线性函数逼近

摘要: 我们研究了具有无限观测和状态空间的部分观测马尔可夫决策过程（POMDP）的强化学习，这在理论上仍然较少研究。为此，我们首次尝试将部分可观察性和函数逼近相结合，针对具有线性结构的一类POMDP提出了一种强化学习算法（通过对抗积分方程进行乐观探索或OP-TENET），该算法在$O(1/\epsilon^2)$个回合内实现了$\epsilon$-最优策略。特别地，样本复杂度在线性结构的固有维度上呈多项式变化，并且与观测和状态空间的大小无关。 OP-TENET的样本效率得益于一系列因素：(i)具有有限记忆的贝尔曼算子，以递归方式表示值函数，(ii)通过对抗积分方程识别和估计这种算子，其特点是针对线性结构量身定制的平滑鉴别器，以及(iii)通过乐观探索来探索观测和状态空间，这基于对对抗积分方程中的不确定性进行量化。

更新时间: 2024-04-01 00:46:06

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2204.09787v3

Automated HER2 Scoring in Breast Cancer Images Using Deep Learning and Pyramid Sampling

Human epidermal growth factor receptor 2 (HER2) is a critical protein in cancer cell growth that signifies the aggressiveness of breast cancer (BC) and helps predict its prognosis. Accurate assessment of immunohistochemically (IHC) stained tissue slides for HER2 expression levels is essential for both treatment guidance and understanding of cancer mechanisms. Nevertheless, the traditional workflow of manual examination by board-certified pathologists encounters challenges, including inter- and intra-observer inconsistency and extended turnaround times. Here, we introduce a deep learning-based approach utilizing pyramid sampling for the automated classification of HER2 status in IHC-stained BC tissue images. Our approach analyzes morphological features at various spatial scales, efficiently managing the computational load and facilitating a detailed examination of cellular and larger-scale tissue-level details. This method addresses the tissue heterogeneity of HER2 expression by providing a comprehensive view, leading to a blind testing classification accuracy of 84.70%, on a dataset of 523 core images from tissue microarrays. Our automated system, proving reliable as an adjunct pathology tool, has the potential to enhance diagnostic precision and evaluation speed, and might significantly impact cancer treatment planning.

Updated: 2024-04-01 00:23:22

标题: 使用深度学习和金字塔采样技术自动化在乳腺癌图像中的HER2评分

摘要: 人类表皮生长因子受体2（HER2）是癌细胞生长中的关键蛋白质，标志着乳腺癌（BC）的侵袭性，并有助于预测其预后。对于免疫组化（IHC）染色组织切片中HER2表达水平的准确评估对于治疗指导和了解癌症机制至关重要。然而，传统的由董事会认证的病理学家进行手动检查的工作流程面临挑战，包括观察者之间和观察者内的不一致性以及延长的反馈时间。在这里，我们介绍了一种基于深度学习的金字塔采样方法，用于自动分类IHC染色的BC组织图像中HER2状态。我们的方法在各种空间尺度分析形态特征，有效地管理计算负荷，并促进对细胞和更大尺度组织水平细节的详细检查。该方法通过提供全面视图来解决HER2表达的组织异质性，导致在一个包含523个核心图像的组织微阵列数据集上的盲测试分类准确率达到84.70%。我们的自动化系统作为一种可靠的辅助病理学工具，有潜力提高诊断精度和评估速度，并可能对癌症治疗规划产生重大影响。

更新时间: 2024-04-01 00:23:22

领域: eess.IV,cs.CV,cs.LG,physics.med-ph

下载: http://arxiv.org/abs/2404.00837v1

Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm

In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning unifying pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters such as the number of learning rounds and batch sizes in the two stages on the convergence rate. Based on our analytical results, we then propose a joint communication and computation resource management design to minimize an average squared gradient norm bound, subject to constraints on the transmit power, overall system energy consumption, and training delay. The decision variables include the number of learning rounds, batch sizes, clock frequencies, and transmit power control for both pre-training and fine-tuning stages. Finally, numerical results are provided to evaluate the effectiveness of our proposed design. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off among the training accuracy, delay, and energy consumption. The proposed design is also shown to effectively leverage the inherent trade-off between pre-training and fine-tuning, which arises from the differences in data distribution between pre-stored general data versus real-time task-specific data, thus efficiently optimizing overall system performance.

Updated: 2024-04-01 00:21:11

标题: 重新思考边缘学习中的资源管理：联合预训练和微调设计范式

摘要: 在一些应用中，边缘学习正在从传统的从头开始学习转变为新的两阶段学习，将预训练和特定任务的微调统一起来。本文考虑了在两阶段边缘学习系统中联合通信和计算资源管理的问题。在该系统中，模型的预训练首先通过在边缘服务器上对本地预存的通用数据进行集中式学习来进行，然后基于预训练模型通过联邦边缘学习在边缘设备上进行特定任务的微调。对于两阶段学习模型，我们首先分析了收敛行为（以平均平方梯度范数界为标准），这一界定了各种系统参数（如两个阶段中的学习轮数和批处理大小）对收敛速度的影响。基于我们的分析结果，我们提出了一个联合通信和计算资源管理设计，以最小化平均平方梯度范数界，同时受到传输功率、整体系统能耗和训练延迟的约束。决策变量包括预训练和微调阶段的学习轮数、批处理大小、时钟频率和传输功率控制。最后，提供了数值结果来评估我们提出的设计的有效性。结果表明，提出的在预训练和微调阶段的联合资源管理很好地平衡了训练准确性、延迟和能耗之间的系统性能权衡。所提出的设计还表明有效地利用了预训练和微调之间的固有权衡，这是由于预存的通用数据与实时任务特定数据之间的数据分布差异而产生的，从而有效地优化了整体系统性能。

更新时间: 2024-04-01 00:21:11

领域: cs.IT,cs.DC,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.00836v1