Arxiv Day: Article

INFusion: Diffusion Regularized Implicit Neural Representations for 2D and 3D accelerated MRI reconstruction

Implicit Neural Representations (INRs) are a learning-based approach to accelerate Magnetic Resonance Imaging (MRI) acquisitions, particularly in scan-specific settings when only data from the under-sampled scan itself are available. Previous work demonstrates that INRs improve rapid MRI through inherent regularization imposed by neural network architectures. Typically parameterized by fully-connected neural networks, INRs support continuous image representations by taking a physical coordinate location as input and outputting the intensity at that coordinate. Previous work has applied unlearned regularization priors during INR training and have been limited to 2D or low-resolution 3D acquisitions. Meanwhile, diffusion based generative models have received recent attention as they learn powerful image priors decoupled from the measurement model. This work proposes INFusion, a technique that regularizes the optimization of INRs from under-sampled MR measurements with pre-trained diffusion models for improved image reconstruction. In addition, we propose a hybrid 3D approach with our diffusion regularization that enables INR application on large-scale 3D MR datasets. 2D experiments demonstrate improved INR training with our proposed diffusion regularization, and 3D experiments demonstrate feasibility of INR training with diffusion regularization on 3D matrix sizes of 256 by 256 by 80.

Updated: 2024-06-19 23:51:26

标题: INFusion：扩散正则化的隐式神经表示在二维和三维加速MRI重建中的应用

摘要: 隐式神经表示（INRs）是一种基于学习的方法，用于加速磁共振成像（MRI）采集，特别是在仅有来自欠采样扫描本身的数据时的扫描特定设置中。先前的工作表明，INRs通过神经网络架构施加的固有正则化改进了快速MRI。通常由全连接神经网络参数化，INRs通过将物理坐标位置作为输入并输出该坐标处的强度来支持连续图像表示。先前的工作在INR训练期间应用了未学习的正则化先验，并且仅限于2D或低分辨率3D采集。与此同时，基于扩散的生成模型近来受到关注，因为它们学习了与测量模型脱钩的强大图像先验。这项工作提出了INFusion，一种技术，通过预训练的扩散模型对来自欠采样MR测量的INRs的优化进行正则化，以改善图像重建。此外，我们提出了一种具有扩散正则化的混合3D方法，该方法使INR能够应用于大规模3D MR数据集。2D实验表明，我们提出的扩散正则化可以改善INR训练，而3D实验表明，在256×256×80的3D矩阵大小上，使用扩散正则化进行3D INR训练是可行的。

更新时间: 2024-06-19 23:51:26

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2406.13895v1

DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection

LiDAR-based 3D object detection has seen impressive advances in recent times. However, deploying trained 3D detectors in the real world often yields unsatisfactory performance when the distribution of the test data significantly deviates from the training data due to different weather conditions, object sizes, \textit{etc}. A key factor in this performance degradation is the diminished generalizability of pre-trained models, which creates a sharp loss landscape during training. Such sharpness, when encountered during testing, can precipitate significant performance declines, even with minor data variations. To address the aforementioned challenges, we propose \textbf{dual-perturbation optimization (DPO)} for \textbf{\underline{T}est-\underline{t}ime \underline{A}daptation in \underline{3}D \underline{O}bject \underline{D}etection (TTA-3OD)}. We minimize the sharpness to cultivate a flat loss landscape to ensure model resiliency to minor data variations, thereby enhancing the generalization of the adaptation process. To fully capture the inherent variability of the test point clouds, we further introduce adversarial perturbation to the input BEV features to better simulate the noisy test environment. As the dual perturbation strategy relies on trustworthy supervision signals, we utilize a reliable Hungarian matcher to filter out pseudo-labels sensitive to perturbations. Additionally, we introduce early Hungarian cutoff to avoid error accumulation from incorrect pseudo-labels by halting the adaptation process. Extensive experiments across three types of transfer tasks demonstrate that the proposed DPO significantly surpasses previous state-of-the-art approaches, specifically on Waymo $\rightarrow$ KITTI, outperforming the most competitive baseline by 57.72\% in $\text{AP}_\text{3D}$ and reaching 91\% of the fully supervised upper bound.

Updated: 2024-06-19 23:46:08

标题: DPO: 3D目标检测中的测试时间适应的双扰动优化

摘要: 基于LiDAR的3D物体检测在最近取得了令人瞩目的进展。然而，将训练好的3D检测器部署到现实世界中时，往往会在测试数据的分布与训练数据存在显著偏差时表现不尽人意，这可能是由于不同的天气条件、物体大小等因素造成的。造成性能下降的一个关键因素是预训练模型的泛化能力下降，这会在训练过程中产生尖锐的损失曲面。当在测试过程中遇到这种尖锐性时，即使有轻微的数据变化也可能导致显著的性能下降。为了解决上述挑战，我们提出了“双扰动优化（DPO）”用于“测试时间适应3D物体检测（TTA-3OD）”。我们通过最小化尖锐性来培育一个平坦的损失曲面，以确保模型对轻微数据变化的适应性，从而增强适应过程的泛化能力。为了充分捕捉测试点云的固有变化性，我们进一步引入对输入BEV特征的对抗性扰动，以更好地模拟嘈杂的测试环境。由于双扰动策略依赖可靠的监督信号，我们利用可靠的匈牙利匹配器来过滤对扰动敏感的伪标签。此外，我们引入早期匈牙利截断来避免由于错误的伪标签导致的错误累积，从而停止适应过程。在三种转移任务上进行的大量实验表明，所提出的DPO显著超越了先前最先进的方法，特别是在Waymo $\rightarrow$ KITTI上，其在$\text{AP}_\text{3D}$方面优于竞争基线57.72\%，达到了完全监督的上限的91%。

更新时间: 2024-06-19 23:46:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13891v1

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct advancements of LLMs and agents for medicine. To address these limitations, we introduce ClinicalLab, a comprehensive clinical diagnosis agent alignment suite. ClinicalLab includes ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for evaluating medical agents and LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalLab also includes four novel metrics (ClinicalMetrics) for evaluating the effectiveness of LLMs in clinical diagnostic tasks. We evaluate 17 LLMs and find that their performance varies significantly across different departments. Based on these findings, in ClinicalLab, we propose ClinicalAgent, an end-to-end clinical agent that aligns with real-world clinical diagnostic practices. We systematically investigate the performance and applicable scenarios of variants of ClinicalAgent on ClinicalBench. Our findings demonstrate the importance of aligning with modern medical practices in designing medical agents.

Updated: 2024-06-19 23:44:25

标题: 临床实验室：在现实世界中为多科室临床诊断协调代理人

摘要: LLMs在各种自然语言处理应用中取得了显著的性能进展。然而，在医学领域，LLMs仍然难以满足准确性和可靠性的严格要求，并在临床应用中面临许多挑战。现有用于评估由LLMs驱动的医疗代理的临床诊断评估基准存在严重局限。首先，大多数现有的医疗评估基准面临数据泄漏或污染的风险。其次，现有基准往往忽视现代医疗实践中多个部门和专业的特点。第三，现有评估方法局限于多项选择题，与现实世界的诊断场景不符。最后，现有评估方法缺乏对端到端真实临床场景的全面评估。这些基准中的局限反过来阻碍了LLMs和医疗代理的进展。为了解决这些局限，我们引入了ClinicalLab，一个全面的临床诊断代理对齐套件。ClinicalLab包括ClinicalBench，一个端到端多部门临床诊断评估基准，用于评估医疗代理和LLMs。ClinicalBench基于涵盖24个部门和150种疾病的真实案例。ClinicalLab还包括四个新颖的指标（ClinicalMetrics），用于评估LLMs在临床诊断任务中的有效性。我们评估了17个LLMs，并发现它们在不同部门之间的表现有显著差异。基于这些发现，在ClinicalLab中，我们提出了ClinicalAgent，一个与现实临床诊断实践相一致的端到端临床代理。我们系统地研究了ClinicalAgent各个变体在ClinicalBench上的表现和适用场景。我们的研究结果表明，在设计医疗代理时，与现代医学实践保持一致的重要性。

更新时间: 2024-06-19 23:44:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13890v1

Open Problem: Anytime Convergence Rate of Gradient Descent

Recent results show that vanilla gradient descent can be accelerated for smooth convex objectives, merely by changing the stepsize sequence. We show that this can lead to surprisingly large errors indefinitely, and therefore ask: Is there any stepsize schedule for gradient descent that accelerates the classic $\mathcal{O}(1/T)$ convergence rate, at \emph{any} stopping time $T$?

Updated: 2024-06-19 23:34:47

标题: 开放问题：梯度下降的任意时刻收敛速率

摘要: 最近的研究结果表明，仅通过改变步长序列，可以加速对于平滑凸目标的普通梯度下降。我们展示了这可能导致无限期的出乎意料的大误差，因此提出了一个问题：是否存在任何梯度下降的步长计划可以在\emph{任何}停止时间$T$下加速经典的$\mathcal{O}(1/T)$收敛速率？

更新时间: 2024-06-19 23:34:47

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.13888v1

Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever

Knowledge tagging for questions plays a crucial role in contemporary intelligent educational applications, including learning progress diagnosis, practice question recommendations, and course content organization. Traditionally, these annotations are always conducted by pedagogical experts, as the task requires not only a strong semantic understanding of both question stems and knowledge definitions but also deep insights into connecting question-solving logic with corresponding knowledge concepts. With the recent emergence of advanced text encoding algorithms, such as pre-trained language models, many researchers have developed automatic knowledge tagging systems based on calculating the semantic similarity between the knowledge and question embeddings. In this paper, we explore automating the task using Large Language Models (LLMs), in response to the inability of prior encoding-based methods to deal with the hard cases which involve strong domain knowledge and complicated concept definitions. By showing the strong performance of zero- and few-shot results over math questions knowledge tagging tasks, we demonstrate LLMs' great potential in conquering the challenges faced by prior methods. Furthermore, by proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs in achieving better performance results while keeping the in-context demonstration usage efficiency high.

Updated: 2024-06-19 23:30:01

标题: 通过具有灵活演示检索器的LLMs对数学问题进行知识标记系统

摘要: 问题的知识标记在当代智能教育应用中发挥着至关重要的作用，包括学习进展诊断、练习题推荐和课程内容组织。传统上，这些注释通常由教育专家进行，因为这项任务不仅需要对问题主干和知识定义的强大语义理解，还需要深入洞察将问题解决逻辑与相应的知识概念相连接。随着先进文本编码算法的出现，如预训练语言模型，许多研究人员已经开发了基于计算知识和问题嵌入之间语义相似性的自动知识标记系统。在本文中，我们探讨利用大型语言模型(LLMs)自动化任务的能力，以回应先前基于编码的方法无法处理涉及强大领域知识和复杂概念定义的困难案例。通过展示在数学问题知识标记任务中零次和少次试验结果的强大表现，我们展示了LLMs在克服先前方法面临的挑战方面的巨大潜力。此外，通过提出基于强化学习的演示检索器，我们成功地利用不同尺寸LLMs的巨大潜力，以获得更好的性能结果，同时保持上下文演示的使用效率高。

更新时间: 2024-06-19 23:30:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13885v1

Allocation Requires Prediction Only if Inequality Is Low

Algorithmic predictions are emerging as a promising solution concept for efficiently allocating societal resources. Fueling their use is an underlying assumption that such systems are necessary to identify individuals for interventions. We propose a principled framework for assessing this assumption: Using a simple mathematical model, we evaluate the efficacy of prediction-based allocations in settings where individuals belong to larger units such as hospitals, neighborhoods, or schools. We find that prediction-based allocations outperform baseline methods using aggregate unit-level statistics only when between-unit inequality is low and the intervention budget is high. Our results hold for a wide range of settings for the price of prediction, treatment effect heterogeneity, and unit-level statistics' learnability. Combined, we highlight the potential limits to improving the efficacy of interventions through prediction.

Updated: 2024-06-19 23:23:32

标题: 如果不平等程度较低，分配仅需要预测。

摘要: 算法预测作为一种有效分配社会资源的有前途的解决方案概念正在兴起。推动它们的使用的是一个基本假设，即这种系统是必要的，以识别需要干预的个体。我们提出了一个审慎的框架来评估这一假设：利用一个简单的数学模型，我们评估基于预测的分配在个体属于更大单位（如医院、社区或学校）的设置中的有效性。我们发现，在单位间的不平等性较低且干预预算较高时，基于预测的分配优于仅使用聚合单位级统计数据的基线方法。我们的结果适用于预测成本、治疗效果异质性和单位级统计数据可学习性的各种设置。总的来说，我们强调通过预测改进干预效果的潜力存在限制。

更新时间: 2024-06-19 23:23:32

领域: cs.LG,cs.CY,econ.TH

下载: http://arxiv.org/abs/2406.13882v1

Privacy-Preserving ECG Data Analysis with Differential Privacy: A Literature Review and A Case Study

Differential privacy has become the preeminent technique to protect the privacy of individuals in a database while allowing useful results from data analysis to be shared. Notably, it guarantees the amount of privacy loss in the worst-case scenario. Although many theoretical research papers have been published, practical real-life application of differential privacy demands estimating several important parameters without any clear solutions or guidelines. In the first part of the paper, we provide an overview of key concepts in differential privacy, followed by a literature review and discussion of its application to ECG analysis. In the second part of the paper, we explore how to implement differentially private query release on an arrhythmia database using a six-step process. We provide guidelines and discuss the related literature for all the steps involved, such as selection of the $\epsilon$ value, distribution of the total $\epsilon$ budget across the queries, and estimation of the sensitivity for the query functions. At the end, we discuss the shortcomings and challenges of applying differential privacy to ECG datasets.

Updated: 2024-06-19 23:17:16

标题: 隐私保护的心电图数据分析与差分隐私：文献综述及案例研究

摘要: 差分隐私已成为保护数据库中个人隐私的主要技术，同时允许从数据分析中分享有用的结果。值得注意的是，它保证了在最坏情况下的隐私损失量。尽管已经发表了许多理论研究论文，但差分隐私在实际生活中的应用要求估计几个重要参数，而没有明确的解决方案或指导方针。在论文的第一部分中，我们提供了差分隐私中关键概念的概述，随后进行了文献综述，并讨论了其在心电图分析中的应用。在论文的第二部分中，我们探讨如何使用六个步骤过程在心律失常数据库上实现差分隐私查询发布。我们提供了指南，并讨论了所有涉及的步骤的相关文献，例如选择$\epsilon$值，将总$\epsilon$预算分配给查询，以及估计查询函数的灵敏度。最后，我们讨论了将差分隐私应用于心电图数据集时的缺点和挑战。

更新时间: 2024-06-19 23:17:16

领域: cs.CR

下载: http://arxiv.org/abs/2406.13880v1

A Catalyst Framework for the Quantum Linear System Problem via the Proximal Point Algorithm

Solving systems of linear equations is a fundamental problem, but it can be computationally intensive for classical algorithms in high dimensions. Existing quantum algorithms can achieve exponential speedups for the quantum linear system problem (QLSP) in terms of the problem dimension, but even such a theoretical advantage is bottlenecked by the condition number of the coefficient matrix. In this work, we propose a new quantum algorithm for QLSP inspired by the classical proximal point algorithm (PPA). Our proposed method can be viewed as a meta-algorithm that allows inverting a modified matrix via an existing \texttt{QLSP\_solver}, thereby directly approximating the solution vector instead of approximating the inverse of the coefficient matrix. By carefully choosing the step size $\eta$, the proposed algorithm can effectively precondition the linear system to mitigate the dependence on condition numbers that hindered the applicability of previous approaches.

Updated: 2024-06-19 23:15:35

标题: 一个通过近端点算法解决量子线性系统问题的催化剂框架

摘要: 解决线性方程组是一个基本问题，但对于高维度的经典算法来说可能需要大量计算。现有的量子算法可以在问题维度方面实现指数级的加速，但即使有这样的理论优势，在系数矩阵的条件数的限制下也会受到瓶颈的影响。在这项工作中，我们提出了一种受经典近端点算法（PPA）启发的新的量子算法用于QLSP。我们提出的方法可以被看作是一种元算法，通过现有的QLSP_solver反转修改后的矩阵，从而直接逼近解向量而不是逼近系数矩阵的逆。通过精心选择步长η，提出的算法可以有效地对线性系统进行预处理，以减轻先前方法受到条件数依赖的影响。

更新时间: 2024-06-19 23:15:35

领域: quant-ph,cs.DS,cs.LG,math.OC

下载: http://arxiv.org/abs/2406.13879v1

A Systematic Literature Review on the Use of Machine Learning in Software Engineering

Software engineering (SE) is a dynamic field that involves multiple phases all of which are necessary to develop sustainable software systems. Machine learning (ML), a branch of artificial intelligence (AI), has drawn a lot of attention in recent years thanks to its ability to analyze massive volumes of data and extract useful patterns from data. Several studies have focused on examining, categorising, and assessing the application of ML in SE processes. We conducted a literature review on primary studies to address this gap. The study was carried out following the objective and the research questions to explore the current state of the art in applying machine learning techniques in software engineering processes. The review identifies the key areas within software engineering where ML has been applied, including software quality assurance, software maintenance, software comprehension, and software documentation. It also highlights the specific ML techniques that have been leveraged in these domains, such as supervised learning, unsupervised learning, and deep learning. Keywords: machine learning, deep learning, software engineering, natural language processing, source code

Updated: 2024-06-19 23:04:27

标题: 一篇关于机器学习在软件工程中应用的系统文献综述

摘要: 软件工程（SE）是一个动态领域，涉及多个阶段，所有这些阶段都是开发可持续软件系统所必需的。机器学习（ML）是人工智能（AI）的一个分支，近年来引起了很多关注，因为它能够分析海量数据并从中提取有用的模式。几项研究已经着重于检查、分类和评估ML在SE过程中的应用。我们进行了一项文献综述来填补这一空白。这项研究遵循了目标和研究问题，探讨了在软件工程过程中应用机器学习技术的当前最新进展。综述确定了软件工程中已应用ML的关键领域，包括软件质量保证、软件维护、软件理解和软件文档编制。它还强调了在这些领域中已经利用的具体ML技术，如监督学习、无监督学习和深度学习。关键词：机器学习、深度学习、软件工程、自然语言处理、源代码

更新时间: 2024-06-19 23:04:27

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2406.13877v1

DeepEdit: Knowledge Editing as Decoding with Constraints

How to edit the knowledge in multi-step reasoning has become the major challenge in the knowledge editing (KE) of large language models (LLMs). The difficulty arises because the hallucinations of LLMs during multi-step reasoning often lead to incorrect use of new knowledge and incorrect answers. To address this issue, we design decoding constraints to "regulate" LLMs' reasoning, enhancing logical coherence when incorporating new knowledge. We propose a new KE framework: DEEPEDIT (Depth-first Search-based Constrained Decoding for Knowledge Editing), which enhances LLMs's ability to generate coherent reasoning chains with new knowledge through depth-first search. Our search selects the most important knowledge that satisfies our constraints as the reasoning step to efficiently increase the reasoning depth. In addition to DEEPEDIT, we propose two new KE benchmarks: MQUAKE-2002 and MQUAKE-HARD, which provide more precise and challenging assessments of KE approaches. Qualitatively, DEEPEDIT enables LLMs to produce succinct and coherent reasoning chains involving new knowledge. Quantitatively, it yields significant improvements on multiple KE benchmarks.

Updated: 2024-06-19 22:53:54

标题: DeepEdit：将知识编辑视为带约束的解码

摘要: 如何编辑多步推理中的知识已成为大型语言模型（LLMs）知识编辑（KE）中的主要挑战。困难在于，LLMs在多步推理过程中常常产生幻觉，导致错误地使用新知识和给出错误答案。为了解决这个问题，我们设计了解码约束来“规范”LLMs的推理过程，在整合新知识时加强逻辑连贯性。我们提出了一个新的KE框架：DEEPEDIT（基于深度优先搜索的约束解码用于知识编辑），通过深度优先搜索增强LLMs生成带有新知识的连贯推理链的能力。我们的搜索选择满足约束的最重要知识作为推理步骤，以有效增加推理深度。除了DEEPEDIT，我们还提出了两个新的KE基准：MQUAKE-2002和MQUAKE-HARD，提供更精确和具有挑战性的KE方法评估。定性上，DEEPEDIT使LLMs能够生成涉及新知识的简洁连贯的推理链。定量上，它在多个KE基准上取得了显著的改进。

更新时间: 2024-06-19 22:53:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.10471v4

Slot State Space Models

Recent State Space Models (SSMs) such as S4, S5, and Mamba have shown remarkable computational benefits in long-range temporal dependency modeling. However, in many sequence modeling problems, the underlying process is inherently modular and it is of interest to have inductive biases that mimic this modular structure. In this paper, we introduce SlotSSMs, a novel framework for incorporating independent mechanisms into SSMs to preserve or encourage separation of information. Unlike conventional SSMs that maintain a monolithic state vector, SlotSSMs maintains the state as a collection of multiple vectors called slots. Crucially, the state transitions are performed independently per slot with sparse interactions across slots implemented via the bottleneck of self-attention. In experiments, we evaluate our model in object-centric video understanding, 3D visual reasoning, and video prediction tasks, which involve modeling multiple objects and their long-range temporal dependencies. We find that our proposed design offers substantial performance gains over existing sequence modeling methods.

Updated: 2024-06-19 22:53:36

标题: 插槽状态空间模型

摘要: 最近的状态空间模型（SSMs）如S4、S5和Mamba在建模长期时间依赖性方面显示出显著的计算优势。然而，在许多序列建模问题中，底层过程本质上是模块化的，有兴趣引入类似模块化结构的归纳偏差。在本文中，我们介绍了SlotSSMs，一种将独立机制合并到SSMs中以保持或鼓励信息分离的新框架。与维护单一状态向量的传统SSMs不同，SlotSSMs将状态保持为称为槽的多个向量的集合。关键的是，状态转换是针对每个槽独立进行的，槽之间的稀疏交互通过自注意力的瓶颈实现。在实验中，我们评估了我们的模型在以物体为中心的视频理解、3D视觉推理和视频预测任务中的表现，这些任务涉及对多个物体及其长期时间依赖性进行建模。我们发现，我们提出的设计相对于现有的序列建模方法提供了显著的性能提升。

更新时间: 2024-06-19 22:53:36

领域: cs.AI

下载: http://arxiv.org/abs/2406.12272v2

RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models

Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs alignment. Despite its advantages, RLHF relies on human annotators to rank the text, which can introduce potential security vulnerabilities if any adversarial annotator (i.e., attackers) manipulates the ranking score by up-ranking any malicious text to steer the LLM adversarially. To assess the red-teaming of RLHF against human preference data poisoning, we propose RankPoison, a poisoning attack method on candidates' selection of preference rank flipping to reach certain malicious behaviors (e.g., generating longer sequences, which can increase the computational cost). With poisoned dataset generated by RankPoison, we can perform poisoning attacks on LLMs to generate longer tokens without hurting the original safety alignment performance. Moreover, applying RankPoison, we also successfully implement a backdoor attack where LLMs can generate longer answers under questions with the trigger word. Our findings highlight critical security challenges in RLHF, underscoring the necessity for more robust alignment methods for LLMs.

Updated: 2024-06-19 22:40:07

标题: RLHFPoison：用于大型语言模型中基于人类反馈的强化学习的奖励中毒攻击

摘要: 使用人类反馈的强化学习（RLHF）是一种旨在将大型语言模型（LLMs）与人类偏好对齐的方法论，在LLMs对齐中发挥着重要作用。尽管具有优势，RLHF仍依赖于人类注释者对文本进行排名，这可能会引入潜在的安全漏洞，如果任何对抗性注释者（即攻击者）通过提升恶意文本的排名分数来对LLM进行对抗性引导。为了评估RLHF对人类偏好数据污染的红队测试，我们提出了RankPoison，一种关于候选人选择偏好排名翻转的毒化攻击方法，以达到某些恶意行为（例如生成更长的序列，可能会增加计算成本）。通过RankPoison生成的毒化数据集，我们可以对LLMs进行毒化攻击，生成更长的标记，而不会损害原始安全对齐性能。此外，应用RankPoison，我们还成功实施了一种后门攻击，LLMs可以在触发词下生成更长的答案。我们的研究结果突显了RLHF中的关键安全挑战，强调了LLMs更加健壮对齐方法的必要性。

更新时间: 2024-06-19 22:40:07

领域: cs.AI,cs.CL,cs.CR,cs.HC

下载: http://arxiv.org/abs/2311.09641v2

Convergence analysis of kernel learning FBSDE filter

Kernel learning forward backward SDE filter is an iterative and adaptive meshfree approach to solve the nonlinear filtering problem. It builds from forward backward SDE for Fokker-Planker equation, which defines evolving density for the state variable, and employs KDE to approximate density. This algorithm has shown more superior performance than mainstream particle filter method, in both convergence speed and efficiency of solving high dimension problems. However, this method has only been shown to converge empirically. In this paper, we present a rigorous analysis to demonstrate its local and global convergence, and provide theoretical support for its empirical results.

Updated: 2024-06-19 22:34:49

标题: 核学习FBSDE滤波器的收敛分析

摘要: 核学习前向后向SDE滤波器是一种迭代和自适应的无网格方法，用于解决非线性滤波问题。它建立在用于Fokker-Planker方程的前向后向SDE之上，该方程定义了状态变量的演化密度，并采用KDE来近似密度。这种算法在收敛速度和解决高维问题的效率方面表现出比主流粒子滤波方法更优越的性能。然而，这种方法仅在经验上显示收敛。在本文中，我们提出了严格的分析来证明其局部和全局收敛性，并为其经验结果提供理论支持。

更新时间: 2024-06-19 22:34:49

领域: cs.LG,cs.NA,math.NA,q-fin.MF

下载: http://arxiv.org/abs/2405.13390v2

A Pure Transformer Pretraining Framework on Text-attributed Graphs

Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Language Models (LLMs) on text-attributed graphs (TAGs), demonstrating superiority to traditional bag-of-words or word2vec techniques. These high-quality node features reduce the previously critical role of graph structure, resulting in a modest performance gap between Graph Neural Networks (GNNs) and structure-agnostic Multi-Layer Perceptrons (MLPs). Motivated by this, we introduce a feature-centric pretraining perspective by treating graph structure as a prior and leveraging the rich, unified feature space to learn refined interaction patterns that generalizes across graphs. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks and employs masked feature reconstruction to capture pairwise proximity in the LLM-unified feature space using a standard Transformer. By utilizing unified text representations rather than varying structures, our framework achieves significantly better transferability among graphs within the same domain. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.

Updated: 2024-06-19 22:30:08

标题: 一个纯变压器预训练框架在文本属性图上

摘要: 预训练在从大规模数据中获取广义知识方面发挥着至关重要的作用，正如在CV和NLP中的大型模型所证明的那样取得了显著成功。然而，在图领域的进展仍然有限，这是由于特征异质性和结构异质性等基本挑战。最近，人们正在加大力度通过在文本属性图（TAGs）上使用大语言模型（LLMs）来提高节点特征质量，表现出优于传统的词袋或word2vec技术。这些高质量的节点特征减少了图结构之前的关键作用，导致了图神经网络（GNNs）和结构无关的多层感知器（MLPs）之间的性能差距较小。受此启发，我们引入了一个以特征为中心的预训练视角，将图结构视为先验，并利用丰富的统一特征空间学习跨图泛化的精细交互模式。我们的框架，即基于Transformer的图序列预训练（GSPT），通过随机游采样节点上下文，并利用掩码特征重构在LLM统一特征空间中捕捉成对接近关系，使用标准Transformer。通过利用统一的文本表示而不是不同结构，我们的框架在同一领域内的不同图之间实现了显著更好的可迁移性。GSPT可以轻松适应节点分类和链接预测，并在各种数据集上展示了有希望的实证成功。

更新时间: 2024-06-19 22:30:08

领域: cs.AI

下载: http://arxiv.org/abs/2406.13873v1

Robust Time Series Forecasting with Non-Heavy-Tailed Gaussian Loss-Weighted Sampler

Forecasting multivariate time series is a computationally intensive task challenged by extreme or redundant samples. Recent resampling methods aim to increase training efficiency by reweighting samples based on their running losses. However, these methods do not solve the problems caused by heavy-tailed distribution losses, such as overfitting to outliers. To tackle these issues, we introduce a novel approach: a Gaussian loss-weighted sampler that multiplies their running losses with a Gaussian distribution weight. It reduces the probability of selecting samples with very low or very high losses while favoring those close to average losses. As it creates a weighted loss distribution that is not heavy-tailed theoretically, there are several advantages to highlight compared to existing methods: 1) it relieves the inefficiency in learning redundant easy samples and overfitting to outliers, 2) It improves training efficiency by preferentially learning samples close to the average loss. Application on real-world time series forecasting datasets demonstrate improvements in prediction quality for 1%-4% using mean square error measurements in channel-independent settings. The code will be available online after 1 the review.

Updated: 2024-06-19 22:28:18

标题: 具有非重尾高斯损失加权采样器的强健时间序列预测

摘要: 预测多变量时间序列是一项计算密集型任务，受极端或冗余样本的挑战。最近的重新采样方法旨在通过根据其运行损失对样本进行重新加权来提高训练效率。然而，这些方法并未解决由重尾分布损失引起的问题，如过度拟合异常值。为了解决这些问题，我们引入了一种新颖方法：一种高斯损失加权采样器，通过将运行损失与高斯分布权重相乘来减少选择具有非常低或非常高损失的样本的概率，同时偏好那些接近平均损失的样本。由于在理论上创建了一个不是重尾的加权损失分布，与现有方法相比有几个值得强调的优点：1）它减轻了学习冗余简单样本和过度拟合异常值的低效性，2）通过优先学习接近平均损失的样本，提高了训练效率。在真实世界时间序列预测数据集上的应用表明，在通道独立设置中，使用均方误差测量，预测质量有1%-4%的改善。代码将在审查后在线提供。

更新时间: 2024-06-19 22:28:18

领域: cs.LG

下载: http://arxiv.org/abs/2406.13871v1

Construction numbers: How to build a graph?

Counting the number of linear extensions of a partial order was considered by Stanley about 50 years ago. For the partial order on the vertices and edges of a graph given by inclusion, we call a linear extension a {\it construction sequence} for the graph as each edge follows the vertices where it is attached. The number of these c-sequences is counted for various graph families. We also consider the set of all length-$n$ c-sequences produced by the graphs with $n$ elements, simplified to their structural skeleton: vertex or edge, and further allow the generating graph to be structurally constrained. Efficiency is analyzed.

Updated: 2024-06-19 22:23:31

标题: 构建数字：如何建立一个图？

摘要: 大约50年前，斯坦利考虑了计算偏序的线性扩展数量的问题。对于由包含关系给定的图的顶点和边的偏序，我们将线性扩展称为图的{\it 构造序列}，因为每条边都遵循它连接的顶点。对于各种图家族，我们计算了这些构造序列的数量。我们还考虑了由具有$n$个元素的图产生的所有长度为$n$的构造序列集合，简化为它们的结构骨架：顶点或边，并进一步允许生成图具有结构约束。我们对效率进行了分析。

更新时间: 2024-06-19 22:23:31

领域: math.CO,cs.AI,05C30, 05A05, 05A15, 06A05, 90B35

下载: http://arxiv.org/abs/2302.13186v4

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a single framework. In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over single and multiple input images using a unified framework. VistaLLM utilizes an instruction-guided image tokenizer that filters global embeddings using task descriptions to extract compressed and refined features from numerous images. Moreover, VistaLLM employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences, significantly improving over previously used uniform sampling. To bolster the desired capability of VistaLLM, we curate CoinIt, a comprehensive coarse-to-fine instruction tuning dataset with 6.8M samples. We also address the lack of multi-image grounding datasets by introducing a novel task, AttCoSeg (Attribute-level Co-Segmentation), which boosts the model's reasoning and grounding capability over multiple input images. Extensive experiments on a wide range of V- and VL tasks demonstrate the effectiveness of VistaLLM by achieving consistent state-of-the-art performance over strong baselines across all downstream tasks. Our project page can be found at https://shramanpramanick.github.io/VistaLLM/.

Updated: 2024-06-19 22:20:40

标题: 万事通，多才多艺：设计通用的粗到细视觉-语言模型

摘要: 大型语言模型(LLMs)处理视觉输入的能力已经催生了通用视觉系统，通过指导调整统一各种视觉-语言(VL)任务。然而，由于视觉领域输入输出格式的巨大多样性，现有的通用模型无法成功地将细分和多图像输入与粗略级任务集成到单一框架中。在这项工作中，我们介绍了VistaLLM，一个强大的视觉系统，利用统一框架处理单个和多个输入图像上的粗粒度和细粒度 VL 任务。VistaLLM利用指导图像标记器，通过任务描述过滤全局嵌入，从众多图像中提取压缩和精细特征。此外，VistaLLM采用梯度感知自适应采样技术，将二进制分割掩模表示为序列，明显改善了先前使用的均匀采样。为了加强VistaLLM的期望能力，我们策划了CoinIt，一个包含680万样本的综合粗调至细调指令数据集。我们还通过引入一项新颖任务AttCoSeg (属性级联合分割)来解决多图像基准数据集的缺乏，这提升了模型在多个输入图像上的推理和基准能力。对各种V-和VL任务的广泛实验表明，VistaLLM通过在所有下游任务上实现始终领先的性能，有效性得到验证。我们的项目页面可以在https://shramanpramanick.github.io/VistaLLM/找到。

更新时间: 2024-06-19 22:20:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.12423v2

Global Human-guided Counterfactual Explanations for Molecular Properties via Reinforcement Learning

Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations is hard in real-world datasets due to a lack of human-annotated ground truth, which limits their use in areas like molecular sciences. Additionally, the increasing scale of these datasets provides a challenge for random search-based methods. In this paper, we develop a novel global explanation model RLHEX for molecular property prediction. It aligns the counterfactual explanations with human-defined principles, making the explanations more interpretable and easy for experts to evaluate. RLHEX includes a VAE-based graph generator to generate global explanations and an adapter to adjust the latent representation space to human-defined principles. Optimized by Proximal Policy Optimization (PPO), the global explanations produced by RLHEX cover 4.12% more input graphs and reduce the distance between the counterfactual explanation set and the input set by 0.47% on average across three molecular datasets. RLHEX provides a flexible framework to incorporate different human-designed principles into the counterfactual explanation generation process, aligning these explanations with domain expertise. The code and data are released at https://github.com/dqwang122/RLHEX.

Updated: 2024-06-19 22:16:40

标题: 全球人类引导的强化学习对分子性质进行反事实解释

摘要: 图神经网络（GNNs）的反事实解释提供了一种强大的方式来理解自然可以用图结构表示的数据。此外，在许多领域中，推导数据驱动的全局解释或规则是非常可取的，可以更好地解释所讨论模型和数据的高级属性。然而，在真实世界的数据集中评估全局反事实解释是困难的，因为缺乏人工标注的基准数据，这限制了它们在分子科学等领域的应用。此外，这些数据集的增加规模为基于随机搜索的方法提供了挑战。在本文中，我们开发了一种新颖的全局解释模型RLHEX用于分子性质预测。它将反事实解释与人类定义的原则对齐，使解释更具可解释性，易于专家评估。RLHEX包括一个基于VAE的图生成器，用于生成全局解释，以及一个适配器，用于调整潜在表示空间以符合人类定义的原则。通过Proximal Policy Optimization（PPO）进行优化，RLHEX生成的全局解释涵盖了输入图的4.12％，并且在三个分子数据集上，反事实解释集与输入集之间的距离平均减少了0.47％。RLHEX提供了一个灵活的框架，可以将不同的人类设计原则纳入反事实解释生成过程中，将这些解释与领域专业知识对齐。代码和数据发布在https://github.com/dqwang122/RLHEX。

更新时间: 2024-06-19 22:16:40

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2406.13869v1

DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators

Global placement is a fundamental step in VLSI physical design. The wide use of 2D processing element (PE) arrays in machine learning accelerators poses new challenges of scalability and Quality of Results (QoR) for state-of-the-art academic global placers. In this work, we develop DG-RePlAce, a new and fast GPU-accelerated global placement framework built on top of the OpenROAD infrastructure, which exploits the inherent dataflow and datapath structures of machine learning accelerators. Experimental results with a variety of machine learning accelerators using a commercial 12nm enablement show that, compared with RePlAce (DREAMPlace), our approach achieves an average reduction in routed wirelength by 10% (7%) and total negative slack (TNS) by 31% (34%), with faster global placement and on-par total runtimes relative to DREAMPlace. Empirical studies on the TILOS MacroPlacement Benchmarks further demonstrate that post-route improvements over RePlAce and DREAMPlace may reach beyond the motivating application to machine learning accelerators.

Updated: 2024-06-19 22:13:46

标题: DG-RePlAce：一个基于数据流的GPU加速的用于机器学习加速器的分析全局布局框架

摘要: 全球布局是VLSI物理设计中的基本步骤。在机器学习加速器中广泛使用2D处理元素（PE）阵列带来了现有学术全局布局器在可扩展性和结果质量（QoR）方面的新挑战。在这项工作中，我们开发了DG-RePlAce，这是一个基于OpenROAD基础设施构建的新的快速GPU加速全局布局框架，利用了机器学习加速器的固有数据流和数据路径结构。使用商业12纳米支持的各种机器学习加速器的实验结果显示，与RePlAce（DREAMPlace）相比，我们的方法在路由线长度平均减少了10％（7％），总负时序（TNS）减少了31％（34％），并且全局布局速度更快，总运行时间与DREAMPlace相当。TILOS MacroPlacement Benchmarks上的实证研究进一步表明，相对于RePlAce和DREAMPlace，后处理改进可能超出了对机器学习加速器的激励应用。

更新时间: 2024-06-19 22:13:46

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2404.13049v2

SDQ: Sparse Decomposed Quantization for LLM Inference

Recently, large language models (LLMs) have shown surprising performance in task-specific workloads as well as general tasks with the given prompts. However, to achieve unprecedented performance, recent LLMs use billions to trillions of parameters, which hinder the wide adaptation of those models due to their extremely large compute and memory requirements. To resolve the issue, various model compression methods are being actively investigated. In this work, we propose SDQ (Sparse Decomposed Quantization) to exploit both structured sparsity and quantization to achieve both high compute and memory efficiency. From our evaluations, we observe that SDQ can achieve 4x effective compute throughput with <1% quality drop.

Updated: 2024-06-19 22:12:51

标题: SDQ: 稀疏分解量化用于LLM推断

摘要: 最近，大型语言模型(LLMs)在特定任务负载以及给定提示的一般任务中展现出令人惊讶的性能。然而，为了实现前所未有的性能，最近的LLMs使用了数十亿到数万亿个参数，这阻碍了这些模型的广泛应用，因为它们对计算和内存的要求非常大。为了解决这个问题，各种模型压缩方法正在积极研究中。在这项工作中，我们提出了SDQ（稀疏分解量化）来利用结构化稀疏性和量化来实现高计算和内存效率。从我们的评估中，我们发现SDQ可以实现4倍的有效计算吞吐量，同时质量下降不到1%。

更新时间: 2024-06-19 22:12:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13868v1

Extreme Solar Flare Prediction Using Residual Networks with HMI Magnetograms and Intensitygrams

Solar flares, especially C, M, and X class, pose significant risks to satellite operations, communication systems, and power grids. We present a novel approach for predicting extreme solar flares using HMI intensitygrams and magnetograms. By detecting sunspots from intensitygrams and extracting magnetic field patches from magnetograms, we train a Residual Network (ResNet) to classify extreme class flares. Our model demonstrates high accuracy, offering a robust tool for predicting extreme solar flares and improving space weather forecasting. Additionally, we show that HMI magnetograms provide more useful data for deep learning compared to other SDO AIA images by better capturing features critical for predicting flare magnitudes. This study underscores the importance of identifying magnetic fields in solar flare prediction, marking a significant advancement in solar activity prediction with practical implications for mitigating space weather impacts.

Updated: 2024-06-19 22:11:28

标题: 使用HMI磁图和强度图的残差网络进行极端太阳耀斑预测

摘要: 太阳耀斑，尤其是C级、M级和X级耀斑，对卫星操作、通信系统和电力网络构成重大风险。我们提出了一种新颖的方法，利用HMI强度图和磁图来预测极端太阳耀斑。通过从强度图中检测太阳黑子并从磁图中提取磁场斑块，我们训练了一个残差网络（ResNet）来分类极端级别的耀斑。我们的模型表现出高准确性，为预测极端太阳耀斑提供了一个强大的工具，并改善了空间天气预报。此外，我们展示了HMI磁图相比其他SDO AIA图像提供更有用的数据，更好地捕捉了预测耀斑强度所必要的特征。这项研究强调了在太阳耀斑预测中识别磁场的重要性，标志着太阳活动预测在实际中的重大进展，具有减轻空间天气影响的实际意义。

更新时间: 2024-06-19 22:11:28

领域: astro-ph.SR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.14750v2

Predictive Modeling of Coronal Hole Areas Using Long Short-Term Memory Networks

In the era of space exploration, the implications of space weather have become increasingly evident. Central to this is the phenomenon of coronal holes, which can significantly influence the functioning of satellites and aircraft. These coronal holes, present on the sun, are distinguished by their open magnetic field lines and comparatively cooler temperatures, leading to the emission of solar winds at heightened rates. To anticipate the effects of these coronal holes on Earth, our study harnesses computer vision to pinpoint the coronal hole regions and estimate their dimensions using imagery from the Solar Dynamics Observatory (SDO). Further, we deploy deep learning methodologies, specifically the Long Short-Term Memory (LSTM) approach, to analyze the trends in the data related to the area of the coronal holes and predict their dimensions across various solar regions over a span of seven days. By evaluating the time series data concerning the area of the coronal holes, our research seeks to uncover patterns in the behavior of coronal holes and comprehend their potential influence on space weather occurrences. This investigation marks a pivotal stride towards bolstering our capacity to anticipate and brace for space weather events that could have ramifications for Earth and its technological apparatuses.

Updated: 2024-06-19 22:08:10

标题: 使用长短期记忆网络进行日冕空洞面积的预测建模

摘要: 在太空探索时代，空间天气的影响日益显现。其中一个关键问题是日冕空洞现象，它可以显著影响卫星和飞机的运行。这些存在于太阳上的日冕空洞以其开放的磁场线和相对较低的温度而著称，导致太阳风以增高的速率释放。为了预测这些日冕空洞对地球的影响，我们利用计算机视觉来确定日冕空洞区域，并利用来自太阳动力学观测卫星（SDO）的图像估算它们的尺寸。此外，我们采用深度学习方法，特别是长短期记忆（LSTM）方法，分析与日冕空洞面积相关的数据趋势，并预测在七天内跨越各种太阳区域的尺寸。通过评估关于日冕空洞面积的时间序列数据，我们的研究旨在揭示日冕空洞行为的模式，并理解它们对空间天气事件的潜在影响。这项调查是向增强我们预测和应对可能对地球及其技术装置产生影响的空间天气事件的能力迈出的重要一步。

更新时间: 2024-06-19 22:08:10

领域: astro-ph.SR,astro-ph.EP,cs.LG,physics.space-ph

下载: http://arxiv.org/abs/2301.06732v7

Evaluating representation learning on the protein structure universe

We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.

Updated: 2024-06-19 21:48:34

标题: 评估蛋白质结构宇宙上的表示学习

摘要: 我们引入了ProteinWorkshop，这是一个完整的基准套件，用于利用几何图神经网络对蛋白质结构进行表示学习。我们考虑在实验和预测结构上进行大规模预训练和下游任务，以便系统评估学习到的结构表示的质量以及它们在捕捉下游任务的功能关系方面的有用性。我们发现：（1）在AlphaFold结构和辅助任务上进行大规模预训练一致地提高了旋转不变和等变GNN的性能，（2）更具表现力的等变GNN受益于预训练的程度比不变模型更多。我们旨在为机器学习和计算生物学社区建立一个共同基础，以严格比较和推进蛋白质结构表示学习。我们的开源代码库降低了处理大型蛋白质结构数据集的门槛，提供了：（1）用于大规模结构数据库（包括AlphaFoldDB和ESM Atlas）的存储高效的数据加载器，以及（2）用于从整个PDB构建新任务的工具。ProteinWorkshop可在github.com/a-r-j/ProteinWorkshop上找到。

更新时间: 2024-06-19 21:48:34

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2406.13864v1

Knowledge Graph-Enhanced Large Language Models via Path Selection

Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. However, they are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. In recent years, incorporating external knowledge extracted from Knowledge Graphs (KGs) has become a promising strategy to improve the factual accuracy of LLM-generated outputs. Nevertheless, most existing explorations rely on LLMs themselves to perform KG knowledge extraction, which is highly inflexible as LLMs can only provide binary judgment on whether a certain knowledge (e.g., a knowledge path in KG) should be used. In addition, LLMs tend to pick only knowledge with direct semantic relationship with the input text, while potentially useful knowledge with indirect semantics can be ignored. In this work, we propose a principled framework KELP with three stages to handle the above problems. Specifically, KELP is able to achieve finer granularity of flexible knowledge extraction by generating scores for knowledge paths with input texts via latent semantic matching. Meanwhile, knowledge paths with indirect semantic relationships with the input text can also be considered via trained encoding between the selected paths in KG and the input text. Experiments on real-world datasets validate the effectiveness of KELP.

Updated: 2024-06-19 21:45:20

标题: 知识图谱增强的大型语言模型通过路径选择

摘要: 大型语言模型（LLMs）在各种实际应用中展现出前所未有的性能。然而，众所周知，它们会生成事实不准确的输出，即所谓的幻觉问题。近年来，将从知识图谱（KGs）中提取的外部知识纳入已成为提高LLM生成输出的事实准确性的一种有前途的策略。然而，大多数现有的探索依赖于LLMs自身来执行KG知识提取，这在很大程度上是不灵活的，因为LLMs只能对某种知识（例如KG中的知识路径）是否应该被使用做出二元判断。此外，LLMs倾向于选择与输入文本直接语义关系的知识，而可能有用的具有间接语义的知识可能会被忽略。在本文中，我们提出了一个有原则的框架KELP，包括三个阶段来处理上述问题。具体而言，KELP能够通过通过潜在语义匹配为知识路径与输入文本生成分数来实现更细粒度的灵活知识提取。同时，通过在KG中选择的路径与输入文本之间的训练编码，也可以考虑具有间接语义关系的知识路径。对真实数据集的实验验证了KELP的有效性。

更新时间: 2024-06-19 21:45:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13862v1

Bridging the Gap in Drug Safety Data Analysis: Large Language Models for SQL Query Generation

Pharmacovigilance (PV) is essential for drug safety, primarily focusing on adverse event monitoring. Traditionally, accessing safety data required database expertise, limiting broader use. This paper introduces a novel application of Large Language Models (LLMs) to democratize database access for non-technical users. Utilizing OpenAI's GPT-4, we developed a chatbot that generates structured query language (SQL) queries from natural language, bridging the gap between domain knowledge and technical requirements. The proposed application aims for more inclusive and efficient data access, enhancing decision making in drug safety. By providing LLMs with plain language summaries of expert knowledge, our approach significantly improves query accuracy over methods relying solely on database schemas. The application of LLMs in this context not only optimizes PV data analysis, ensuring timely and precise drug safety reporting -- a crucial component in adverse drug reaction monitoring -- but also promotes safer pharmacological practices and informed decision making across various data intensive fields.

Updated: 2024-06-19 21:41:11

标题: 填补药物安全数据分析中的空白：用于SQL查询生成的大型语言模型

摘要: 药物监测（PV）对药物安全至关重要，主要关注不良事件监测。传统上，获取安全数据需要数据库专业知识，限制了更广泛的使用。本文介绍了一种新颖的大型语言模型（LLMs）应用，以使非技术用户能够访问数据库。利用OpenAI的GPT-4，我们开发了一个聊天机器人，可以从自然语言生成结构化查询语言（SQL）查询，弥合了领域知识和技术要求之间的差距。所提出的应用旨在实现更加包容和高效的数据访问，增强药物安全中的决策制定。通过向LLMs提供专家知识的纯文本摘要，我们的方法显著改进了仅依赖数据库模式的方法的查询准确性。在这种背景下应用LLMs不仅优化了PV数据分析，确保及时和精确的药物安全报告 - 这是不良药物反应监测的关键组成部分 - 而且促进了各种数据密集型领域的更安全的药理实践和知情决策制定。

更新时间: 2024-06-19 21:41:11

领域: cs.AI,cs.DB,H.3.3; I.2.7

下载: http://arxiv.org/abs/2406.10690v2

TroL: Traversal of Layers for Large Language and Vision Models

Large language and vision models (LLVMs) have been driven by the generalization power of large language models (LLMs) and the advent of visual instruction tuning. Along with scaling them up directly, these models enable LLVMs to showcase powerful vision language (VL) performances by covering diverse tasks via natural language instructions. However, existing open-source LLVMs that perform comparably to closed-source LLVMs such as GPT-4V are often considered too large (e.g., 26B, 34B, and 110B parameters), having a larger number of layers. These large models demand costly, high-end resources for both training and inference. To address this issue, we present a new efficient LLVM family with 1.8B, 3.8B, and 7B LLM model sizes, Traversal of Layers (TroL), which enables the reuse of layers in a token-wise manner. This layer traversing technique simulates the effect of looking back and retracing the answering stream while increasing the number of forward propagation layers without physically adding more layers. We demonstrate that TroL employs a simple layer traversing approach yet efficiently outperforms the open-source LLVMs with larger model sizes and rivals the performances of the closed-source LLVMs with substantial sizes.

Updated: 2024-06-19 21:40:03

标题: TroL：大型语言和视觉模型的层遍历

摘要: 大型语言和视觉模型（LLVMs）受到大型语言模型（LLMs）的泛化能力和视觉指导调整的推动。除了直接扩展它们之外，这些模型使LLVMs能够通过自然语言指令涵盖各种任务，展示强大的视觉语言（VL）性能。然而，现有的开源LLVMs，如与GPT-4V等封闭源LLVMs相媲美的LLVMs，通常被认为太大（例如26B、34B和110B参数），具有更多的层。这些大型模型要求昂贵的高端资源进行训练和推理。为了解决这个问题，我们提出了一种新的高效的LLVM家族，拥有1.8B、3.8B和7B的LLM模型大小，即层遍历（TroL），它可以以令牌方式重复使用层。这种层遍历技术模拟了回顾和重追答案流的效果，同时增加了前向传播层的数量，而无需物理添加更多的层。我们证明TroL采用了一种简单的层遍历方法，但却高效地超越了具有更大模型大小的开源LLVMs，并与具有实质大小的封闭源LLVMs的性能相匹敌。

更新时间: 2024-06-19 21:40:03

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.12246v2

Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data

Data sets with imbalanced class sizes, often where one class size is much smaller than that of others, occur extremely often in various applications, including those with biological foundations, such as drug discovery and disease diagnosis. Thus, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to detect can result in heavy costs. However, many data classification algorithms do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this paper, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) techniques and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification problems on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed method not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer model based on an attention mechanism for self-supervised learning. Additionally, the method implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed model is validated using six molecular data sets, and we also provide a thorough comparison to other competing algorithms. The computational experiments show that the proposed method performs better than competing techniques even when the class imbalance ratio is very high.

Updated: 2024-06-19 21:34:07

标题: 基于图形的双向变压器决策阈值调整算法用于类不平衡的分子数据

摘要: 具有不平衡类别大小的数据集经常出现在各种应用中，其中一个类别的大小远远小于其他类别，包括具有生物基础的应用，如药物发现和疾病诊断。因此，能够识别各种大小类别的数据元素非常重要，因为未能检测到可能导致巨大成本。然而，许多数据分类算法在不平衡数据集上表现不佳，因为它们经常无法检测到属于少数类别的元素。在本文中，我们提出了BTDT-MBO算法，将Merriman-Bence-Osher（MBO）技术和双向变压器结合起来，以及距离相关性和决策阈值调整，用于高度不平衡的分子数据集的数据分类问题，其中类别的大小差异很大。所提出的方法不仅整合了对MBO算法的分类阈值进行调整以帮助处理类别不平衡，还利用基于注意力机制的双向变压器模型进行自监督学习。此外，该方法还将距离相关性实现为相似性图框架上的权重函数，调整后的MBO算法在该框架上运行。提出的模型使用了六个分子数据集进行验证，并与其他竞争算法进行了彻底比较。计算实验表明，即使类别不平衡比例非常高，所提出的方法仍优于竞争技术。

更新时间: 2024-06-19 21:34:07

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2406.06479v2

Advancing Blockchain Scalability: An Introduction to Layer 1 and Layer 2 Solutions

Bitcoin rise has put blockchain technology into the mainstream, amplifying its potential and broad utility. While Bitcoin has become incredibly famous, its transaction rate has not match such a corresponding increase. It still takes approximately 10 minutes to mine a block and add it to the chain. This limitation highlights the importance of seeking scale-up solutions that solve the low throughput transaction rates. Blockchain's consensus mechanisms make peer-to-peer transactions becomes feasible and effectively eliminate the need for centralized control. However, the decentralized systems also causes a lower speed and throughput compared to centralized networks as we mentioned Bitcoin's block creation rates. Two mainstreams scale-up solutions, Layer 1 scale-up and Layer 2 scale-up have been implemented to address these issues. Layer 1 level scalability enhancements happen at where traditional blockchain operates. This paper provides a deep examination of the components of the Layer 1 protocol and the scale-up methods that directly improve the lower level blockchain. We also address that Layer 1 solutions encounter inherent limitations although improvements were applied due to layer 1 storage costs and latency are high. In addition, we discuss layer 2 protocols, advanced scalability techniques, that elevate blockchain performance by handling transactions off the mainnet. Our findings indicate that Layer 2 protocols, with their various implementations such as rollups and channels, significantly outperform Layer 1 solutions in terms of transaction throughput and efficiency. This paper discusses these Layer 2 scaling methods in detail, aiming to provide readers with a comprehensive understanding of these protocols and the underlying logic that drives their effectiveness.

Updated: 2024-06-19 21:30:16

标题: 推进区块链可扩展性：介绍第一层和第二层解决方案

摘要: 比特币的崛起已经将区块链技术推向主流，扩大了其潜力和广泛的实用性。尽管比特币变得非常有名，但其交易速率并没有与相应的增长相匹配。挖掘一个区块并将其添加到链中仍然需要大约10分钟的时间。这种限制突显了寻求解决低吞吐量交易速率的规模化解决方案的重要性。区块链的共识机制使点对点交易变得可行，并有效地消除了对中心化控制的需求。然而，分散式系统也导致了较低的速度和吞吐量，与中心化网络相比，正如我们提到的比特币区块创建速率。为解决这些问题，已经实施了两种主流的规模化解决方案，即第1层规模化和第2层规模化。第1层级的可伸缩性增强发生在传统区块链运作的地方。本文深入研究了第1层协议的各个组成部分和直接改进底层区块链的规模化方法。我们还指出，尽管对第1层的改进已经应用，但第1层解决方案也存在固有限制，因为第1层的存储成本和延迟都很高。此外，我们讨论了第2层协议、先进的可伸缩性技术，通过处理主网之外的交易来提升区块链性能。我们的研究结果表明，第2层协议，如滚动和通道等各种实施方式，在交易吞吐量和效率方面明显优于第1层解决方案。本文详细讨论了这些第2层规模化方法，旨在为读者提供对这些协议及其有效性驱动逻辑的全面理解。

更新时间: 2024-06-19 21:30:16

领域: cs.CR

下载: http://arxiv.org/abs/2406.13855v1

Optimizing Quantile-based Trading Strategies in Electricity Arbitrage

Efficiently integrating renewable resources into electricity markets is vital for addressing the challenges of matching real-time supply and demand while reducing the significant energy wastage resulting from curtailments. To address this challenge effectively, the incorporation of storage devices can enhance the reliability and efficiency of the grid, improving market liquidity and reducing price volatility. In short-term electricity markets, participants navigate numerous options, each presenting unique challenges and opportunities, underscoring the critical role of the trading strategy in maximizing profits. This study delves into the optimization of day-ahead and balancing market trading, leveraging quantile-based forecasts. Employing three trading approaches with practical constraints, our research enhances forecast assessment, increases trading frequency, and employs flexible timestamp orders. Our findings underscore the profit potential of simultaneous participation in both day-ahead and balancing markets, especially with larger battery storage systems; despite increased costs and narrower profit margins associated with higher-volume trading, the implementation of high-frequency strategies plays a significant role in maximizing profits and addressing market challenges. Finally, we modelled four commercial battery storage systems and evaluated their economic viability through a scenario analysis, with larger batteries showing a shorter return on investment.

Updated: 2024-06-19 21:27:12

标题: 在电力套利中优化基于分位数的交易策略

摘要: 将可再生资源高效地整合到电力市场中对于解决实时供需匹配的挑战以及减少因削减而产生的显著能源浪费至关重要。为了有效应对这一挑战，引入储能设备可以提高电网的可靠性和效率，改善市场流动性并降低价格波动。在短期电力市场中，参与者需要在众多选项中选择，每个选项都提出了独特的挑战和机遇，强调了交易策略在最大化利润中的关键作用。本研究深入研究了日前市场和平衡市场交易的优化，利用基于分位数的预测。通过采用三种带有实际约束条件的交易方法，我们的研究增强了预测评估，增加了交易频率，并采用了灵活的时间戳订单。我们的研究结果强调了同时参与日前市场和平衡市场的盈利潜力，特别是配备更大电池储能系统时；尽管高交易量会导致成本增加和利润率变窄，但高频策略的实施在最大化利润和应对市场挑战中起着重要作用。最后，我们对四种商业电池储能系统进行建模，并通过情景分析评估了它们的经济可行性，结果显示较大容量的电池具有较短的投资回报期。

更新时间: 2024-06-19 21:27:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13851v1

Text Serialization and Their Relationship with the Conventional Paradigms of Tabular Machine Learning

Recent research has explored how Language Models (LMs) can be used for feature representation and prediction in tabular machine learning tasks. This involves employing text serialization and supervised fine-tuning (SFT) techniques. Despite the simplicity of these techniques, significant gaps remain in our understanding of the applicability and reliability of LMs in this context. Our study assesses how emerging LM technologies compare with traditional paradigms in tabular machine learning and evaluates the feasibility of adopting similar approaches with these advanced technologies. At the data level, we investigate various methods of data representation and curation of serialized tabular data, exploring their impact on prediction performance. At the classification level, we examine whether text serialization combined with LMs enhances performance on tabular datasets (e.g. class imbalance, distribution shift, biases, and high dimensionality), and assess whether this method represents a state-of-the-art (SOTA) approach for addressing tabular machine learning challenges. Our findings reveal current pre-trained models should not replace conventional approaches.

Updated: 2024-06-19 21:19:37

标题: 文本序列化及其与传统表格机器学习范式的关系

摘要: 最近的研究探讨了如何在表格机器学习任务中使用语言模型（LMs）进行特征表示和预测。这涉及到使用文本序列化和监督微调（SFT）技术。尽管这些技术的简单性，我们对LMs在这一领域的适用性和可靠性仍存在重大差距。我们的研究评估了新兴的LM技术与传统范式在表格机器学习中的比较，并评估了采用类似方法与这些先进技术的可行性。在数据层面上，我们调查了各种数据表示方法和序列化表格数据的策划方法，探讨它们对预测性能的影响。在分类层面上，我们研究了文本序列化与LMs结合是否提高了在表格数据集上的性能（例如类别不平衡、分布变化、偏见和高维度），并评估了这种方法是否代表了解决表格机器学习挑战的最新技术方法。我们的研究结果显示，目前的预训练模型不应取代传统方法。

更新时间: 2024-06-19 21:19:37

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13846v1

MAMA-MIA: A Large-Scale Multi-Center Breast Cancer DCE-MRI Benchmark Dataset with Expert Segmentations

Current research in breast cancer Magnetic Resonance Imaging (MRI), especially with Artificial Intelligence (AI), faces challenges due to the lack of expert segmentations. To address this, we introduce the MAMA-MIA dataset, comprising 1506 multi-center dynamic contrast-enhanced MRI cases with expert segmentations of primary tumors and non-mass enhancement areas. These cases were sourced from four publicly available collections in The Cancer Imaging Archive (TCIA). Initially, we trained a deep learning model to automatically segment the cases, generating preliminary segmentations that significantly reduced expert segmentation time. Sixteen experts, averaging 9 years of experience in breast cancer, then corrected these segmentations, resulting in the final expert segmentations. Additionally, two radiologists conducted a visual inspection of the automatic segmentations to support future quality control studies. Alongside the expert segmentations, we provide 49 harmonized demographic and clinical variables and the pretrained weights of the well-known nnUNet architecture trained using the DCE-MRI full-images and expert segmentations. This dataset aims to accelerate the development and benchmarking of deep learning models and foster innovation in breast cancer diagnostics and treatment planning.

Updated: 2024-06-19 21:11:46

标题: MAMA-MIA: 一个包含专家分割的大规模多中心乳腺癌DCE-MRI基准数据集

摘要: 目前关于乳腺癌磁共振成像（MRI）的研究，尤其是与人工智能（AI）相关的研究，面临着由于缺乏专家分割而带来的挑战。为了解决这个问题，我们引入了MAMA-MIA数据集，包括1506个多中心动态增强MRI病例，其中包括原发肿瘤和非肿块增强区域的专家分割。这些病例来自癌症影像存档（TCIA）中的四个公开收集。最初，我们训练了一个深度学习模型来自动分割这些病例，生成了初步的分割结果，显著减少了专家分割的时间。随后，十六名平均有9年乳腺癌经验的专家纠正了这些分割结果，最终得到了专家分割。此外，两名放射科医生对自动分割结果进行了视觉检查，以支持未来的质量控制研究。除了专家分割结果，我们还提供了49个经过协调的人口统计和临床变量，以及使用DCE-MRI全图像和专家分割训练的知名nnUNet架构的预训练权重。这个数据集旨在加速深度学习模型的发展和基准测试，并促进乳腺癌诊断和治疗规划方面的创新。

更新时间: 2024-06-19 21:11:46

领域: cs.CV,cs.AI,cs.DB

下载: http://arxiv.org/abs/2406.13844v1

Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies.

Updated: 2024-06-19 21:11:31

标题: 逻辑约束下部分可观察和多智能体马尔可夫决策过程的最优控制

摘要: 自主系统通常会出现逻辑约束，例如来自安全、运营或监管要求。这些约束可以用时间逻辑规范来表达。系统状态通常是部分可观测的。此外，它可能包括一个由多个代理人组成的团队，这些代理人具有共同的目标，但信息结构和约束不同。在本文中，我们首先介绍了一种用于部分可观测马尔可夫决策过程（POMDPs）的有限线性时间逻辑约束的最优控制理论。我们提供了一种结构化方法来合成最大化累积奖励的策略，同时确保满足时间逻辑约束的概率足够高。我们的方法具有关于近似奖励最优性和约束满足的保证。然后，我们在此基础上构建了一个用于逻辑受限多代理设置的最优控制框架，其信息不对称。我们通过在几个案例研究中实施来展示我们方法的有效性。

更新时间: 2024-06-19 21:11:31

领域: cs.AI,cs.FL,cs.SY,eess.SY

下载: http://arxiv.org/abs/2305.14736v3

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild.

Updated: 2024-06-19 21:11:17

标题: 生成式人工智能滥用：战术分类及来自真实数据的见解

摘要: 生成式、多模态人工智能（GenAI）在各行业具有转型潜力，但其滥用也带来了重大风险。先前的研究已经揭示了高级人工智能系统被利用进行恶意目的的潜力。然而，我们仍然缺乏对GenAI模型在实践中被具体利用或滥用的确切理解，包括用于造成伤害的策略。在本文中，我们提出了一种GenAI滥用策略的分类法，该分类法根据现有学术文献和对2023年1月至2024年3月之间报道的大约200起滥用事件的定性分析。通过这一分析，我们阐明了在这一时期滥用中的关键和新颖模式，包括潜在的动机、策略，以及攻击者如何在实际情况中利用和滥用跨模态（例如图像、文本、音频、视频）系统能力。

更新时间: 2024-06-19 21:11:17

领域: cs.AI

下载: http://arxiv.org/abs/2406.13843v1

Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models

Diffusion models have enabled remarkably high-quality medical image generation, yet it is challenging to enforce anatomical constraints in generated images. To this end, we propose a diffusion model-based method that supports anatomically-controllable medical image generation, by following a multi-class anatomical segmentation mask at each sampling step. We additionally introduce a random mask ablation training algorithm to enable conditioning on a selected combination of anatomical constraints while allowing flexibility in other anatomical areas. We compare our method ("SegGuidedDiff") to existing methods on breast MRI and abdominal/neck-to-pelvis CT datasets with a wide range of anatomical objects. Results show that our method reaches a new state-of-the-art in the faithfulness of generated images to input anatomical masks on both datasets, and is on par for general anatomical realism. Finally, our model also enjoys the extra benefit of being able to adjust the anatomical similarity of generated images to real images of choice through interpolation in its latent space. SegGuidedDiff has many applications, including cross-modality translation, and the generation of paired or counterfactual data. Our code is available at https://github.com/mazurowski-lab/segmentation-guided-diffusion.

Updated: 2024-06-19 21:10:31

标题: 具有解剖学可控性的医学图像生成：受分割引导扩散模型的启发

摘要: 扩散模型已经实现了高质量的医学图像生成，但在生成图像中强制施加解剖约束是具有挑战性的。为此，我们提出了一种基于扩散模型的方法，通过在每个采样步骤中遵循多类解剖分割掩模，支持解剖可控的医学图像生成。我们还引入了一种随机掩模消蚀训练算法，以实现在选定的解剖约束组合上进行条件控制，同时允许其他解剖区域的灵活性。我们将我们的方法（"SegGuidedDiff"）与现有方法在乳腺MRI和腹部/颈部至盆腔CT数据集上进行了比较，这些数据集涵盖了广泛的解剖对象。结果显示，我们的方法在两个数据集上生成的图像与输入解剖掩模的忠实度达到了新的技术水平，并在一般解剖真实性方面处于同等水平。最后，我们的模型还可以通过在潜在空间中进行插值来调整生成图像与选择的真实图像之间的解剖相似性。SegGuidedDiff有许多应用，包括跨模态转换和生成配对或对照数据。我们的代码可在https://github.com/mazurowski-lab/segmentation-guided-diffusion上找到。

更新时间: 2024-06-19 21:10:31

领域: eess.IV,cs.CV,cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.05210v4

StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation

Developers spend much time finding information that is relevant to their questions. Stack Overflow has been the leading resource, and with the advent of Large Language Models (LLMs), generative models such as ChatGPT are used frequently. However, there is a catch in using each one separately. Searching for answers is time-consuming and tedious, as shown by the many tools developed by researchers to address this issue. On the other, using LLMs is not reliable, as they might produce irrelevant or unreliable answers (i.e., hallucination). In this work, we present StackRAG, a retrieval-augmented Multiagent generation tool based on LLMs that combines the two worlds: aggregating the knowledge from SO to enhance the reliability of the generated answers. Initial evaluations show that the generated answers are correct, accurate, relevant, and useful.

Updated: 2024-06-19 21:07:35

标题: StackRAG代理：通过检索增强生成改进开发者答案

摘要: 开发人员花费大量时间查找与其问题相关的信息。Stack Overflow一直是主要资源，在大型语言模型（LLM）的出现之后，生成模型如ChatGPT被频繁使用。然而，单独使用每个模型都存在问题。寻找答案是耗时且乏味的，正如研究人员开发的许多工具所显示的那样。另一方面，使用LLM并不可靠，因为它们可能产生无关或不可靠的答案（即幻觉）。在这项工作中，我们提出了StackRAG，一种基于LLM的检索增强型多智能体生成工具，结合了这两个世界：聚合来自Stack Overflow的知识，以增强生成答案的可靠性。初步评估表明，生成的答案是正确的、准确的、相关的和有用的。

更新时间: 2024-06-19 21:07:35

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.13840v1

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design

Updated: 2024-06-19 21:06:44

标题: RNA-FrameFlow：用于全新3D RNA主干设计的流匹配

摘要: 我们介绍了RNA-FrameFlow，这是第一个用于3D RNA骨架设计的生成模型。我们基于SE(3)流匹配进行蛋白质骨架生成，并建立了数据准备和评估协议，以解决RNA建模带来的独特挑战。我们将RNA结构形式化为一组刚体框架和相关损失函数，这些函数考虑了更大、更构象柔韧的RNA骨架（每个核苷酸13个原子）与蛋白质（每个残基4个原子）的区别。为了解决3D RNA数据集缺乏多样性的问题，我们探索了结构聚类和裁剪增强的训练方法。此外，我们定义了一套评估指标，用于衡量生成的RNA结构是否在全局上自洽（通过逆向折叠后的正向折叠）并且在局部上恢复RNA特定的结构描述符。RNA-FrameFlow的最佳版本可以生成40-150个核苷酸的局部逼真的RNA骨架，其中超过40%的骨架通过我们的有效性标准，即自洽TM分数>=0.45，这意味着两个RNA具有相同的全局折叠形式。开源代码：https://github.com/rish-16/rna-backbone-design

更新时间: 2024-06-19 21:06:44

领域: q-bio.BM,cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2406.13839v1

Graph Kernel Neural Networks

The convolution operator at the core of many modern neural architectures can effectively be seen as performing a dot product between an input matrix and a filter. While this is readily applicable to data such as images, which can be represented as regular grids in the Euclidean space, extending the convolution operator to work on graphs proves more challenging, due to their irregular structure. In this paper, we propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain. This allows us to define an entirely structural model that does not require computing the embedding of the input graph. Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability in terms of the structural masks that are learned during the training process, similarly to what happens for convolutional masks in traditional convolutional neural networks. We perform an extensive ablation study to investigate the model hyper-parameters' impact and show that our model achieves competitive performance on standard graph classification and regression datasets.

Updated: 2024-06-19 21:03:53

标题: 图核神经网络

摘要: 许多现代神经网络架构的核心是卷积运算符，可以有效地看作是在输入矩阵和滤波器之间执行点积。虽然这在诸如图像这样可以在欧几里得空间中表示为规则网格的数据上很容易应用，但将卷积运算符扩展到图形上的工作则更具挑战性，因为它们具有不规则的结构。在本文中，我们提出使用图内核，即在图上计算内积的核函数，将标准卷积运算符扩展到图领域。这使我们能够定义一个完全结构化的模型，不需要计算输入图的嵌入。我们的架构允许插入任何类型的图内核，并在训练过程中学习到的结构掩模方面提供了一定的可解释性，类似于传统卷积神经网络中卷积掩模的情况。我们进行了大量的消融研究，以研究模型超参数的影响，并展示我们的模型在标准图分类和回归数据集上取得了竞争性能。

更新时间: 2024-06-19 21:03:53

领域: cs.LG

下载: http://arxiv.org/abs/2112.07436v2

Learning to Maximize Gains From Trade in Small Markets

We study the problem of designing a two-sided market (double auction) to maximize the gains from trade (social welfare) under the constraints of (dominant-strategy) incentive compatibility and budget-balance. Our goal is to do so for an unknown distribution from which we are given a polynomial number of samples. Our first result is a general impossibility for the case of correlated distributions of values even between just one seller and two buyers, in contrast to the case of one seller and one buyer (bilateral trade) where this is possible. Our second result is an efficient learning algorithm for one seller and two buyers in the case of independent distributions which is based on a novel algorithm for computing optimal mechanisms for finitely supported and explicitly given independent distributions. Both results rely heavily on characterizations of (dominant-strategy) incentive compatible mechanisms that are strongly budget-balanced.

Updated: 2024-06-19 21:02:22

标题: 学会在小市场中最大化贸易收益

摘要: 我们研究设计一个双边市场（双向拍卖）的问题，以在主导策略激励兼容和预算平衡的约束下最大化贸易收益（社会福利）。我们的目标是在一个未知分布中实现这一目标，我们已经获得了多项式数量的样本。我们的第一个结果是，即使只有一个卖家和两个买家之间的值相关分布的情况下，也是不可能的，与一个卖家和一个买家的情况相反（双边交易）在那种情况下是可能的。我们的第二个结果是一个高效的学习算法，用于一个卖家和两个买家的独立分布的情况，该算法基于一种用于计算有限支持和明确给定独立分布的最优机制的新颖算法。这两个结果都严重依赖于对（主导策略）激励兼容机制的特征描述，这些机制是强烈的预算平衡。

更新时间: 2024-06-19 21:02:22

领域: cs.GT,cs.AI,cs.LG,F.0; I.2; I.2.6; J.4

下载: http://arxiv.org/abs/2401.11596v2

DKEC: Domain Knowledge Enhanced Multi-Label Classification for Diagnosis Prediction

Multi-label text classification (MLTC) tasks in the medical domain often face the long-tail label distribution problem. Prior works have explored hierarchical label structures to find relevant information for few-shot classes, but mostly neglected to incorporate external knowledge from medical guidelines. This paper presents DKEC, Domain Knowledge Enhanced Classification for diagnosis prediction with two innovations: (1) automated construction of heterogeneous knowledge graphs from external sources to capture semantic relations among diverse medical entities, (2) incorporating the heterogeneous knowledge graphs in few-shot classification using a label-wise attention mechanism. We construct DKEC using three online medical knowledge sources and evaluate it on a real-world Emergency Medical Services (EMS) dataset and a public electronic health record (EHR) dataset. Results show that DKEC outperforms the state-of-the-art label-wise attention networks and transformer models of different sizes, particularly for the few-shot classes. More importantly, it helps the smaller language models achieve comparable performance to large language models.

Updated: 2024-06-19 20:58:52

标题: DKEC：领域知识增强的多标签分类用于诊断预测

摘要: 在医学领域的多标签文本分类（MLTC）任务经常面临长尾标签分布问题。之前的研究已经探索了层次化标签结构，以找到少样本类别的相关信息，但大多数忽略了将医学指南中的外部知识纳入其中。本文提出了DKEC，一种用于诊断预测的领域知识增强分类方法，具有两个创新点：（1）从外部来源自动构建异构知识图，以捕捉不同医学实体之间的语义关系，（2）利用标签注意机制在少样本分类中整合异构知识图。我们使用三个在线医学知识来源构建了DKEC，并在实际应用中的急救医疗服务（EMS）数据集和公共电子健康记录（EHR）数据集上进行评估。结果显示，DKEC在少样本类别上表现出色，胜过了最先进的标签注意网络和不同规模的变压器模型，尤其是对于少样本类别。更重要的是，它帮助较小的语言模型达到了与大型语言模型相当的性能水平。

更新时间: 2024-06-19 20:58:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.07059v2

Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as the neural feature ansatz (NFA). Through the NFA, the authors introduce mapping with the AGOP as a general mechanism for neural feature learning. However, these works do not provide a theoretical explanation for this correlation or its origins. In this work, we further clarify the nature of this correlation, and explain its emergence. We show that this correlation is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent features at each layer. We further establish that the alignment is driven by the interaction of weight changes induced by SGD with the pre-activation features, and analyze the resulting dynamics analytically at early times in terms of simple statistics of the inputs and labels. Finally, motivated by the observation that the NFA is driven by this centered correlation, we introduce a simple optimization rule that dramatically increases the NFA correlations at any given layer and improves the quality of features learned.

Updated: 2024-06-19 20:56:30

标题: 特征学习作为对齐：非线性神经网络中梯度下降的结构特性

摘要: 理解神经网络通过特征学习从输入标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。先前的研究表明，在训练过程中，权重的格拉姆矩阵（神经特征矩阵，NFM）和平均梯度外积（AGOP）之间存在相关性，这一说法被称为神经特征假设（NFA）。通过NFA，作者引入了将AGOP作为神经特征学习的一般机制。然而，这些研究并未为这种相关性或其起源提供理论解释。在这项工作中，我们进一步澄清了这种相关性的性质，并解释了它的出现。我们表明，这种相关性等同于权重矩阵的左奇异结构与每一层新定义的预激活切线特征之间的对齐。我们进一步确定，这种对齐是由SGD引起的权重变化与预激活特征的相互作用驱动的，并在早期时期以输入和标签的简单统计量来分析所产生的动态。最后，受到NFA受到这种中心化相关性驱动的观察的启发，我们引入了一个简单的优化规则，大幅增加了任何给定层的NFA相关性，并提高了学习到的特征的质量。

更新时间: 2024-06-19 20:56:30

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.05271v2

Optimizing Wireless Discontinuous Reception via MAC Signaling Learning

We present a Reinforcement Learning (RL) approach to the problem of controlling the Discontinuous Reception (DRX) policy from a Base Transceiver Station (BTS) in a cellular network. We do so by means of optimally timing the transmission of fast Layer-2 signaling messages (a.k.a. Medium Access Layer (MAC) Control Elements (CEs) as specified in 5G New Radio). Unlike more conventional approaches to DRX optimization, which rely on fine-tuning the values of DRX timers, we assess the gains that can be obtained solely by means of this MAC CE signalling. For the simulation part, we concentrate on traffic types typically encountered in Extended Reality (XR) applications, where the need for battery drain minimization and overheating mitigation are particularly pressing. Both 3GPP 5G New Radio (5G NR) compliant and non-compliant ("beyond 5G") MAC CEs are considered. Our simulation results show that our proposed technique strikes an improved trade-off between latency and energy savings as compared to conventional timer-based approaches that are characteristic of most current implementations. Specifically, our RL-based policy can nearly halve the active time for a single User Equipment (UE) with respect to a na\"ive MAC CE transmission policy, and still achieve near 20% active time reduction for 9 simultaneously served UEs.

Updated: 2024-06-19 20:55:12

标题: 通过MAC信令学习优化无线不连续接收

摘要: 我们提出了一种强化学习（RL）方法来解决在蜂窝网络中从基站（BTS）控制Discontinuous Reception（DRX）策略的问题。我们通过优化地时机传输快速的第二层信令消息（即5G新无线电规定的介质访问层（MAC）控制元素（CE））来实现这一目标。与更传统的DRX优化方法不同，这些方法依赖于微调DRX定时器的值，我们评估了仅通过这种MAC CE信令可以获得的收益。在模拟部分，我们集中研究了通常在扩展现实（XR）应用程序中遇到的流量类型，其中最需要最小化电池耗尽和降低过热的需求。我们考虑了符合3GPP 5G新无线电（5G NR）标准和不符合标准（“超越5G”）的MAC CE。我们的模拟结果表明，我们提出的技术相比于大多数当前实现中特有的基于定时器的传统方法，可以在延迟和节能之间获得改进的权衡。具体而言，我们基于RL的策略可以将单个用户设备（UE）的活动时间减少近一半，相对于一个天真的MAC CE传输策略，同时为9个同时服务的UE实现近20%的活动时间减少。

更新时间: 2024-06-19 20:55:12

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2406.13834v1

LLM-Enhanced Bayesian Optimization for Efficient Analog Layout Constraint Generation

Analog layout synthesis faces significant challenges due to its dependence on manual processes, considerable time requirements, and performance instability. Current Bayesian Optimization (BO)-based techniques for analog layout synthesis, despite their potential for automation, suffer from slow convergence and extensive data needs, limiting their practical application. This paper presents the \texttt{LLANA} framework, a novel approach that leverages Large Language Models (LLMs) to enhance BO by exploiting the few-shot learning abilities of LLMs for more efficient generation of analog design-dependent parameter constraints. Experimental results demonstrate that \texttt{LLANA} not only achieves performance comparable to state-of-the-art (SOTA) BO methods but also enables a more effective exploration of the analog circuit design space, thanks to LLM's superior contextual understanding and learning efficiency. The code is available at https://github.com/dekura/LLANA.

Updated: 2024-06-19 20:49:26

标题: LLM增强贝叶斯优化用于高效模拟布局约束生成

摘要: 模拟布局综合面临着重大挑战，因为它依赖于手动过程、相当长的时间需求和性能不稳定性。当前基于贝叶斯优化（BO）的模拟布局综合技术，尽管具有自动化的潜力，但收敛速度慢且需要大量数据，限制了它们的实际应用。本文介绍了LLANA框架，这是一种利用大型语言模型（LLM）增强BO的新方法，通过利用LLM的少样本学习能力更有效地生成模拟设计相关参数约束。实验结果显示，LLANA不仅实现了与最先进的BO方法相媲美的性能，而且还能更有效地探索模拟电路设计空间，这得益于LLM在上下文理解和学习效率方面的优越性。代码可在https://github.com/dekura/LLANA找到。

更新时间: 2024-06-19 20:49:26

领域: cs.AI,cs.AR,cs.LG

下载: http://arxiv.org/abs/2406.05250v2

The Challenges of Machine Learning for Trust and Safety: A Case Study on Misinformation Detection

We examine the disconnect between scholarship and practice in applying machine learning to trust and safety problems, using misinformation detection as a case study. We survey literature on automated detection of misinformation across a corpus of 248 well-cited papers in the field. We then examine subsets of papers for data and code availability, design missteps, reproducibility, and generalizability. Our paper corpus includes published work in security, natural language processing, and computational social science. Across these disparate disciplines, we identify common errors in dataset and method design. In general, detection tasks are often meaningfully distinct from the challenges that online services actually face. Datasets and model evaluation are often non-representative of real-world contexts, and evaluation frequently is not independent of model training. We demonstrate the limitations of current detection methods in a series of three representative replication studies. Based on the results of these analyses and our literature survey, we conclude that the current state-of-the-art in fully-automated misinformation detection has limited efficacy in detecting human-generated misinformation. We offer recommendations for evaluating applications of machine learning to trust and safety problems and recommend future directions for research.

Updated: 2024-06-19 20:33:53

标题: 机器学习在信任和安全方面的挑战：关于虚假信息检测的案例研究

摘要: 我们研究了学术研究和实践在将机器学习应用于信任和安全问题中的脱节，以误信息检测为案例研究。我们调查了自动检测误信息的文献，涵盖了248篇领域内被广泛引用的论文。然后，我们检查了数据和代码的可用性、设计错误、可重复性和泛化性等方面的子集论文。我们的论文库包括发表在安全、自然语言处理和计算社会科学领域的研究成果。在这些不同的学科领域中，我们发现了数据集和方法设计方面的共同错误。总的来说，检测任务通常与在线服务实际面临的挑战有着明显的区别。数据集和模型评估往往不代表真实世界的情境，评估经常不独立于模型训练。我们通过一系列三个代表性的复制研究展示了当前检测方法的局限性。基于这些分析结果和我们的文献调查，我们得出结论，目前的全自动误信息检测技术在检测人为生成的误信息方面效果有限。我们提出了评估将机器学习应用于信任和安全问题的建议，并推荐未来研究的方向。

更新时间: 2024-06-19 20:33:53

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2308.12215v3

Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

Fair decision making has largely been studied with respect to a single decision. Here we investigate the notion of fairness in the context of sequential decision making where multiple stakeholders can be affected by the outcomes of decisions. We observe that fairness often depends on the history of the sequential decision-making process, and in this sense that it is inherently non-Markovian. We further observe that fairness often needs to be assessed at time points within the process, not just at the end of the process. To advance our understanding of this class of fairness problems, we explore the notion of non-Markovian fairness in the context of sequential decision making. We identify properties of non-Markovian fairness, including notions of long-term, anytime, periodic, and bounded fairness. We explore the interplay between non-Markovian fairness and memory and how memory can support construction of fair policies. Finally, we introduce the FairQCM algorithm, which can automatically augment its training data to improve sample efficiency in the synthesis of fair policies via reinforcement learning.

Updated: 2024-06-19 20:29:19

标题: 记得要公平：序贯决策中的非马尔可夫公平

摘要: 公平决策在单一决策方面已被广泛研究。在这里，我们研究了在多个利益相关者可能受到决策结果影响的情况下公平的概念。我们观察到公平通常取决于顺序决策过程的历史，在这个意义上，它是固有的非马尔可夫性。我们进一步观察到公平通常需要在过程中的时间点进行评估，而不仅仅是在过程结束时。为了推进我们对这类公平问题的理解，我们在顺序决策中探讨了非马尔可夫公平的概念。我们确定了非马尔可夫公平的属性，包括长期、随时、周期性和有界公平的概念。我们探讨了非马尔可夫公平与记忆之间的相互作用，以及记忆如何支持构建公平政策。最后，我们介绍了FairQCM算法，该算法可以通过强化学习自动增加其训练数据，以提高在合成公平政策方面的样本效率。

更新时间: 2024-06-19 20:29:19

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2312.04772v4

Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?

In this work, we present empirical results regarding the feasibility of using offline large language models (LLMs) in the context of electronic design automation (EDA). The goal is to investigate and evaluate a contemporary language model's (Llama-2-7B) ability to function as a microelectronic Q & A expert as well as its reasoning, and generation capabilities in solving microelectronic-related problems. Llama-2-7B was tested across a variety of adaptation methods, including introducing a novel low-rank knowledge distillation (LoRA-KD) scheme. Our experiments produce both qualitative and quantitative results.

Updated: 2024-06-19 20:14:39

标题: 低秩知识蒸馏在LLMs中对微电子推理是否有用？

摘要: 在这项工作中，我们提出了关于在电子设计自动化（EDA）领域使用离线大型语言模型（LLMs）的可行性的实证结果。研究的目标是调查和评估当代语言模型（Llama-2-7B）在解决微电子相关问题时作为微电子问答专家的能力，以及其推理和生成能力。Llama-2-7B在各种适应方法中进行了测试，包括引入一种新颖的低秩知识蒸馏（LoRA-KD）方案。我们的实验产生了定性和定量结果。

更新时间: 2024-06-19 20:14:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.13808v1

AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding

AI personal assistants deployed via robots or wearables require embodied understanding to collaborate with humans effectively. However, current Vision-Language Models (VLMs) primarily focus on third-person view videos, neglecting the richness of egocentric perceptual experience. To address this gap, we propose three key contributions. First, we introduce the Egocentric Video Understanding Dataset (EVUD) for training VLMs on video captioning and question answering tasks specific to egocentric videos. Second, we present AlanaVLM, a 7B parameter VLM trained using parameter-efficient methods on EVUD. Finally, we evaluate AlanaVLM's capabilities on OpenEQA, a challenging benchmark for embodied video question answering. Our model achieves state-of-the-art performance, outperforming open-source models including strong Socratic models using GPT-4 as a planner by 3.6%. Additionally, we outperform Claude 3 and Gemini Pro Vision 1.0 and showcase competitive results compared to Gemini Pro 1.5 and GPT-4V, even surpassing the latter in spatial reasoning. This research paves the way for building efficient VLMs that can be deployed in robots or wearables, leveraging embodied video understanding to collaborate seamlessly with humans in everyday tasks, contributing to the next generation of Embodied AI

Updated: 2024-06-19 20:14:14

标题: AlanaVLM：一种用于主观视角视频理解的多模态具象人工智能基础模型

摘要: 人工智能个人助理通过机器人或可穿戴设备部署，需要具有体验理解能力才能有效与人类合作。然而，当前的视觉-语言模型（VLMs）主要关注第三人称视角的视频，忽略了自我中心感知经验的丰富性。为了弥补这一差距，我们提出了三个关键贡献。首先，我们引入了自我中心视频理解数据集（EVUD），用于训练VLMs进行特定于自我中心视频的视频字幕和问题回答任务。其次，我们提出了AlanaVLM，一个使用参数高效方法在EVUD上训练的7B参数VLM。最后，我们评估了AlanaVLM在OpenEQA上的能力，这是一个具有挑战性的基准测试，用于体验视频问题回答。我们的模型实现了最先进的性能，比使用GPT-4作为规划者的强力苏格拉底模型高出3.6%。此外，我们在空间推理方面超越了Claude 3和Gemini Pro Vision 1.0，并展示了与Gemini Pro 1.5和GPT-4V相比的竞争性结果，甚至在空间推理方面超过了后者。这项研究为构建可以部署在机器人或可穿戴设备中的高效VLMs铺平了道路，利用体验视频理解与人类在日常任务中无缝合作，为下一代具有体验的人工智能做出贡献。

更新时间: 2024-06-19 20:14:14

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.13807v1

WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia

Retrieval-augmented generation (RAG) has emerged as a promising solution to mitigate the limitations of large language models (LLMs), such as hallucinations and outdated information. However, it remains unclear how LLMs handle knowledge conflicts arising from different augmented retrieved passages, especially when these passages originate from the same source and have equal trustworthiness. In this work, we conduct a comprehensive evaluation of LLM-generated answers to questions that have varying answers based on contradictory passages from Wikipedia, a dataset widely regarded as a high-quality pre-training resource for most LLMs. Specifically, we introduce WikiContradict, a benchmark consisting of 253 high-quality, human-annotated instances designed to assess LLM performance when augmented with retrieved passages containing real-world knowledge conflicts. We benchmark a diverse range of both closed and open-source LLMs under different QA scenarios, including RAG with a single passage, and RAG with 2 contradictory passages. Through rigorous human evaluations on a subset of WikiContradict instances involving 5 LLMs and over 3,500 judgements, we shed light on the behaviour and limitations of these models. For instance, when provided with two passages containing contradictory facts, all models struggle to generate answers that accurately reflect the conflicting nature of the context, especially for implicit conflicts requiring reasoning. Since human evaluation is costly, we also introduce an automated model that estimates LLM performance using a strong open-source language model, achieving an F-score of 0.8. Using this automated metric, we evaluate more than 1,500 answers from seven LLMs across all WikiContradict instances. To facilitate future work, we release WikiContradict on: https://ibm.biz/wikicontradict.

Updated: 2024-06-19 20:13:42

标题: WikiContradict：一个用于在维基百科上评估LLM的真实知识冲突的基准。

摘要: 检索增强生成（RAG）已经成为减轻大语言模型（LLMs）的限制，如幻觉和过时信息的一个有希望的解决方案。然而，尚不清楚LLMs如何处理由不同的增强检索段落引起的知识冲突，特别是当这些段落来自同一来源并具有相同的可信度时。在这项工作中，我们对LLM生成的回答进行了全面评估，这些回答基于维基百科中包含相互矛盾段落的问题，维基百科被广泛认为是大多数LLMs高质量的预训练资源。具体来说，我们引入了WikiContradict，这是一个由253个高质量、人工注释实例组成的基准，旨在评估增强检索段落中包含真实世界知识冲突时LLM的性能。我们对各种不同的封闭和开源LLM在不同的问答场景下进行基准测试，包括具有单个段落的RAG，以及具有2个相互矛盾段落的RAG。通过对涉及5个LLM和超过3,500个判断的WikiContradict实例的严格人类评估，我们揭示了这些模型的行为和局限性。例如，当提供两个包含相互矛盾事实的段落时，所有模型都难以生成准确反映上下文冲突性质的答案，特别是对需要推理的隐含冲突。由于人类评估成本高昂，我们还引入了一个自动模型，使用一个强大的开源语言模型来估计LLM的性能，实现了0.8的F分数。使用这个自动度量标准，我们评估了来自七个LLM的超过1,500个答案在所有WikiContradict实例上的表现。为了促进未来工作，我们在以下网址上发布了WikiContradict：https://ibm.biz/wikicontradict。

更新时间: 2024-06-19 20:13:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13805v1

Detecting Generative Parroting through Overfitting Masked Autoencoders

The advent of generative AI models has revolutionized digital content creation, yet it introduces challenges in maintaining copyright integrity due to generative parroting, where models mimic their training data too closely. Our research presents a novel approach to tackle this issue by employing an overfitted Masked Autoencoder (MAE) to detect such parroted samples effectively. We establish a detection threshold based on the mean loss across the training dataset, allowing for the precise identification of parroted content in modified datasets. Preliminary evaluations demonstrate promising results, suggesting our method's potential to ensure ethical use and enhance the legal compliance of generative models.

Updated: 2024-06-19 19:53:26

标题: 通过过拟合掩码自动编码器检测生成性模仿

摘要: 生成式人工智能模型的出现彻底改变了数字内容创作，但由于生成模仿，其中模型过于密切地模仿其训练数据，它引入了在维护版权完整性方面的挑战。我们的研究提出了一种新颖的方法来解决这个问题，即利用一个过度拟合的Masked Autoencoder（MAE）来有效地检测这种模仿样本。我们建立了一个基于训练数据集中的平均损失的检测阈值，从而能够精确地识别修改数据集中的模仿内容。初步评估显示出有希望的结果，表明我们的方法有潜力确保生成模型的合法使用和增强法律合规性。

更新时间: 2024-06-19 19:53:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.19050v3

RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering

In recent years, with the rapid advancement of transformer models, transformer-based multimodal architectures have found wide application in various downstream tasks, including but not limited to Image Captioning, Visual Question Answering (VQA), and Image-Text Generation. However, contemporary approaches to Remote Sensing (RS) VQA often involve resource-intensive techniques, such as full fine-tuning of large models or the extraction of image-text features from pre-trained multimodal models, followed by modality fusion using decoders. These approaches demand significant computational resources and time, and a considerable number of trainable parameters are introduced. To address these challenges, we introduce a novel method known as RSAdapter, which prioritizes runtime and parameter efficiency. RSAdapter comprises two key components: the Parallel Adapter and an additional linear transformation layer inserted after each fully connected (FC) layer within the Adapter. This approach not only improves adaptation to pre-trained multimodal models but also allows the parameters of the linear transformation layer to be integrated into the preceding FC layers during inference, reducing inference costs. To demonstrate the effectiveness of RSAdapter, we conduct an extensive series of experiments using three distinct RS-VQA datasets and achieve state-of-the-art results on all three datasets. The code for RSAdapter is available online at https://github.com/Y-D-Wang/RSAdapter.

Updated: 2024-06-19 19:39:49

标题: RSAdapter：为遥感视觉问答调整多模型

摘要: 近年来，随着Transformer模型的快速发展，基于Transformer的多模态架构在各种下游任务中得到了广泛应用，包括但不限于图像字幕生成、视觉问答（VQA）和图像文本生成。然而，当代的遥感（RS）VQA方法通常涉及资源密集型技术，例如对大型模型进行完全微调或从预训练的多模态模型中提取图像文本特征，然后使用解码器进行模态融合。这些方法需要大量的计算资源和时间，并引入了大量可训练参数。为了解决这些挑战，我们引入了一种名为RSAdapter的新方法，该方法优先考虑运行时和参数效率。RSAdapter包括两个关键组件：并行适配器和在适配器内的每个全连接（FC）层之后插入的额外线性变换层。这种方法不仅提高了对预训练多模态模型的适应性，还允许在线推断期间将线性变换层的参数整合到前面的FC层中，从而降低推断成本。为了展示RSAdapter的有效性，我们使用三个不同的RS-VQA数据集进行了一系列广泛的实验，并在所有三个数据集上取得了最先进的结果。RSAdapter的代码可在https://github.com/Y-D-Wang/RSAdapter 上获得。

更新时间: 2024-06-19 19:39:49

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.13120v2

IoT-Based Preventive Mental Health Using Knowledge Graphs and Standards for Better Well-Being

Sustainable Development Goals (SDGs) give the UN a road map for development with Agenda 2030 as a target. SDG3 "Good Health and Well-Being" ensures healthy lives and promotes well-being for all ages. Digital technologies can support SDG3. Burnout and even depression could be reduced by encouraging better preventive health. Due to the lack of patient knowledge and focus to take care of their health, it is necessary to help patients before it is too late. New trends such as positive psychology and mindfulness are highly encouraged in the USA. Digital Twin (DT) can help with the continuous monitoring of emotion using physiological signals (e.g., collected via wearables). Digital twins facilitate monitoring and provide constant health insight to improve quality of life and well-being with better personalization. Healthcare DT challenges are standardizing data formats, communication protocols, and data exchange mechanisms. To achieve those data integration and knowledge challenges, we designed the Mental Health Knowledge Graph (ontology and dataset) to boost mental health. The Knowledge Graph (KG) acquires knowledge from ontology-based mental health projects classified within the LOV4IoT ontology catalog (Emotion, Depression, and Mental Health). Furthermore, the KG is mapped to standards (e.g., ontologies) when possible. Standards from ETSI SmartM2M, ITU/WHO, ISO, W3C, NIST, and IEEE are relevant to mental health.

Updated: 2024-06-19 19:35:14

标题: 基于物联网的预防性心理健康，利用知识图谱和标准来提升幸福感

摘要: 可持续发展目标（SDGs）为联合国提供了一个发展路线图，以2030年议程为目标。SDG3“健康与福祉”确保各个年龄段的健康生活并促进福祉。数字技术可以支持SDG3。通过鼓励更好的预防保健，可以减少倦怠甚至抑郁。由于患者缺乏健康知识和关注自身健康，有必要在为时已晚之前帮助患者。在美国，积极心理学和正念等新趋势受到高度鼓励。数字孪生（DT）可以通过使用生理信号（例如通过可穿戴设备收集）进行情绪持续监测。数字孪生有助于监测并提供持续的健康洞察，以改善生活质量和福祉，并实现更好的个性化。医疗数字孪生的挑战在于标准化数据格式、通信协议和数据交换机制。为了解决这些数据集成和知识挑战，我们设计了心理健康知识图（本体和数据集）以促进心理健康。知识图（KG）从基于本体的心理健康项目中获取知识，并分类到LOV4IoT本体目录（情绪、抑郁和心理健康）。此外，KG在可能的情况下与标准（例如本体）进行映射。ETSI SmartM2M、ITU/WHO、ISO、W3C、NIST和IEEE等标准与心理健康相关。

更新时间: 2024-06-19 19:35:14

领域: cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.13791v1

Resource-Aware Hierarchical Federated Learning in Wireless Video Caching Networks

Backhaul traffic congestion caused by the video traffic of a few popular files can be alleviated by storing the to-be-requested content at various levels in wireless video caching networks. Typically, content service providers (CSPs) own the content, and the users request their preferred content from the CSPs using their (wireless) internet service providers (ISPs). As these parties do not reveal their private information and business secrets, traditional techniques may not be readily used to predict the dynamic changes in users' future demands. Motivated by this, we propose a novel resource-aware hierarchical federated learning (RawHFL) solution for predicting user's future content requests. A practical data acquisition technique is used that allows the user to update its local training dataset based on its requested content. Besides, since networking and other computational resources are limited, considering that only a subset of the users participate in the model training, we derive the convergence bound of the proposed algorithm. Based on this bound, we minimize a weighted utility function for jointly configuring the controllable parameters to train the RawHFL energy efficiently under practical resource constraints. Our extensive simulation results validate the proposed algorithm's superiority, in terms of test accuracy and energy cost, over existing baselines.

Updated: 2024-06-19 19:18:42

标题: 资源感知的分层联邦学习在无线视频缓存网络中的应用

摘要: 通过在无线视频缓存网络的各个层次存储即将请求的内容，可以缓解由一些热门文件的视频流量导致的回程流量拥堵。通常，内容服务提供商（CSPs）拥有内容，用户通过他们的（无线）互联网服务提供商（ISPs）向CSPs请求他们喜欢的内容。由于这些各方不会透露他们的私人信息和商业机密，因此传统技术可能无法轻松地用于预测用户未来需求的动态变化。受此启发，我们提出了一种用于预测用户未来内容请求的新颖资源感知分层联合学习（RawHFL）解决方案。采用了一种实用的数据采集技术，允许用户根据其请求的内容更新其本地训练数据集。此外，由于网络和其他计算资源有限，考虑到只有用户的子集参与模型训练，我们推导了所提算法的收敛界限。基于此界限，我们最小化一个加权效用函数，以联合配置可控参数，以在实际资源约束下高效训练RawHFL能量。我们广泛的模拟结果验证了所提算法在测试准确性和能源成本方面优于现有基准的优越性。

更新时间: 2024-06-19 19:18:42

领域: cs.NI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2402.04216v2

A Primal-Dual Framework for Transformers and Neural Networks

Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem, whose primal formulation has the form of a neural network layer. Using our framework, we derive popular attention layers used in practice and propose two new attentions: 1) the Batch Normalized Attention (Attention-BN) derived from the batch normalization layer and 2) the Attention with Scaled Head (Attention-SH) derived from using less training data to fit the SVR model. We empirically demonstrate the advantages of the Attention-BN and Attention-SH in reducing head redundancy, increasing the model's accuracy, and improving the model's efficiency in a variety of practical applications including image and time-series classification.

Updated: 2024-06-19 19:11:22

标题: 一个适用于Transformer和神经网络的原始-对偶框架

摘要: 自我关注是transformers在序列建模任务中取得非凡成功的关键，包括在自然语言处理和计算机视觉等许多应用中。就像神经网络层一样，这些注意机制通常是通过试错和经验开发的。为了为transformers构建关注层提供一个原则性框架，我们展示了自我关注对应于从支持向量回归问题中导出的支持向量扩展，其原始公式具有神经网络层的形式。使用我们的框架，我们推导出实践中使用的流行关注层，并提出了两种新的关注方式：1）来自批归一化层的批标准化注意（Attention-BN）和2）来自使用更少训练数据拟合SVR模型的缩放头注意（Attention-SH）。我们在多种实际应用中，包括图像和时间序列分类中，实证地证明了Attention-BN和Attention-SH在降低头部冗余、提高模型准确性和提高模型效率方面的优势。

更新时间: 2024-06-19 19:11:22

领域: cs.LG,cs.AI,cs.CL,cs.CV,stat.ML

下载: http://arxiv.org/abs/2406.13781v1

Timely Communications for Remote Inference

In this paper, we analyze the impact of data freshness on remote inference systems, where a pre-trained neural network blue infers a time-varying target (e.g., the locations of vehicles and pedestrians) based on features (e.g., video frames) observed at a sensing node (e.g., a camera). One might expect that the performance of a remote inference system degrades monotonically as the feature becomes stale. Using an information-theoretic analysis, we show that this is true if the feature and target data sequence can be closely approximated as a Markov chain, whereas it is not true if the data sequence is far from being Markovian. Hence, the inference error is a function of Age of Information (AoI), where the function could be non-monotonic. To minimize the inference error in real-time, we propose a new "selection-from-buffer" model for sending the features, which is more general than the "generate-at-will" model used in earlier studies. In addition, we design low-complexity scheduling policies to improve inference performance. For single-source, single-channel systems, we provide an optimal scheduling policy. In multi-source, multi-channel systems, the scheduling problem becomes a multi-action restless multi-armed bandit problem. For this setting, we design a new scheduling policy by integrating Whittle index-based source selection and duality-based feature selection-from-buffer algorithms. This new scheduling policy is proven to be asymptotically optimal. These scheduling results hold for minimizing general AoI functions (monotonic or non-monotonic). Data-driven evaluations demonstrate the significant advantages of our proposed scheduling policies.

Updated: 2024-06-19 19:09:20

标题: 远程推理的及时通讯

摘要: 在这篇论文中，我们分析了数据新鲜度对远程推理系统的影响，在这种系统中，一个经过预训练的神经网络蓝色根据在感知节点（例如摄像头）观察到的特征（例如视频帧）推断出一个时变目标（例如车辆和行人的位置）。人们可能会预期，随着特征变得陈旧，远程推理系统的性能会单调下降。通过信息论分析，我们展示了如果特征和目标数据序列可以被近似地看作是马尔可夫链，那么这一点是正确的，而如果数据序列远非马尔可夫链，则不是如此。因此，推理误差是信息时代（AoI）的一个函数，该函数可能是非单调的。为了最小化实时推理误差，我们提出了一个新的“从缓冲区选择”模型来发送特征，这比先前研究中使用的“随意生成”模型更通用。此外，我们设计了低复杂度的调度策略来提高推理性能。对于单源单通道系统，我们提供了一个最优调度策略。在多源多通道系统中，调度问题变成了一个多动作不安定多臂赌博问题。针对这种情况，我们设计了一个新的调度策略，将Whittle指数为基础的源选择和基于对偶的特征选择从缓冲区算法相结合。这个新的调度策略被证明是渐近最优的。这些调度结果适用于最小化一般AoI函数（单调或非单调）。数据驱动的评估展示了我们提出的调度策略的显著优势。

更新时间: 2024-06-19 19:09:20

领域: cs.NI,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.16281v2

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs

Humans often express their communicative intents indirectly or non-literally, which requires their interlocutors -- human or AI -- to understand beyond the literal meaning of words. While most existing work has focused on discriminative evaluations, we present a new approach to generatively evaluate large language models' (LLMs') intention understanding by examining their responses to non-literal utterances. Ideally, an LLM should respond in line with the true intention of a non-literal utterance, not its literal interpretation. Our findings show that LLMs struggle to generate pragmatically relevant responses to non-literal language, achieving only 50-55% accuracy on average. While explicitly providing oracle intentions significantly improves performance (e.g., 75% for Mistral-Instruct), this still indicates challenges in leveraging given intentions to produce appropriate responses. Using chain-of-thought to make models spell out intentions yields much smaller gains (60% for Mistral-Instruct). These findings suggest that LLMs are not yet effective pragmatic interlocutors, highlighting the need for better approaches for modeling intentions and utilizing them for pragmatic generation.

Updated: 2024-06-19 19:07:47

标题: 教皇是天主教徒吗？是的，教皇是天主教徒。LLM中非字面意图解析的生成式评估

摘要: 人类通常间接或非字面表达他们的交际意图，这需要他们的交流对象 -- 无论是人类还是人工智能 -- 超越字面意义理解。虽然大多数现有研究侧重于辨别性评估，但我们提出了一种新方法，通过检验大型语言模型（LLMs）对非字面话语的反应来生成性评估其意图理解能力。理想情况下，一个LLM应该根据非字面话语的真实意图而不是其字面解释做出回应。我们的研究结果显示，LLMs在生成符合语用意义的回应方面存在困难，平均只能达到50-55%的准确率。明确提供正确意图显著提高了性能（例如，Mistral-Instruct达到了75%），但这仍然表明利用给定意图产生适当回应的挑战。使用思维链来使模型明确意图只带来了更小的增益（Mistral-Instruct为60%）。这些研究结果表明，LLMs尚未成为有效的语用交流对象，强调了需要更好的方法来建模意图并利用它们进行语用生成。

更新时间: 2024-06-19 19:07:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.08760v2

Soil respiration signals in response to sustainable soil management practices enhance soil organic carbon stocks

Development of a spatial-temporal and data-driven model of soil respiration at the global scale based on soil temperature, yearly soil moisture, and soil organic carbon (C) estimates. Prediction of soil respiration on an annual basis (1991-2018) with relatively high accuracy (NSE 0.69, CCC 0.82). Lower soil respiration trends, higher soil respiration magnitudes, and higher soil organic C stocks across areas experiencing the presence of sustainable soil management practices.

Updated: 2024-06-19 19:06:36

标题: 可持续土壤管理实践对土壤呼吸信号的响应增强土壤有机碳储量

摘要: 在全球范围内基于土壤温度、年度土壤湿度和土壤有机碳（C）估计开发了一个时空和数据驱动的土壤呼吸模型。通过对年度基础上的土壤呼吸进行预测（1991-2018年），结果显示相对较高的准确性（NSE 0.69，CCC 0.82）。在经历可持续土壤管理实践的区域，土壤呼吸趋势较低，呼吸量较高，土壤有机碳储量也较高。

更新时间: 2024-06-19 19:06:36

领域: cs.LG

下载: http://arxiv.org/abs/2404.05737v2

Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN

Vehicular controller area networks (CANs) are susceptible to masquerade attacks by malicious adversaries. In masquerade attacks, adversaries silence a targeted ID and then send malicious frames with forged content at the expected timing of benign frames. As masquerade attacks could seriously harm vehicle functionality and are the stealthiest attacks to detect in CAN, recent work has devoted attention to compare frameworks for detecting masquerade attacks in CAN. However, most existing works report offline evaluations using CAN logs already collected using simulations that do not comply with domain's real-time constraints. Here we contribute to advance the state of the art by introducing a benchmark study of four different non-deep learning (DL)-based unsupervised online intrusion detection systems (IDS) for masquerade attacks in CAN. Our approach differs from existing benchmarks in that we analyze the effect of controlling streaming data conditions in a sliding window setting. In doing so, we use realistic masquerade attacks being replayed from the ROAD dataset. We show that although benchmarked IDS are not effective at detecting every attack type, the method that relies on detecting changes at the hierarchical structure of clusters of time series produces the best results at the expense of higher computational overhead. We discuss limitations, open challenges, and how the benchmarked methods can be used for practical unsupervised online CAN IDS for masquerade attacks.

Updated: 2024-06-19 19:04:51

标题: 基准测试针对CAN中伪装攻击的无监督在线入侵检测系统

摘要: 车载控制器区域网络（CAN）容易受到恶意对手的伪装攻击。在伪装攻击中，对手会沉默一个目标ID，然后在预期的良性帧时机发送具有伪造内容的恶意帧。由于伪装攻击可能严重影响车辆功能，并且是CAN中最难检测到的攻击，最近的研究已经开始关注比较用于检测CAN中伪装攻击的框架。然而，大多数现有工作报告使用模拟已收集的CAN日志进行离线评估，这些日志不符合领域的实时约束条件。在这里，我们通过引入四种不基于深度学习（DL）的无监督在线入侵检测系统（IDS）对CAN中的伪装攻击进行基准研究，有助于推进技术水平。我们的方法与现有基准的不同之处在于，我们分析了在滑动窗口设置中控制流数据条件的影响。在这样做的过程中，我们使用来自ROAD数据集的真实伪装攻击进行重放。我们发现，尽管经过基准测试的IDS无法有效检测每种攻击类型，但依赖于检测时间序列集群层次结构的变化的方法产生了最佳结果，但代价是更高的计算开销。我们讨论了限制、开放挑战以及如何利用基准方法来实现实际的无监督在线CAN IDS以检测伪装攻击。

更新时间: 2024-06-19 19:04:51

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.13778v1

Game of LLMs: Discovering Structural Constructs in Activities using Large Language Models

Human Activity Recognition is a time-series analysis problem. A popular analysis procedure used by the community assumes an optimal window length to design recognition pipelines. However, in the scenario of smart homes, where activities are of varying duration and frequency, the assumption of a constant sized window does not hold. Additionally, previous works have shown these activities to be made up of building blocks. We focus on identifying these underlying building blocks--structural constructs, with the use of large language models. Identifying these constructs can be beneficial especially in recognizing short-duration and infrequent activities. We also propose the development of an activity recognition procedure that uses these building blocks to model activities, thus helping the downstream task of activity monitoring in smart homes.

Updated: 2024-06-19 19:02:44

标题: LLMs游戏：使用大型语言模型发现活动中的结构构造

摘要: 人类活动识别是一个时间序列分析问题。社区常用的分析程序假设一个最佳的窗口长度来设计识别管道。然而，在智能家居的情景中，活动的持续时间和频率各不相同，常数大小窗口的假设并不成立。此外，先前的研究表明这些活动由构建模块组成。我们专注于识别这些基础构建模块--结构构造，利用大型语言模型。识别这些构造尤其有益于识别短时和不经常的活动。我们还提出开发一种活动识别程序，使用这些构建模块来建模活动，从而帮助智能家居中活动监测的下游任务。

更新时间: 2024-06-19 19:02:44

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.13777v1

Confidence Is All You Need for MI Attacks

In this evolving era of machine learning security, membership inference attacks have emerged as a potent threat to the confidentiality of sensitive data. In this attack, adversaries aim to determine whether a particular point was used during the training of a target model. This paper proposes a new method to gauge a data point's membership in a model's training set. Instead of correlating loss with membership, as is traditionally done, we have leveraged the fact that training examples generally exhibit higher confidence values when classified into their actual class. During training, the model is essentially being 'fit' to the training data and might face particular difficulties in generalization to unseen data. This asymmetry leads to the model achieving higher confidence on the training data as it exploits the specific patterns and noise present in the training data. Our proposed approach leverages the confidence values generated by the machine learning model. These confidence values provide a probabilistic measure of the model's certainty in its predictions and can further be used to infer the membership of a given data point. Additionally, we also introduce another variant of our method that allows us to carry out this attack without knowing the ground truth(true class) of a given data point, thus offering an edge over existing label-dependent attack methods.

Updated: 2024-06-19 18:58:19

标题: 信心是进行MI攻击所需要的全部

摘要: 在这个不断发展的机器学习安全领域，成员推理攻击已经成为对敏感数据保密性的重要威胁。在这种攻击中，对手的目标是确定一个特定点是否在目标模型的训练中使用过。本文提出了一种新方法来衡量数据点在模型训练集中的成员资格。与传统做法相反，我们利用了训练示例通常在被分类为其实际类时表现出更高的置信度值的事实，而不是将损失与成员资格相关联。在训练过程中，模型实质上是在适应训练数据，并且可能在泛化到未见数据时面临特定困难。这种不对称性导致模型在训练数据上实现更高的置信度，因为它利用了训练数据中存在的特定模式和噪声。我们提出的方法利用了机器学习模型生成的置信度值。这些置信度值提供了模型对其预测的确定性的概率度量，并可以进一步用于推断给定数据点的成员资格。此外，我们还介绍了我们方法的另一种变体，允许我们在不知道给定数据点的地面真相（真实类）的情况下进行这种攻击，从而提供了比现有依赖标签的攻击方法更有优势。

更新时间: 2024-06-19 18:58:19

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2311.15373v2

Elliptical Attention

Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this paper, we propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance. In particular, we define a hyper-ellipsoidal neighborhood around each query to increase the attention weights of the tokens lying in the contextually important directions. We term this novel class of attention Elliptical Attention. Our Elliptical Attention provides two benefits: 1) reducing representation collapse and 2) enhancing the model's robustness as the Elliptical Attention pays more attention to contextually relevant information rather than focusing on some small subset of informative features. We empirically demonstrate the advantages of Elliptical Attention over the baseline dot-product attention and state-of-the-art attention methods on various practical tasks, including object classification, image segmentation, and language modeling across different data modalities.

Updated: 2024-06-19 18:38:11

标题: 椭圆形注意力

摘要: 成对点积自注意力是transformers成功的关键，这些transformers在语言和视觉等各种应用中取得了最先进的性能。这种点积自注意力使用欧几里得距离计算输入标记之间的注意力权重，这使得模型容易发生表示崩溃并容易受到污染样本的影响。在本文中，我们提出使用马氏距离度量计算注意力权重，以拉伸具有高语境相关性的底层特征空间。具体而言，我们在每个查询周围定义一个超椭圆形邻域，以增加位于具有上下文重要性方向的标记的注意力权重。我们将这种新颖的注意力类称为椭圆形注意力。我们的椭圆形注意力提供两个好处：1）减少表示崩溃，2）增强模型的鲁棒性，因为椭圆形注意力更加关注上下文相关信息，而不是专注于一小部分信息丰富的特征。我们通过实验证明了椭圆形注意力在各种实际任务上的优势，包括对象分类、图像分割和跨不同数据模态的语言建模，优于基线点积注意力和最先进的注意力方法。

更新时间: 2024-06-19 18:38:11

领域: cs.LG,cs.AI,cs.CL,cs.CV,stat.ML

下载: http://arxiv.org/abs/2406.13770v1

FastPersist: Accelerating Model Checkpointing in Deep Learning

Model checkpoints are critical Deep Learning (DL) artifacts that enable fault tolerance for training and downstream applications, such as inference. However, writing checkpoints to persistent storage, and other I/O aspects of DL training, are mostly ignored by compute-focused optimization efforts for faster training of rapidly growing models and datasets. Towards addressing this imbalance, we propose FastPersist to accelerate checkpoint creation in DL training. FastPersist combines three novel techniques: (i) NVMe optimizations for faster checkpoint writes to SSDs, (ii) efficient write parallelism using the available SSDs in training environments, and (iii) overlapping checkpointing with independent training computations. Our evaluation using real world dense and sparse DL models shows that FastPersist creates checkpoints in persistent storage up to 116x faster than baseline, and enables per-iteration checkpointing with negligible overhead.

Updated: 2024-06-19 18:31:23

标题: FastPersist：加速深度学习模型检查点

摘要: 模型检查点是关键的深度学习（DL）工件，它们使训练和推断等下游应用程序具有容错性。然而，将检查点写入持久存储以及DL训练的其他I/O方面，大多被计算优化努力忽视，以加快不断增长的模型和数据集的训练速度。为了解决这一不平衡，我们提出了FastPersist，以加速DL训练中的检查点创建。FastPersist结合了三种新技术：（i）用于更快将检查点写入SSD的NVMe优化，（ii）在训练环境中利用可用的SSD进行高效的写并行，以及（iii）将独立训练计算与检查点重叠。我们使用真实世界的密集和稀疏DL模型进行评估，结果显示FastPersist在持久存储中创建检查点比基线快高达116倍，并且能够实现每次迭代的检查点，几乎没有额外开销。

更新时间: 2024-06-19 18:31:23

领域: cs.DC,cs.AI,cs.LG,cs.PF

下载: http://arxiv.org/abs/2406.13768v1

Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models

Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention). However, human reasoning in the wild is often grounded in dynamic scenes across time. Thus, we consider videos a new medium for examining spatio-temporal ToM reasoning ability. Specifically, we ask explicit probing questions about videos with abundant social and emotional reasoning content. We develop a pipeline for multimodal LLM for ToM reasoning using video and text. We also enable explicit ToM reasoning by retrieving key frames for answering a ToM question, which reveals how multimodal LLMs reason about ToM.

Updated: 2024-06-19 18:24:31

标题: 透过心灵之眼：使用多模式视频大型语言模型阅读思维

摘要: 大型多模态模型是否具有类似人类情感和社会推理能力，如果是，它是如何工作的？最近的研究发现大型语言模型(LLMs)具有新兴的心灵理论(TOM)推理能力。LLMs能够通过解决各种基于文本的TOM任务来推理人们的心理状态，这些任务要求回答关于行动者TOM的问题(例如，人类的信念、欲望、意图)。然而，现实中人类的推理通常建立在跨越时间的动态场景之上。因此，我们认为视频是检验时空TOM推理能力的新媒介。具体而言，我们提出明确的关于视频中丰富的社会和情感推理内容的探究性问题。我们开发了一个用于视频和文本的多模态LLM的TOM推理管道。我们还通过检索关键帧来回答TOM问题，从而实现了明确的TOM推理，揭示了多模态LLMs如何推理TOM。

更新时间: 2024-06-19 18:24:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13763v1

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learning models, the construction of these attention mechanisms rely on heuristics and experience. In our work, we derive self-attention from kernel principal component analysis (kernel PCA) and show that self-attention projects its query vectors onto the principal component axes of its key matrix in a feature space. We then formulate the exact formula for the value matrix in self-attention, theoretically and empirically demonstrating that this value matrix captures the eigenvectors of the Gram matrix of the key vectors in self-attention. Leveraging our kernel PCA framework, we propose Attention with Robust Principal Components (RPC-Attention), a novel class of robust attention that is resilient to data contamination. We empirically demonstrate the advantages of RPC-Attention over softmax attention on the ImageNet-1K object classification, WikiText-103 language modeling, and ADE20K image segmentation task.

Updated: 2024-06-19 18:22:32

标题: 揭示自注意力的隐藏结构：通过核主成分分析

摘要: transformers在序列建模任务中取得了显著的成功，涵盖了自然语言处理和计算机视觉中的各种应用，这归功于自注意力的关键作用。与大多数深度学习模型的发展类似，这些注意力机制的构建依赖于启发式方法和经验。在我们的工作中，我们从核主成分分析（kernel PCA）推导出自注意力，并展示自注意力将其查询向量投影到其关键矩阵的主成分轴上的特征空间中。然后，我们在自注意力中制定了值矩阵的精确公式，从理论和经验上证明了这个值矩阵捕捉了自注意力中关键向量的Gram矩阵的特征向量。利用我们的核PCA框架，我们提出了一种新颖的鲁棒注意力类别，即具有稳健主成分的注意力（RPC-Attention），它对数据污染具有弹性。我们在ImageNet-1K对象分类、WikiText-103语言建模和ADE20K图像分割任务中经验性地展示了RPC-Attention相对于softmax注意力的优势。

更新时间: 2024-06-19 18:22:32

领域: cs.LG,cs.AI,cs.CL,cs.CV,stat.ML

下载: http://arxiv.org/abs/2406.13762v1

Exponential time differencing for matrix-valued dynamical systems

Matrix evolution equations occur in many applications, such as dynamical Lyapunov/Sylvester systems or Riccati equations in optimization and stochastic control, machine learning or data assimilation. In many cases, their tightest stability condition is coming from a linear term. Exponential time differencing (ETD) is known to produce highly stable numerical schemes by treating the linear term in an exact fashion. In particular, for stiff problems, ETD methods are a method of choice. We propose an extension of the class of ETD algorithms to matrix-valued dynamical equations. This allows us to produce highly efficient and stable integration schemes. We show their efficiency and applicability for a variety of real-world problems, from geophysical applications to dynamical problems in machine learning.

Updated: 2024-06-19 18:22:23

标题: 指数时间差分用于矩阵值动力系统

摘要: 矩阵演化方程出现在许多应用中，例如动态Lyapunov/Sylvester系统或优化和随机控制中的Riccati方程，机器学习或数据同化。在许多情况下，它们最严格的稳定条件来自一个线性项。指数时间差分（ETD）以精确处理线性项而闻名，能够产生高度稳定的数值方案。特别是对于刚性问题，ETD方法是一种选择。我们提出将ETD算法类别扩展到矩阵值动态方程。这使我们能够产生高效稳定的积分方案。我们展示了它们在各种真实世界问题中的效率和适用性，从地球物理应用到机器学习中的动态问题。

更新时间: 2024-06-19 18:22:23

领域: math.NA,cs.LG,cs.NA,math.OC,37M15, 65L04, 65L06, 65L80, 86-10, 68T07

下载: http://arxiv.org/abs/2406.13761v1

WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions

Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a sufficient litmus test of a model's utility in clinical practice. A model that can be trusted for practice should have a correspondence between explanation and clinical determination, yet no prior research has examined the attention fidelity of these models and their effect on ground truth explanations. We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WD). We focus on two mental health and well-being datasets: (a) Multi-label Classification-based MultiWD, and (b) WellXplain for evaluating attention mechanism veracity against expert-labeled explanations. The labels are based on Halbert Dunn's theory of wellness, which gives grounding to our evaluation. We reveal four surprising results about LMs/LLMs: (1) Despite their human-like capabilities, GPT-3.5/4 lag behind RoBERTa, and MedAlpaca, a fine-tuned LLM fails to deliver any remarkable improvements in performance or explanations. (2) Re-examining LMs' predictions based on a confidence-oriented loss function reveals a significant performance drop. (3) Across all LMs/LLMs, the alignment between attention and explanations remains low, with LLMs scoring a dismal 0.0. (4) Most mental health-specific LMs/LLMs overlook domain-specific knowledge and undervalue explanations, causing these discrepancies. This study highlights the need for further research into their consistency and explanations in mental health and well-being.

Updated: 2024-06-19 18:19:39

标题: WellDunn：关于语言模型和大型语言模型在识别健康维度方面的稳健性和可解释性

摘要: 语言模型（LMs）被提议用于心理健康应用，其中不良结果的风险增加意味着预测性能可能不是模型在临床实践中实用性的充分检验标准。一个可以信任的实践模型应该在解释和临床决策之间有对应，然而以往的研究尚未检验这些模型的注意力忠实度及其对地面真相解释的影响。我们引入了一个评估设计，重点关注LMs在识别健康维度（WD）方面的鲁棒性和可解释性。我们关注两个心理健康和福祉数据集：（a）基于多标签分类的MultiWD，以及（b）WellXplain，用于评估注意力机制的真实性与专家标记的解释。标签基于Halbert Dunn的健康理论，这为我们的评估提供了基础。我们揭示了关于LMs/LLMs的四个令人惊讶的结果：（1）尽管具有类似人类的能力，GPT-3.5/4落后于RoBERTa和MedAlpaca，一个经过微调的LLM未能提供任何显着的性能或解释改进。（2）基于自信度导向损失函数重新审视LMs的预测结果，揭示了显著的性能下降。（3）在所有LMs/LLMs中，注意力和解释之间的一致性保持较低，LLMs得分为0.0。（4）大多数针对心理健康的LMs/LLMs忽视了领域特定知识，并低估了解释，导致这些差异。本研究强调了在心理健康和福祉领域进一步研究其一致性和解释的必要性。

更新时间: 2024-06-19 18:19:39

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12058v2

Concept Drift Visualization of SVM with Shifting Window

In machine learning, concept drift is an evolution of information that invalidates the current data model. It happens when the statistical properties of the input data change over time in unforeseen ways. Concept drift detection is crucial when dealing with dynamically changing data. Its visualization can bring valuable insight into the data dynamics, especially for multidimensional data, and is related to visual knowledge discovery. We propose a novel visualization model based on parallel coordinates, denoted as parallel histograms through time. Our model represents histograms of feature distributions for successive time-shifted windows. The drift is shown as variations of these histograms, obtained by connecting the means of the distribution for successive time windows. We show how these diagrams can be used to explain the decision made by the machine learning model in choosing the drift point. By isolating the drift at the edges of successive time windows, there will be none (or reduced) drift within the adjacent windows. We illustrate this concept on both synthetic and real datasets. In our experiments, we use an incremental/decremental SVM with shifting window, introduced by us in previous work. With our proposed technique, in addition to detect the presence of concept drift, we can also depict it. This information can be further used to explain the change. mental results, opening the possibility for further investigations.

Updated: 2024-06-19 18:12:02

标题: SVM在概念漂移可视化中的应用：使用移动窗口

摘要: 在机器学习中，概念漂移是信息演变的一种，使当前数据模型失效。当输入数据的统计属性以无法预料的方式随时间变化时，就会发生这种情况。在处理动态变化数据时，概念漂移检测至关重要。其可视化可以为数据动态提供宝贵的见解，特别是对于多维数据，并与视觉知识发现相关。我们提出了一种基于平行坐标的新型可视化模型，称为时间平行直方图。我们的模型表示连续时间偏移窗口的特征分布直方图。漂移显示为这些直方图的变化，通过连接连续时间窗口的分布均值获得。我们展示了如何利用这些图表来解释机器学习模型在选择漂移点时所做的决策。通过将漂移孤立在连续时间窗口的边缘，相邻窗口内将没有（或减少）漂移。我们在合成和真实数据集上说明了这个概念。在我们的实验中，我们使用了我们之前提出的具有移动窗口的增量/减量SVM。通过我们提出的技术，除了检测概念漂移的存在，我们还可以描述它。这些信息可以进一步用于解释改变，开启进一步研究的可能性。

更新时间: 2024-06-19 18:12:02

领域: cs.LG

下载: http://arxiv.org/abs/2406.13754v1

Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak

Active feedback control in magnetic confinement fusion devices is desirable to mitigate plasma instabilities and enable robust operation. Optical high-speed cameras provide a powerful, non-invasive diagnostic and can be suitable for these applications. In this study, we process fast camera data, at rates exceeding 100kfps, on $\textit{in situ}$ Field Programmable Gate Array (FPGA) hardware to track magnetohydrodynamic (MHD) mode evolution and generate control signals in real-time. Our system utilizes a convolutional neural network (CNN) model which predicts the $n$=1 MHD mode amplitude and phase using camera images with better accuracy than other tested non-deep-learning-based methods. By implementing this model directly within the standard FPGA readout hardware of the high-speed camera diagnostic, our mode tracking system achieves a total trigger-to-output latency of 17.6$\mu$s and a throughput of up to 120kfps. This study at the High Beta Tokamak-Extended Pulse (HBT-EP) experiment demonstrates an FPGA-based high-speed camera data acquisition and processing system, enabling application in real-time machine-learning-based tokamak diagnostic and control as well as potential applications in other scientific domains.

Updated: 2024-06-19 18:10:49

标题: 在托卡马克上部署机器学习的低延迟光学模式跟踪

摘要: 磁约束聚变装置中的主动反馈控制是为了缓解等离子体不稳定性并实现稳健运行而必不可少的。光学高速摄像机提供了一种强大的、非侵入式的诊断方法，可以适用于这些应用。在这项研究中，我们利用$\textit{in situ}$可编程门阵列（FPGA）硬件处理高速摄像机数据，速率超过100kfps，以跟踪磁流体动力学（MHD）模式演化并实时生成控制信号。我们的系统采用卷积神经网络（CNN）模型，使用摄像机图像预测$n$=1 MHD模式幅值和相位，比其他经过测试的非深度学习方法具有更好的准确性。通过将该模型直接实现在高速摄像机诊断的标准FPGA读取硬件中，我们的模式跟踪系统实现了总触发到输出延迟为17.6$\mu$s，吞吐量高达120kfps。这项研究在高Beta托卡马克-扩展脉冲（HBT-EP）实验中展示了基于FPGA的高速摄像机数据采集和处理系统，实现了实时机器学习基础托卡马克诊断和控制的应用，以及在其他科学领域的潜在应用。

更新时间: 2024-06-19 18:10:49

领域: physics.plasm-ph,cs.AR,cs.LG,physics.ins-det

下载: http://arxiv.org/abs/2312.00128v2

Empowering Tuberculosis Screening with Explainable Self-Supervised Deep Neural Networks

Tuberculosis persists as a global health crisis, especially in resource-limited populations and remote regions, with more than 10 million individuals newly infected annually. It stands as a stark symbol of inequity in public health. Tuberculosis impacts roughly a quarter of the global populace, with the majority of cases concentrated in eight countries, accounting for two-thirds of all tuberculosis infections. Although a severe ailment, tuberculosis is both curable and manageable. However, early detection and screening of at-risk populations are imperative. Chest x-ray stands as the predominant imaging technique utilized in tuberculosis screening efforts. However, x-ray screening necessitates skilled radiologists, a resource often scarce, particularly in remote regions with limited resources. Consequently, there is a pressing need for artificial intelligence (AI)-powered systems to support clinicians and healthcare providers in swift screening. However, training a reliable AI model necessitates large-scale high-quality data, which can be difficult and costly to acquire. Inspired by these challenges, in this work, we introduce an explainable self-supervised self-train learning network tailored for tuberculosis case screening. The network achieves an outstanding overall accuracy of 98.14% and demonstrates high recall and precision rates of 95.72% and 99.44%, respectively, in identifying tuberculosis cases, effectively capturing clinically significant features.

Updated: 2024-06-19 18:10:06

标题: 使用可解释的自监督深度神经网络增强结核病筛查

摘要: 结核病仍然是全球卫生危机，尤其在资源匮乏的人口和偏远地区，每年新感染的人数超过1000万人。它象征着公共卫生领域的不平等现象。结核病影响全球约四分之一的人口，大多数病例集中在八个国家，占所有结核病感染的三分之二。尽管是一种严重的疾病，结核病是可以治愈和控制的。然而，早期检测和筛查处于风险中的人群至关重要。胸部X光是结核病筛查工作中主要使用的影像技术。然而，X光筛查需要熟练的放射科医生，这种资源通常很稀缺，特别是在资源有限的偏远地区。因此，迫切需要人工智能（AI）技术支持医生和医疗保健提供者进行快速筛查。然而，训练可靠的AI模型需要大规模高质量的数据，获取这些数据可能会很困难和昂贵。受到这些挑战的启发，在这项工作中，我们介绍了一种专门用于结核病筛查的可解释的自我监督自训练学习网络。该网络在识别结核病病例方面取得了出色的总体准确率达到98.14%，分别实现了95.72%和99.44%的高召回率和精确率，有效捕捉了临床上重要的特征。

更新时间: 2024-06-19 18:10:06

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13750v1

Every Language Counts: Learn and Unlearn in Multilingual LLMs

This paper investigates the propagation of harmful information in multilingual large language models (LLMs) and evaluates the efficacy of various unlearning methods. We demonstrate that fake information, regardless of the language it is in, once introduced into these models through training data, can spread across different languages, compromising the integrity and reliability of the generated content. Our findings reveal that standard unlearning techniques, which typically focus on English data, are insufficient in mitigating the spread of harmful content in multilingual contexts and could inadvertently reinforce harmful content across languages. We show that only by addressing harmful responses in both English and the original language of the harmful data can we effectively eliminate generations for all languages. This underscores the critical need for comprehensive unlearning strategies that consider the multilingual nature of modern LLMs to enhance their safety and reliability across diverse linguistic landscapes.

Updated: 2024-06-19 18:01:08

标题: 每种语言都重要：在多语种法学硕士课程中学习和去学习

摘要: 本文研究了有害信息在多语言大型语言模型（LLMs）中的传播，并评估了各种取消学习方法的有效性。我们证明，无论信息是用什么语言发布的，一旦通过训练数据引入到这些模型中，都会在不同语言之间传播，危及生成内容的完整性和可靠性。我们的研究发现，通常专注于英语数据的标准取消学习技术在减轻多语言环境中有害内容传播方面是不够的，并且可能无意中加强不同语言之间的有害内容。我们表明，只有通过同时解决有害数据的英文和原始语言中的有害响应，我们才能有效地消除所有语言的生成内容。这突显了综合取消学习策略对于考虑现代LLMs的多语言特性以增强其在不同语言环境中的安全性和可靠性的重要性。

更新时间: 2024-06-19 18:01:08

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13748v1

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct an extensive human study on GenAI-Bench to evaluate the performance of leading image and video generation models in various aspects of compositional text-to-visual generation. We also compare automated evaluation metrics against our collected human ratings and find that VQAScore -- a metric measuring the likelihood that a VQA model views an image as accurately depicting the prompt -- significantly outperforms previous metrics such as CLIPScore. In addition, VQAScore can improve generation in a black-box manner (without finetuning) via simply ranking a few (3 to 9) candidate images. Ranking by VQAScore is 2x to 3x more effective than other scoring methods like PickScore, HPSv2, and ImageReward at improving human alignment ratings for DALL-E 3 and Stable Diffusion, especially on compositional prompts that require advanced visio-linguistic reasoning. We will release a new GenAI-Rank benchmark with over 40,000 human ratings to evaluate scoring metrics on ranking images generated from the same prompt. Lastly, we discuss promising areas for improvement in VQAScore, such as addressing fine-grained visual details. We will release all human ratings (over 80,000) to facilitate scientific benchmarking of both generative models and automated metrics.

Updated: 2024-06-19 18:00:07

标题: GenAI-Bench：评估和改进组合文本到视觉生成

摘要: 尽管文本到视觉模型现在能够生成逼真的图像和视频，但它们在涉及属性、关系以及逻辑和比较等更高阶推理的组合文本提示方面仍然存在困难。在这项工作中，我们进行了一项广泛的人类研究，使用GenAI-Bench来评估领先的图像和视频生成模型在组合文本到视觉生成各个方面的性能。我们还将自动评估指标与我们收集的人类评分进行比较，并发现VQAScore —— 一种衡量VQA模型是否准确地将图像视为描述提示的概率的指标 —— 在显著优于以往的指标，如CLIPScore。此外，VQAScore可以通过简单地对3到9个候选图像进行排名来在不进行微调的情况下以黑盒方式改善生成效果。通过VQAScore进行排名比其他评分方法如PickScore、HPSv2和ImageReward在提高DALL-E 3和Stable Diffusion的人类对齐评分方面更有效，特别是在需要高级视觉语言推理的组合提示上。我们将发布一个新的GenAI-Rank基准，其中包含超过40,000个人类评分，以评估对从相同提示生成的图像进行排名的评分指标。最后，我们讨论了改进VQAScore的有希望的领域，比如解决细粒度的视觉细节。我们将发布所有人类评分（超过80,000个）以促进对生成模型和自动评估指标的科学基准测试。

更新时间: 2024-06-19 18:00:07

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2406.13743v1

StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

Understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. A key aspect of this challenge is that objects sharing similar semantic meanings or functions can exhibit striking visual differences, making accurate identification and categorization difficult. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statistics. These frameworks account for the visual variability of objects, as well as complex object co-occurrences and sources of noise such as diverse lighting conditions. By leveraging large-scale datasets and cross-attention conditioning, these models generate detailed and contextually rich scene representations. This capability opens new avenues for improving object recognition and scene understanding in varied and challenging environments. Our work presents StableSemantics, a dataset comprising 224 thousand human-curated prompts, processed natural language captions, over 2 million synthetic images, and 10 million attention maps corresponding to individual noun chunks. We explicitly leverage human-generated prompts that correspond to visually interesting stable diffusion generations, provide 10 generations per phrase, and extract cross-attention maps for each image. We explore the semantic distribution of generated images, examine the distribution of objects within images, and benchmark captioning and open vocabulary segmentation methods on our data. To the best of our knowledge, we are the first to release a diffusion dataset with semantic attributions. We expect our proposed dataset to catalyze advances in visual semantic understanding and provide a foundation for developing more sophisticated and effective visual models. Website: https://stablesemantics.github.io/StableSemantics

Updated: 2024-06-19 17:59:40

标题: 《稳定语义：自然图像中语义表示的合成语言视觉数据集》

摘要: 理解视觉场景的语义是计算机视觉中的一个基本挑战。这一挑战的关键在于，共享相似语义意义或功能的对象可能呈现出明显的视觉差异，使准确识别和分类变得困难。最近文本到图像框架的进展导致了一些模型，可以隐式地捕获自然场景的统计信息。这些框架考虑了对象的视觉变化性，以及复杂的对象共现和各种噪声来源，例如不同的光照条件。通过利用大规模数据集和交叉注意力调节，这些模型生成了详细且具有上下文丰富性的场景表示。这种能力为改进在各种具有挑战性的环境中的对象识别和场景理解打开了新的途径。我们的工作介绍了StableSemantics，这是一个数据集，包括22.4万个人工策划的提示，处理过的自然语言标题，超过200万个合成图像，以及对应于单个名词块的1000万个注意力地图。我们明确利用人类生成的提示，这些提示对应于视觉上有趣的稳定扩散生成物，每个短语提供10个生成物，并为每个图像提取交叉注意力地图。我们探索生成图像的语义分布，检查图像中对象的分布，并在我们的数据上对标题和开放词汇分割方法进行基准测试。据我们所知，我们是第一个发布带有语义属性的扩散数据集。我们期望我们提出的数据集能够推动视觉语义理解的进步，并为开发更复杂和有效的视觉模型打下基础。网站：https://stablesemantics.github.io/StableSemantics.

更新时间: 2024-06-19 17:59:40

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13735v1

You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling

Pseudo-labeling is a popular semi-supervised learning technique to leverage unlabeled data when labeled samples are scarce. The generation and selection of pseudo-labels heavily rely on labeled data. Existing approaches implicitly assume that the labeled data is gold standard and 'perfect'. However, this can be violated in reality with issues such as mislabeling or ambiguity. We address this overlooked aspect and show the importance of investigating labeled data quality to improve any pseudo-labeling method. Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling. We select useful labeled and pseudo-labeled samples via analysis of learning dynamics. We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world tabular and image datasets. Additionally, DIPS improves data efficiency and reduces the performance distinctions between different pseudo-labelers. Overall, we highlight the significant benefits of a data-centric rethinking of pseudo-labeling in real-world settings.

Updated: 2024-06-19 17:58:40

标题: 您无法处理（肮脏的）真相：数据中心洞察改善伪标签

摘要: 伪标签是一种流行的半监督学习技术，可以在标记样本稀缺时利用未标记数据。伪标签的产生和选择在很大程度上依赖于标记数据。现有方法隐含地假设标记数据是金标准和“完美”的。然而，在现实中可能存在问题，如错误标记或模糊性。我们解决了这个被忽视的方面，并展示了研究标记数据质量以改善任何伪标签方法的重要性。具体来说，我们引入了一种名为DIPS的新型数据特征化和选择框架，以扩展伪标签。我们通过分析学习动态来选择有用的标记和伪标记样本。我们展示了DIPS在各种伪标签方法在广泛范围的真实世界表格和图像数据集中的适用性和影响。此外，DIPS提高了数据效率，并减少了不同伪标签者之间的性能差异。总的来说，我们强调在真实世界环境中对伪标签进行数据中心重新思考的重要好处。

更新时间: 2024-06-19 17:58:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13733v1

Is poisoning a real threat to LLM alignment? Maybe more so than you think

Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLHF methods warrants an analysis of their vulnerabilities. In this work, we investigate the vulnerabilities of DPO to poisoning attacks under different scenarios and compare the effectiveness of preference poisoning, a first of its kind. We comprehensively analyze DPO's vulnerabilities under different types of attacks, i.e., backdoor and non-backdoor attacks, and different poisoning methods across a wide array of language models, i.e., LLama 7B, Mistral 7B, and Gemma 7B. We find that unlike PPO-based methods, which, when it comes to backdoor attacks, require at least 4\% of the data to be poisoned to elicit harmful behavior, we exploit the true vulnerabilities of DPO more simply so we can poison the model with only as much as 0.5\% of the data. We further investigate the potential reasons behind the vulnerability and how well this vulnerability translates into backdoor vs non-backdoor attacks.

Updated: 2024-06-19 17:56:17

标题: 中毒是对低锁定力矫治器真正的威胁吗？也许比你想象的更严重

摘要: 最近在人类反馈强化学习（RLHF）方面取得的进展显著影响了大型语言模型（LLMs）的对齐。诸如Proximal Policy Optimization（PPO）等强化学习算法的敏感性导致了对直接策略优化（DPO）的新线工作，该方法将RLHF视为监督学习框架。这些RLHF方法的实际应用增加了对其脆弱性的分析需求。在本研究中，我们调查了DPO在不同情境下对毒害攻击的脆弱性，并比较了偏好毒害攻击的有效性，这是首次研究。我们全面分析了DPO在不同类型攻击（即后门和非后门攻击）以及不同毒害方法在各种语言模型（如LLama 7B、Mistral 7B和Gemma 7B）中的脆弱性。我们发现，与基于PPO的方法相比，后门攻击时至少需要毒害数据的4\%才能引发有害行为，我们更简单地利用DPO的真正脆弱性，只需0.5\%的数据即可毒害模型。我们进一步研究了潜在的脆弱性原因以及这种脆弱性在后门攻击与非后门攻击中的转化情况。

更新时间: 2024-06-19 17:56:17

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2406.12091v2

Integrating Fuzzy Logic with Causal Inference: Enhancing the Pearl and Neyman-Rubin Methodologies

In this paper, we generalize the Pearl and Neyman-Rubin methodologies in causal inference by introducing a generalized approach that incorporates fuzzy logic. Indeed, we introduce a fuzzy causal inference approach that consider both the vagueness and imprecision inherent in data, as well as the subjective human perspective characterized by fuzzy terms such as 'high', 'medium', and 'low'. To do so, we introduce two fuzzy causal effect formulas: the Fuzzy Average Treatment Effect (FATE) and the Generalized Fuzzy Average Treatment Effect (GFATE), together with their normalized versions: NFATE and NGFATE. When dealing with a binary treatment variable, our fuzzy causal effect formulas coincide with classical Average Treatment Effect (ATE) formula, that is a well-established and popular metric in causal inference. In FATE, all values of the treatment variable are considered equally important. In contrast, GFATE takes into account the rarity and frequency of these values. We show that for linear Structural Equation Models (SEMs), the normalized versions of our formulas, NFATE and NGFATE, are equivalent to ATE. Further, we provide identifiability criteria for these formulas and show their stability with respect to minor variations in the fuzzy subsets and the probability distributions involved. This ensures the robustness of our approach in handling small perturbations in the data. Finally, we provide several experimental examples to empirically validate and demonstrate the practical application of our proposed fuzzy causal inference methods.

Updated: 2024-06-19 17:54:31

标题: 将模糊逻辑与因果推断相结合：增强Pearl和Neyman-Rubin方法论

摘要: 在本文中，我们通过引入模糊逻辑，将Pearl和Neyman-Rubin方法论在因果推断中进行了泛化。事实上，我们介绍了一种考虑数据中固有的模糊性和不精确性以及模糊术语（如“高”、“中”和“低”）所特征的主观人类视角的模糊因果推断方法。为此，我们引入了两个模糊因果效应公式：模糊平均处理效应（FATE）和广义模糊平均处理效应（GFATE），以及它们的归一化版本：NFATE和NGFATE。当处理二元处理变量时，我们的模糊因果效应公式与经典的平均处理效应（ATE）公式重合，后者是因果推断中一个被广泛认可和流行的度量。在FATE中，处理变量的所有值被视为同等重要。相反，GFATE考虑了这些值的稀有性和频率。我们表明，在线性结构方程模型（SEMs）中，我们公式的归一化版本NFATE和NGFATE等同于ATE。此外，我们提供了这些公式的可辨识性标准，并展示了它们对模糊子集和涉及的概率分布的微小变化的稳定性。这确保了我们的方法在处理数据中的小扰动时的鲁棒性。最后，我们提供了几个实验示例，以经验验证和展示我们提出的模糊因果推断方法的实际应用。

更新时间: 2024-06-19 17:54:31

领域: cs.AI,cs.LG,cs.LO,62D20, 60A86, 03E72, 93C42, 68T37, 6008, 68T20, 68T27, 68U99,I.2.3; G.3

下载: http://arxiv.org/abs/2406.13731v1

Chain-of-Thought Unfaithfulness as Disguised Accuracy

Understanding the extent to which Chain-of-Thought (CoT) generations align with a large language model's (LLM) internal computations is critical for deciding whether to trust an LLM's output. As a proxy for CoT faithfulness, Lanham et al. (2023) propose a metric that measures a model's dependence on its CoT for producing an answer. Within a single family of proprietary models, they find that LLMs exhibit a scaling-then-inverse-scaling relationship between model size and their measure of faithfulness, and that a 13 billion parameter model exhibits increased faithfulness compared to models ranging from 810 million to 175 billion parameters in size. We evaluate whether these results generalize as a property of all LLMs. We replicate their experimental setup with three different families of models and, under specific conditions, successfully reproduce the scaling trends for CoT faithfulness they report. However, we discover that simply changing the order of answer choices in the prompt can reduce the metric by 73 percentage points. The faithfulness metric is also highly correlated ($R^2$ = 0.91) with accuracy, raising doubts about its validity for evaluating faithfulness.

Updated: 2024-06-19 17:49:54

标题: 思维链不忠作为伪装的准确性

摘要: 理解思维链（CoT）生成与大型语言模型（LLM）内部计算的一致程度对于决定是否信任LLM的输出至关重要。作为CoT忠实度的代理，Lanham等人（2023年）提出了一种衡量模型依赖其CoT以产生答案的度量标准。在一个专有模型系列中，他们发现LLMs展示了模型大小和它们的忠实度测量之间的缩放-反向缩放关系，一个拥有130亿参数的模型相比于大小范围在8.1亿到1750亿参数的模型表现出更高的忠实度。我们评估这些结果是否作为所有LLMs的一个属性普遍存在。我们使用三个不同家族的模型复制他们的实验设置，在特定条件下，成功复制了他们报告的CoT忠实度的缩放趋势。然而，我们发现简单改变提示中答案选项的顺序可以将度量减少73个百分点。忠实度度量也与准确度高度相关（$R^2$ = 0.91），对于评估忠实度的有效性提出了质疑。

更新时间: 2024-06-19 17:49:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.14897v2

Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking

In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023, this dataset encompasses monthly recruitment demand for 2,324 types of skills across 521 companies. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research. Our code and dataset are publicly accessible via the https://github.com/Job-SDF/benchmark.

Updated: 2024-06-19 17:47:25

标题: Job-SDF：一种用于工作技能需求预测和基准测试的多粒度数据集

摘要: 在快速发展的就业市场中，技能需求预测至关重要，因为它使决策者和企业能够预见和适应变化，确保劳动力技能与市场需求对齐，从而提高生产力和竞争力。此外，通过识别新兴的技能需求，它引导个人走向相关的培训和教育机会，促进持续的自我学习和发展。然而，缺乏全面的数据集构成了重大挑战，阻碍了研究和该领域的进展。为了弥合这一差距，我们提出了Job-SDF，这是一个旨在训练和基准测试工作技能需求预测模型的数据集。基于2021年至2023年间收集的来自中国主要在线招聘平台的1035万份公开工作广告，此数据集涵盖了521家公司对2324种技能的月度招聘需求。我们的数据集独特地可以评估各种粒度的技能需求预测模型，包括职业、公司和地区水平。我们在该数据集上对一系列模型进行基准测试，评估它们在标准情况下的表现，针对较低价值范围的预测以及在结构性突变的情况下的表现，为进一步的研究提供新的见解。我们的代码和数据集可以通过https://github.com/Job-SDF/benchmark 公开获取。

更新时间: 2024-06-19 17:47:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.11920v2

Global Solutions to Master Equations for Continuous Time Heterogeneous Agent Macroeconomic Models

We propose and compare new global solution algorithms for continuous time heterogeneous agent economies with aggregate shocks. First, we approximate the agent distribution so that equilibrium in the economy can be characterized by a high, but finite, dimensional non-linear partial differential equation. We consider different approximations: discretizing the number of agents, discretizing the agent state variables, and projecting the distribution onto a finite set of basis functions. Second, we represent the value function using a neural network and train it to solve the differential equation using deep learning tools. We refer to the solution as an Economic Model Informed Neural Network (EMINN). The main advantage of this technique is that it allows us to find global solutions to high dimensional, non-linear problems. We demonstrate our algorithm by solving important models in the macroeconomics and spatial literatures (e.g. Krusell and Smith (1998), Khan and Thomas (2007), Bilal (2023)).

Updated: 2024-06-19 17:42:53

标题: 全球解决方案：连续时间异质代理宏观经济模型的主方程

摘要: 我们提出并比较了连续时间异质代理经济中的全局解决方案算法与总量冲击。首先，我们对代理分布进行近似，使得经济平衡可以通过一个高维但有限的非线性偏微分方程来表征。我们考虑不同的近似方法：离散化代理数量、离散化代理状态变量，并将分布投影到有限一组基函数上。其次，我们使用神经网络表示价值函数，并训练它使用深度学习工具来解决微分方程。我们将这种解决方案称为经济模型信息神经网络（EMINN）。这种技术的主要优点是它能够帮助我们找到高维、非线性问题的全局解。我们通过解决宏观经济学和空间文献中的重要模型（如Krusell和Smith（1998），Khan和Thomas（2007），Bilal（2023））来展示我们的算法。

更新时间: 2024-06-19 17:42:53

领域: math.OC,cs.LG,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2406.13726v1

Tree-Sliced Wasserstein Distance on a System of Lines

Sliced Wasserstein (SW) distance in Optimal Transport (OT) is widely used in various applications thanks to its statistical effectiveness and computational efficiency. On the other hand, Tree Wassenstein (TW) and Tree-sliced Wassenstein (TSW) are instances of OT for probability measures where its ground cost is a tree metric. TSW also has a low computational complexity, i.e. linear to the number of edges in the tree. Especially, TSW is identical to SW when the tree is a chain. While SW is prone to loss of topological information of input measures due to relying on one-dimensional projection, TSW is more flexible and has a higher degree of freedom by choosing a tree rather than a line to alleviate the curse of dimensionality in SW. However, for practical applications, popular tree metric sampling methods are heavily built upon given supports, which limits their capacity to adapt to new supports. In this paper, we propose the Tree-Sliced Wasserstein distance on a System of Lines (TSW-SL), which brings a connection between SW and TSW. Compared to SW and TSW, our TSW-SL benefits from the higher degree of freedom of TSW while being suitable to dynamic settings as SW. In TSW-SL, we use a variant of the Radon Transform to project measures onto a system of lines, resulting in measures on a space with a tree metric, then leverage TW to efficiently compute distances between them. We empirically verify the advantages of TSW-SL over the traditional SW by conducting a variety of experiments on gradient flows, image style transfer, and generative models.

Updated: 2024-06-19 17:40:11

标题: 树切分的系统线上的Wasserstein距离

摘要: 在Optimal Transport（OT）中，切片Wasserstein（SW）距离广泛应用于各种应用中，这得益于其统计有效性和计算效率。另一方面，树Wasserstein（TW）和树切片Wasserstein（TSW）是概率测度的OT实例，其基准成本是树度量。TSW的计算复杂度也很低，即与树中的边的数量成线性关系。特别是，当树是一个链时，TSW与SW相同。虽然SW倾向于因依赖一维投影而丢失输入测度的拓扑信息，但TSW更灵活，并且通过选择树而不是直线来缓解SW中的维度灾难问题，因此具有更高的自由度。然而，对于实际应用，流行的树度量采样方法主要建立在给定支持上，这限制了它们适应新支持的能力。在本文中，我们提出了基于系统线的树切片Wasserstein距离（TSW-SL），它在SW和TSW之间建立了联系。与SW和TSW相比，我们的TSW-SL利用了TSW的更高自由度，同时适用于动态环境，就像SW一样。在TSW-SL中，我们使用Radon变换的变体将测度投影到一组线上，从而得到带有树度量的空间上的测度，然后利用TW有效地计算它们之间的距离。通过在梯度流、图像风格转移和生成模型上进行各种实验，我们在经验上验证了TSW-SL相对于传统SW的优势。

更新时间: 2024-06-19 17:40:11

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.13725v1

Heterogeneous Graph Neural Networks with Post-hoc Explanations for Multi-modal and Explainable Land Use Inference

Urban land use inference is a critically important task that aids in city planning and policy-making. Recently, the increased use of sensor and location technologies has facilitated the collection of multi-modal mobility data, offering valuable insights into daily activity patterns. Many studies have adopted advanced data-driven techniques to explore the potential of these multi-modal mobility data in land use inference. However, existing studies often process samples independently, ignoring the spatial correlations among neighbouring objects and heterogeneity among different services. Furthermore, the inherently low interpretability of complex deep learning methods poses a significant barrier in urban planning, where transparency and extrapolability are crucial for making long-term policy decisions. To overcome these challenges, we introduce an explainable framework for inferring land use that synergises heterogeneous graph neural networks (HGNs) with Explainable AI techniques, enhancing both accuracy and explainability. The empirical experiments demonstrate that the proposed HGNs significantly outperform baseline graph neural networks for all six land-use indicators, especially in terms of 'office' and 'sustenance'. As explanations, we consider feature attribution and counterfactual explanations. The analysis of feature attribution explanations shows that the symmetrical nature of the `residence' and 'work' categories predicted by the framework aligns well with the commuter's 'work' and 'recreation' activities in London. The analysis of the counterfactual explanations reveals that variations in node features and types are primarily responsible for the differences observed between the predicted land use distribution and the ideal mixed state. These analyses demonstrate that the proposed HGNs can suitably support urban stakeholders in their urban planning and policy-making.

Updated: 2024-06-19 17:39:10

标题: 多模态和可解释土地利用推断的异质图神经网络与事后解释

摘要: 城市土地利用推断是一项至关重要的任务，有助于城市规划和政策制定。最近，传感器和定位技术的广泛应用促进了多模态移动数据的收集，为日常活动模式提供了宝贵的见解。许多研究采用先进的数据驱动技术来探索这些多模态移动数据在土地利用推断中的潜力。然而，现有研究通常独立处理样本，忽视了邻近对象之间的空间相关性和不同服务之间的异质性。此外，复杂深度学习方法固有的低可解释性在城市规划中构成了重要障碍，透明度和可推广性对于做出长期政策决策至关重要。为了克服这些挑战，我们引入了一个可解释的框架，用于推断土地利用，将异构图神经网络（HGNs）与可解释的AI技术相结合，提高了准确性和可解释性。实证实验表明，所提出的HGNs在所有六个土地利用指标中明显优于基准图神经网络，特别是在“办公”和“食物”方面。作为解释，我们考虑特征归因和反事实解释。特征归因解释的分析显示，框架预测的“居住”和“工作”类别的对称性与伦敦通勤者的“工作”和“娱乐”活动相吻合。反事实解释的分析揭示了节点特征和类型的变化主要导致了预测的土地利用分布与理想混合状态之间观察到的差异。这些分析表明，所提出的HGNs可以适当地支持城市利益相关者在其城市规划和政策制定中。

更新时间: 2024-06-19 17:39:10

领域: cs.AI

下载: http://arxiv.org/abs/2406.13724v1

Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

There has been a surge in the development of various Large Language Models (LLMs). However, text generation for languages other than English often faces significant challenges, including poor generation quality and the reduced computational performance due to the disproportionate representation of tokens in model's vocabulary. In this work, we address these issues and introduce Vikhr, a new state-of-the-art open-source instruction-tuned LLM designed specifically for the Russian language. Unlike previous efforts for Russian that utilize computationally inexpensive LoRA adapters on top of English-oriented models, Vikhr features an adapted tokenizer vocabulary and undergoes the continued pre-training and instruction tuning of all weights. This approach not only enhances the model's performance but also significantly improves its computational and contextual efficiency. The remarkable performance of Vikhr across various Russian-language benchmarks can also be attributed to our efforts in expanding instruction datasets and corpora for continued pre-training. Vikhr not only sets the new state of the art among open-source LLMs for Russian, but even outperforms some proprietary closed-source models on certain benchmarks. The model weights, instruction sets, and code are publicly available

Updated: 2024-06-19 17:32:23

标题: 飓风：俄语开源指令调整的大型语言模型家族

摘要: 近年来，各种大型语言模型（LLMs）的发展迅猛。然而，除英语外的其他语言的文本生成往往面临重大挑战，包括生成质量不佳以及由于模型词汇表中令牌比例不均导致的计算性能降低。在这项工作中，我们解决了这些问题，并推出了Vikhr，这是一个专门为俄语设计的最新一代开源指令调整的LLM。与之前为俄语设计的模型使用在英语导向模型之上的计算成本较低的LoRA适配器不同，Vikhr具有经过调整的分词器词汇表，并经过所有权重的持续预训练和指令调整。这种方法不仅提高了模型的性能，还显著提高了其计算和上下文效率。Vikhr在各种俄语基准测试中的出色表现也归功于我们扩大指令数据集和语料库以进行持续预训练的努力。Vikhr不仅在俄语的开源LLMs中创造了新的技术水平，甚至在某些基准测试中超越了一些专有的闭源模型。模型权重、指令集和代码均公开可用。

更新时间: 2024-06-19 17:32:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.13929v2

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in three phases. First, Vul-RAG constructs a vulnerability knowledge base by extracting multi-dimension knowledge via LLMs from existing CVE instances; second, for a given code snippet, Vul-RAG} retrieves the relevant vulnerability knowledge from the constructed knowledge base based on functional semantics; third, Vul-RAG leverages LLMs to check the vulnerability of the given code snippet by reasoning the presence of vulnerability causes and fixing solutions of the retrieved vulnerability knowledge. Our evaluation of Vul-RAG on our constructed benchmark PairVul shows that Vul-RAG substantially outperforms all baselines by 12.96\%/110\% relative improvement in accuracy/pairwise-accuracy. In addition, our user study shows that the vulnerability knowledge generated by Vul-RAG can serve as high-quality explanations which can improve the manual detection accuracy from 0.60 to 0.77.

Updated: 2024-06-19 17:27:06

标题: Vul-RAG：通过知识级RAG增强基于LLM的漏洞检测

摘要: 漏洞检测对软件质量保证至关重要。近年来，深度学习模型（尤其是大型语言模型）在漏洞检测方面表现出了潜力。在这项工作中，我们提出了一种基于LLM的漏洞检测技术Vul-RAG，利用知识级检索增强生成（RAG）框架来检测给定代码中的漏洞，分为三个阶段。首先，Vul-RAG通过从现有CVE实例中利用LLMs提取多维知识构建漏洞知识库；其次，对于给定的代码片段，Vul-RAG根据功能语义从构建的知识库中检索相关的漏洞知识；第三，Vul-RAG利用LLMs通过推理检查给定代码片段的漏洞，评估检测漏洞原因和修复方案。我们在构建的基准测试PairVul上评估了Vul-RAG，结果表明Vul-RAG在准确性/成对准确性上相对提高了12.96\%/110%。此外，我们的用户研究表明，Vul-RAG生成的漏洞知识可以作为高质量的解释，可以将手动检测的准确性从0.60提高到0.77。

更新时间: 2024-06-19 17:27:06

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.11147v2

On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems

Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning.

Updated: 2024-06-19 17:24:36

标题: 关于针对少样本问题进行域相邻微调模型集成的实用性

摘要: 大型语言模型（LLMs）在领域特定数据上微调后，已被观察到在广泛的下游任务上表现良好。然而，在许多应用中，这样的数据可能并不容易获得，从而促使使用领域相邻模型的零射击或少射击方法。虽然有多个针对各种任务进行微调的模型可用，但为给定任务找到适当的领域相邻模型通常并不简单。在本文中，我们研究了DAFT-E，一个利用一组领域相邻微调基础模型的框架，用于少射击问题。我们展示了对于零射击问题，这种集成方法提供了接近单一最佳模型的准确性性能。对于少射击问题，这种性能进一步提高，此时DEFT-E可以超越任何单一领域相邻模型，同时需要更少的数据进行领域特定微调。

更新时间: 2024-06-19 17:24:36

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13720v1

The Male CEO and the Female Assistant: Gender Biases in Text-To-Image Generation of Dual Subjects

Recent large-scale T2I models like DALLE-3 have made progress on improving fairness in single-subject generation, i.e. generating a one-person image. However, we reveal that these improved models still demonstrate considerable biases when simply generating two people. To systematically evaluate T2I models in this challenging generation setting, we propose the Paired Stereotype Test (PST) framework, established as a dual-subject generation task, i.e. generating two people in the same image. The setting in PST is especially challenging, as the two individuals are described with social identities that are male-stereotyped and female-stereotyped, respectively, e.g. "a CEO" and "an Assistant". It is easy for T2I models to unfairly follow gender stereotypes in this contrastive setting. We establish a metric, Stereotype Score (SS), to quantitatively measure the adherence to gender stereotypes in generated images. Using PST, we evaluate two aspects of gender biases in DALLE-3 -- the widely-identified bias in gendered occupation, as well as a novel aspect: bias in organizational power. Results show that despite generating seemingly fair or even anti-stereotype single-person images, DALLE-3 still shows notable biases under PST -- for instance, in experiments on gender-occupational stereotypes, over 74% model generations demonstrate biases. Moreover, compared to single-person settings, DALLE-3 is more likely to perpetuate male-associated stereotypes under PST. Our work pioneers the research on bias in dual-subject generation, and our proposed PST framework can be easily extended for further experiments, establishing a valuable contribution.

Updated: 2024-06-19 17:23:44

标题: 男性CEO和女性助理：双重主体文本到图像生成中的性别偏见

摘要: 最近大规模的T2I模型，如DALLE-3，在改善单一主体生成中取得了进展，即生成一个人的图像。然而，我们揭示了这些改进的模型在简单生成两个人时仍然表现出相当大的偏见。为了系统评估T2I模型在这种具有挑战性的生成设置中的表现，我们提出了配对陈规测试（PST）框架，建立为一个双主体生成任务，即在同一图像中生成两个人。在PST中的设置尤其具有挑战性，因为两个个体分别被描述为具有男性陈规和女性陈规的社会身份，例如“一名CEO”和“一名助理”。在这种对比设置中，T2I模型很容易不公平地遵循性别陈规。我们建立了一个度量标准，陈规分数（SS），用于定量衡量生成图像中对性别陈规的遵守程度。利用PST，我们评估了DALLE-3中性别偏见的两个方面--性别职业中广泛认可的偏见，以及一种新颖的方面：组织权力中的偏见。结果显示，尽管生成看似公平甚至反陈规的单人图像，DALLE-3在PST下仍然展示出明显的偏见--例如，在性别职业陈规实验中，超过74%的模型生成展示了偏见。此外，与单人设置相比，DALLE-3在PST下更有可能持续传播男性相关的陈规。我们的工作开创了对双主体生成中偏见的研究，我们提出的PST框架可以很容易地扩展进行进一步的实验，建立了一个有价值的贡献。

更新时间: 2024-06-19 17:23:44

领域: cs.CV,cs.AI,cs.CY

下载: http://arxiv.org/abs/2402.11089v2

Supervised low-rank semi-nonnegative matrix factorization with frequency regularization for forecasting spatio-temporal data

We propose a novel methodology for forecasting spatio-temporal data using supervised semi-nonnegative matrix factorization (SSNMF) with frequency regularization. Matrix factorization is employed to decompose spatio-temporal data into spatial and temporal components. To improve clarity in the temporal patterns, we introduce a nonnegativity constraint on the time domain along with regularization in the frequency domain. Specifically, regularization in the frequency domain involves selecting features in the frequency space, making an interpretation in the frequency domain more convenient. We propose two methods in the frequency domain: soft and hard regularizations, and provide convergence guarantees to first-order stationary points of the corresponding constrained optimization problem. While our primary motivation stems from geophysical data analysis based on GRACE (Gravity Recovery and Climate Experiment) data, our methodology has the potential for wider application. Consequently, when applying our methodology to GRACE data, we find that the results with the proposed methodology are comparable to previous research in the field of geophysical sciences but offer clearer interpretability.

Updated: 2024-06-19 17:22:27

标题: 受监督的低秩半非负矩阵分解与频率正则化用于预测时空数据

摘要: 我们提出了一种新的方法来利用监督半非负矩阵分解（SSNMF）和频率正则化来预测时空数据。矩阵分解被用来将时空数据分解为空间和时间组成部分。为了提高时间模式的清晰度，我们在时间域引入了非负约束，并在频率域进行正则化。具体而言，频率域中的正则化涉及在频率空间中选择特征，使频率域中的解释更加便利。我们在频率域提出了两种方法：软正则化和硬正则化，并对相应的受约束优化问题的一阶稳定点提供了收敛保证。尽管我们的主要动机来自基于GRACE（重力恢复与气候实验）数据的地球物理数据分析，但我们的方法具有更广泛的应用潜力。因此，当将我们的方法应用于GRACE数据时，我们发现与地球物理科学领域的先前研究相比，采用所提出的方法得到的结果具有可比性，但提供了更清晰的可解释性。

更新时间: 2024-06-19 17:22:27

领域: stat.ML,cs.LG,65F22, 65F55 and 86A04

下载: http://arxiv.org/abs/2311.08636v2

Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion

Recent advances in large language models (LLMs) have led to new summarization strategies, offering an extensive toolkit for extracting important information. However, these approaches are frequently limited by their reliance on isolated sources of data. The amount of information that can be gathered is limited and covers a smaller range of themes, which introduces the possibility of falsified content and limited support for multilingual and multimodal data. The paper proposes a novel approach to summarization that tackles such challenges by utilizing the strength of multiple sources to deliver a more exhaustive and informative understanding of intricate topics. The research progresses beyond conventional, unimodal sources such as text documents and integrates a more diverse range of data, including YouTube playlists, pre-prints, and Wikipedia pages. The aforementioned varied sources are then converted into a unified textual representation, enabling a more holistic analysis. This multifaceted approach to summary generation empowers us to extract pertinent information from a wider array of sources. The primary tenet of this approach is to maximize information gain while minimizing information overlap and maintaining a high level of informativeness, which encourages the generation of highly coherent summaries.

Updated: 2024-06-19 17:15:47

标题: 融合多源、多模态和多语言的信息提取和总结：融合维度的趋同

摘要: 最近的大语言模型（LLMs）的进展引发了新的摘要策略，提供了一个广泛的工具包，用于提取重要信息。然而，这些方法经常受到其依赖于孤立数据源的限制。可收集的信息量有限，涵盖的主题范围较小，这引入了虚假内容的可能性，并对多语言和多模态数据的支持有限。本文提出了一种新颖的摘要方法，通过利用多个数据源的优势，提供对复杂主题更全面和信息丰富的理解，以解决这些挑战。该研究超越了传统的单模态数据源，如文本文档，整合了更多样化的数据，包括YouTube播放列表、预印本和维基百科页面。然后，上述各种来源被转换为统一的文本表示，实现更全面的分析。这种多面向的摘要生成方法使我们能够从更广泛的来源中提取相关信息。该方法的主要原则是最大化信息增益，同时最小化信息重叠，保持高水平的信息量，从而鼓励生成高度连贯的摘要。

更新时间: 2024-06-19 17:15:47

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.13715v1

BEACON: Balancing Convenience and Nutrition in Meals With Long-Term Group Recommendations and Reasoning on Multimodal Recipes

A common, yet regular, decision made by people, whether healthy or with any health condition, is to decide what to have in meals like breakfast, lunch, and dinner, consisting of a combination of foods for appetizer, main course, side dishes, desserts, and beverages. However, often this decision is seen as a trade-off between nutritious choices (e.g., low salt and sugar) or convenience (e.g., inexpensive, fast to prepare/obtain, taste better). In this preliminary work, we present a data-driven approach for the novel meal recommendation problem that can explore and balance choices for both considerations while also reasoning about a food's constituents and cooking process. Beyond the problem formulation, our contributions also include a goodness measure, a recipe conversion method from text to the recently introduced multimodal rich recipe representation (R3) format, and learning methods using contextual bandits that show promising results.

Updated: 2024-06-19 17:14:41

标题: "BEACON: 在长期群体推荐和多模式食谱上平衡便利性和营养的餐饮"

摘要: 人们经常要做的一个普遍且定期的决定是决定早餐、午餐和晚餐等餐饮中要选择什么，包括开胃菜、主菜、配菜、甜点和饮料的组合。然而，通常这个决定被视为在营养选择（例如低盐和低糖）和便利性（例如廉价、快速准备/获得、味道更好）之间的权衡。在这项初步工作中，我们提出了一种基于数据驱动的方法，用于新型餐饮推荐问题，可以在探索和平衡这两种考虑的选择的同时，还可以推理食物的组成成分和烹饪过程。除了问题的描述，我们的贡献还包括一个好度量、一种将文本转换为最近引入的多模式丰富菜谱表示（R3）格式的食谱转换方法，以及使用上下文强化学习方法的学习方法，展示了令人期待的结果。

更新时间: 2024-06-19 17:14:41

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.13714v1

Imagining In-distribution States: How Predictable Robot Behavior Can Enable User Control Over Learned Policies

It is crucial that users are empowered to take advantage of the functionality of a robot and use their understanding of that functionality to perform novel and creative tasks. Given a robot trained with Reinforcement Learning (RL), a user may wish to leverage that autonomy along with their familiarity of how they expect the robot to behave to collaborate with the robot. One technique is for the user to take control of some of the robot's action space through teleoperation, allowing the RL policy to simultaneously control the rest. We formalize this type of shared control as Partitioned Control (PC). However, this may not be possible using an out-of-the-box RL policy. For example, a user's control may bring the robot into a failure state from the policy's perspective, causing it to act unexpectedly and hindering the success of the user's desired task. In this work, we formalize this problem and present Imaginary Out-of-Distribution Actions, IODA, an initial algorithm which empowers users to leverage their expectations of a robot's behavior to accomplish new tasks. We deploy IODA in a user study with a real robot and find that IODA leads to both better task performance and a higher degree of alignment between robot behavior and user expectation. We also show that in PC, there is a strong and significant correlation between task performance and the robot's ability to meet user expectations, highlighting the need for approaches like IODA. Code is available at https://github.com/AABL-Lab/ioda_roman_2024

Updated: 2024-06-19 17:08:28

标题: 想象中的分布状态：如何通过可预测的机器人行为实现用户对学习策略的控制

摘要: 用户能够利用机器人的功能，并利用对功能的理解执行新颖和创造性的任务至关重要。给定一个经过强化学习（RL）训练的机器人，用户可能希望利用该自主性以及他们对机器人行为的熟悉程度与机器人合作。一种技术是用户通过远程操作控制机器人的某些动作空间，同时允许RL策略控制其余部分。我们将此类型的共享控制形式化为Partitioned Control（PC）。然而，使用现成的RL策略可能不可能实现这一点。例如，用户的控制可能导致机器人从策略的角度进入失败状态，导致其表现出乎意料，从而妨碍用户期望的任务成功。在这项工作中，我们形式化了这一问题，并提出了一种名为Imaginary Out-of-Distribution Actions（IODA）的初始算法，使用户能够利用他们对机器人行为的期望来完成新任务。我们在一个真实机器人的用户研究中部署了IODA，并发现IODA既提高了任务性能，又增加了机器人行为与用户期望之间的一致性。我们还展示了在PC中，任务性能与机器人满足用户期望的能力之间存在强烈且显著的相关性，突出了IODA等方法的必要性。代码可在https://github.com/AABL-Lab/ioda_roman_2024找到。

更新时间: 2024-06-19 17:08:28

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.13711v1

Learning Linear Utility Functions From Pairwise Comparison Queries

We study learnability of linear utility functions from pairwise comparison queries. In particular, we consider two learning objectives. The first objective is to predict out-of-sample responses to pairwise comparisons, whereas the second is to approximately recover the true parameters of the utility function. We show that in the passive learning setting, linear utilities are efficiently learnable with respect to the first objective, both when query responses are uncorrupted by noise, and under Tsybakov noise when the distributions are sufficiently "nice". In contrast, we show that utility parameters are not learnable for a large set of data distributions without strong modeling assumptions, even when query responses are noise-free. Next, we proceed to analyze the learning problem in an active learning setting. In this case, we show that even the second objective is efficiently learnable, and present algorithms for both the noise-free and noisy query response settings. Our results thus exhibit a qualitative learnability gap between passive and active learning from pairwise preference queries, demonstrating the value of the ability to select pairwise queries for utility learning.

Updated: 2024-06-19 17:08:13

标题: 学习线性效用函数的两两比较查询

摘要: 我们研究了从成对比较查询中学习线性效用函数的可学习性。特别是，我们考虑了两个学习目标。第一个目标是预测成对比较的样本外响应，而第二个目标是近似恢复效用函数的真实参数。我们表明，在被动学习环境中，线性效用在第一个目标方面是可有效学习的，无论查询响应是否受到噪声干扰，或者在分布足够“好”的情况下受到Tsybakov噪声干扰。相比之下，我们表明，在没有强建模假设的情况下，对于大量数据分布，即使查询响应是无噪声的，效用参数也无法被学习。接下来，我们继续分析主动学习环境中的学习问题。在这种情况下，我们展示了即使第二个目标也是可有效学习的，并提出了适用于无噪声和有噪声查询响应环境的算法。因此，我们的结果展示了被动学习和主动学习之间的成对偏好查询学习的定性可学习性差距，展示了选择成对查询进行效用学习的价值。

更新时间: 2024-06-19 17:08:13

领域: cs.LG,cs.AI,cs.CY,stat.ML

下载: http://arxiv.org/abs/2405.02612v3

Breaking News: Case Studies of Generative AI's Use in Journalism

Journalists are among the many users of large language models (LLMs). To better understand the journalist-AI interactions, we conduct a study of LLM usage by two news agencies through browsing the WildChat dataset, identifying candidate interactions, and verifying them by matching to online published articles. Our analysis uncovers instances where journalists provide sensitive material such as confidential correspondence with sources or articles from other agencies to the LLM as stimuli and prompt it to generate articles, and publish these machine-generated articles with limited intervention (median output-publication ROUGE-L of 0.62). Based on our findings, we call for further research into what constitutes responsible use of AI, and the establishment of clear guidelines and best practices on using LLMs in a journalistic context.

Updated: 2024-06-19 16:58:32

标题: 突发新闻：生成式人工智能在新闻业中的应用案例研究

摘要: 记者是众多大型语言模型（LLMs）的用户之一。为了更好地了解记者与人工智能的互动，我们通过浏览WildChat数据集，对两家新闻机构使用LLM的情况进行了研究，识别了候选互动，并通过与在线发布的文章进行匹配来验证它们。我们的分析揭示了记者提供敏感材料（如与消息来源的机密通信或其他机构的文章）给LLM作为刺激，并促使其生成文章的情况，以及在有限干预下发布这些机器生成的文章（中位输出-发布ROUGE-L为0.62）。根据我们的发现，我们呼吁进一步研究何为对AI的负责使用，并建立在新闻环境下使用LLMs的明确准则和最佳实践。

更新时间: 2024-06-19 16:58:32

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.13706v1

EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DFT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DFT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance.

Updated: 2024-06-19 16:58:28

标题: EndoUIC: 胶囊内窥镜统一照明校正的快速扩散变换器

摘要: 无线胶囊内窥镜（WCE）因其非侵入性和无痛的特点而备受重视，尽管其有效性受到硬件约束和复杂的内部动态所影响，导致图像过曝或曝光不足。虽然研究人员已经讨论了WCE低光增强的挑战，但不同曝光水平的校正问题仍未得到充分探讨。为了解决这个问题，我们引入了EndoUIC，一种使用端到端可提示扩散变换器（DFT）模型的WCE统一照明校正解决方案。在我们的工作中，照明提示模块将引导模型适应不同的曝光水平并执行有针对性的图像增强，其中自适应提示集成（API）和全局提示扫描仪（GPS）模块进一步增强了提示参数和特征之间的并行表示学习。此外，U形恢复DFT模型将捕获长距离依赖性和上下文信息，用于统一照明恢复。此外，我们提出了一个新颖的胶囊内窥镜曝光校正（CEC）数据集，包括由专业摄影师注释的地面真实和损坏图像对。对四个数据集进行的广泛实验表明，我们提出的方法和组件在WCE照明恢复方面的有效性，并进一步的下游实验证明了其在临床诊断和手术辅助方面的实用性。

更新时间: 2024-06-19 16:58:28

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.13705v1

Listenable Maps for Audio Classifiers

Despite the impressive performance of deep learning models across diverse tasks, their complexity poses challenges for interpretation. This challenge is particularly evident for audio signals, where conveying interpretations becomes inherently difficult. To address this issue, we introduce Listenable Maps for Audio Classifiers (L-MAC), a posthoc interpretation method that generates faithful and listenable interpretations. L-MAC utilizes a decoder on top of a pretrained classifier to generate binary masks that highlight relevant portions of the input audio. We train the decoder with a loss function that maximizes the confidence of the classifier decision on the masked-in portion of the audio while minimizing the probability of model output for the masked-out portion. Quantitative evaluations on both in-domain and out-of-domain data demonstrate that L-MAC consistently produces more faithful interpretations than several gradient and masking-based methodologies. Furthermore, a user study confirms that, on average, users prefer the interpretations generated by the proposed technique.

Updated: 2024-06-19 16:49:14

标题: 可听地图对音频分类器的影响

摘要: 尽管深度学习模型在各种任务中表现出色，但其复杂性给解释带来了挑战。这一挑战在音频信号中尤为明显，解释变得困难。为了解决这个问题，我们引入了适用于音频分类器的可听地图（L-MAC），这是一种后置解释方法，可以生成忠实且可听的解释。L-MAC利用一个解码器在预训练分类器之上生成二进制掩码，突出显示输入音频的相关部分。我们使用一个损失函数训练解码器，该函数最大化分类器对音频掩码部分的置信度，同时最小化模型对掩码部分的输出概率。对领域内和领域外数据的定量评估表明，L-MAC始终比几种基于梯度和掩码的方法产生更忠实的解释。此外，一项用户研究证实，平均而言，用户更喜欢由提出的技术生成的解释。

更新时间: 2024-06-19 16:49:14

领域: cs.SD,cs.LG,eess.AS,eess.SP

下载: http://arxiv.org/abs/2403.13086v3

Multilingual De-Duplication Strategies: Applying scalable similarity search with monolingual & multilingual embedding models

This paper addresses the deduplication of multilingual textual data using advanced NLP tools. We compare a two-step method involving translation to English followed by embedding with mpnet, and a multilingual embedding model (distiluse). The two-step approach achieved a higher F1 score (82% vs. 60%), particularly with less widely used languages, which can be increased up to 89% by leveraging expert rules based on domain knowledge. We also highlight limitations related to token length constraints and computational efficiency. Our methodology suggests improvements for future multilingual deduplication tasks.

Updated: 2024-06-19 16:48:14

标题: 多语言去重策略：应用可扩展的相似度搜索与单语和多语言嵌入模型

摘要: 本文讨论了使用先进的自然语言处理工具对多语言文本数据进行去重。我们比较了一个两步方法，涉及将文本翻译成英语，然后使用mpnet进行嵌入，以及一个多语言嵌入模型(distiluse)。两步方法实现了更高的F1分数(82% vs. 60%)，特别是在使用较少的语言时，通过利用基于领域知识的专家规则，可以将其提高到89%。我们还强调了与令牌长度限制和计算效率相关的限制。我们的方法建议针对未来的多语言去重任务进行改进。

更新时间: 2024-06-19 16:48:14

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2406.13695v1

An Embedded Intelligent System for Attendance Monitoring

In this paper, we propose an intelligent embedded system for monitoring class attendance and sending the attendance list to a remote computer. The proposed system consists of two parts : an embedded device (Raspberry with PI camera) for facial recognition and a web application for attendance management. The proposed solution take into account the different challenges: the limited resources of the Raspberry Pi, the need to adapt the facial recognition model and achieving acceptable performance using images provided by the Raspberry Pi camera.

Updated: 2024-06-19 16:46:19

标题: 一个用于考勤监控的嵌入式智能系统

摘要: 在这篇论文中，我们提出了一种智能嵌入式系统，用于监测课堂出勤情况并将出勤名单发送至远程计算机。该系统由两部分组成：一个嵌入式设备（带有树莓派摄像头）用于人脸识别，以及一个用于出勤管理的网络应用程序。提出的解决方案考虑了不同的挑战：树莓派的有限资源、需要调整人脸识别模型以及使用树莓派摄像头提供的图像实现可接受的性能。

更新时间: 2024-06-19 16:46:19

领域: cs.AI

下载: http://arxiv.org/abs/2406.13694v1

From Single Agent to Multi-Agent: Improving Traffic Signal Control

Due to accelerating urbanization, the importance of solving the signal control problem increases. This paper analyzes various existing methods and suggests options for increasing the number of agents to reduce the average travel time. Experiments were carried out with 2 datasets. The results show that in some cases, the implementation of multiple agents can improve existing methods. For a fine-tuned large language model approach there is small enhancement on all metrics.

Updated: 2024-06-19 16:46:15

标题: 从单一代理到多代理：改进交通信号控制

摘要: 由于城市化加速，解决信号控制问题的重要性日益增加。本文分析了各种现有方法，并提出增加代理数量以减少平均旅行时间的选项。实验使用了2个数据集进行。结果显示，在某些情况下，实施多个代理可以改善现有方法。对于经过微调的大型语言模型方法，在所有指标上都有轻微改进。

更新时间: 2024-06-19 16:46:15

领域: cs.AI

下载: http://arxiv.org/abs/2406.13693v1

Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery

Over the past years, images generated by artificial intelligence have become more prevalent and more realistic. Their advent raises ethical questions relating to misinformation, artistic expression, and identity theft, among others. The crux of many of these moral questions is the difficulty in distinguishing between real and fake images. It is important to develop tools that are able to detect AI-generated images, especially when these images are too realistic-looking for the human eye to identify as fake. This paper proposes a dual-branch neural network architecture that takes both images and their Fourier frequency decomposition as inputs. We use standard CNN-based methods for both branches as described in Stuchi et al. [7], followed by fully-connected layers. Our proposed model achieves an accuracy of 94% on the CIFAKE dataset, which significantly outperforms classic ML methods and CNNs, achieving performance comparable to some state-of-the-art architectures, such as ResNet.

Updated: 2024-06-19 16:42:04

标题: AI生成图像检测的双输入神经模型的开发

摘要: 在过去的几年中，由人工智能生成的图像变得越来越普遍和逼真。它们的出现引发了与虚假信息、艺术表达和身份盗窃等伦理问题有关的讨论。许多道德问题的核心在于区分真假图像的困难。开发能够检测人工智能生成的图像的工具至关重要，特别是当这些图像对人眼来说看起来太逼真而无法辨别真伪时。本文提出了一种双分支神经网络架构，将图像及其傅立叶频率分解作为输入。我们使用了Stuchi等人描述的标准基于CNN的方法作为两个分支，然后是全连接层。我们提出的模型在CIFAKE数据集上达到了94%的准确率，明显优于经典的机器学习方法和CNN，性能可与一些最先进的架构（如ResNet）相媲美。

更新时间: 2024-06-19 16:42:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13688v1

IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning

Image-text contrastive models such as CLIP learn transferable and robust representations for zero-shot transfer to a variety of downstream tasks. However, to obtain strong downstream performances, prompts need to be carefully curated, which can be a tedious engineering task. To address the issue of manual prompt engineering, prompt-tuning is used where a set of contextual vectors are learned by leveraging information from the training data. Despite their effectiveness, existing prompt-tuning frameworks often lack interpretability, thus limiting their ability to understand the compositional nature of images. In this work, we first identify that incorporating compositional attributes (e.g., a "green" tree frog) in the design of manual prompts can significantly enhance image-text alignment scores. Building upon this observation, we propose a novel and interpretable prompt-tuning method named IntCoOp, which learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning. To assess the effectiveness of our approach, we evaluate IntCoOp across two representative tasks in a few-shot learning setup: generalization to novel classes, and unseen domain shifts. Through extensive experiments across 10 downstream datasets on CLIP, we find that introducing attribute-level inductive biases leads to superior performance against state-of-the-art prompt tuning frameworks. Notably, in a 16-shot setup, IntCoOp improves CoOp by 7.35% in average performance across 10 diverse datasets.

Updated: 2024-06-19 16:37:31

标题: IntCoOp: 可解释性视觉语言提示调整

摘要: 图像文本对比模型，如CLIP学习可转移和稳健的表示，以进行零样本转移到各种下游任务。然而，为了获得强大的下游性能，需要仔细策划提示，这可能是一项繁琐的工程任务。为了解决手动提示工程问题，使用提示调整，通过利用训练数据中的信息学习一组上下文向量。尽管它们的有效性，现有的提示调整框架通常缺乏可解释性，从而限制了它们理解图像的组合性质的能力。在这项工作中，我们首先确定在设计手动提示时纳入组合属性（例如，“绿色”树蛙）可以显着提高图像文本对齐分数。基于这一观察结果，我们提出了一种新颖且可解释的提示调整方法，名为IntCoOp，该方法在提示调整期间学习联合对齐属性级归纳偏差和类别嵌入。为了评估我们的方法的有效性，我们在少量样本学习设置中评估了IntCoOp在两个代表性任务中的表现：对新类别的泛化和看不见的领域转移。通过在CLIP上的10个下游数据集上进行大量实验，我们发现引入属性级归纳偏差会导致优于现有最先进提示调整框架的性能。值得注意的是，在16个样本设置中，IntCoOp在10个不同数据集上平均性能提高了7.35％。

更新时间: 2024-06-19 16:37:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13683v1

On the Consistency of Fairness Measurement Methods for Regression Tasks

With growing applications of Machine Learning (ML) techniques in the real world, it is highly important to ensure that these models work in an equitable manner. One main step in ensuring fairness is to effectively measure fairness, and to this end, various metrics have been proposed in the past literature. While the computation of those metrics are straightforward in the classification set-up, it is computationally intractable in the regression domain. To address the challenge of computational intractability, past literature proposed various methods to approximate such metrics. However, they did not verify the extent to which the output of such approximation algorithms are consistent with each other. To fill this gap, this paper comprehensively studies the consistency of the output of various fairness measurement methods through conducting an extensive set of experiments on various regression tasks. As a result, it finds that while some fairness measurement approaches show strong consistency across various regression tasks, certain methods show a relatively poor consistency in certain regression tasks. This, in turn, calls for a more principled approach for measuring fairness in the regression domain.

Updated: 2024-06-19 16:35:23

标题: 关于回归任务中公平性测量方法的一致性

摘要: 随着机器学习（ML）技术在现实世界中的应用不断增长，确保这些模型以公平的方式工作至关重要。确保公平性的一个主要步骤是有效地衡量公平性，为此，过去的文献提出了各种度量标准。虽然在分类设置中这些度量的计算是直接的，但在回归领域中这种计算是难以计算的。为了解决计算难题，过去的文献提出了各种方法来近似这些度量。然而，他们没有验证这些近似算法的输出在多大程度上是一致的。为了填补这一空白，本文通过在各种回归任务上进行广泛的实验，全面研究了各种公平性测量方法的输出一致性。结果发现，尽管一些公平性测量方法在各种回归任务中表现出较强的一致性，但某些方法在某些回归任务中表现出相对较差的一致性。因此，这要求在回归领域采用更具原则性的方法来衡量公平性。

更新时间: 2024-06-19 16:35:23

领域: cs.LG

下载: http://arxiv.org/abs/2406.13681v1

Prose-to-P4: Leveraging High Level Languages

Languages such as P4 and NPL have enabled a wide and diverse range of networking applications that take advantage of programmable dataplanes. However, software development in these languages is difficult. To address this issue, high-level languages have been designed to offer programmers powerful abstractions that reduce the time, effort and domain-knowledge required for developing networking applications. These languages are then translated by a compiler into P4/NPL code. Inspired by the recent success of Large Language Models (LLMs) in the task of code generation, we propose to raise the level of abstraction even higher, employing LLMs to translate prose into high-level networking code. We analyze the problem, focusing on the motivation and opportunities, as well as the challenges involved and sketch out a roadmap for the development of a system that can generate high-level dataplane code from natural language instructions. We present some promising preliminary results on generating Lucid code from natural language.

Updated: 2024-06-19 16:32:27

标题: 从散文到P4：利用高级语言

摘要: P4和NPL等语言已经实现了广泛和多样化的网络应用程序，利用可编程数据平面的优势。然而，在这些语言中进行软件开发是困难的。为了解决这个问题，设计了高级语言，为程序员提供了强大的抽象，减少了开发网络应用程序所需的时间、精力和领域知识。然后，这些语言被编译器转换成P4/NPL代码。受到大型语言模型（LLMs）在代码生成任务中的最近成功的启发，我们提出将抽象层次提高到更高水平，利用LLMs将散文翻译成高级网络代码。我们分析了这个问题，着重关注动机和机会，以及涉及的挑战，并勾勒了开发一个能够从自然语言指令生成高级数据平面代码的系统的路线图。我们展示了一些有希望的初步结果，通过自然语言生成Lucid代码。

更新时间: 2024-06-19 16:32:27

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2406.13679v1

Can LLM-Augmented autonomous agents cooperate?, An evaluation of their cooperative capabilities through Melting Pot

As the field of AI continues to evolve, a significant dimension of this progression is the development of Large Language Models and their potential to enhance multi-agent artificial intelligence systems. This paper explores the cooperative capabilities of Large Language Model-augmented Autonomous Agents (LAAs) using the well-known Meltin Pot environments along with reference models such as GPT4 and GPT3.5. Preliminary results suggest that while these agents demonstrate a propensity for cooperation, they still struggle with effective collaboration in given environments, emphasizing the need for more robust architectures. The study's contributions include an abstraction layer to adapt Melting Pot game scenarios for LLMs, the implementation of a reusable architecture for LLM-mediated agent development - which includes short and long-term memories and different cognitive modules, and the evaluation of cooperation capabilities using a set of metrics tied to the Melting Pot's "Commons Harvest" game. The paper closes, by discussing the limitations of the current architectural framework and the potential of a new set of modules that fosters better cooperation among LAAs.

Updated: 2024-06-19 16:23:05

标题: 可以增强的LLM自主代理能够合作吗？通过Melting Pot评估它们的合作能力

摘要: 随着人工智能领域的不断发展，一个重要的方面是大型语言模型的发展以及它们在增强多智能体人工智能系统方面的潜力。本文探讨了使用著名的Meltin Pot环境以及参考模型如GPT4和GPT3.5的大型语言模型增强的自主代理（LAAs）的合作能力。初步结果表明，虽然这些代理展现出合作的倾向，但它们仍然在特定环境下面临有效合作的困难，强调了需要更加强大的架构。该研究的贡献包括一个用于适应LLMs的Melting Pot游戏场景的抽象层，实现了一个可重复使用的LLM中介代理开发架构 - 包括短期和长期记忆以及不同的认知模块，并使用一组与Melting Pot的“Commons Harvest”游戏相关的指标评估合作能力。本文最后讨论了当前架构框架的局限性以及促进LAAs更好合作的一组新模块的潜力。

更新时间: 2024-06-19 16:23:05

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.11381v2

KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR

Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handles out-of-domain questions and verifies the generated queries with query execution.Our framework begins by standardizing the structure of questions into a templated format. We use a powerful large language model (LLM), fine-tuned GPT-3.5 with detailed prompts involving the table schemas of the EHR database system. Our experimental results demonstrate the effectiveness of our framework on the EHRSQL-2024 benchmark benchmark, a shared task in the ClinicalNLP workshop. Although a straightforward fine-tuning of GPT shows promising results on the development set, it struggled with the out-of-domain questions in the test set. With our framework, we improve our system's adaptability and achieve competitive performances in the official leaderboard of the EHRSQL-2024 challenge.

Updated: 2024-06-19 16:21:46

标题: KU-DMIS在EHRSQL 2024：通过电子健康记录中的问题模板生成SQL查询

摘要: 将自然语言问题转换为SQL查询对于精确从电子健康记录（EHR）数据库中检索数据至关重要。在这个过程中的一个重要挑战是检测和拒绝那些要求超出数据库范围或超出系统能力的无法回答的问题。在本文中，我们介绍了一个新颖的文本到SQL框架，能够强大地处理领域外问题，并通过查询执行验证生成的查询。我们的框架从将问题结构标准化为模板格式开始。我们使用了一个强大的大型语言模型（LLM），细化了涉及EHR数据库系统表结构的详细提示的GPT-3.5。我们的实验结果展示了我们的框架在EHRSQL-2024基准测试中的有效性，这是临床自然语言处理研讨会的一个共享任务。尽管对GPT的直接微调在开发集上表现出有希望的结果，但它在测试集中的领域外问题上表现不佳。通过我们的框架，我们提高了系统的适应性，并在EHRSQL-2024挑战的官方排行榜上取得了竞争力的表现。

更新时间: 2024-06-19 16:21:46

领域: cs.DB,cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2406.00014v2

MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation

Medical image segmentation of anatomical structures and pathology is crucial in modern clinical diagnosis, disease study, and treatment planning. To date, great progress has been made in deep learning-based segmentation techniques, but most methods still lack data efficiency, generalizability, and interactability. Consequently, the development of new, precise segmentation methods that demand fewer labeled datasets is of utmost importance in medical image analysis. Recently, the emergence of foundation models, such as CLIP and Segment-Anything-Model (SAM), with comprehensive cross-domain representation opened the door for interactive and universal image segmentation. However, exploration of these models for data-efficient medical image segmentation is still limited, but is highly necessary. In this paper, we propose a novel framework, called MedCLIP-SAM that combines CLIP and SAM models to generate segmentation of clinical scans using text prompts in both zero-shot and weakly supervised settings. To achieve this, we employed a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss to fine-tune the BiomedCLIP model and the recent gScoreCAM to generate prompts to obtain segmentation masks from SAM in a zero-shot setting. Additionally, we explored the use of zero-shot segmentation labels in a weakly supervised paradigm to improve the segmentation quality further. By extensively testing three diverse segmentation tasks and medical image modalities (breast tumor ultrasound, brain tumor MRI, and lung X-ray), our proposed framework has demonstrated excellent accuracy. Code is available at https://github.com/HealthX-Lab/MedCLIP-SAM.

Updated: 2024-06-19 16:21:34

标题: MedCLIP-SAM：在通用医学图像分割方面架起文本与图像的桥梁

摘要: 解剖结构和病理的医学图像分割在现代临床诊断、疾病研究和治疗规划中至关重要。迄今为止，基于深度学习的分割技术取得了巨大进展，但大多数方法仍然缺乏数据效率、泛化能力和可交互性。因此，在医学图像分析中，开发要求较少标记数据集的新型精确分割方法至关重要。最近，基础模型（如CLIP和Segment-Anything-Model（SAM））的出现，具有全面的跨领域表示，为交互式和通用图像分割打开了大门。然而，对于数据高效医学图像分割的这些模型的探索仍然有限，但是非常必要。在本文中，我们提出了一个新颖的框架，称为MedCLIP-SAM，结合CLIP和SAM模型，使用文本提示在零样本和弱监督设置下生成临床扫描的分割。为了实现这一目标，我们采用了一种新的分离的硬负噪声对比估计（DHN-NCE）损失，对BiomedCLIP模型进行微调，并使用最近的gScoreCAM生成提示，从SAM中获得分割掩模在零样本设置中。此外，我们探索了在弱监督范式中使用零样本分割标签，以进一步提高分割质量。通过广泛测试三个不同的分割任务和医学图像模态（乳腺肿瘤超声、脑瘤MRI和肺X光），我们提出的框架展示了出色的准确性。源代码可在https://github.com/HealthX-Lab/MedCLIP-SAM找到。

更新时间: 2024-06-19 16:21:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.20253v2

Improved bounds for calibration via stronger sign preservation games

A set of probabilistic forecasts is calibrated if each prediction of the forecaster closely approximates the empirical distribution of outcomes on the subset of timesteps where that prediction was made. We study the fundamental problem of online calibrated forecasting of binary sequences, which was initially studied by Foster & Vohra (1998). They derived an algorithm with $O(T^{2/3})$ calibration error after $T$ time steps, and showed a lower bound of $\Omega(T^{1/2})$. These bounds remained stagnant for two decades, until Qiao & Valiant (2021) improved the lower bound to $\Omega(T^{0.528})$ by introducing a combinatorial game called sign preservation and showing that lower bounds for this game imply lower bounds for calibration. We introduce a strengthening of Qiao & Valiant's game that we call sign preservation with reuse (SPR). We prove that the relationship between SPR and calibrated forecasting is bidirectional: not only do lower bounds for SPR translate into lower bounds for calibration, but algorithms for SPR also translate into new algorithms for calibrated forecasting. In particular, any strategy that improves the trivial upper bound for the value of the SPR game would imply a forecasting algorithm with calibration error exponent less than 2/3, improving Foster & Vohra's upper bound for the first time. Using similar ideas, we then prove a slightly stronger lower bound than that of Qiao & Valiant, namely $\Omega(T^{0.54389})$. Our lower bound is obtained by an oblivious adversary, marking the first $\omega(T^{1/2})$ calibration lower bound for oblivious adversaries.

Updated: 2024-06-19 16:19:39

标题: 通过更强的符号保留游戏改进校准的界限

摘要: 一组概率预测被校准，如果每个预测者的预测都与在进行该预测的时间步骤子集上的结果的经验分布紧密匹配。我们研究了二进制序列在线校准预测的基本问题，这个问题最初由Foster＆Vohra（1998）研究过。他们推导出一个算法，在T个时间步之后，校准误差为O（T^（2/3）），并且显示出Ω（T^（1/2））的一个下限。这些界限在两十年间保持不变，直到Qiao＆Valiant（2021）通过引入一种称为符号保留的组合游戏，将下限改进为Ω（T^（0.528）），并且证明了该游戏的下限暗示了校准的下限。我们引入了Qiao＆Valiant游戏的加强版，称为带重用的符号保留（SPR）。我们证明了SPR与校准预测之间的关系是双向的：SPR的下限不仅会转化为校准的下限，而且SPR的算法也会转化为新的校准预测算法。特别是，任何改进SPR游戏价值的平凡上限的策略将意味着具有校准误差指数小于2/3的预测算法，并且首次改进了Foster＆Vohra的上限。然后，通过类似的思想，我们证明了比Qiao＆Valiant稍微更强的下限，即Ω（T^（0.54389））。我们的下限是通过一个无意识的对手获得的，这是对于无意识的对手而言第一个Ω（T^（1/2））校准下限。

更新时间: 2024-06-19 16:19:39

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2406.13668v1

Challenges in Binary Classification

Binary Classification plays an important role in machine learning. For linear classification, SVM is the optimal binary classification method. For nonlinear classification, the SVM algorithm needs to complete the classification task by using the kernel function. Although the SVM algorithm with kernel function is very effective, the selection of kernel function is empirical, which means that the kernel function may not be optimal. Therefore, it is worth studying how to obtain an optimal binary classifier. In this paper, the problem of finding the optimal binary classifier is considered as a variational problem. We design the objective function of this variational problem through the max-min problem of the (Euclidean) distance between two classes. For linear classification, it can be deduced that SVM is a special case of this variational problem framework. For Euclidean distance, it is proved that the proposed variational problem has some limitations for nonlinear classification. Therefore, how to design a more appropriate objective function to find the optimal binary classifier is still an open problem. Further, it's discussed some challenges and problems in finding the optimal classifier.

Updated: 2024-06-19 16:11:59

标题: 二元分类中的挑战

摘要: 二分类在机器学习中起着重要作用。对于线性分类，支持向量机（SVM）是最优的二分类方法。对于非线性分类，SVM算法需要通过使用核函数来完成分类任务。虽然带有核函数的SVM算法非常有效，但核函数的选择是经验性的，这意味着核函数可能不是最优的。因此，值得研究如何获得最优的二分类器。本文将寻找最优二分类器的问题视为一个变分问题。我们通过两个类之间（欧几里得）距离的最大最小问题设计了这个变分问题的目标函数。对于线性分类，可以推导出SVM是这个变分问题框架的一个特殊情况。对于欧几里得距离，证明了所提出的变分问题对于非线性分类存在一些限制。因此，如何设计一个更合适的目标函数来寻找最优的二分类器仍然是一个开放的问题。此外，还讨论了寻找最优分类器中的一些挑战和问题。

更新时间: 2024-06-19 16:11:59

领域: cs.LG

下载: http://arxiv.org/abs/2406.13665v1

Root-KGD: A Novel Framework for Root Cause Diagnosis Based on Knowledge Graph and Industrial Data

With the development of intelligent manufacturing and the increasing complexity of industrial production, root cause diagnosis has gradually become an important research direction in the field of industrial fault diagnosis. However, existing research methods struggle to effectively combine domain knowledge and industrial data, failing to provide accurate, online, and reliable root cause diagnosis results for industrial processes. To address these issues, a novel fault root cause diagnosis framework based on knowledge graph and industrial data, called Root-KGD, is proposed. Root-KGD uses the knowledge graph to represent domain knowledge and employs data-driven modeling to extract fault features from industrial data. It then combines the knowledge graph and data features to perform knowledge graph reasoning for root cause identification. The performance of the proposed method is validated using two industrial process cases, Tennessee Eastman Process (TEP) and Multiphase Flow Facility (MFF). Compared to existing methods, Root-KGD not only gives more accurate root cause variable diagnosis results but also provides interpretable fault-related information by locating faults to corresponding physical entities in knowledge graph (such as devices and streams). In addition, combined with its lightweight nature, Root-KGD is more effective in online industrial applications.

Updated: 2024-06-19 16:11:43

标题: Root-KGD：基于知识图谱和工业数据的根本原因诊断新框架

摘要: 随着智能制造的发展和工业生产复杂性的增加，根本原因诊断逐渐成为工业故障诊断领域的重要研究方向。然而，现有研究方法很难有效地结合领域知识和工业数据，无法为工业过程提供准确、在线和可靠的根本原因诊断结果。为解决这些问题，提出了一种基于知识图和工业数据的新型故障根本原因诊断框架，称为Root-KGD。Root-KGD利用知识图表示领域知识，并采用数据驱动建模从工业数据中提取故障特征。然后，它将知识图和数据特征结合起来，进行知识图推理以进行根本原因识别。所提出方法的性能使用两个工业过程案例进行验证，即Tennessee Eastman Process（TEP）和Multiphase Flow Facility（MFF）。与现有方法相比，Root-KGD不仅提供更准确的根本原因变量诊断结果，还通过将故障定位到知识图中对应的物理实体（如设备和流）提供可解释的故障相关信息。此外，结合其轻量级特性，Root-KGD在在线工业应用中更为有效。

更新时间: 2024-06-19 16:11:43

领域: cs.AI

下载: http://arxiv.org/abs/2406.13664v1

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution.

Updated: 2024-06-19 16:10:26

标题: 基于模型内部的答案归因，用于可信的检索增强生成

摘要: 确保模型答案的可验证性是问答（QA）领域中检索增强生成（RAG）的一个基本挑战。最近，提出了自引用提示，以使大型语言模型（LLMs）生成支持文档的引用和答案。然而，自引用的LLMs经常难以匹配所需的格式，引用不存在的来源，并且未能忠实地反映LLMs在生成过程中的上下文使用。在这项工作中，我们提出了MIRAGE--基于模型内部的RAG解释--一种使用模型内部进行忠实答案归因的即插即用方法。MIRAGE通过显著性方法检测上下文敏感的答案标记，并将其与通过检索得到的文档配对，以贡献于其预测。我们在一个多语言摘要QA数据集上评估了我们提出的方法，发现与人类答案归因高度一致。在开放式QA中，MIRAGE实现了与自引用相当的引用质量和效率，同时还允许对归因参数进行更精细的控制。我们的定性评估突出了MIRAGE归因的忠实性，并强调了将模型内部应用于RAG答案归因的潜在应用。

更新时间: 2024-06-19 16:10:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13663v1

Hitchhiker's guide on Energy-Based Models: a comprehensive review on the relation with other generative models, sampling and statistical physics

Energy-Based Models (EBMs) have emerged as a powerful framework in the realm of generative modeling, offering a unique perspective that aligns closely with principles of statistical mechanics. This review aims to provide physicists with a comprehensive understanding of EBMs, delineating their connection to other generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Normalizing Flows. We explore the sampling techniques crucial for EBMs, including Markov Chain Monte Carlo (MCMC) methods, and draw parallels between EBM concepts and statistical mechanics, highlighting the significance of energy functions and partition functions. Furthermore, we delve into state-of-the-art training methodologies for EBMs, covering recent advancements and their implications for enhanced model performance and efficiency. This review is designed to clarify the often complex interconnections between these models, which can be challenging due to the diverse communities working on the topic.

Updated: 2024-06-19 16:08:00

标题: 《基于能量的模型搭便车指南：与其他生成模型、采样和统计物理的关系综述》

摘要: 基于能量的模型（EBMs）已经成为生成建模领域中一个强大的框架，提供了一个与统计力学原则紧密对齐的独特视角。本综述旨在为物理学家提供对EBMs的全面理解，勾勒出它们与其他生成模型（如生成对抗网络（GANs）、变分自编码器（VAEs）和归一化流）的联系。我们探讨了对EBMs至关重要的采样技术，包括马尔可夫链蒙特卡洛（MCMC）方法，并在EBM概念和统计力学之间绘制了类比，突出了能量函数和分区函数的重要性。此外，我们深入探讨了EBMs的最新训练方法，涵盖了最近的进展及其对增强模型性能和效率的影响。本综述旨在澄清这些模型之间常常复杂的相互关系，这可能是由于研究这一主题的不同社区之间的多样性而具有挑战性。

更新时间: 2024-06-19 16:08:00

领域: cs.LG,math-ph,math.MP,physics.app-ph,physics.data-an

下载: http://arxiv.org/abs/2406.13661v1

Variational Schrödinger Diffusion Models

Schr\"odinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. However, SB requires estimating the intractable forward score functions, inevitably resulting in the costly implicit training loss based on simulated trajectories. To improve the scalability while preserving efficient transportation plans, we leverage variational inference to linearize the forward score functions (variational scores) of SB and restore simulation-free properties in training backward scores. We propose the variational Schr\"odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport. Theoretically, we use stochastic approximation to prove the convergence of the variational scores and show the convergence of the adaptively generated samples based on the optimal variational scores. Empirically, we test the algorithm in simulated examples and observe that VSDM is efficient in generations of anisotropic shapes and yields straighter sample trajectories compared to the single-variate diffusion. We also verify the scalability of the algorithm in real-world data and achieve competitive unconditional generation performance in CIFAR10 and conditional generation in time series modeling. Notably, VSDM no longer depends on warm-up initializations and has become tuning-friendly in training large-scale experiments.

Updated: 2024-06-19 16:06:23

标题: 变分Schrödinger扩散模型

摘要: 薛定谔桥（SB）已成为优化扩散模型中的交通计划的首选方法。然而，SB需要估计难以处理的前向评分函数，这不可避免地导致基于模拟轨迹的昂贵的隐式训练损失。为了提高可扩展性同时保留高效的交通计划，我们利用变分推断来线性化SB的前向评分函数（变分评分），并在训练后向评分时恢复无需模拟的特性。我们提出了变分薛定谔扩散模型（VSDM），其中前向过程是多元扩散，变分评分被自适应优化以实现高效运输。在理论上，我们使用随机逼近来证明变分评分的收敛性，并展示基于最优变分评分生成的样本的收敛性。在实证方面，我们在模拟示例中测试算法，并观察到VSDM在生成各向异性形状方面效率高，并且与单变量扩散相比，生成的样本轨迹更直。我们还验证了该算法在真实数据中的可扩展性，并在CIFAR10中实现了竞争力强的无条件生成性能，在时间序列建模中实现了有条件生成。值得注意的是，VSDM不再依赖热身初始化，并且在训练大规模实验时更加友好。

更新时间: 2024-06-19 16:06:23

领域: cs.LG

下载: http://arxiv.org/abs/2405.04795v3

Towards Minimal Targeted Updates of Language Models with Targeted Negative Training

Generative models of language exhibit impressive capabilities but still place non-negligible probability mass over undesirable outputs. In this work, we address the task of updating a model to avoid unwanted outputs while minimally changing model behavior otherwise, a challenge we refer to as a minimal targeted update. We first formalize the notion of a minimal targeted update and propose a method to achieve such updates using negative examples from a model's generations. Our proposed Targeted Negative Training (TNT) results in updates that keep the new distribution close to the original, unlike existing losses for negative signal which push down probability but do not control what the updated distribution will be. In experiments, we demonstrate that TNT yields a better trade-off between reducing unwanted behavior and maintaining model generation behavior than baselines, paving the way towards a modeling paradigm based on iterative training updates that constrain models from generating undesirable outputs while preserving their impressive capabilities.

Updated: 2024-06-19 16:06:21

标题: 朝向具有有针对性的负训练的语言模型最小化有针对性的更新

摘要: 语言生成模型展示出令人印象深刻的能力，但仍然存在对不良输出的非可忽略概率。在这项工作中，我们致力于更新模型以避免不良输出，同时最小限度地改变模型行为，这是我们称之为最小目标更新的挑战。我们首先明确了最小目标更新的概念，并提出了一种使用模型生成的负例实现这种更新的方法。我们提出的目标负面训练（TNT）导致更新使新分布接近原始分布，与现有的针对负信号的损失不同，后者将概率值降低，但并不控制更新后的分布。在实验中，我们证明了TNT在减少不良行为和保持模型生成行为之间取得了更好的平衡，为基于迭代训练更新的建模范式铺平了道路，这种范式能够约束模型不生成不良输出，同时保留其令人印象深刻的能力。

更新时间: 2024-06-19 16:06:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13660v1

Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

The rapid advancements in large language models (LLMs) have opened up new opportunities for transforming patient engagement in healthcare through conversational AI. This paper presents an overview of the current landscape of LLMs in healthcare, specifically focusing on their applications in analyzing and generating conversations for improved patient engagement. We showcase the power of LLMs in handling unstructured conversational data through four case studies: (1) analyzing mental health discussions on Reddit, (2) developing a personalized chatbot for cognitive engagement in seniors, (3) summarizing medical conversation datasets, and (4) designing an AI-powered patient engagement system. These case studies demonstrate how LLMs can effectively extract insights and summarizations from unstructured dialogues and engage patients in guided, goal-oriented conversations. Leveraging LLMs for conversational analysis and generation opens new doors for many patient-centered outcomes research opportunities. However, integrating LLMs into healthcare raises important ethical considerations regarding data privacy, bias, transparency, and regulatory compliance. We discuss best practices and guidelines for the responsible development and deployment of LLMs in healthcare settings. Realizing the full potential of LLMs in digital health will require close collaboration between the AI and healthcare professionals communities to address technical challenges and ensure these powerful tools' safety, efficacy, and equity.

Updated: 2024-06-19 16:02:04

标题: 利用大型语言模型提升患者参与度：对话式人工智能在数字健康中的力量

摘要: 大型语言模型（LLMs）的快速进展为通过对话人工智能在医疗保健中转化患者参与提供了新的机会。本文概述了医疗保健中LLMs的当前景观，重点关注它们在分析和生成对话以改善患者参与方面的应用。我们展示了LLMs在处理非结构化对话数据方面的强大能力，通过四个案例研究: （1）分析Reddit上的心理健康讨论，（2）开发个性化的老年认知参与聊天机器人，（3）总结医疗对话数据集，以及（4）设计一个由人工智能驱动的患者参与系统。这些案例研究展示了LLMs如何有效地从非结构化对话中提取见解和总结，并引导患者进行目标导向对话。利用LLMs进行对话分析和生成为许多以患者为中心的结果研究机会打开了新的大门。然而，在医疗保健中整合LLMs引发了重要的伦理考虑，涉及数据隐私、偏见、透明度和监管合规性。我们讨论了在医疗保健环境中负责任地开发和部署LLMs的最佳实践和准则。实现LLMs在数字健康中的全部潜力将需要AI和医疗保健专业人士社区之间密切合作，以解决技术挑战，并确保这些强大工具的安全性、有效性和公平性。

更新时间: 2024-06-19 16:02:04

领域: cs.AI

下载: http://arxiv.org/abs/2406.13659v1

Improving GFlowNets with Monte Carlo Tree Search

Generative Flow Networks (GFlowNets) treat sampling from distributions over compositional discrete spaces as a sequential decision-making problem, training a stochastic policy to construct objects step by step. Recent studies have revealed strong connections between GFlowNets and entropy-regularized reinforcement learning. Building on these insights, we propose to enhance planning capabilities of GFlowNets by applying Monte Carlo Tree Search (MCTS). Specifically, we show how the MENTS algorithm (Xiao et al., 2019) can be adapted for GFlowNets and used during both training and inference. Our experiments demonstrate that this approach improves the sample efficiency of GFlowNet training and the generation fidelity of pre-trained GFlowNet models.

Updated: 2024-06-19 15:58:35

标题: 利用蒙特卡罗树搜索改进GFlowNets

摘要: 生成流网络（GFlowNets）将从组合离散空间中的分布进行抽样视为一个顺序决策问题，训练一个随机策略逐步构建对象。最近的研究揭示了GFlowNets与熵正则化强化学习之间的强连接。基于这些见解，我们提出通过应用蒙特卡洛树搜索（MCTS）来增强GFlowNets的规划能力。具体来说，我们展示了如何将MENTS算法（Xiao等，2019年）调整为适用于GFlowNets，并在训练和推断过程中使用。我们的实验表明，这种方法改善了GFlowNet训练的样本效率和预先训练的GFlowNet模型的生成准确性。

更新时间: 2024-06-19 15:58:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13655v1

Controlling Forgetting with Test-Time Data in Continual Learning

Foundational vision-language models have shown impressive performance on various downstream tasks. Yet, there is still a pressing need to update these models later as new tasks or domains become available. Ongoing Continual Learning (CL) research provides techniques to overcome catastrophic forgetting of previous information when new knowledge is acquired. To date, CL techniques focus only on the supervised training sessions. This results in significant forgetting yielding inferior performance to even the prior model zero shot performance. In this work, we argue that test-time data hold great information that can be leveraged in a self supervised manner to refresh the model's memory of previous learned tasks and hence greatly reduce forgetting at no extra labelling cost. We study how unsupervised data can be employed online to improve models' performance on prior tasks upon encountering representative samples. We propose a simple yet effective student-teacher model with gradient based sparse parameters updates and show significant performance improvements and reduction in forgetting, which could alleviate the role of an offline episodic memory/experience replay buffer.

Updated: 2024-06-19 15:56:21

标题: 使用测试时间数据控制遗忘在连续学习中

摘要: 基础视觉-语言模型在各种下游任务上表现出色。然而，随着新任务或领域的出现，更新这些模型仍然迫在眉睫。持续的持续学习（CL）研究提供了在获取新知识时克服灾难性遗忘的技术。迄今为止，CL技术仅专注于监督训练会话。这导致显着遗忘，使性能不如之前的模型零射击性能。在这项工作中，我们认为测试时间数据包含大量信息，可以以自监督方式利用，刷新模型对先前学习任务的记忆，从而在不额外标记成本的情况下大大减少遗忘。我们研究了如何在线使用无监督数据，在遇到代表性样本时改进模型在先前任务上的性能。我们提出了一个简单但有效的师生模型，基于梯度的稀疏参数更新，并展示了显著的性能提升和遗忘减少，这可以缓解离线情景记忆/经验重播缓冲区的作用。

更新时间: 2024-06-19 15:56:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.13653v1

Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics

Inverse problems describe the process of estimating the causal factors from a set of measurements or data. Mapping of often incomplete or degraded data to parameters is ill-posed, thus data-driven iterative solutions are required, for example when reconstructing clean images from poor signals. Diffusion models have shown promise as potent generative tools for solving inverse problems due to their superior reconstruction quality and their compatibility with iterative solvers. However, most existing approaches are limited to linear inverse problems represented as Stochastic Differential Equations (SDEs). This simplification falls short of addressing the challenging nature of real-world problems, leading to amplified cumulative errors and biases. We provide an explanation for this gap through the lens of measure-preserving dynamics of Random Dynamical Systems (RDS) with which we analyse Temporal Distribution Discrepancy and thus introduce a theoretical framework based on RDS for SDE diffusion models. We uncover several strategies that inherently enhance the stability and generalizability of diffusion models for inverse problems and introduce a novel score-based diffusion framework, the \textbf{D}ynamics-aware S\textbf{D}E \textbf{D}iffusion \textbf{G}enerative \textbf{M}odel (D$^3$GM). The \textit{Measure-preserving property} can return the degraded measurement to the original state despite complex degradation with the RDS concept of \textit{stability}. Our extensive experimental results corroborate the effectiveness of D$^3$GM across multiple benchmarks including a prominent application for inverse problems, magnetic resonance imaging. Code and data will be publicly available.

Updated: 2024-06-19 15:55:12

标题: 随机微分方程扩散模型中的稳定性和泛化能力与保测度动力学

摘要: 反问题描述了从一组测量或数据中估计因果因素的过程。将通常不完整或退化的数据映射到参数是不适定的，因此需要基于数据驱动的迭代解决方案，例如在从劣质信号重建清晰图像时。扩散模型已显示出作为解决反问题的有效生成工具的潜力，因为它们具有卓越的重建质量和与迭代求解器的兼容性。然而，大多数现有方法局限于表示为随机微分方程（SDE）的线性反问题。这种简化未能解决现实世界问题的挑战性质，导致了累积误差和偏差的增加。通过随机动力系统（RDS）的保度量动力学的视角，我们提供了对这一差距的解释，通过分析时间分布差异，从而引入了基于RDS的SDE扩散模型的理论框架。我们发现了几种固有增强扩散模型稳定性和泛化性的策略，并引入了一种新颖的基于评分的扩散框架，即动态感知SDE扩散生成模型（D$^3$GM）。保度量性质可以通过RDS概念的稳定性，将退化测量恢复到原始状态，尽管存在复杂的退化。我们广泛的实验结果证实了D$^3$GM在多个基准测试中的有效性，包括一个突出的反问题应用，磁共振成像。代码和数据将公开提供。

更新时间: 2024-06-19 15:55:12

领域: cs.AI

下载: http://arxiv.org/abs/2406.13652v1

Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks

This paper presents T3: Transferable Tactile Transformers, a framework for tactile representation learning that scales across multi-sensors and multi-tasks. T3 is designed to overcome the contemporary issue that camera-based tactile sensing is extremely heterogeneous, i.e. sensors are built into different form factors, and existing datasets were collected for disparate tasks. T3 captures the shared latent information across different sensor-task pairings by constructing a shared trunk transformer with sensor-specific encoders and task-specific decoders. The pre-training of T3 utilizes a novel Foundation Tactile (FoTa) dataset, which is aggregated from several open-sourced datasets and it contains over 3 million data points gathered from 13 sensors and 11 tasks. FoTa is the largest and most diverse dataset in tactile sensing to date and it is made publicly available in a unified format. Across various sensors and tasks, experiments show that T3 pre-trained with FoTa achieved zero-shot transferability in certain sensor-task pairings, can be further fine-tuned with small amounts of domain-specific data, and its performance scales with bigger network sizes. T3 is also effective as a tactile encoder for long horizon contact-rich manipulation. Results from sub-millimeter multi-pin electronics insertion tasks show that T3 achieved a task success rate 25% higher than that of policies trained with tactile encoders trained from scratch, or 53% higher than without tactile sensing. Data, code, and model checkpoints are open-sourced at https://t3.alanz.info.

Updated: 2024-06-19 15:39:27

标题: 可转移触觉变换器：用于跨多种传感器和任务的表征学习

摘要: 这篇论文介绍了T3：可转移触觉变压器，这是一个跨多传感器和多任务的触觉表示学习框架。T3旨在解决当前的问题，即基于摄像头的触觉传感极度异构，即传感器内置在不同的形态因素中，现有数据集是为不同任务收集的。T3通过构建具有传感器特定编码器和任务特定解码器的共享主干变压器来捕获不同传感器-任务配对之间的共享潜在信息。T3的预训练利用了一个新颖的基础触觉（FoTa）数据集，该数据集是从几个开源数据集聚合而来，包含来自13个传感器和11个任务的300多万数据点。FoTa是迄今为止触觉传感中最大最多样化的数据集，并以统一格式公开发布。在各种传感器和任务中，实验表明，使用FoTa预训练的T3在某些传感器-任务配对中实现了零-shot可转移性，可以进一步通过少量领域特定数据进行微调，并随着网络规模的增大而提高性能。T3还作为长期接触丰富的操作的触觉编码器是有效的。亚毫米多引脚电子插入任务的结果显示，T3的任务成功率比从头训练的触觉编码器训练的策略高25%，比没有触觉传感高53%。数据、代码和模型检查点在https://t3.alanz.info上开源。

更新时间: 2024-06-19 15:39:27

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13640v1

Sync+Sync: A Covert Channel Built on fsync with Storage

Scientists have built a variety of covert channels for secretive information transmission with CPU cache and main memory. In this paper, we turn to a lower level in the memory hierarchy, i.e., persistent storage. Most programs store intermediate or eventual results in the form of files and some of them call fsync to synchronously persist a file with storage device for orderly persistence. Our quantitative study shows that one program would undergo significantly longer response time for fsync call if the other program is concurrently calling fsync, although they do not share any data. We further find that, concurrent fsync calls contend at multiple levels of storage stack due to sharing software structures (e.g., Ext4's journal) and hardware resources (e.g., disk's I/O dispatch queue). We accordingly build a covert channel named Sync+Sync. Sync+Sync delivers a transmission bandwidth of 20,000 bits per second at an error rate of about 0.40% with an ordinary solid-state drive. Sync+Sync can be conducted in cross-disk partition, cross-file system, cross-container, cross-virtual machine, and even cross-disk drive fashions, without sharing data between programs. Next, we launch side-channel attacks with Sync+Sync and manage to precisely detect operations of a victim database (e.g., insert/update and B-Tree node split). We also leverage Sync+Sync to distinguish applications and websites with high accuracy by detecting and analyzing their fsync frequencies and flushed data volumes. These attacks are useful to support further fine-grained information leakage.

Updated: 2024-06-19 15:37:58

标题: 同步+同步：基于fsync和存储构建的隐蔽信道

摘要: 科学家们已经构建了各种隐蔽通道，用于通过CPU缓存和主存储器进行秘密信息传输。在这篇论文中，我们转向存储层次结构中的更低级别，即持久存储。大多数程序以文件的形式存储中间或最终结果，其中一些调用fsync方法，将文件与存储设备同步持久化以保持有序性。我们的定量研究显示，如果另一个程序同时调用fsync，一个程序在调用fsync时将经历明显更长的响应时间，尽管它们不共享任何数据。我们进一步发现，由于共享软件结构（例如Ext4的日志）和硬件资源（例如磁盘的I/O调度队列），并发的fsync调用在存储堆栈的多个级别上发生竞争。因此，我们建立了一个名为Sync+Sync的隐蔽通道。Sync+Sync在普通固态驱动器上以每秒20,000比特的传输带宽提供大约0.40%的错误率。Sync+Sync可以在跨磁盘分区、跨文件系统、跨容器、跨虚拟机，甚至跨磁盘驱动器的模式下进行，而不共享程序之间的数据。接下来，我们使用Sync+Sync发起侧信道攻击，并成功精准地检测受害数据库的操作（例如插入/更新和B-Tree节点拆分）。我们还利用Sync+Sync通过检测和分析它们的fsync频率和刷新数据量，以高准确度区分应用程序和网站。这些攻击有助于支持进一步细粒度的信息泄露。

更新时间: 2024-06-19 15:37:58

领域: cs.CR,cs.OS

下载: http://arxiv.org/abs/2309.07657v2

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.

Updated: 2024-06-19 15:35:04

标题: 镜像影响假设：利用前向传递实现高效数据影响估计

摘要: 大规模的黑盒模型已经在许多应用中变得普遍。了解个别训练数据来源对这些模型所做预测的影响对于提高它们的可信度至关重要。目前的影响估计技术涉及为每个训练点计算梯度或在不同子集上重复训练。当扩展到大型数据集和模型时，这些方法面临明显的计算挑战。在本文中，我们介绍并探讨了镜像影响假设，突出了训练数据和测试数据之间相互影响的性质。具体来说，它表明评估训练数据对测试预测的影响可以重新制定为一个等价但相反的问题：评估如果模型在特定测试样本上进行训练，那么对训练样本的预测将如何改变。通过经验和理论验证，我们展示了我们假设的广泛适用性。受此启发，我们引入了一种新的方法来估计训练数据的影响，该方法需要为特定测试样本计算梯度，同时为每个训练点进行前向传递。这种方法可以利用在同时检查的测试样本数量远远小于训练数据集规模的情况下常见的不对称性，因此与现有方法相比，在效率上获得了显著的改进。我们展示了我们的方法在一系列场景中的适用性，包括在扩散模型中的数据归因、数据泄漏检测、记忆分析、错误标记数据检测以及在语言模型中追踪行为。我们的代码将在https://github.com/ruoxi-jia-group/Forward-INF 上提供。

更新时间: 2024-06-19 15:35:04

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.08922v2

Emergent representations in networks trained with the Forward-Forward algorithm

The Backpropagation algorithm has often been criticised for its lack of biological realism. In an attempt to find a more biologically plausible alternative, the recently introduced Forward-Forward algorithm replaces the forward and backward passes of Backpropagation with two forward passes. In this work, we show that the internal representations obtained by the Forward-Forward algorithm can organise into category-specific ensembles exhibiting high sparsity - composed of a low number of active units. This situation is reminiscent of what has been observed in cortical sensory areas, where neuronal ensembles are suggested to serve as the functional building blocks for perception and action. Interestingly, while this sparse pattern does not typically arise in models trained with standard Backpropagation, it can emerge in networks trained with Backpropagation on the same objective proposed for the Forward-Forward algorithm. These results suggest that the learning procedure proposed by Forward-Forward may be superior to Backpropagation in modelling learning in the cortex, even when a backward pass is used.

Updated: 2024-06-19 15:32:54

标题: 使用前向前向算法训练的网络中的紧急表示

摘要: 反向传播算法经常被批评缺乏生物现实性。为了寻找一个更具生物学可信度的替代方案，最近引入的前向传递算法用两个前向传递替换了反向传递和前向传递。在这项工作中，我们展示了前向传递算法获得的内部表示可以组织成展现高稀疏性的特定类别集合，由低数量的活跃单元组成。这种情况类似于在皮层感觉区观察到的情况，那里神经元集合被认为是感知和行动的功能构建块。有趣的是，虽然这种稀疏模式通常不会在使用标准反向传播训练的模型中出现，但在使用反向传播进行训练的网络中，可以出现在为前向传递算法提出的相同目标上。这些结果表明，前向传递提出的学习程序可能比反向传播在建模皮层学习时更优越，即使使用反向传递。

更新时间: 2024-06-19 15:32:54

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2305.18353v3

Contrast Sets for Evaluating Language-Guided Robot Policies

Robot evaluations in language-guided, real world settings are time-consuming and often sample only a small space of potential instructions across complex scenes. In this work, we introduce contrast sets for robotics as an approach to make small, but specific, perturbations to otherwise independent, identically distributed (i.i.d.) test instances. We investigate the relationship between experimenter effort to carry out an evaluation and the resulting estimated test performance as well as the insights that can be drawn from performance on perturbed instances. We use contrast sets to characterize policies at reduced experimenter effort in both a simulated manipulation task and a physical robot vision-and-language navigation task. We encourage the use of contrast set evaluations as a more informative alternative to small scale, i.i.d. demonstrations on physical robots, and as a scalable alternative to industry-scale real world evaluations.

Updated: 2024-06-19 15:31:21

标题: 用于评估语言引导机器人策略的对比集

摘要: 在语言引导的真实世界环境中对机器人进行评估是耗时的，并且通常只对复杂场景中潜在指令的一小部分空间进行采样。在这项工作中，我们引入对机器人的对比集作为一种方法，对独立同分布（i.i.d.）的测试实例进行小而具体的扰动。我们研究了实验者进行评估所需的努力与结果估计的测试性能之间的关系，以及从扰动实例的表现中可以得出的见解。我们使用对比集来表征在模拟操作任务和物理机器人视觉语言导航任务中减少实验者努力的策略。我们鼓励使用对比集评估作为对物理机器人进行小规模i.i.d.演示的更具信息量的替代方法，并作为行业规模实际世界评估的可扩展替代方案。

更新时间: 2024-06-19 15:31:21

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2406.13636v1

Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an $\tilde{\mathcal{O}}(dD\sqrt{T})$ regret, where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. The second algorithm \texttt{OVIFH-MNL} is computationally more efficient and applies to the more general class of weakly communicating MDPs, for which we show a regret guarantee of $\tilde{\mathcal{O}}(d^{2/5} \mathrm{sp}(v^*)T^{4/5})$ where $\mathrm{sp}(v^*)$ is the span of the associated optimal bias function. We also prove a lower bound of $\Omega(d\sqrt{DT})$ for learning communicating MDPs with MNL transitions of diameter at most $D$. Furthermore, we show a regret lower bound of $\Omega(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting.

Updated: 2024-06-19 15:29:14

标题: 使用多项式逻辑函数近似的无限时域平均奖励MDPs的强化学习

摘要: 我们研究了基于模型的强化学习，其中使用非线性函数逼近，底层马尔可夫决策过程（MDP）的转移函数由多项逻辑（MNL）模型给出。在本文中，我们为无限时间平均奖励设置开发了两种算法。我们的第一个算法\texttt{UCRL2-MNL}适用于通信MDP类，并实现了一个$\tilde{\mathcal{O}}(dD\sqrt{T})$的后悔，其中$d$是特征映射的维度，$D$是底层MDP的直径，$T$是时间跨度。第二个算法\texttt{OVIFH-MNL}在计算上更有效，并适用于更一般的弱通信MDP类，我们展示了一个后悔保证$\tilde{\mathcal{O}}(d^{2/5}\mathrm{sp}(v^*)T^{4/5})$，其中$\mathrm{sp}(v^*)$是相关的最优偏差函数的跨度。我们还证明了在直径最多为$D$的具有MNL转换的通信MDP中学习的下限为$\Omega(d\sqrt{DT})$。此外，我们展示了在具有MNL函数逼近的$H$-时间跨度的情节MDP中学习的后悔下限为$\Omega(dH^{3/2}\sqrt{K})$，其中$K$是情节数，这超越了有限时间跨度环境的已知下限。

更新时间: 2024-06-19 15:29:14

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.13633v1

On AI-Inspired UI-Design

Graphical User Interface (or simply UI) is a primary mean of interaction between users and their device. In this paper, we discuss three major complementary approaches on how to use Artificial Intelligence (AI) to support app designers create better, more diverse, and creative UI of mobile apps. First, designers can prompt a Large Language Model (LLM) like GPT to directly generate and adjust one or multiple UIs. Second, a Vision-Language Model (VLM) enables designers to effectively search a large screenshot dataset, e.g. from apps published in app stores. The third approach is to train a Diffusion Model (DM) specifically designed to generate app UIs as inspirational images. We discuss how AI should be used, in general, to inspire and assist creative app design rather than automating it.

Updated: 2024-06-19 15:28:21

标题: 受人工智能启发的用户界面设计

摘要: 图形用户界面（简称UI）是用户与其设备之间的主要交互方式。本文讨论了三种主要的互补方法，即如何利用人工智能（AI）来支持应用程序设计师创建更好、更多样化和更有创意的移动应用程序的UI。首先，设计师可以促使像GPT这样的大型语言模型（LLM）直接生成和调整一个或多个UI。其次，视觉语言模型（VLM）使设计师能够有效地搜索大量的屏幕截图数据集，例如来自应用商店中发布的应用程序。第三种方法是训练一个特别设计用于生成应用程序UI的扩散模型（DM）作为灵感图片。我们讨论了AI应该如何被使用，一般来说，是要激发和辅助创意应用程序设计，而不是自动化它。

更新时间: 2024-06-19 15:28:21

领域: cs.HC,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.13631v1

Implicit Bias of Mirror Flow on Separable Data

We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. Such problems are minimised `at infinity' and have many possible solutions; we study which solution is preferred by the algorithm depending on the mirror potential. For exponential tailed losses and under mild assumptions on the potential, we show that the iterates converge in direction towards a $\phi_\infty$-maximum margin classifier. The function $\phi_\infty$ is the $\textit{horizon function}$ of the mirror potential and characterises its shape `at infinity'. When the potential is separable, a simple formula allows to compute this function. We analyse several examples of potentials and provide numerical experiments highlighting our results.

Updated: 2024-06-19 15:25:57

标题: 镜像流对可分离数据的隐性偏见

摘要: 我们研究了镜像下降的连续时间对应物，即镜像流，在线性可分的分类问题上。这些问题在“无穷远处”被最小化，并且有许多可能的解决方案；我们研究了根据镜像势函数算法偏好的解决方案。对于指数尾损失且对势函数有轻微假设的情况，我们表明迭代会朝向$\phi_\infty$-最大间隔分类器的方向收敛。函数$\phi_\infty$是镜像势函数的“视野函数”，并表征其在“无穷远处”的形状。当势函数可分时，一个简单的公式允许计算这个函数。我们分析了几个势函数的例子，并提供了突出我们结果的数值实验。

更新时间: 2024-06-19 15:25:57

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2406.12763v2

InstructRAG: Instructing Retrieval-Augmented Generation with Explicit Denoising

Retrieval-augmented generation (RAG) has shown promising potential to enhance the accuracy and factuality of language models (LMs). However, imperfect retrievers or noisy corpora can introduce misleading or even erroneous information to the retrieved contents, posing a significant challenge to the generation quality. Existing RAG methods typically address this challenge by directly predicting final answers despite potentially noisy inputs, resulting in an implicit denoising process that is difficult to interpret and verify. On the other hand, the acquisition of explicit denoising supervision is often costly, involving significant human efforts. In this work, we propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales -- First, we instruct the LM to explain how the ground-truth answer is derived from retrieved documents. Then, these rationales can be used either as demonstrations for in-context learning of explicit denoising or as supervised fine-tuning data to train the model. Compared to standard RAG approaches, InstructRAG requires no additional supervision, allows for easier verification of the predicted answers, and effectively improves generation accuracy. Experiments show InstructRAG consistently outperforms existing RAG methods in both training-free and trainable scenarios, achieving a relative improvement of 8.3% over the best baseline method on average across five knowledge-intensive benchmarks. Extensive analysis indicates that InstructRAG scales well with increased numbers of retrieved documents and consistently exhibits robust denoising ability even in out-of-domain datasets, demonstrating strong generalizability.

Updated: 2024-06-19 15:25:29

标题: InstructRAG: 使用明确去噪指导检索增强生成

摘要: 检索增强生成（RAG）已经显示出增强语言模型（LMs）准确性和事实性的潜力。然而，不完美的检索器或嘈杂的语料库可能会向检索内容引入误导性甚至错误的信息，给生成质量带来显著挑战。现有的RAG方法通常通过直接预测最终答案来解决这一挑战，尽管输入可能存在噪声，但结果是一种难以解释和验证的隐式去噪过程。另一方面，显式去噪监督的获取往往成本高昂，涉及大量人力。在这项工作中，我们提出了InstructRAG，其中LMs通过自我合成的理由明确学习去噪过程 —— 首先，我们指导LM解释地面真实答案如何从检索到的文档中得出。然后，这些理由可以被用作在上下文学习中的显式去噪的演示，或者作为监督微调数据来训练模型。与标准RAG方法相比，InstructRAG不需要额外的监督，可以更容易验证预测的答案，并有效提高生成准确性。实验表明，InstructRAG在无需训练和可训练场景中始终优于现有的RAG方法，在五个知识密集基准测试中平均相对改进了8.3%。广泛的分析表明，InstructRAG随着检索文档数量的增加而扩展，并且即使在领域外数据集中也始终表现出强大的去噪能力，展示了良好的泛化能力。

更新时间: 2024-06-19 15:25:29

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13629v1

Can AI be enabled to dynamical downscaling? Training a Latent Diffusion Model to mimic km-scale COSMO-CLM downscaling of ERA5 over Italy

Downscaling techniques are one of the most prominent applications of Deep Learning (DL) in Earth System Modeling. A robust DL downscaling model can generate high-resolution fields from coarse-scale numerical model simulations, saving the timely and resourceful applications of regional/local models. Additionally, generative DL models have the potential to provide uncertainty information, by generating ensemble-like scenario pools, a task that is computationally prohibitive for traditional numerical simulations. In this study, we apply a Latent Diffusion Model (LDM) to downscale ERA5 data over Italy up to a resolution of 2 km. The high-resolution target data consists of results from a high-resolution dynamical downscaling performed with COSMO-CLM. Our goal is to demonstrate that recent advancements in generative modeling enable DL-based models to deliver results comparable to those of numerical dynamical downscaling models, given the same input data (i.e., ERA5 data), preserving the realism of fine-scale features and flow characteristics. The training and testing database consists of hourly data from 2000 to 2020. The target variables of this study are 2-m temperature and 10-m horizontal wind components. A selection of predictors from ERA5 is used as input to the LDM, and a residual approach against a reference UNET is leveraged in applying the LDM. The performance of the generative LDM is compared with reference baselines of increasing complexity: quadratic interpolation of ERA5, a UNET, and a Generative Adversarial Network (GAN) built on the same reference UNET. Results highlight the improvements introduced by the LDM architecture and the residual approach over these baselines. The models are evaluated on a yearly test dataset, assessing the models' performance through deterministic metrics, spatial distribution of errors, and reconstruction of frequency and power spectra distributions.

Updated: 2024-06-19 15:20:28

标题: 可以将人工智能用于动态降尺度吗？训练一个潜在扩散模型，模拟在意大利上进行ERA5的km级COSMO-CLM降尺度

摘要: 下采样技术是地球系统建模中深度学习（DL）最显著的应用之一。一个强大的DL下采样模型可以从粗略的数值模型模拟中生成高分辨率场，节省了区域/局部模型的及时和资源应用。此外，生成式DL模型有潜力提供不确定性信息，通过生成类似集合的场景池，这对传统数值模拟来说是计算上禁止的任务。在这项研究中，我们将一个潜在扩散模型（LDM）应用于意大利境内ERA5数据的下采样，分辨率高达2公里。高分辨率目标数据包括使用COSMO-CLM进行的高分辨率动力下采样的结果。我们的目标是证明生成建模的最新进展使得基于DL的模型可以在给定相同输入数据（即ERA5数据）的情况下提供与数值动力下采样模型相当的结果，保留细节特征和流动特性的真实性。训练和测试数据库包括2000年至2020年的每小时数据。本研究的目标变量是2米温度和10米水平风分量。从ERA5中选取的一些预测变量被用作LDM的输入，通过与参考UNET的残差方法来应用LDM。生成式LDM的性能与不断增加复杂性的参考基线进行了比较：ERA5的二次插值、一个UNET以及基于相同参考UNET构建的生成对抗网络（GAN）。结果突显了LDM架构和残差方法相对于这些基线引入的改进。这些模型在年度测试数据集上进行评估，通过确定性指标、错误的空间分布以及频率和功率谱分布的重建来评估模型的性能。

更新时间: 2024-06-19 15:20:28

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2406.13627v1

Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines

In this study, we explore the application of sentiment analysis on financial news headlines to understand investor sentiment. By leveraging Natural Language Processing (NLP) and Large Language Models (LLM), we analyze sentiment from the perspective of retail investors. The FinancialPhraseBank dataset, which contains categorized sentiments of financial news headlines, serves as the basis for our analysis. We fine-tuned several models, including distilbert-base-uncased, Llama, and gemma-7b, to evaluate their effectiveness in sentiment classification. Our experiments demonstrate that the fine-tuned gemma-7b model outperforms others, achieving the highest precision, recall, and F1 score. Specifically, the gemma-7b model showed significant improvements in accuracy after fine-tuning, indicating its robustness in capturing the nuances of financial sentiment. This model can be instrumental in providing market insights, risk management, and aiding investment decisions by accurately predicting the sentiment of financial news. The results highlight the potential of advanced LLMs in transforming how we analyze and interpret financial information, offering a powerful tool for stakeholders in the financial industry.

Updated: 2024-06-19 15:20:19

标题: Fine-Tuning Gemma-7B 以提升对财经新闻标题的情绪分析

摘要: 在这项研究中，我们探讨了情绪分析在金融新闻标题上的应用，以了解投资者情绪。通过利用自然语言处理（NLP）和大型语言模型（LLM），我们从零售投资者的角度分析情绪。包含金融新闻标题情绪分类的FinancialPhraseBank数据集作为我们分析的基础。我们对几个模型进行了微调，包括distilbert-base-uncased、Llama和gemma-7b，以评估它们在情绪分类中的有效性。我们的实验表明，经过微调的gemma-7b模型优于其他模型，实现了最高的精确度、召回率和F1分数。具体来说，gemma-7b模型在微调后在准确性方面显示出显著改进，表明其在捕捉金融情绪细微差别方面的稳健性。该模型可以在准确预测金融新闻情绪的同时提供市场洞察、风险管理和帮助投资决策。结果突出了先进LLMs在转变我们分析和解释金融信息的方式方面的潜力，为金融行业利益相关者提供了强大的工具。

更新时间: 2024-06-19 15:20:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13626v1

MatchSeg: Towards Better Segmentation via Reference Image Matching

Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the query set. Inspired by this paradigm, we introduce MatchSeg, a novel framework that enhances medical image segmentation through strategic reference image matching. We leverage contrastive language-image pre-training (CLIP) to select highly relevant samples when defining the support set. Additionally, we design a joint attention module to strengthen the interaction between support and query features, facilitating a more effective knowledge transfer between support and query sets. We validated our method across four public datasets. Experimental results demonstrate superior segmentation performance and powerful domain generalization ability of MatchSeg against existing methods for domain-specific and cross-domain segmentation tasks. Our code is made available at https://github.com/keeplearning-again/MatchSeg

Updated: 2024-06-19 15:20:07

标题: MatchSeg: 通过参考图像匹配实现更好的分割

摘要: 最近，基于深度学习的自动医学图像分割方法取得了巨大成功。然而，它们严重依赖于大量的注释数据集，获取这些数据集成本高且耗时。少样本学习旨在通过使用一个小型标记数据集（支持集）来指导为新的未标记图像（查询集）预测标签，从而克服对注释数据的需求。受到这一范式的启发，我们引入了MatchSeg，这是一个通过战略性参考图像匹配来增强医学图像分割的新框架。我们利用对比语言-图像预训练（CLIP）来选择高度相关的样本来定义支持集。此外，我们设计了一个联合注意力模块，可以加强支持和查询特征之间的交互，促进更有效地知识传递。我们在四个公共数据集上验证了我们的方法。实验结果表明，MatchSeg在特定领域和跨领域分割任务中具有优越的分割性能和强大的领域泛化能力，超越了现有方法。我们的代码可在 https://github.com/keeplearning-again/MatchSeg 上找到。

更新时间: 2024-06-19 15:20:07

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2403.15901v2

Enhance the Image: Super Resolution using Artificial Intelligence in MRI

This chapter provides an overview of deep learning techniques for improving the spatial resolution of MRI, ranging from convolutional neural networks, generative adversarial networks, to more advanced models including transformers, diffusion models, and implicit neural representations. Our exploration extends beyond the methodologies to scrutinize the impact of super-resolved images on clinical and neuroscientific assessments. We also cover various practical topics such as network architectures, image evaluation metrics, network loss functions, and training data specifics, including downsampling methods for simulating low-resolution images and dataset selection. Finally, we discuss existing challenges and potential future directions regarding the feasibility and reliability of deep learning-based MRI super-resolution, with the aim to facilitate its wider adoption to benefit various clinical and neuroscientific applications.

Updated: 2024-06-19 15:19:41

标题: 提升图像质量：利用人工智能在MRI中进行超分辨率处理

摘要: 这一章节提供了关于利用深度学习技术改善MRI空间分辨率的概述，涵盖了从卷积神经网络、生成对抗网络，到更先进的模型，包括变压器、扩散模型和隐式神经表示。我们的探索不仅限于方法论，还对超分辨图像对临床和神经科学评估的影响进行了审查。我们还涵盖了各种实际主题，如网络架构、图像评估指标、网络损失函数和训练数据的具体内容，包括用于模拟低分辨率图像和数据集选择的下采样方法。最后，我们讨论了深度学习MRI超分辨率的现有挑战和潜在未来方向，旨在促进其更广泛的应用，以造福各种临床和神经科学应用。

更新时间: 2024-06-19 15:19:41

领域: cs.CV,cs.AI,physics.med-ph

下载: http://arxiv.org/abs/2406.13625v1

Adaptive Hyperparameter Optimization for Continual Learning Scenarios

Hyperparameter selection in continual learning scenarios is a challenging and underexplored aspect, especially in practical non-stationary environments. Traditional approaches, such as grid searches with held-out validation data from all tasks, are unrealistic for building accurate lifelong learning systems. This paper aims to explore the role of hyperparameter selection in continual learning and the necessity of continually and automatically tuning them according to the complexity of the task at hand. Hence, we propose leveraging the nature of sequence task learning to improve Hyperparameter Optimization efficiency. By using the functional analysis of variance-based techniques, we identify the most crucial hyperparameters that have an impact on performance. We demonstrate empirically that this approach, agnostic to continual scenarios and strategies, allows us to speed up hyperparameters optimization continually across tasks and exhibit robustness even in the face of varying sequential task orders. We believe that our findings can contribute to the advancement of continual learning methodologies towards more efficient, robust and adaptable models for real-world applications.

Updated: 2024-06-19 15:17:51

标题: 持续学习场景下的自适应超参数优化

摘要: 在不断学习的场景中选择超参数是一个具有挑战性且未被充分探索的方面，特别是在实际的非稳态环境中。传统方法，如使用所有任务的验证数据进行网格搜索，对于构建准确的终身学习系统是不现实的。本文旨在探讨超参数选择在不断学习中的作用，并根据手头任务的复杂性必要地持续自动调整它们。因此，我们提出利用序列任务学习的特性来提高超参数优化效率。通过使用基于方差的函数分析技术，我们确定对性能产生影响的最关键的超参数。我们在实证上证明了这种方法，不受不断情景和策略的影响，可以持续加快跨任务的超参数优化，并展现出在面对不同的顺序任务时的稳健性。我们相信我们的研究结果可以为不断学习方法的进步做出贡献，使其朝着更高效、稳健和适应性更强的真实世界应用模型发展。

更新时间: 2024-06-19 15:17:51

领域: cs.LG

下载: http://arxiv.org/abs/2403.07015v2

Improving Visual Commonsense in Language Models via Multiple Image Generation

Commonsense reasoning is fundamentally based on multimodal knowledge. However, existing large language models (LLMs) are primarily trained using textual data only, limiting their ability to incorporate essential visual information. In contrast, Visual Language Models, which excel at visually-oriented tasks, often fail at non-visual tasks such as basic commonsense reasoning. This divergence highlights a critical challenge - the integration of robust visual understanding with foundational text-based language reasoning. To this end, we introduce a method aimed at enhancing LLMs' visual commonsense. Specifically, our method generates multiple images based on the input text prompt and integrates these into the model's decision-making process by mixing their prediction probabilities. To facilitate multimodal grounded language modeling, we employ a late-fusion layer that combines the projected visual features with the output of a pre-trained LLM conditioned on text only. This late-fusion layer enables predictions based on comprehensive image-text knowledge as well as text only when this is required. We evaluate our approach using several visual commonsense reasoning tasks together with traditional NLP tasks, including common sense reasoning and reading comprehension. Our experimental results demonstrate significant superiority over existing baselines. When applied to recent state-of-the-art LLMs (e.g., Llama3), we observe improvements not only in visual common sense but also in traditional NLP benchmarks. Code and models are available under https://github.com/guyyariv/vLMIG.

Updated: 2024-06-19 15:17:10

标题: 通过多图像生成改进语言模型中的视觉常识

摘要: 常识推理基本上是基于多模态知识的。然而，现有的大型语言模型（LLMs）主要是使用文本数据进行训练的，限制了它们整合基本视觉信息的能力。相比之下，在视觉导向任务上表现出色的视觉语言模型，在非视觉任务（如基本常识推理）上经常失败。这种分歧突显了一个关键挑战——将强大的视觉理解与基于文本的语言推理相结合。为此，我们提出了一种旨在增强LLMs视觉常识的方法。具体而言，我们的方法基于输入的文本提示生成多个图像，并通过混合它们的预测概率将它们整合到模型的决策过程中。为了促进多模态基础语言建模，我们采用了一个后融合层，将投影的视觉特征与仅基于文本的预训练LLM的输出结合在一起。这个后融合层使得在需要时可以基于全面的图像文本知识以及仅文本进行预测。我们使用几个视觉常识推理任务以及传统的自然语言处理任务来评估我们的方法，包括常识推理和阅读理解。我们的实验结果显示出明显优于现有基线的优势。当应用于最近的最先进的LLMs（例如Llama3）时，我们观察到不仅在视觉常识方面有改进，还在传统的自然语言处理基准上有改进。代码和模型可在https://github.com/guyyariv/vLMIG中获得。

更新时间: 2024-06-19 15:17:10

领域: cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13621v1

Outline of an Independent Systematic Blackbox Test for ML-based Systems

This article proposes a test procedure that can be used to test ML models and ML-based systems independently of the actual training process. In this way, the typical quality statements such as accuracy and precision of these models and system can be verified independently, taking into account their black box character and the immanent stochastic properties of ML models and their training data. The article presents first results from a set of test experiments and suggest extensions to existing test methods reflecting the stochastic nature of ML models and ML-based systems.

Updated: 2024-06-19 15:16:17

标题: 基于ML的系统独立系统化黑盒测试概要

摘要: 本文提出了一种测试程序，可以用来独立于实际训练过程测试ML模型和基于ML的系统。通过这种方式，可以独立验证这些模型和系统的典型质量陈述，如准确性和精度，考虑到它们的黑匣子特性以及ML模型和训练数据的固有随机特性。本文介绍了一组测试实验的初步结果，并建议扩展现有的测试方法，以反映ML模型和基于ML的系统的随机性质。

更新时间: 2024-06-19 15:16:17

领域: cs.LG

下载: http://arxiv.org/abs/2401.17062v2

Generative Modeling by Minimizing the Wasserstein-2 Loss

This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss). The minimization is characterized by a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential between a current estimated distribution and the true data distribution. A main result shows that the time-marginal law of the ODE converges exponentially to the true data distribution. To prove that the ODE has a unique solution, we first construct explicitly a solution to the associated nonlinear Fokker-Planck equation and show that it coincides with the unique gradient flow for the $W_2$ loss. Based on this, a unique solution to the ODE is built from Trevisan's superposition principle and the exponential convergence results. An Euler scheme is proposed for the distribution-dependent ODE and it is shown to correctly recover the gradient flow for the $W_2$ loss in the limit. An algorithm is designed by following the scheme and applying persistent training, which is natural in our gradient-flow framework. In both low- and high-dimensional experiments, our algorithm converges much faster than and outperforms Wasserstein generative adversarial networks, by increasing the level of persistent training appropriately.

Updated: 2024-06-19 15:15:00

标题: 通过最小化Wasserstein-2损失进行生成建模

摘要: 本文通过最小化二阶Wasserstein损失（$W_2$损失）来解决无监督学习问题。最小化过程由一个依赖于分布的普通微分方程（ODE）来描述，其动力学涉及当前估计分布和真实数据分布之间的Kantorovich势。一个主要结果显示，ODE的时间边缘分布以指数速度收敛到真实数据分布。为了证明ODE具有唯一解，我们首先明确构造了与相关非线性Fokker-Planck方程的解，并展示它与$W_2$损失的唯一梯度流相吻合。基于此，从Trevisan的叠加原理和指数收敛结果中构建了ODE的唯一解。提出了一个基于分布依赖ODE的Euler方案，并证明在极限情况下正确恢复了$W_2$损失的梯度流。通过遵循方案并应用持续训练设计了一种算法，在我们的梯度流框架中自然地进行。在低维和高维实验中，我们的算法比Wasserstein生成对抗网络收敛速度更快，通过适当增加持续训练水平表现更好。

更新时间: 2024-06-19 15:15:00

领域: stat.ML,cs.LG,34A06, 49Q22, 68T01

下载: http://arxiv.org/abs/2406.13619v1

Optimizing Psychological Counseling with Instruction-Tuned Large Language Models

The advent of large language models (LLMs) has significantly advanced various fields, including natural language processing and automated dialogue systems. This paper explores the application of LLMs in psychological counseling, addressing the increasing demand for mental health services. We present a method for instruction tuning LLMs with specialized prompts to enhance their performance in providing empathetic, relevant, and supportive responses. Our approach involves developing a comprehensive dataset of counseling-specific prompts, refining them through feedback from professional counselors, and conducting rigorous evaluations using both automatic metrics and human assessments. The results demonstrate that our instruction-tuned model outperforms several baseline LLMs, highlighting its potential as a scalable and accessible tool for mental health support.

Updated: 2024-06-19 15:13:07

标题: 通过指导调节的大型语言模型优化心理咨询

摘要: 大型语言模型（LLMs）的出现显著推进了各个领域，包括自然语言处理和自动对话系统。本文探讨了LLMs在心理咨询中的应用，应对心理健康服务需求的增加。我们提出了一种方法，通过使用专门提示来调整LLMs的性能，以提供富有同情心、相关和支持性的回应。我们的方法涉及开发一个全面的咨询专用提示数据集，通过专业咨询师的反馈对其进行优化，并使用自动指标和人工评估进行严格评估。结果表明，我们调整指令的模型胜过了几种基线LLMs，突显了其作为心理健康支持的可扩展和可访问工具的潜力。

更新时间: 2024-06-19 15:13:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13617v1

Physics-informed Neural Networks with Unknown Measurement Noise

Physics-informed neural networks (PINNs) constitute a flexible approach to both finding solutions and identifying parameters of partial differential equations. Most works on the topic assume noiseless data, or data contaminated with weak Gaussian noise. We show that the standard PINN framework breaks down in case of non-Gaussian noise. We give a way of resolving this fundamental issue and we propose to jointly train an energy-based model (EBM) to learn the correct noise distribution. We illustrate the improved performance of our approach using multiple examples.

Updated: 2024-06-19 15:11:28

标题: 带有未知测量噪声的物理信息神经网络

摘要: 物理学通知神经网络（PINNs）构成了一种灵活的方法，用于找到偏微分方程的解以及识别参数。大多数关于这一主题的研究假定数据没有噪音，或者数据受到弱高斯噪音的污染。我们展示了在非高斯噪音情况下标准PINN框架的崩溃。我们提出了一种解决这一基本问题的方法，并建议联合训练一个基于能量的模型（EBM）来学习正确的噪音分布。我们通过多个示例展示了我们方法的改进性能。

更新时间: 2024-06-19 15:11:28

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2211.15498v5

Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback

Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into successive stages, such as supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning (RL), each performing one specific learning task. Such a sequential approach results in serious issues such as significant under-utilization of data and distribution mismatch between the learned reward model and generated policy, which eventually lead to poor alignment performance. We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF), capable of integrating both human preference and demonstration to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms such as RLHF and Directly Policy Optimization (DPO), and only requires minor changes to the existing alignment pipelines. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo. We observe that the proposed solutions outperform the existing alignment algorithms such as RLHF and DPO by large margins, especially when the amount of high-quality preference data is relatively limited.

Updated: 2024-06-19 15:04:23

标题: 联合展示和偏好学习改善政策与人类反馈的一致性

摘要: 将人类偏好与价值进行对齐是构建当代基础模型和具有体现性的人工智能的重要要求。然而，流行的方法如使用人类反馈的强化学习（RLHF）将任务分解为连续的阶段，如监督微调（SFT）、奖励建模（RM）和强化学习（RL），每个阶段执行一个特定的学习任务。这种顺序方法会导致严重问题，如数据严重未利用和学习奖励模型与生成策略之间的分布不匹配，最终导致对齐性能不佳。我们开发了一种名为整合人类反馈的对齐（AIHF）的单阶段方法，能够集成人类偏好和示范以训练奖励模型和策略。所提出的方法允许一系列高效的算法，可以轻松缩减到，并利用，流行的对齐算法，如RLHF和直接策略优化（DPO），并且只需要对现有对齐流程进行轻微更改。我们通过涉及LLMs中的对齐问题和MuJoCo中的机器人控制问题的大量实验展示了所提出解决方案的效率。我们观察到，所提出的解决方案在高质量偏好数据相对有限的情况下，特别是在数量有限时，胜过现有的对齐算法，如RLHF和DPO。

更新时间: 2024-06-19 15:04:23

领域: cs.AI,cs.HC,cs.RO

下载: http://arxiv.org/abs/2406.06874v2

How to choose the most appropriate centrality measure? A decision tree approach

Centrality metrics play a crucial role in network analysis, while the choice of specific measures significantly influences the accuracy of conclusions as each measure represents a unique concept of node importance. Among over 400 proposed indices, selecting the most suitable ones for specific applications remains a challenge. Existing approaches -- model-based, data-driven, and axiomatic -- have limitations, requiring association with models, training datasets, or restrictive axioms for each specific application. To address this, we introduce the culling method, which relies on the expert concept of centrality behavior on simple graphs. The culling method involves forming a set of candidate measures, generating a list of as small graphs as possible needed to distinguish the measures from each other, constructing a decision-tree survey, and identifying the measure consistent with the expert's concept. We apply this approach to a diverse set of 40 centralities, including novel kernel-based indices, and combine it with the axiomatic approach. Remarkably, only 13 small 1-trees are sufficient to separate all 40 measures, even for pairs of closely related ones. By adopting simple ordinal axioms like Self-consistency or Bridge axiom, the set of measures can be drastically reduced making the culling survey short. Applying the culling method provides insightful findings on some centrality indices, such as PageRank, Bridging, and dissimilarity-based Eigencentrality measures, among others. The proposed approach offers a cost-effective solution in terms of labor and time, complementing existing methods for measure selection, and providing deeper insights into the underlying mechanisms of centrality measures.

Updated: 2024-06-19 15:03:45

标题: 如何选择最合适的中心性度量？一种决策树方法

摘要: 中心度度量在网络分析中起着至关重要的作用，而具体度量的选择显著影响结论的准确性，因为每个度量代表了节点重要性的独特概念。在提出的400多种指数中，选择最适合特定应用的指数仍然是一个挑战。现有的方法--基于模型、数据驱动和公理--存在局限性，需要将其与模型、训练数据集或特定应用的限制性公理关联起来。为了解决这个问题，我们引入了挑选方法，该方法依赖于专家对简单图中心度行为的概念。挑选方法涉及形成一组候选度量，生成尽可能少的小图列表以区分度量之间的差异，构建决策树调查，并确定与专家概念一致的度量。我们将这种方法应用于40种中心度的多样化集合，包括新颖的基于核的指数，并将其与公理方法相结合。值得注意的是，仅有13个小型1树就足以区分所有40种度量，即使是一对密切相关的度量。通过采用自洽性或桥梁公理等简单的序公理，可以大幅减少度量集合，使挑选调查变短。应用挑选方法为一些中心度指数提供了有益的发现，例如PageRank、桥接和基于差异的特征中心度等。所提出的方法在劳动和时间方面提供了一种经济有效的解决方案，为度量选择提供了补充，并深入了解中心度度量的基本机制。

更新时间: 2024-06-19 15:03:45

领域: physics.soc-ph,cs.HC,cs.LG,cs.SI,math.MG,05C50, 05C05, 15A51

下载: http://arxiv.org/abs/2003.01052v7

Wiretapped Commitment over Binary Channels

We propose the problem of wiretapped commitment, where two parties, say committer Alice and receiver Bob, engage in a commitment protocol using a noisy channel as a resource, in the presence of an eavesdropper, say Eve. Noisy versions of Alice's transmission over the wiretap channel are received at both Bob and Eve. We seek to determine the maximum commitment throughput in the presence of an eavesdropper, i.e., wiretapped commitment capacity, where in addition to the standard security requirements for two-party commitment, one seeks to ensure that Eve doesn't learn about the commit string. A key interest in this work is to explore the effect of collusion (or lack of it) between the eavesdropper Eve and either Alice or Bob. Toward the same, we present results on the wiretapped commitment capacity under the so-called 1-private regime (when Alice or Bob cannot collude with Eve) and the 2-private regime (when Alice or Bob may possibly collude with Eve).

Updated: 2024-06-19 15:02:23

标题: 通过二进制通道的窃听承诺

摘要: 我们提出了窃听承诺的问题，其中两个当事人，即承诺者Alice和接收者Bob，在存在窃听者Eve的情况下，利用嘈杂信道作为资源进行承诺协议。Alice在窃听信道上传输的嘈杂版本被同时发送到Bob和Eve。我们试图确定在窃听者存在的情况下的最大承诺吞吐量，即窃听承诺容量，除了两方承诺的标准安全要求外，还需确保Eve不了解承诺字符串。本文的一个关键兴趣是探索窃听者Eve与Alice或Bob之间的串通（或缺乏串通）的影响。为此，我们提出了在所谓的1-私有制度（当Alice或Bob不能与Eve串通）和2-私有制度（当Alice或Bob可能与Eve串通）下的窃听承诺容量的结果。

更新时间: 2024-06-19 15:02:23

领域: cs.IT,cs.CR,math.IT

下载: http://arxiv.org/abs/2406.13608v1

Learning Collective Variables with Synthetic Data Augmentation through Physics-inspired Geodesic Interpolation

In molecular dynamics simulations, rare events, such as protein folding, are typically studied using enhanced sampling techniques, most of which are based on the definition of a collective variable (CV) along which acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. This new data can be used to improve the accuracy of classifier-based methods. Alternatively, a regression-based learning scheme for CV models can be adopted by leveraging the interpolation progress parameter.

Updated: 2024-06-19 15:01:57

标题: 通过物理启发的测地插值学习合集变量与合成数据增强

摘要: 在分子动力学模拟中，罕见事件，比如蛋白质折叠，通常使用增强采样技术进行研究，其中大多数技术基于定义一个沿着加速发生的集体变量（CV）。获得一个具有表现力的CV是至关重要的，但通常受到关于特定事件的信息不足的阻碍，例如，从展开到折叠构象的转变。我们提出了一种使用受物理启发的度量来生成类似蛋白质折叠转变的测地插值的无模拟数据增强策略，从而提高采样效率而无需真正的过渡态样本。这些新数据可用于提高基于分类器的方法的准确性。另外，可以通过利用插值进度参数采用基于回归的CV模型学习方案。

更新时间: 2024-06-19 15:01:57

领域: physics.chem-ph,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2402.01542v3

Instances and Labels: Hierarchy-aware Joint Supervised Contrastive Learning for Hierarchical Multi-Label Text Classification

Hierarchical multi-label text classification (HMTC) aims at utilizing a label hierarchy in multi-label classification. Recent approaches to HMTC deal with the problem of imposing an over-constrained premise on the output space by using contrastive learning on generated samples in a semi-supervised manner to bring text and label embeddings closer. However, the generation of samples tends to introduce noise as it ignores the correlation between similar samples in the same batch. One solution to this issue is supervised contrastive learning, but it remains an underexplored topic in HMTC due to its complex structured labels. To overcome this challenge, we propose $\textbf{HJCL}$, a $\textbf{H}$ierarchy-aware $\textbf{J}$oint Supervised $\textbf{C}$ontrastive $\textbf{L}$earning method that bridges the gap between supervised contrastive learning and HMTC. Specifically, we employ both instance-wise and label-wise contrastive learning techniques and carefully construct batches to fulfill the contrastive learning objective. Extensive experiments on four multi-path HMTC datasets demonstrate that HJCL achieves promising results and the effectiveness of Contrastive Learning on HMTC.

Updated: 2024-06-19 14:59:14

标题: 实例和标签：面向层次多标签文本分类的层次感知联合监督对比学习

摘要: 分层多标签文本分类（HMTC）旨在利用多标签分类中的标签层次结构。最近针对HMTC的方法处理了在输出空间上施加过度约束前提的问题，通过在半监督方式下对生成的样本进行对比学习，使文本和标签嵌入更加接近。然而，样本的生成往往会引入噪音，因为它忽略了同一批次中类似样本之间的相关性。解决此问题的一种方法是监督对比学习，但由于其复杂的结构化标签，它在HMTC中仍然是一个未充分探讨的话题。为了克服这一挑战，我们提出了HJCL，一种层次感知的联合监督对比学习方法，它弥合了监督对比学习和HMTC之间的差距。具体而言，我们采用了实例级和标签级对比学习技术，并精心构建批次以实现对比学习目标。对四个多路径HMTC数据集的广泛实验表明，HJCL取得了令人满意的结果，并证明了对比学习在HMTC上的有效性。

更新时间: 2024-06-19 14:59:14

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.05128v3

Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma?

The behavior of Large Language Models (LLMs) as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we investigate the cooperative behavior of Llama2 when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We introduce a systematic methodology to evaluate an LLM's comprehension of the game's rules and its capability to parse historical gameplay logs for decision-making. We conducted simulations of games lasting for 100 rounds, and analyzed the LLM's decisions in terms of dimensions defined in behavioral economics literature. We find that Llama2 tends not to initiate defection but it adopts a cautious approach towards cooperation, sharply shifting towards a behavior that is both forgiving and non-retaliatory only when the opponent reduces its rate of defection below 30%. In comparison to prior research on human participants, Llama2 exhibits a greater inclination towards cooperative behavior. Our systematic approach to the study of LLMs in game theoretical scenarios is a step towards using these simulations to inform practices of LLM auditing and alignment.

Updated: 2024-06-19 14:51:14

标题: 比人类更好：大型语言模型在囚徒困境中的行为是怎样的？

摘要: 大型语言模型（LLMs）作为人工社会代理的行为在很大程度上尚未被探索，我们仍然缺乏这些代理对简单社会刺激的反应的广泛证据。在经典博弈论实验中测试AI代理的行为为评估这些代理在原型社会情境中的规范和价值提供了一个有前途的理论框架。在这项工作中，我们调查了Llama2在与展示不同敌意水平的随机对手对抗的迭代囚徒困境游戏中的合作行为。我们引入了一种系统方法来评估LLM对游戏规则的理解以及解析历史游戏日志进行决策的能力。我们进行了持续100轮的游戏模拟，并根据行为经济学文献中定义的维度分析了LLM的决策。我们发现，Llama2倾向于不主动倒向，但在对手将其倒向率降至30%以下时，它采取了一种谨慎的合作方式，迅速转向一种宽恕且不报复的行为。与先前关于人类参与者的研究相比，Llama2表现出更强的合作倾向。我们在博弈理论情景中对LLMs的研究的系统方法是朝着利用这些模拟来指导LLM审计和调整实践迈出的一步。

更新时间: 2024-06-19 14:51:14

领域: cs.CY,cs.AI,cs.GT,physics.soc-ph

下载: http://arxiv.org/abs/2406.13605v1

Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments

With the development of cloud-native technologies, microservice-based software systems face challenges in accurately localizing root causes when failures occur. Additionally, the cloud-edge collaborative environment introduces more difficulties, such as unstable networks and high latency across network segments. Accurately identifying the root cause of microservices in a cloud-edge collaborative environment has thus become an urgent problem. In this paper, we propose MicroCERCL, a novel approach that pinpoints root causes at the kernel and application level in the cloud-edge collaborative environment. Our key insight is that failures propagate through direct invocations and indirect resource-competition dependencies in a cloud-edge collaborative environment characterized by instability and high latency. This will become more complex in the hybrid deployment that simultaneously involves multiple microservice systems. Leveraging this insight, we extract valid contents from kernel-level logs to prioritize localizing the kernel-level root cause. Moreover, we construct a heterogeneous dynamic topology stack and train a graph neural network model to accurately localize the application-level root cause without relying on historical data. Notably, we released the first benchmark hybrid deployment microservice system in a cloud-edge collaborative environment (the largest and most complex within our knowledge). Experiments conducted on the dataset collected from the benchmark show that MicroCERCL can accurately localize the root cause of microservice systems in such environments, significantly outperforming state-of-the-art approaches with an increase of at least 24.1% in top-1 accuracy.

Updated: 2024-06-19 14:49:37

标题: 云边协作环境下微服务系统的根本原因定位

摘要: 随着云原生技术的发展，基于微服务的软件系统在发生故障时面临准确定位根本原因的挑战。此外，云边协作环境引入了更多困难，如网络不稳定和跨网络段的高延迟。在云边协作环境中准确识别微服务的根本原因因此已成为一个紧迫的问题。在本文中，我们提出了MicroCERCL，一种新颖的方法，在云边协作环境中定位内核和应用级别的根本原因。我们的关键见解是，在由不稳定性和高延迟特征的云边协作环境中，故障通过直接调用和间接资源竞争依赖传播。在同时涉及多个微服务系统的混合部署中，这将变得更加复杂。利用这一见解，我们从内核级别日志中提取有效内容，以优先定位内核级别的根本原因。此外，我们构建了一个异构动态拓扑栈，并训练了一个图神经网络模型，以准确定位应用级别的根本原因，而不依赖于历史数据。值得注意的是，我们在云边协作环境中发布了第一个基准混合部署微服务系统（在我们的知识范围内最大且最复杂）。对从基准收集的数据集上进行的实验表明，MicroCERCL能够准确定位这种环境中微服务系统的根本原因，其准确率至少比最先进的方法提高了24.1%。

更新时间: 2024-06-19 14:49:37

领域: cs.SE,cs.AI,cs.PF

下载: http://arxiv.org/abs/2406.13604v1

Beyond IID: data-driven decision-making in heterogeneous environments

How should one leverage historical data when past observations are not perfectly indicative of the future, e.g., due to the presence of unobserved confounders which one cannot "correct" for? Motivated by this question, we study a data-driven decision-making framework in which historical samples are generated from unknown and different distributions assumed to lie in a heterogeneity ball with known radius and centered around the (also) unknown future (out-of-sample) distribution on which the performance of a decision will be evaluated. This work aims at analyzing the performance of central data-driven policies but also near-optimal ones in these heterogeneous environments and understanding key drivers of performance. We establish a first result which allows to upper bound the asymptotic worst-case regret of a broad class of policies. Leveraging this result, for any integral probability metric, we provide a general analysis of the performance achieved by Sample Average Approximation (SAA) as a function of the radius of the heterogeneity ball. This analysis is centered around the approximation parameter, a notion of complexity we introduce to capture how the interplay between the heterogeneity and the problem structure impacts the performance of SAA. In turn, we illustrate through several widely-studied problems -- e.g., newsvendor, pricing -- how this methodology can be applied and find that the performance of SAA varies considerably depending on the combinations of problem classes and heterogeneity. The failure of SAA for certain instances motivates the design of alternative policies to achieve rate-optimality. We derive problem-dependent policies achieving strong guarantees for the illustrative problems described above and provide initial results towards a principled approach for the design and analysis of general rate-optimal algorithms.

Updated: 2024-06-19 14:49:17

标题: 超越IID：异质环境中的数据驱动决策-making

摘要: 当过去的观察并不完全代表未来时，人们应该如何利用历史数据，例如由于存在无法“校正”的未观察到的混淆因素？受这个问题的启发，我们研究了一个数据驱动的决策框架，其中假定历史样本是从未知和不同分布中生成的，这些分布被假定位于一个已知半径的异质性球内，其中心位于（同样）未知的未来（样本外）分布，对该决策的性能将进行评估。这项工作旨在分析在这些异质环境中中央数据驱动策略以及接近最优策略的性能，并理解性能的关键因素。我们建立了一个第一结果，允许上限渐近最坏情况后悔的广泛类策略。利用这一结果，对于任何积分概率度量，我们提供了一个关于样本平均近似（SAA）所实现性能的总体分析，作为异质性球半径的函数。该分析围绕近似参数展开，这是我们引入的一个复杂性概念，用于捕捉异质性和问题结构之间的相互作用如何影响SAA的性能。反过来，我们通过几个广泛研究的问题 – 例如，新闻供应商，定价 – 展示了这种方法如何应用，并发现SAA的性能因问题类别和异质性的组合而有很大差异。对于某些情况下SAA的失败促使设计替代策略以实现速率最优性。我们推导了实现强有力保证的问题相关策略，用于描述上述示例问题，并提供了朝着设计和分析一般速率最优算法的原则方法的初步结果。

更新时间: 2024-06-19 14:49:17

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2206.09642v4

Evaluating the Performance of ChatGPT for Spam Email Detection

Email continues to be a pivotal and extensively utilized communication medium within professional and commercial domains. Nonetheless, the prevalence of spam emails poses a significant challenge for users, disrupting their daily routines and diminishing productivity. Consequently, accurately identifying and filtering spam based on content has become crucial for cybersecurity. Recent advancements in natural language processing, particularly with large language models like ChatGPT, have shown remarkable performance in tasks such as question answering and text generation. However, its potential in spam identification remains underexplored. To fill in the gap, this study attempts to evaluate ChatGPT's capabilities for spam identification in both English and Chinese email datasets. We employ ChatGPT for spam email detection using in-context learning, which requires a prompt instruction and a few demonstrations. We also investigate how the number of demonstrations in the prompt affects the performance of ChatGPT. For comparison, we also implement five popular benchmark methods, including naive Bayes, support vector machines (SVM), logistic regression (LR), feedforward dense neural networks (DNN), and BERT classifiers. Through extensive experiments, the performance of ChatGPT is significantly worse than deep supervised learning methods in the large English dataset, while it presents superior performance on the low-resourced Chinese dataset.

Updated: 2024-06-19 14:49:09

标题: 评估ChatGPT在垃圾邮件检测中的性能

摘要: 电子邮件在专业和商业领域仍然是一个至关重要且广泛使用的沟通媒介。然而，垃圾邮件的普及给用户带来了重大挑战，打乱了他们的日常工作流程并降低了生产力。因此，基于内容准确识别和过滤垃圾邮件已成为网络安全的关键。最近在自然语言处理方面取得的进展，特别是像ChatGPT这样的大型语言模型，在诸如问答和文本生成等任务中表现出卓越性能。然而，它在垃圾邮件识别方面的潜力尚未得到充分挖掘。为填补这一空白，本研究尝试评估ChatGPT在英文和中文电子邮件数据集中用于垃圾邮件识别的能力。我们利用ChatGPT进行垃圾邮件检测，采用上下文学习，需要一个提示指令和一些演示。我们还研究了提示中演示数量对ChatGPT性能的影响。为了比较，我们还实施了五种流行的基准方法，包括朴素贝叶斯、支持向量机（SVM）、逻辑回归（LR）、前馈密集神经网络（DNN）和BERT分类器。通过广泛实验，ChatGPT在大型英文数据集中的性能明显低于深度监督学习方法，而在资源稀缺的中文数据集中表现出优越性能。

更新时间: 2024-06-19 14:49:09

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.15537v2

CoDreamer: Communication-Based Decentralised World Models

Sample efficiency is a critical challenge in reinforcement learning. Model-based RL has emerged as a solution, but its application has largely been confined to single-agent scenarios. In this work, we introduce CoDreamer, an extension of the Dreamer algorithm for multi-agent environments. CoDreamer leverages Graph Neural Networks for a two-level communication system to tackle challenges such as partial observability and inter-agent cooperation. Communication is separately utilised within the learned world models and within the learned policies of each agent to enhance modelling and task-solving. We show that CoDreamer offers greater expressive power than a naive application of Dreamer, and we demonstrate its superiority over baseline methods across various multi-agent environments.

Updated: 2024-06-19 14:42:40

标题: CoDreamer：基于通信的去中心化世界模型

摘要: 样本效率是强化学习中的一个关键挑战。基于模型的强化学习已经成为一种解决方案，但其应用主要局限在单智能体场景中。在这项工作中，我们引入了CoDreamer，这是Dreamer算法在多智能体环境中的扩展。CoDreamer利用图神经网络构建了一个两级通信系统，以应对部分可观察性和智能体间的合作等挑战。通信在学习的世界模型和每个智能体的学习策略中分别使用，以增强建模和任务解决能力。我们展示了CoDreamer提供了比Dreamer的简单应用更强的表达能力，并且在各种多智能体环境中展示了其优于基准方法的优越性。

更新时间: 2024-06-19 14:42:40

领域: cs.AI

下载: http://arxiv.org/abs/2406.13600v1

Defying the Odds: Solana's Unexpected Resilience in Spite of the Security Challenges Faced by Developers

Solana gained considerable attention as one of the most popular blockchain platforms for deploying decentralized applications. Compared to Ethereum, however, we observe a lack of research on how Solana smart contract developers handle security, what challenges they encounter, and how this affects the overall security of the ecosystem. To address this, we conducted the first comprehensive study on the Solana platform consisting of a 90-minute Solana smart contract code review task with 35 participants followed by interviews with a subset of seven participants. Our study shows, quite alarmingly, that none of the participants could detect all important security vulnerabilities in a code review task and that 83% of the participants are likely to release vulnerable smart contracts. Our study also sheds light on the root causes of developers' challenges with Solana smart contract development, suggesting the need for better security guidance and resources. In spite of these challenges, our automated analysis on currently deployed Solana smart contracts surprisingly suggests that the prevalence of vulnerabilities - especially those pointed out as the most challenging in our developer study - is below 0.3%. We explore the causes of this counter-intuitive resilience and show that frameworks, such as Anchor, are aiding Solana developers in deploying secure contracts.

Updated: 2024-06-19 14:42:33

标题: 逆势而行：索拉娜面对开发者所面临的安全挑战时意外展现的韧性

摘要: Solana作为最受欢迎的区块链平台之一，用于部署去中心化应用程序时引起了相当大的关注。然而，与以太坊相比，我们观察到对Solana智能合约开发者如何处理安全性以及他们遇到的挑战的研究不足，以及这如何影响整个生态系统的安全性。为了解决这个问题，我们进行了第一次关于Solana平台的全面研究，包括一个90分钟的Solana智能合约代码审查任务，有35名参与者参与，并进行了一些参与者的访谈。我们的研究显示，令人震惊的是，没有一个参与者能够在代码审查任务中检测到所有重要的安全漏洞，83%的参与者可能发布易受攻击的智能合约。我们的研究还揭示了Solana智能合约开发者面临挑战的根本原因，表明需要更好的安全指导和资源。尽管存在这些挑战，我们对目前部署的Solana智能合约进行的自动化分析令人惊讶地表明，漏洞的普遍性 - 尤其是在我们的开发者研究中指出的最具挑战性的漏洞 - 低于0.3%。我们探讨了这种逆常的弹性的原因，并表明Anchor等框架正在帮助Solana开发者部署安全合约。

更新时间: 2024-06-19 14:42:33

领域: cs.CR

下载: http://arxiv.org/abs/2406.13599v1

GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold Networks

Massive number of applications involve data with underlying relationships embedded in non-Euclidean space. Graph neural networks (GNNs) are utilized to extract features by capturing the dependencies within graphs. Despite groundbreaking performances, we argue that Multi-layer perceptrons (MLPs) and fixed activation functions impede the feature extraction due to information loss. Inspired by Kolmogorov Arnold Networks (KANs), we make the first attempt to GNNs with KANs. We discard MLPs and activation functions, and instead used KANs for feature extraction. Experiments demonstrate the effectiveness of GraphKAN, emphasizing the potential of KANs as a powerful tool. Code is available at https://github.com/Ryanfzhang/GraphKan.

Updated: 2024-06-19 14:41:09

标题: GraphKAN：利用图科尔莫哥洛夫·阿诺德网络增强特征提取

摘要: 大量应用涉及嵌入在非欧几里德空间中的具有潜在关联的数据。图神经网络（GNNs）被用于通过捕捉图中的依赖关系来提取特征。尽管表现具有突破性，但我们认为多层感知器（MLPs）和固定激活函数阻碍了特征提取，因为会导致信息丢失。受 Kolmogorov Arnold Networks（KANs）的启发，我们首次尝试了带有KANs的GNNs。我们放弃了MLPs和激活函数，而是使用KANs进行特征提取。实验证明了GraphKAN的有效性，强调了KANs作为强大工具的潜力。代码可在 https://github.com/Ryanfzhang/GraphKan 找到。

更新时间: 2024-06-19 14:41:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13597v1

A Survey of Data-Efficient Graph Learning

Graph-structured data, prevalent in domains ranging from social networks to biochemical analysis, serve as the foundation for diverse real-world systems. While graph neural networks demonstrate proficiency in modeling this type of data, their success is often reliant on significant amounts of labeled data, posing a challenge in practical scenarios with limited annotation resources. To tackle this problem, tremendous efforts have been devoted to enhancing graph machine learning performance under low-resource settings by exploring various approaches to minimal supervision. In this paper, we introduce a novel concept of Data-Efficient Graph Learning (DEGL) as a research frontier, and present the first survey that summarizes the current progress of DEGL. We initiate by highlighting the challenges inherent in training models with large labeled data, paving the way for our exploration into DEGL. Next, we systematically review recent advances on this topic from several key aspects, including self-supervised graph learning, semi-supervised graph learning, and few-shot graph learning. Also, we state promising directions for future research, contributing to the evolution of graph machine learning.

Updated: 2024-06-19 14:34:24

标题: 一项关于数据高效图学习的调查

摘要: 图结构化数据，在社交网络到生物化学分析等领域普遍存在，为多样真实世界系统的基础。虽然图神经网络展现出对这种类型数据建模的熟练能力，但它们的成功往往依赖于大量标记数据，在标注资源有限的实际场景中面临挑战。为了解决这个问题，人们致力于通过探索各种最小监督方法来提升低资源环境下的图机器学习性能。本文介绍了一种名为数据高效图学习（DEGL）的新概念作为研究前沿，并提出了第一份总结DEGL当前进展的调查报告。我们首先强调了使用大量标记数据训练模型所固有的挑战，为我们探索DEGL铺平道路。接下来，我们系统地回顾了该主题的最新进展，包括自监督图学习、半监督图学习和少样本图学习等几个关键方面。此外，我们指出了未来研究的有前景方向，有助于图机器学习的发展。

更新时间: 2024-06-19 14:34:24

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2402.00447v4

Submodular Participatory Budgeting

Participatory budgeting refers to the practice of allocating public resources by collecting and aggregating individual preferences. Most existing studies in this field often assume an additive utility function, where each individual holds a private utility for each candidate project, and the total utility of a set of funded projects is simply the sum of the utilities of all projects. We argue that this assumption does not always hold in reality. For example, building two playgrounds in the same neighborhood does not necessarily lead to twice the utility of building a single playground. To address this, we extend the existing study by proposing a submodular participatory budgeting problem, assuming that the utility function of each individual is a monotone and submodular function over funded projects. We propose and examine three preference elicitation methods, including \emph{ranking-by-marginal-values}, \emph{ranking-by-values} and \emph{threshold approval votes}, and analyze their performances in terms of distortion. Notably, if the utility function is addicative, our aggregation rule designed for threshold approval votes achieves a better distortion than the state-of-the-art approach.

Updated: 2024-06-19 14:22:54

标题: 子模块参与式预算

摘要: 参与式预算是指通过收集和汇总个体偏好来分配公共资源的实践。这个领域中大多数现有研究通常假设加性效用函数，即每个个体对每个候选项目都持有私人效用，并且资助项目集的总效用简单地是所有项目的效用之和。我们认为这种假设在现实中并不总是成立。例如，在同一个社区建造两个游乐场并不一定会带来建造一个游乐场的两倍效用。为了解决这个问题，我们通过提出一个次模式参与式预算问题来扩展现有研究，假设每个个体的效用函数是对资助项目单调且次模式的函数。我们提出并考察了三种偏好引导方法，包括按边际值排名、按值排名和阈值批准投票，并分析了它们在失真方面的表现。值得注意的是，如果效用函数是加性的，我们为阈值批准投票设计的聚合规则比现有技术方法表现出更好的失真率。

更新时间: 2024-06-19 14:22:54

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2406.13586v1

MEV Ecosystem Evolution From Ethereum 1.0

Smart contracts led to the emergence of the decentralized finance (DeFi) marketplace within blockchain ecosystems, where diverse participants engage in financial activities. In traditional finance, there are possibilities to create values, e.g., arbitrage offers to create value from market inefficiencies or front-running offers to extract value for the participants having privileged roles. Such opportunities are readily available -- searching programmatically in DeFi. It is commonly known as Maximal Extractable Value (MEV) in the literature. In this survey, first, we show how lucrative such opportunities can be. Next, we discuss how protocol-following participants trying to capture such opportunities threaten to sabotage blockchain's performance and the core tenets of decentralization, transparency, and trustlessness that blockchains are based on. Then, we explain different attempts by the community in the past to address these issues and the problems introduced by these solutions. Finally, we review the current state of research trying to restore trustlessness and decentralization to provide all DeFi participants with a fair marketplace.

Updated: 2024-06-19 14:22:26

标题: MEV生态系统从以太坊1.0的演变

摘要: 智能合约导致了在区块链生态系统内出现了去中心化金融（DeFi）市场，各种参与者在其中进行金融活动。在传统金融领域，存在着创造价值的可能性，例如利用套利机会从市场的不完喟性中创造价值，或者利用前瞻性机会为具有特权角色的参与者提取价值。这样的机会在DeFi中通过程序搜索就能够轻松找到，这在文献中通常被称为最大可提取价值（MEV）。在这项调查中，首先我们展示了这些机会有多么具有吸引力。接下来，我们讨论了遵循协议的参与者试图抓住这些机会如何威胁到了区块链的性能以及其基于去中心化、透明和无需信任的核心原则。然后，我们解释了社区过去为解决这些问题所做的不同尝试以及这些解决方案引入的问题。最后，我们回顾了当前的研究状态，试图恢复无需信任和去中心化，为所有DeFi参与者提供一个公平的市场。

更新时间: 2024-06-19 14:22:26

领域: cs.CR

下载: http://arxiv.org/abs/2406.13585v1

Explaining time series models using frequency masking

Time series data is fundamentally important for describing many critical domains such as healthcare, finance, and climate, where explainable models are necessary for safe automated decision-making. To develop eXplainable AI (XAI) in these domains therefore implies explaining salient information in the time series. Current methods for obtaining saliency maps assumes localized information in the raw input space. In this paper, we argue that the salient information of a number of time series is more likely to be localized in the frequency domain. We propose FreqRISE, which uses masking based methods to produce explanations in the frequency and time-frequency domain, which shows the best performance across a number of tasks.

Updated: 2024-06-19 14:19:59

标题: 用频率屏蔽解释时间序列模型

摘要: 时间序列数据在描述许多关键领域（如医疗保健、金融和气候）中具有根本重要性，这些领域需要可解释的模型来进行安全的自动决策。因此，在这些领域开发可解释的人工智能（XAI）意味着解释时间序列中的显著信息。目前获取显著性地图的方法假设原始输入空间中存在局部信息。在本文中，我们认为一些时间序列的显著信息更可能在频域中局部化。我们提出了FreqRISE，它使用基于掩码的方法在频率和时频域中产生解释，这在许多任务中表现出最佳性能。

更新时间: 2024-06-19 14:19:59

领域: cs.LG

下载: http://arxiv.org/abs/2406.13584v1

Calibrating Neural Networks' parameters through Optimal Contraction in a Prediction Problem

This study introduces a novel approach to ensure the existence and uniqueness of optimal parameters in neural networks. The paper details how a recurrent neural networks (RNN) can be transformed into a contraction in a domain where its parameters are linear. It then demonstrates that a prediction problem modeled through an RNN, with a specific regularization term in the loss function, can have its first-order conditions expressed analytically. This system of equations is reduced to two matrix equations involving Sylvester equations, which can be partially solved. We establish that, if certain conditions are met, optimal parameters exist, are unique, and can be found through a straightforward algorithm to any desired precision. Also, as the number of neurons grows the conditions of convergence become easier to fulfill. Feedforward neural networks (FNNs) are also explored by including linear constraints on parameters. According to our model, incorporating loops (with fixed or variable weights) will produce loss functions that train easier, because it assures the existence of a region where an iterative method converges.

Updated: 2024-06-19 14:16:03

标题: 通过预测问题中的最优收缩来校准神经网络的参数

摘要: 这项研究介绍了一种确保神经网络中最优参数存在且唯一的新方法。本文详细介绍了如何将递归神经网络（RNN）转化为一个收缩，在其参数为线性的领域中。然后演示了通过具有特定正则化项的损失函数建模的RNN中的预测问题可以通过解析的方式表达其一阶条件。这些方程组被简化为涉及Sylvester方程的两个矩阵方程，可以部分解决。我们建立了如果满足某些条件，则最优参数存在、唯一，并且可以通过一个简单的算法找到任何所需的精度。此外，随着神经元数量的增加，收敛条件变得更容易满足。前馈神经网络（FNNs）还通过在参数上包含线性约束进行了探索。根据我们的模型，包含循环（具有固定或可变权重）将产生训练更容易的损失函数，因为它确保了存在一个区域，其中迭代方法收敛。

更新时间: 2024-06-19 14:16:03

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2406.10703v2

Trusted Video Inpainting Localization via Deep Attentive Noise Learning

Digital video inpainting techniques have been substantially improved with deep learning in recent years. Although inpainting is originally designed to repair damaged areas, it can also be used as malicious manipulation to remove important objects for creating false scenes and facts. As such it is significant to identify inpainted regions blindly. In this paper, we present a Trusted Video Inpainting Localization network (TruVIL) with excellent robustness and generalization ability. Observing that high-frequency noise can effectively unveil the inpainted regions, we design deep attentive noise learning in multiple stages to capture the inpainting traces. Firstly, a multi-scale noise extraction module based on 3D High Pass (HP3D) layers is used to create the noise modality from input RGB frames. Then the correlation between such two complementary modalities are explored by a cross-modality attentive fusion module to facilitate mutual feature learning. Lastly, spatial details are selectively enhanced by an attentive noise decoding module to boost the localization performance of the network. To prepare enough training samples, we also build a frame-level video object segmentation dataset of 2500 videos with pixel-level annotation for all frames. Extensive experimental results validate the superiority of TruVIL compared with the state-of-the-arts. In particular, both quantitative and qualitative evaluations on various inpainted videos verify the remarkable robustness and generalization ability of our proposed TruVIL. Code and dataset will be available at https://github.com/multimediaFor/TruVIL.

Updated: 2024-06-19 14:08:58

标题: 通过深度关注噪声学习的可信视频修复定位

摘要: 数字视频修复技术近年来在深度学习的帮助下得到了显著改进。尽管修复最初是为了修复受损区域，但也可以被用作恶意操作，以移除重要对象，从而创造虚假的场景和事实。因此，盲目识别修复区域具有重要意义。在本文中，我们提出了一个具有出色鲁棒性和泛化能力的可信视频修复定位网络（TruVIL）。观察到高频噪声可以有效地揭示修复区域，我们设计了深度关注噪声学习，以在多个阶段捕捉修复痕迹。首先，基于3D高通（HP3D）层的多尺度噪声提取模块用于从输入RGB帧创建噪声模态。然后，通过跨模态关注融合模块探索了这两种互补模态之间的相关性，以促进相互特征学习。最后，通过关注噪声解码模块有选择地增强了空间细节，以提升网络的定位性能。为了准备足够的训练样本，我们还建立了一个包含2500个视频的帧级视频对象分割数据集，对所有帧进行了像素级注释。广泛的实验结果验证了与最新技术相比，TruVIL的优越性。特别是，对各种修复视频的定量和定性评估验证了我们提出的TruVIL的显著鲁棒性和泛化能力。代码和数据集将在https://github.com/multimediaFor/TruVIL 上提供。

更新时间: 2024-06-19 14:08:58

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.13576v1

Extending Input Contexts of Language Models through Training on Segmented Sequences

Effectively training language models on long inputs poses many technical challenges. As a cost consideration, languages models are pretrained on a fixed sequence length before being adapted to longer sequences. We explore various methods for adapting models to longer inputs by training on segmented sequences and an interpolation-based method for extending absolute positional embeddings. We develop a training procedure to extend the input context size of pretrained models with no architectural changes and no additional memory costs than training on the original input lengths. By sub-sampling segments from long inputs while maintaining their original position the model is able to learn new positional interactions. Our method benefits both models trained with absolute positional embeddings, by extending their input contexts, as well as popular relative positional embedding methods showing a reduced perplexity on sequences longer than they were trained on. We demonstrate our method can extend input contexts by a factor of 4x while improving perplexity.

Updated: 2024-06-19 14:00:27

标题: 通过对分段序列进行训练扩展语言模型的输入上下文

摘要: 在长输入上有效地训练语言模型面临许多技术挑战。出于成本考虑，语言模型在适应更长序列之前会在固定序列长度上进行预训练。我们通过在分段序列上训练和基于插值的方法来扩展绝对位置嵌入，探索了适应模型更长输入的各种方法。我们开发了一种训练程序，可以在不进行架构更改和不增加额外内存成本的情况下扩展预训练模型的输入上下文大小。通过从长输入中对段进行子采样，并保持它们的原始位置，模型能够学习新的位置交互。我们的方法不仅有利于使用绝对位置嵌入训练的模型，还能通过扩展输入上下文的方式减少相对位置嵌入方法在比它们训练的序列更长时的困惑度。我们证明了我们的方法可以将输入上下文扩展4倍同时改善困惑度。

更新时间: 2024-06-19 14:00:27

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.14633v3

Bayes' capacity as a measure for reconstruction attacks in federated learning

Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of a training element given an observation of the weight updates performed during stochastic gradient descent. In response to these threats, the privacy community recommends the use of differential privacy in the stochastic gradient descent algorithm, termed DP-SGD. However, DP has not yet been formally established as an effective countermeasure against reconstruction attacks. In this paper, we formalise the reconstruction threat model using the information-theoretic framework of quantitative information flow. We show that the Bayes' capacity, related to the Sibson mutual information of order infinity, represents a tight upper bound on the leakage of the DP-SGD algorithm to an adversary interested in performing a reconstruction attack. We provide empirical results demonstrating the effectiveness of this measure for comparing mechanisms against reconstruction threats.

Updated: 2024-06-19 13:58:42

标题: 贝叶斯容量作为联邦学习中重建攻击的度量方式

摘要: 在机器学习社区中，重构攻击是一个主要的关注点，甚至在旨在保护隐私的联邦学习中也已经被确定。在联邦学习中，已经表明，一个了解机器学习架构的对手能够推断出在随机梯度下降期间进行的权重更新观察到的训练元素的确切值。作为对这些威胁的回应，隐私社区建议在随机梯度下降算法中使用差分隐私，称为DP-SGD。然而，DP尚未正式被确立为有效的对抗重构攻击的对策。在本文中，我们利用定量信息流的信息论框架，形式化了重构威胁模型。我们展示了与Sibson无穷阶互信息相关的贝叶斯容量代表了DP-SGD算法泄露给对重构攻击感兴趣的对手的上限。我们提供了实证结果，证明了这一措施对比机制在面对重构威胁时的有效性。

更新时间: 2024-06-19 13:58:42

领域: cs.LG,cs.AI,cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2406.13569v1

Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance comparable to deep neural networks, have garnered widespread attention. To reduce the energy consumption of practical applications of reinforcement learning, researchers have successively proposed the Pop-SAN and MDC-SAN algorithms. Nonetheless, these algorithms use rectangular functions to approximate the spike network during the training process, resulting in low sensitivity, thus indicating room for improvement in the training effectiveness of SNN. Based on this, we propose a trapezoidal approximation gradient method to replace the spike network, which not only preserves the original stable learning state but also enhances the model's adaptability and response sensitivity under various signal dynamics. Simulation results show that the improved algorithm, using the trapezoidal approximation gradient to replace the spike network, achieves better convergence speed and performance compared to the original algorithm and demonstrates good training stability.

Updated: 2024-06-19 13:56:22

标题: 梯形梯度下降算法在尖峰网络中的有效强化学习

摘要: 随着人工智能技术的快速发展，强化学习领域在理论和实践上不断取得突破。然而，传统的强化学习算法在与环境交互过程中往往会产生高能耗。脉冲神经网络（SNN）以其低能耗特性和与深度神经网络可比的性能而受到广泛关注。为了降低强化学习实际应用的能耗，研究人员先后提出了Pop-SAN和MDC-SAN算法。然而，这些算法在训练过程中使用矩形函数来逼近脉冲网络，导致灵敏度较低，因此表明脉冲神经网络的训练效果有待改进。基于此，我们提出了一种梯形逼近梯度方法来替代脉冲网络，这不仅保留了原始的稳定学习状态，还增强了模型在各种信号动态下的适应性和响应灵敏度。模拟结果显示，使用梯形逼近梯度代替脉冲网络的改进算法，相比于原始算法，实现了更好的收敛速度和性能，并表现出良好的训练稳定性。

更新时间: 2024-06-19 13:56:22

领域: cs.AI

下载: http://arxiv.org/abs/2406.13568v1

Exploring Multi-view Pixel Contrast for General and Robust Image Forgery Localization

Image forgery localization, which aims to segment tampered regions in an image, is a fundamental yet challenging digital forensic task. While some deep learning-based forensic methods have achieved impressive results, they directly learn pixel-to-label mappings without fully exploiting the relationship between pixels in the feature space. To address such deficiency, we propose a Multi-view Pixel-wise Contrastive algorithm (MPC) for image forgery localization. Specifically, we first pre-train the backbone network with the supervised contrastive loss to model pixel relationships from the perspectives of within-image, cross-scale and cross-modality. That is aimed at increasing intra-class compactness and inter-class separability. Then the localization head is fine-tuned using the cross-entropy loss, resulting in a better pixel localizer. The MPC is trained on three different scale training datasets to make a comprehensive and fair comparison with existing image forgery localization algorithms. Extensive experiments on the small, medium and large scale training datasets show that the proposed MPC achieves higher generalization performance and robustness against post-processing than the state-of-the-arts. Code will be available at https://github.com/multimediaFor/MPC.

Updated: 2024-06-19 13:51:52

标题: 探索多视角像素对比用于一般和稳健图像伪造定位

摘要: 图像伪造定位旨在分割图像中的篡改区域，是一项基础但具有挑战性的数字取证任务。虽然一些基于深度学习的取证方法取得了令人印象深刻的结果，但它们直接学习像素到标签的映射，而没有充分利用特征空间中像素之间的关系。为了解决这种不足，我们提出了一种多视角像素对比算法（MPC）用于图像伪造定位。具体来说，我们首先使用有监督对比损失对骨干网络进行预训练，以从图像内部、跨尺度和跨模态的角度建模像素关系。这旨在增加类内紧凑性和类间可分性。然后使用交叉熵损失微调定位头部，从而获得更好的像素定位器。MPC在三个不同尺度的训练数据集上进行训练，以与现有的图像伪造定位算法进行全面公平的比较。对小型、中型和大型训练数据集的广泛实验表明，所提出的MPC比现有技术具有更高的泛化性能和抗后处理鲁棒性。代码将在https://github.com/multimediaFor/MPC 上提供。

更新时间: 2024-06-19 13:51:52

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.13565v1

Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor

Despite significant advancements in computer vision, understanding complex scenes, particularly those involving humor, remains a substantial challenge. This paper introduces HumorDB, a novel image-only dataset specifically designed to advance visual humor understanding. HumorDB consists of meticulously curated image pairs with contrasting humor ratings, emphasizing subtle visual cues that trigger humor and mitigating potential biases. The dataset enables evaluation through binary classification(Funny or Not Funny), range regression(funniness on a scale from 1 to 10), and pairwise comparison tasks(Which Image is Funnier?), effectively capturing the subjective nature of humor perception. Initial experiments reveal that while vision-only models struggle, vision-language models, particularly those leveraging large language models, show promising results. HumorDB also shows potential as a valuable zero-shot benchmark for powerful large multimodal models. We open-source both the dataset and code under the CC BY 4.0 license.

Updated: 2024-06-19 13:51:40

标题: AI有趣吗？HumorDB：一个精心筛选的数据集和基准，用于研究图形幽默

摘要: 尽管计算机视觉取得了显著进展，但理解复杂场景，特别是涉及幽默的场景，仍然是一个重大挑战。本文介绍了HumorDB，一个新颖的仅包含图像的数据集，专门设计用于推进视觉幽默理解。HumorDB由精心策划的图像对组成，具有不同的幽默评分，强调触发幽默的微妙视觉线索，减轻潜在的偏见。该数据集允许通过二元分类（有趣或无趣）、范围回归（在1到10的尺度上的有趣程度）和成对比较任务（哪张图更有趣）进行评估，有效捕捉幽默感知的主观性质。初步实验表明，尽管仅视觉模型存在困难，但视觉语言模型，特别是利用大型语言模型的模型，表现出有希望的结果。HumorDB还显示了作为强大的大型多模态模型的零样本基准的潜力。我们以CC BY 4.0许可证开源数据集和代码。

更新时间: 2024-06-19 13:51:40

领域: cs.CV,cs.AI,I.5.4

下载: http://arxiv.org/abs/2406.13564v1

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

Large language models (LLMs) have transformed the field of natural language processing, but they remain susceptible to jailbreaking attacks that exploit their capabilities to generate unintended and potentially harmful content. Existing token-level jailbreaking techniques, while effective, face scalability and efficiency challenges, especially as models undergo frequent updates and incorporate advanced defensive measures. In this paper, we introduce JailMine, an innovative token-level manipulation approach that addresses these limitations effectively. JailMine employs an automated "mining" process to elicit malicious responses from LLMs by strategically selecting affirmative outputs and iteratively reducing the likelihood of rejection. Through rigorous testing across multiple well-known LLMs and datasets, we demonstrate JailMine's effectiveness and efficiency, achieving a significant average reduction of 86% in time consumed while maintaining high success rates averaging 95%, even in the face of evolving defensive strategies. Our work contributes to the ongoing effort to assess and mitigate the vulnerability of LLMs to jailbreaking attacks, underscoring the importance of continued vigilance and proactive measures to enhance the security and reliability of these powerful language models.

Updated: 2024-06-19 13:51:06

标题: LLM（Lock Level Models）拆锁：基于Logit的利用令牌级别操纵的越狱

摘要: 大型语言模型(LLMs)已经改变了自然语言处理领域，但它们仍然容易受到越狱攻击的影响，这些攻击利用它们生成意外和潜在有害内容的能力。现有的基于令牌级别的越狱技术，虽然有效，但面临着可扩展性和效率方面的挑战，特别是在模型经常更新并融入高级防御措施时。在本文中，我们介绍了JailMine，这是一种创新的基于令牌级别的操作方法，有效地解决了这些限制。JailMine利用自动化的"挖掘"过程，通过有策略地选择肯定的输出并迭代地降低拒绝的可能性来引发LLMs的恶意响应。通过对多个知名LLMs和数据集进行严格测试，我们展示了JailMine的有效性和效率，实现了平均时间消耗减少86%的显著效果，同时保持95%的高成功率，即使面对不断演进的防御策略也是如此。我们的工作有助于评估和缓解LLMs对越狱攻击的脆弱性的持续努力，强调了继续保持警惕和采取积极措施以增强这些强大语言模型的安全性和可靠性的重要性。

更新时间: 2024-06-19 13:51:06

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.13068v2

Solarcast-ML: Per Node GraphCast Extension for Solar Energy Production

This project presents an extension to the GraphCast model, a state-of-the-art graph neural network (GNN) for global weather forecasting, by integrating solar energy production forecasting capabilities. The proposed approach leverages the weather forecasts generated by GraphCast and trains a neural network model to predict the ratio of actual solar output to potential solar output based on various weather conditions. The model architecture consists of an input layer corresponding to weather features (temperature, humidity, dew point, wind speed, rain, barometric pressure, and altitude), two hidden layers with ReLU activations, and an output layer predicting solar radiation. The model is trained using a mean absolute error loss function and Adam optimizer. The results demonstrate the model's effectiveness in accurately predicting solar radiation, with its convergence behavior, decreasing training loss, and accurate prediction of solar radiation patterns suggesting successful learning of the underlying relationships between weather conditions and solar radiation. The integration of solar energy production forecasting with GraphCast offers valuable insights for the renewable energy sector, enabling better planning and decision-making based on expected solar energy production. Future work could explore further model refinements, incorporation of additional weather variables, and extension to other renewable energy sources.

Updated: 2024-06-19 13:47:05

标题: Solarcast-ML: 用于太阳能生产的每节点图传播扩展

摘要: 这个项目提出了对GraphCast模型的扩展，GraphCast是一种用于全球天气预报的最先进的图神经网络（GNN），通过整合太阳能生产预测能力。提出的方法利用GraphCast生成的天气预报，并训练一个神经网络模型来预测实际太阳输出与潜在太阳输出之间的比率，基于各种天气条件。模型架构包括一个与天气特征（温度、湿度、露点、风速、降雨、气压和海拔）相对应的输入层，两个具有ReLU激活的隐藏层，以及一个预测太阳辐射的输出层。该模型使用平均绝对误差损失函数和Adam优化器进行训练。结果表明，该模型在准确预测太阳辐射方面的有效性，其收敛行为、降低的训练损失以及准确预测太阳辐射模式表明成功学习了天气条件和太阳辐射之间的基础关系。将太阳能生产预测与GraphCast集成，为可再生能源部门提供了宝贵的见解，使其能够根据预期的太阳能生产进行更好的规划和决策。未来的工作可以进一步探索模型的改进、引入额外的天气变量，并将其扩展到其他可再生能源来源。

更新时间: 2024-06-19 13:47:05

领域: cs.LG

下载: http://arxiv.org/abs/2406.13559v1

Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach

Travel choice analysis is crucial for understanding individual travel behavior to develop appropriate transport policies and recommendation systems in Intelligent Transportation Systems (ITS). Despite extensive research, this domain faces two critical challenges: a) modeling with limited survey data, and b) simultaneously achieving high model explainability and accuracy. In this paper, we introduce a novel prompt-learning-based Large Language Model(LLM) framework that significantly improves prediction accuracy and provides explicit explanations for individual predictions. This framework involves three main steps: transforming input variables into textual form; building of demonstrations similar to the object, and applying these to a well-trained LLM. We tested the framework's efficacy using two widely used choice datasets: London Passenger Mode Choice (LPMC) and Optima-Mode collected in Switzerland. The results indicate that the LLM significantly outperforms state-of-the-art deep learning methods and discrete choice models in predicting people's choices. Additionally, we present a case of explanation illustrating how the LLM framework generates understandable and explicit explanations at the individual level.

Updated: 2024-06-19 13:46:08

标题: 用大型语言模型增强旅行选择建模：一种提示学习方法

摘要: 旅行选择分析对于理解个体出行行为、制定适当的交通政策和在智能交通系统（ITS）中建立推荐系统至关重要。尽管进行了大量研究，但该领域面临两个关键挑战：a）使用有限调查数据进行建模，b）同时实现高模型可解释性和准确性。本文介绍了一种基于新型提示学习的大型语言模型（LLM）框架，显著提高了预测准确性，并为个体预测提供明确解释。该框架包括三个主要步骤：将输入变量转化为文本形式；构建类似于对象的演示，并将其应用于经过良好训练的LLM。我们使用两个广泛使用的选择数据集：伦敦乘客出行方式选择（LPMC）和在瑞士收集的Optima-Mode 来测试该框架的有效性。结果表明，LLM在预测人们的选择方面明显优于最先进的深度学习方法和离散选择模型。此外，我们展示了一个解释案例，说明了LLM框架如何在个体水平上生成可理解和明确的解释。

更新时间: 2024-06-19 13:46:08

领域: cs.AI

下载: http://arxiv.org/abs/2406.13558v1

BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation

In recent years, large language models (LLMs) have shown exceptional capabilities across various natural language processing (NLP) tasks. However, such impressive performance often comes with the trade-off of an increased parameter size, posing significant challenges for widespread deployment. Knowledge distillation (KD) provides a solution by transferring knowledge from a large teacher model to a smaller student model. In this paper, we explore the task-specific distillation of LLMs at the logit level. Our investigation reveals that the logits of fine-tuned LLMs exhibit a more extreme long-tail distribution than those from vision models, with hidden "noise" in the long tail affecting distillation performance. Furthermore, existing logits distillation methods often struggle to effectively utilize the internal ranking information from the logits. To address these, we propose the Bi-directional Logits Difference (BiLD) loss. The BiLD loss filters out the long-tail noise by utilizing only top-$k$ teacher and student logits, and leverages the internal logits ranking information by constructing logits differences. To evaluate BiLD loss, we conduct comprehensive experiments on 13 datasets using two types of LLMs. Our results show that the BiLD loss, with only the top-8 logits, outperforms supervised fine-tuning (SFT), vanilla KL loss, and five other distillation methods from both NLP and CV fields.

Updated: 2024-06-19 13:44:56

标题: BiLD：用于大型语言模型蒸馏的双向对数差异损失

摘要: 近年来，大型语言模型（LLMs）在各种自然语言处理（NLP）任务中展现出了出色的能力。然而，这种令人印象深刻的性能往往伴随着参数大小的增加，给广泛部署带来了重大挑战。知识蒸馏（KD）通过将知识从大型教师模型转移至较小的学生模型提供了一种解决方案。在本文中，我们探讨了LLMs在logit级别的任务特定蒸馏。我们的研究表明，经过微调的LLMs的logits呈现出比视觉模型更极端的长尾分布，长尾中的隐藏“噪音”影响了蒸馏性能。此外，现有的logits蒸馏方法往往难以有效利用logits的内部排名信息。为了解决这些问题，我们提出了双向Logits差异（BiLD）损失。BiLD损失通过仅利用前$k$个教师和学生logits来过滤长尾噪音，并通过构建logits差异来利用内部logits排名信息。为了评估BiLD损失，我们在13个数据集上使用两种类型的LLMs进行了全面实验。我们的结果表明，仅使用前8个logits的BiLD损失优于受监督微调（SFT）、基本KL损失和来自NLP和CV领域的其他五种蒸馏方法。

更新时间: 2024-06-19 13:44:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13555v1

Standardness Fogs Meaning: A Position Regarding the Informed Usage of Standard Datasets

Standard datasets are frequently used to train and evaluate Machine Learning models. However, the assumed standardness of these datasets leads to a lack of in-depth discussion on how their labels match the derived categories for the respective use case. In other words, the standardness of the datasets seems to fog coherency and applicability, thus impeding the trust in Machine Learning models. We propose to adopt Grounded Theory and Hypotheses Testing through Visualization as methods to evaluate the match between use case, derived categories, and labels of standard datasets. To showcase the approach, we apply it to the 20 Newsgroups dataset and the MNIST dataset. For the 20 Newsgroups dataset, we demonstrate that the labels are imprecise. Therefore, we argue that neither a Machine Learning model can learn a meaningful abstraction of derived categories nor one can draw conclusions from achieving high accuracy. For the MNIST dataset, we demonstrate how the labels can be confirmed to be defined well. We conclude that a concept of standardness of a dataset implies that there is a match between use case, derived categories, and class labels, as in the case of the MNIST dataset. We argue that this is necessary to learn a meaningful abstraction and, thus, improve trust in the Machine Learning model.

Updated: 2024-06-19 13:39:05

标题: 标准化模糊的含义：关于标准数据集的知情使用的立场

摘要: 标准数据集经常被用来训练和评估机器学习模型。然而，这些数据集的标准性导致缺乏对它们的标签如何与相应用例的衍生类别匹配的深入讨论。换句话说，数据集的标准性似乎模糊了连贯性和适用性，从而阻碍了对机器学习模型的信任。我们提议采用基于理论和通过可视化进行假设测试的方法来评估用例、衍生类别和标准数据集标签之间的匹配。为展示这一方法，我们将其应用于20个新闻组数据集和MNIST数据集。对于20个新闻组数据集，我们展示了标签不精确的情况。因此，我们认为，机器学习模型既不能学习到有意义的衍生类别的抽象，也不能从达到高准确度得出结论。对于MNIST数据集，我们展示了如何确认标签被定义得很好。我们得出结论，数据集的标准性概念意味着用例、衍生类别和类标签之间存在匹配，就像MNIST数据集的情况一样。我们认为这对于学习有意义的抽象是必要的，因此可以提高对机器学习模型的信任。

更新时间: 2024-06-19 13:39:05

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2406.13552v1

Mitigating Social Biases in Language Models through Unlearning

Mitigating bias in language models (LMs) has become a critical problem due to the widespread deployment of LMs. Numerous approaches revolve around data pre-processing and fine-tuning of language models, tasks that can be both time-consuming and computationally demanding. Consequently, there is a growing interest in machine unlearning techniques given their capacity to induce the forgetting of undesired behaviors of the existing pre-trained or fine-tuned models with lower computational cost. In this work, we explore two unlearning methods, (1) Partitioned Contrastive Gradient Unlearning (PCGU) applied on decoder models and (2) Negation via Task Vector, to reduce social biases in state-of-the-art and open-source LMs such as LLaMA-2 and OPT. We also implement distributed PCGU for large models. It is empirically shown, through quantitative and qualitative analyses, that negation via Task Vector method outperforms PCGU in debiasing with minimum deterioration in performance and perplexity of the models. On LLaMA-27B, negation via Task Vector reduces the bias score by 11.8%

Updated: 2024-06-19 13:38:34

标题: 通过取消学习减轻语言模型中的社会偏见

摘要: 随着语言模型（LMs）的广泛部署，减轻语言模型中的偏见已经成为一个关键问题。许多方法围绕数据预处理和语言模型的微调展开，这些任务既耗时又需要大量计算资源。因此，人们越来越关注机器遗忘技术，因为它们能够以较低的计算成本诱导现有预训练或微调模型忘记不良行为。在这项工作中，我们探索了两种遗忘方法，一种是应用于解码器模型的分区对比梯度遗忘（PCGU），另一种是通过任务向量进行否定，以减少社会偏见在最先进和开源LMs（如LLaMA-2和OPT）中的影响。我们还为大型模型实现了分布式PCGU。通过定量和定性分析，实验证明通过任务向量进行否定方法在减轻偏见方面优于PCGU，同时对模型的性能和困惑度的损失最小。在LLaMA-27B上，通过任务向量进行否定将偏见分数降低了11.8%。

更新时间: 2024-06-19 13:38:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13551v1

CoSD: Collaborative Stance Detection with Contrastive Heterogeneous Topic Graph Learning

Stance detection seeks to identify the viewpoints of individuals either in favor or against a given target or a controversial topic. Current advanced neural models for stance detection typically employ fully parametric softmax classifiers. However, these methods suffer from several limitations, including lack of explainability, insensitivity to the latent data structure, and unimodality, which greatly restrict their performance and applications. To address these challenges, we present a novel collaborative stance detection framework called (CoSD) which leverages contrastive heterogeneous topic graph learning to learn topic-aware semantics and collaborative signals among texts, topics, and stance labels for enhancing stance detection. During training, we construct a heterogeneous graph to structurally organize texts and stances through implicit topics via employing latent Dirichlet allocation. We then perform contrastive graph learning to learn heterogeneous node representations, aggregating informative multi-hop collaborative signals via an elaborate Collaboration Propagation Aggregation (CPA) module. During inference, we introduce a hybrid similarity scoring module to enable the comprehensive incorporation of topic-aware semantics and collaborative signals for stance detection. Extensive experiments on two benchmark datasets demonstrate the state-of-the-art detection performance of CoSD, verifying the effectiveness and explainability of our collaborative framework.

Updated: 2024-06-19 13:34:24

标题: CoSD：基于对比异质主题图学习的协作立场检测

摘要: 态度检测旨在确定个人对某一目标或有争议话题的支持或反对观点。当前先进的神经模型通常采用完全参数化的softmax分类器进行态度检测。然而，这些方法存在一些限制，包括缺乏可解释性、对潜在数据结构不敏感和单峰性，这严重限制了它们的性能和应用。为了解决这些挑战，我们提出了一种名为CoSD的新颖协作态度检测框架，利用对比异质主题图学习来学习主题感知语义和文本、主题和态度标签之间的协作信号，以增强态度检测。在训练期间，我们通过使用潜在狄利克雷分配构建异质图，通过隐含主题来结构化组织文本和态度。然后，我们进行对比图学习，通过精心设计的协作传播聚合（CPA）模块聚合信息丰富的多跳协作信号，学习异质节点表示。在推理过程中，我们引入了混合相似性评分模块，以实现对主题感知语义和协作信号的全面整合，用于态度检测。对两个基准数据集的大量实验表明，CoSD的检测性能处于业界领先地位，验证了我们的协作框架的有效性和可解释性。

更新时间: 2024-06-19 13:34:24

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.17609v2

ModSec-Learn: Boosting ModSecurity with Machine Learning

ModSecurity is widely recognized as the standard open-source Web Application Firewall (WAF), maintained by the OWASP Foundation. It detects malicious requests by matching them against the Core Rule Set (CRS), identifying well-known attack patterns. Each rule is manually assigned a weight based on the severity of the corresponding attack, and a request is blocked if the sum of the weights of matched rules exceeds a given threshold. However, we argue that this strategy is largely ineffective against web attacks, as detection is only based on heuristics and not customized on the application to protect. In this work, we overcome this issue by proposing a machine-learning model that uses the CRS rules as input features. Through training, ModSec-Learn is able to tune the contribution of each CRS rule to predictions, thus adapting the severity level to the web applications to protect. Our experiments show that ModSec-Learn achieves a significantly better trade-off between detection and false positive rates. Finally, we analyze how sparse regularization can reduce the number of rules that are relevant at inference time, by discarding more than 30% of the CRS rules. We release our open-source code and the dataset at https://github.com/pralab/modsec-learn and https://github.com/pralab/http-traffic-dataset, respectively.

Updated: 2024-06-19 13:32:47

标题: ModSec-Learn：使用机器学习提升ModSecurity

摘要: ModSecurity被广泛认为是由OWASP基金会维护的标准开源Web应用防火墙（WAF）。它通过将恶意请求与核心规则集（CRS）进行匹配，识别出众所周知的攻击模式来检测恶意请求。每条规则根据相应攻击的严重程度手动分配权重，如果匹配规则的权重之和超过给定阈值，则会阻止请求。然而，我们认为这种策略在防御Web攻击方面效果不佳，因为检测仅基于启发式算法，并未针对要保护的应用进行定制。在这项工作中，我们提出了一个利用CRS规则作为输入特征的机器学习模型来克服这个问题。通过训练，ModSec-Learn能够调整每个CRS规则对预测的贡献，从而适应要保护的Web应用的严重级别。我们的实验表明，ModSec-Learn在检测和误报率之间取得了明显更好的平衡。最后，我们分析了如何通过稀疏正则化来减少推断时相关规则的数量，通过舍弃超过30%的CRS规则。我们在https://github.com/pralab/modsec-learn 和https://github.com/pralab/http-traffic-dataset 分别发布了我们的开源代码和数据集。

更新时间: 2024-06-19 13:32:47

领域: cs.LG

下载: http://arxiv.org/abs/2406.13547v1

One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes

Recent studies have highlighted fairness issues in Graph Neural Networks (GNNs), where they produce discriminatory predictions against specific protected groups categorized by sensitive attributes such as race and age. While various efforts to enhance GNN fairness have made significant progress, these approaches are often tailored to specific sensitive attributes. Consequently, they necessitate retraining the model from scratch to accommodate changes in the sensitive attribute requirement, resulting in high computational costs. To gain deeper insights into this issue, we approach the graph fairness problem from a causal modeling perspective, where we identify the confounding effect induced by the sensitive attribute as the underlying reason. Motivated by this observation, we formulate the fairness problem in graphs from an invariant learning perspective, which aims to learn invariant representations across environments. Accordingly, we propose a graph fairness framework based on invariant learning, namely FairINV, which enables the training of fair GNNs to accommodate various sensitive attributes within a single training session. Specifically, FairINV incorporates sensitive attribute partition and trains fair GNNs by eliminating spurious correlations between the label and various sensitive attributes. Experimental results on several real-world datasets demonstrate that FairINV significantly outperforms state-of-the-art fairness approaches, underscoring its effectiveness. Our code is available via: https://github.com/ZzoomD/FairINV/.

Updated: 2024-06-19 13:30:17

标题: 一种适用于所有情境的方法：学习公平的图神经网络以处理不同敏感属性

摘要: 最近的研究强调了图神经网络（GNNs）中的公平性问题，这些网络会对特定受保护群体（如种族和年龄等敏感属性分类）产生歧视性预测。虽然各种努力提高GNN公平性取得了重大进展，但这些方法通常针对特定敏感属性进行定制。因此，它们需要从头开始重新训练模型以适应敏感属性要求的变化，导致高计算成本。为了更深入地了解这个问题，我们从因果建模的角度来解决图公平性问题，我们确定敏感属性引起的混淆效应是其根本原因。受到这一观察的启发，我们从不变学习的角度来制定图公平性问题，旨在学习跨环境的不变表示。因此，我们提出了一种基于不变学习的图公平性框架，即FairINV，该框架可以在单次训练中使公平GNNs适应各种敏感属性。具体而言，FairINV包括敏感属性分区，并通过消除标签和各种敏感属性之间的虚假相关性来训练公平GNNs。对几个真实数据集的实验结果表明，FairINV明显优于最先进的公平性方法，强调了其有效性。我们的代码可通过以下链接获得：https://github.com/ZzoomD/FairINV/。

更新时间: 2024-06-19 13:30:17

领域: cs.LG

下载: http://arxiv.org/abs/2406.13544v1

Towards Cyber Threat Intelligence for the IoT

With the proliferation of digitization and its usage in critical sectors, it is necessary to include information about the occurrence and assessment of cyber threats in an organization's threat mitigation strategy. This Cyber Threat Intelligence (CTI) is becoming increasingly important, or rather necessary, for critical national and industrial infrastructures. Current CTI solutions are rather federated and unsuitable for sharing threat information from low-power IoT devices. This paper presents a taxonomy and analysis of the CTI frameworks and CTI exchange platforms available today. It proposes a new CTI architecture relying on the MISP Threat Intelligence Sharing Platform customized and focusing on IoT environment. The paper also introduces a tailored version of STIX (which we call tinySTIX), one of the most prominent standards adopted for CTI data modeling, optimized for low-power IoT devices using the new lightweight encoding and cryptography solutions. The proposed CTI architecture will be very beneficial for securing IoT networks, especially the ones working in harsh and adversarial environments.

Updated: 2024-06-19 13:30:01

标题: 走向物联网的网络威胁情报

摘要: 随着数字化的普及及其在关键领域的使用，有必要在组织的威胁缓解策略中包含有关网络威胁的发生和评估信息。这种网络威胁情报（CTI）对于关键的国家和工业基础设施变得越来越重要，甚至可以说是必不可少的。当前的CTI解决方案相当分散，并不适合分享来自低功率物联网设备的威胁信息。本文提出了一种分类和分析当前可用的CTI框架和CTI交换平台。它提出了一种依赖于MISP威胁情报共享平台的新的CTI架构，该架构经过定制并专注于物联网环境。本文还介绍了STIX的定制版本（我们称之为tinySTIX），STIX是最著名的用于CTI数据建模的标准之一，通过使用新的轻量级编码和加密解决方案，针对低功率物联网设备进行了优化。提出的CTI架构将对保护物联网网络非常有益，特别是对那些在恶劣和对抗性环境中运作的网络。

更新时间: 2024-06-19 13:30:01

领域: cs.CR

下载: http://arxiv.org/abs/2406.13543v1

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

One core capability of large language models (LLMs) is to follow natural language instructions. However, the issue of automatically constructing high-quality training data to enhance the complex instruction-following abilities of LLMs without manual annotation remains unresolved. In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. AutoIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, the corresponding code to check the correctness of the instruction responses, and unit test samples to verify the code's correctness. Then, execution feedback-based rejection sampling can generate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) training. AutoIF achieves significant improvements across three training algorithms, SFT, Offline DPO, and Online DPO, when applied to the top open-source LLMs, Qwen2 and LLaMA3, in self-alignment and strong-to-weak distillation settings. Our code is publicly available at https://github.com/QwenLM/AutoIF.

Updated: 2024-06-19 13:29:53

标题: 自我对弈与执行反馈：提高大型语言模型的遵循指令能力

摘要: 大型语言模型（LLMs）的一个核心能力是遵循自然语言指令。然而，如何自动构建高质量的训练数据以增强LLMs的复杂指令遵循能力，而无需手动注释的问题仍未解决。在本文中，我们介绍了AutoIF，这是第一个可扩展且可靠的方法，用于自动生成指令遵循训练数据。AutoIF将指令遵循数据质量验证转化为代码验证，要求LLMs生成指令、相应的代码来检查指令响应的正确性，以及单元测试样本来验证代码的正确性。然后，基于执行反馈的拒绝抽样可以为受监督的微调（SFT）和从人类反馈中学习的强化学习（RLHF）训练生成数据。在应用于顶级开源LLMs Qwen2和LLaMA3时，AutoIF在自对齐和强到弱蒸馏设置中的三种训练算法SFT、离线DPO和在线DPO中取得了显著的改进。我们的代码可以在https://github.com/QwenLM/AutoIF 上公开获取。

更新时间: 2024-06-19 13:29:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13542v1

DefSent+: Improving sentence embeddings of language models by projecting definition sentences into a quasi-isotropic or isotropic vector space of unlimited dictionary entries

This paper presents a significant improvement on the previous conference paper known as DefSent. The prior study seeks to improve sentence embeddings of language models by projecting definition sentences into the vector space of dictionary entries. We discover that this approach is not fully explored due to the methodological limitation of using word embeddings of language models to represent dictionary entries. This leads to two hindrances. First, dictionary entries are constrained by the single-word vocabulary, and thus cannot be fully exploited. Second, semantic representations of language models are known to be anisotropic, but pre-processing word embeddings for DefSent is not allowed because its weight is frozen during training and tied to the prediction layer. In this paper, we propose a novel method to progressively build entry embeddings not subject to the limitations. As a result, definition sentences can be projected into a quasi-isotropic or isotropic vector space of unlimited dictionary entries, so that sentence embeddings of noticeably better quality are attainable. We abbreviate our approach as DefSent+ (a plus version of DefSent), involving the following strengths: 1) the task performance on measuring sentence similarities is significantly improved compared to DefSent; 2) when DefSent+ is used to further train data-augmented models like SIMCSE, SNCSE, and SynCSE, state-of-the-art performance on measuring sentence similarities can be achieved among the approaches without using manually labeled datasets; 3) DefSent+ is also competitive in feature-based transfer for NLP downstream tasks.

Updated: 2024-06-19 13:26:53

标题: DefSent+：通过将定义句子投影到具有无限词典条目的拟等向或等向量空间中，改进语言模型的句子嵌入

摘要: 本文介绍了一项对先前会议论文DefSent的显著改进。先前的研究旨在通过将定义句子投影到字典条目的向量空间中，从而改进语言模型的句子嵌入。我们发现，由于使用语言模型的词嵌入来表示字典条目的方法论限制，这种方法尚未得到充分探索。这导致了两个障碍。首先，字典条目受到单词词汇的限制，因此无法充分利用。其次，语言模型的语义表示被认为是各向异性的，但是对于DefSent的预处理词嵌入是不允许的，因为在训练过程中其权重被冻结并与预测层绑定。在本文中，我们提出了一种新颖的方法来逐步构建不受限制的条目嵌入。因此，定义句子可以投影到一个无限字典条目的准各向同性或各向同性向量空间中，从而实现更高质量的句子嵌入。我们将我们的方法简称为DefSent+（DefSent的增强版本），具有以下优势：1）在衡量句子相似性方面，与DefSent相比，任务性能显著提高；2）当DefSent+用于进一步训练数据增强模型如SIMCSE、SNCSE和SynCSE时，可以在不使用手动标记数据集的情况下实现衡量句子相似性的最新性能；3）DefSent+在基于特征的NLP下游任务中也具有竞争力。

更新时间: 2024-06-19 13:26:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16153v3

DRACO: Decentralized Asynchronous Federated Learning over Continuous Row-Stochastic Network Matrices

Recent developments and emerging use cases, such as smart Internet of Things (IoT) and Edge AI, have sparked considerable interest in the training of neural networks over fully decentralized (serverless) networks. One of the major challenges of decentralized learning is to ensure stable convergence without resorting to strong assumptions applied for each agent regarding data distributions or updating policies. To address these issues, we propose DRACO, a novel method for decentralized asynchronous Stochastic Gradient Descent (SGD) over row-stochastic gossip wireless networks by leveraging continuous communication. Our approach enables edge devices within decentralized networks to perform local training and model exchanging along a continuous timeline, thereby eliminating the necessity for synchronized timing. The algorithm also features a specific technique of decoupling communication and computation schedules, which empowers complete autonomy for all users and manageable instructions for stragglers. Through a comprehensive convergence analysis, we highlight the advantages of asynchronous and autonomous participation in decentralized optimization. Our numerical experiments corroborate the efficacy of the proposed technique.

Updated: 2024-06-19 13:17:28

标题: DRACO：在连续的行随机网络矩阵上的分散式异步联邦学习

摘要: 最近的发展和新兴用例，如智能物联网(IoT)和边缘人工智能，引发了人们对在完全去中心化（无服务器）网络上训练神经网络的浓厚兴趣。去中心化学习的主要挑战之一是确保稳定的收敛，而不需要对每个代理人的数据分布或更新策略应用强假设。为了解决这些问题，我们提出了DRACO，一种通过连续通信利用行随机传闻无线网络进行去中心化异步随机梯度下降(SGD)的新方法。我们的方法使得去中心化网络中的边缘设备能够沿着一个连续的时间线进行本地训练和模型交换，从而消除了对同步计时的必要性。该算法还具有一种将通信和计算时间表解耦的特定技术，这为所有用户提供了完全自治和可管理的指令，以应对慢速参与者。通过全面的收敛分析，我们强调了异步和自主参与去中心化优化的优势。我们的数值实验证实了所提出技术的有效性。

更新时间: 2024-06-19 13:17:28

领域: cs.LG,cs.IT,cs.NI,math.IT

下载: http://arxiv.org/abs/2406.13533v1

MSynFD: Multi-hop Syntax aware Fake News Detection

The proliferation of social media platforms has fueled the rapid dissemination of fake news, posing threats to our real-life society. Existing methods use multimodal data or contextual information to enhance the detection of fake news by analyzing news content and/or its social context. However, these methods often overlook essential textual news content (articles) and heavily rely on sequential modeling and global attention to extract semantic information. These existing methods fail to handle the complex, subtle twists in news articles, such as syntax-semantics mismatches and prior biases, leading to lower performance and potential failure when modalities or social context are missing. To bridge these significant gaps, we propose a novel multi-hop syntax aware fake news detection (MSynFD) method, which incorporates complementary syntax information to deal with subtle twists in fake news. Specifically, we introduce a syntactical dependency graph and design a multi-hop subgraph aggregation mechanism to capture multi-hop syntax. It extends the effect of word perception, leading to effective noise filtering and adjacent relation enhancement. Subsequently, a sequential relative position-aware Transformer is designed to capture the sequential information, together with an elaborate keyword debiasing module to mitigate the prior bias. Extensive experimental results on two public benchmark datasets verify the effectiveness and superior performance of our proposed MSynFD over state-of-the-art detection models.

Updated: 2024-06-19 13:15:34

标题: MSynFD: 多跳句法感知假新闻检测

摘要: 社交媒体平台的蓬勃发展推动了虚假新闻的快速传播，对我们的现实社会构成了威胁。现有方法利用多模态数据或上下文信息来增强对虚假新闻的检测，通过分析新闻内容和/或其社交背景。然而，这些方法经常忽视了新闻文章的基本文本内容，并且过度依赖顺序建模和全局注意力来提取语义信息。这些现有方法无法处理新闻文章中的复杂、微妙的转折，如句法-语义不匹配和先前的偏见，导致性能较低，当模态或社交背景缺失时可能会失败。为了弥补这些重大差距，我们提出了一种新颖的多跳句法感知虚假新闻检测（MSynFD）方法，该方法融入了补充的句法信息来处理虚假新闻中的微妙转折。具体来说，我们引入了一个句法依赖图，并设计了一个多跳子图聚合机制来捕捉多跳句法。它延伸了词感知的效果，实现了有效的噪声过滤和相邻关系增强。随后，设计了一个顺序相对位置感知变压器来捕捉顺序信息，以及一个精心设计的关键词去偏模块来减轻先前的偏见。在两个公开基准数据集上进行的广泛实验结果验证了我们提出的MSynFD方法相对于最先进的检测模型的有效性和优越性能。

更新时间: 2024-06-19 13:15:34

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2402.14834v2

Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still missed for precision time synchronization. In this paper, a secure combination algorithm based on Dempster-Shafer theory is proposed for multiple paths method. Special optimizations are done for the combination algorithm to solve the potential problems due to untrusted evidence. Theoretical simulation shows that the proposed algorithm works much better than Fault Tolerant Algorithm (FTA) and the attack detection method based on single path. And experimental demonstration proves the feasibility and superiority of the proposed algorithm, where the time stability with 27.97 ps, 1.57 ps, and 1.12 ps at average time 1s, 10s, 100s is achieved under TDA and local clock jump. The proposed algorithm can be used to improve the security and resilience of many importance synchronization protocol, such as NTP, PTP, and TWFTT.

Updated: 2024-06-19 13:15:12

标题: 基于优化Dempster-Shafer理论的不可信时间信息安全组合

摘要: 精准的安全时间同步对于物理网络系统的应用至关重要。然而，多种攻击，尤其是时间延迟攻击（TDA），严重影响了时间同步系统的性能。多路径方案被认为是减少TDA影响的有效安全对策。然而，针对精准时间同步仍缺乏有效的安全组合算法。本文提出了基于Dempster-Shafer理论的安全组合算法，用于多路径方法。对组合算法进行了特殊优化，以解决由于不可信证据可能引起的潜在问题。理论模拟表明，所提出的算法比容错算法（FTA）和基于单路径的攻击检测方法要好得多。实验演示证明了所提出算法的可行性和优越性，其中在TDA和本地时钟跳跃下，平均时间1秒、10秒和100秒时的时间稳定性分别为27.97ps、1.57ps和1.12ps。所提出的算法可用于提高许多重要同步协议的安全性和韧性，如NTP、PTP和TWFTT。

更新时间: 2024-06-19 13:15:12

领域: cs.CR

下载: http://arxiv.org/abs/2406.15501v1

A Tree-of-Thoughts to Broaden Multi-step Reasoning across Languages

Reasoning methods, best exemplified by the well-known Chain-of-Thought (CoT), empower the reasoning abilities of Large Language Models (LLMs) by eliciting them to solve complex tasks in a step-by-step manner. Although they are achieving significant success, the ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data, which makes other languages a barrier. In this paper, we propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages. The proposed method, through a self-consistent cross-lingual prompting mechanism inspired by the Tree-of-Thoughts approach, provides multi-step reasoning paths in different languages that, during the steps, lead to the final solution. Experimental evaluations show that our method significantly outperforms existing prompting methods by reducing the number of interactions and achieving state-of-the-art performance.

Updated: 2024-06-19 13:07:54

标题: 一个思维树：拓展跨语言多步推理

摘要: 推理方法，最好的例子是著名的Chain-of-Thought（CoT），通过引导它们以逐步方式解决复杂任务来增强大型语言模型（LLMs）的推理能力。尽管它们取得了显著的成功，但由于预训练数据分布的不均衡，使得其他语言成为一道障碍，交付多步推理的能力仍然局限于英语。在本文中，我们提出了跨语言Tree-of-Thoughts（Cross-ToT），一种用于对齐跨语言CoT推理的方法。所提出的方法通过受到Tree-of-Thoughts方法启发的自洽跨语言提示机制，在不同语言中提供多步推理路径，这些路径在步骤中导致最终解决方案。实验评估表明，我们的方法通过减少交互次数并实现最先进的性能，显著优于现有的提示方法。

更新时间: 2024-06-19 13:07:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.08097v3

Scalable unsupervised alignment of general metric and non-metric structures

Aligning data from different domains is a fundamental problem in machine learning with broad applications across very different areas, most notably aligning experimental readouts in single-cell multiomics. Mathematically, this problem can be formulated as the minimization of disagreement of pair-wise quantities such as distances and is related to the Gromov-Hausdorff and Gromov-Wasserstein distances. Computationally, it is a quadratic assignment problem (QAP) that is known to be NP-hard. Prior works attempted to solve the QAP directly with entropic or low-rank regularization on the permutation, which is computationally tractable only for modestly-sized inputs, and encode only limited inductive bias related to the domains being aligned. We consider the alignment of metric structures formulated as a discrete Gromov-Wasserstein problem and instead of solving the QAP directly, we propose to learn a related well-scalable linear assignment problem (LAP) whose solution is also a minimizer of the QAP. We also show a flexible extension of the proposed framework to general non-metric dissimilarities through differentiable ranks. We extensively evaluate our approach on synthetic and real datasets from single-cell multiomics and neural latent spaces, achieving state-of-the-art performance while being conceptually and computationally simple.

Updated: 2024-06-19 12:54:03

标题: 可扩展的无监督对齐通用度量和非度量结构

摘要: 将来自不同领域的数据进行对齐是机器学习中的一个基本问题，具有广泛的应用，尤其是在单细胞多组学中对实验结果进行对齐。从数学上讲，这个问题可以被表述为最小化一对一量的不一致性，如距离，并与Gromov-Hausdorff和Gromov-Wasserstein距离相关。在计算上，这是一个已知为NP难的二次分配问题（QAP）。先前的工作尝试直接通过对排列进行熵或低秩正则化来解决QAP，但这仅适用于规模适中的输入，且只能编码与对齐领域相关的有限归纳偏差。我们考虑将度量结构的对齐形式化为一个离散的Gromov-Wasserstein问题，而不是直接解决QAP，我们提议学习一个相关的可扩展的线性分配问题（LAP），其解也是QAP的最小化器。我们还展示了所提出框架对一般非度量差异的灵活扩展，通过可微分秩。我们在来自单细胞多组学和神经潜在空间的合成和真实数据集上广泛评估我们的方法，取得了最先进的性能，同时概念上和计算上简单。

更新时间: 2024-06-19 12:54:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.13507v1

ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems

Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core challenges. Firstly, variant experimental setups across research papers often yield unfair comparisons, obscuring practical insights. Secondly, the existing literature's lack of detailed analysis on selection attributes, based on large-scale datasets and a thorough comparison among selection techniques and DRS backbones, restricts the generalizability of findings and impedes deployment on DRS. Lastly, research often focuses on comparing the peak performance achievable by feature selection methods, an approach that is typically computationally infeasible for identifying the optimal hyperparameters and overlooks evaluating the robustness and stability of these methods. To bridge these gaps, this paper presents ERASE, a comprehensive bEnchmaRk for feAture SElection for DRS. ERASE comprises a thorough evaluation of eleven feature selection methods, covering both traditional and deep learning approaches, across four public datasets, private industrial datasets, and a real-world commercial platform, achieving significant enhancement. Our code is available online for ease of reproduction.

Updated: 2024-06-19 12:48:25

标题: ERASE：深度推荐系统中特征选择方法的基准测试

摘要: 深度推荐系统（DRS）越来越依赖大量特征字段以获得更精确的推荐。因此，有效的特征选择方法变得至关重要，以进一步提高准确性并优化存储效率以满足部署需求。特别是在DRS的背景下，这一研究领域尚处于起步阶段，面临三个核心挑战。首先，研究论文中不同的实验设置经常导致不公平的比较，遮蔽了实用的见解。其次，现有文献缺乏基于大规模数据集的详细分析，以及在选择技术和DRS基础架构之间进行全面比较，限制了发现的普适性并阻碍了在DRS上的部署。最后，研究往往专注于比较特征选择方法可实现的最佳性能，这种方法通常在识别最佳超参数方面是计算上不可行的，并且忽略了评估这些方法的稳健性和稳定性。为填补这些差距，本文提出了ERASE，一个针对DRS的特征选择的全面基准。ERASE包括对十一种特征选择方法的彻底评估，涵盖传统和深度学习方法，覆盖四个公共数据集、私人工业数据集和一个真实世界的商业平台，实现了显著提升。我们的代码可在线获取以方便再现。

更新时间: 2024-06-19 12:48:25

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2403.12660v3

GraphMU: Repairing Robustness of Graph Neural Networks via Machine Unlearning

Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need for a method to repair poisoned GNN. In this paper, we address this gap by introducing the novel concept of model repair for GNNs. We propose a repair framework, Repairing Robustness of Graph Neural Networks via Machine Unlearning (GraphMU), which aims to fine-tune poisoned GNN to forget adversarial samples without the need for complete retraining. We also introduce a unlearning validation method to ensure that our approach effectively forget specified poisoned data. To evaluate the effectiveness of GraphMU, we explore three fine-tuned subgraph construction scenarios based on the available perturbation information: (i) Known Perturbation Ratios, (ii) Known Complete Knowledge of Perturbations, and (iii) Unknown any Knowledge of Perturbations. Our extensive experiments, conducted across four citation datasets and four adversarial attack scenarios, demonstrate that GraphMU can effectively restore the performance of poisoned GNN.

Updated: 2024-06-19 12:41:15

标题: GraphMU：通过机器遗忘修复图神经网络的稳健性

摘要: 图神经网络（GNNs）在各个领域展示出了显著的应用潜力。然而，GNNs仍然容易受到对抗攻击。已经提出了许多针对GNNs的对抗防御方法来解决对抗攻击的问题。然而，这些方法只能作为防御手段，但无法修复被毒害的GNN。因此，亟需一种修复被毒害的GNN的方法。在本文中，我们通过引入GNNs的模型修复这一新概念来填补这一空白。我们提出了一个修复框架，即通过机器遗忘修复图神经网络的鲁棒性（GraphMU），旨在微调被毒害的GNN以忘记对抗性样本，而无需完全重新训练。我们还引入了一种遗忘验证方法，以确保我们的方法有效地忘记指定的毒害数据。为了评估GraphMU的有效性，我们基于可用的扰动信息探讨了三种微调子图构建情景：（i）已知扰动比例，（ii）已知扰动的完全知识，（iii）未知任何扰动知识。我们在四个引文数据集和四个对抗攻击场景下进行了大量实验，结果表明GraphMU可以有效恢复被毒害的GNN的性能。

更新时间: 2024-06-19 12:41:15

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2406.13499v1

Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

We introduce Mathador-LM, a new benchmark for evaluating the mathematical reasoning on large language models (LLMs), combining ruleset interpretation, planning, and problem-solving. This benchmark is inspired by the Mathador game, where the objective is to reach a target number using basic arithmetic operations on a given set of base numbers, following a simple set of rules. We show that, across leading LLMs, we obtain stable average performance while generating benchmark instances dynamically, following a target difficulty level. Thus, our benchmark alleviates concerns about test-set leakage into training data, an issue that often undermines popular benchmarks. Additionally, we conduct a comprehensive evaluation of both open and closed-source state-of-the-art LLMs on Mathador-LM. Our findings reveal that contemporary models struggle with Mathador-LM, scoring significantly lower than average 3rd graders. This stands in stark contrast to their strong performance on popular mathematical reasoning benchmarks.

Updated: 2024-06-19 12:28:10

标题: Mathador-LM：一个用于大型语言模型数学推理的动态基准

摘要: 我们介绍了Mathador-LM，这是一个用于评估大型语言模型（LLMs）数学推理能力的新基准，结合了规则集解释、规划和问题解决。这个基准受到了Mathador游戏的启发，游戏的目标是使用给定的一组基本数字进行基本算术运算，按照一组简单的规则达到一个目标数字。我们展示了，在领先的LLMs中，我们在动态生成基准实例的同时获得稳定的平均性能，遵循一个目标难度级别。因此，我们的基准减轻了关于测试集泄漏到训练数据的担忧，这是经常破坏流行基准的问题。此外，我们在Mathador-LM上对开源和闭源最先进的LLMs进行了全面评估。我们的发现显示，当代模型在Mathador-LM上遇到困难，得分明显低于平均第三年级学生。这与它们在流行的数学推理基准上的强劲表现形成鲜明对比。

更新时间: 2024-06-19 12:28:10

领域: cs.CL,cs.AI,cs.LG,I.2.7

下载: http://arxiv.org/abs/2406.12572v2

In-Context In-Context Learning with Transformer Neural Processes

Neural processes (NPs) are a powerful family of meta-learning models that seek to approximate the posterior predictive map of the ground-truth stochastic process from which each dataset in a meta-dataset is sampled. There are many cases in which practitioners, besides having access to the dataset of interest, may also have access to other datasets that share similarities with it. In this case, integrating these datasets into the NP can improve predictions. We equip NPs with this functionality and describe this paradigm as in-context in-context learning. Standard NP architectures, such as the convolutional conditional NP (ConvCNP) or the family of transformer neural processes (TNPs), are not capable of in-context in-context learning, as they are only able to condition on a single dataset. We address this shortcoming by developing the in-context in-context learning pseudo-token TNP (ICICL-TNP). The ICICL-TNP builds on the family of PT-TNPs, which utilise pseudo-token-based transformer architectures to sidestep the quadratic computational complexity associated with regular transformer architectures. Importantly, the ICICL-TNP is capable of conditioning on both sets of datapoints and sets of datasets, enabling it to perform in-context in-context learning. We demonstrate the importance of in-context in-context learning and the effectiveness of the ICICL-TNP in a number of experiments.

Updated: 2024-06-19 12:26:36

标题: 上下文学习与Transformer神经过程

摘要: 神经过程（NPs）是一类强大的元学习模型，旨在逼近每个元数据集中抽样的地面真实随机过程的后验预测映射。在许多情况下，从业者除了可以访问感兴趣的数据集外，还可以访问与之相似的其他数据集。在这种情况下，将这些数据集整合到NP中可以改善预测。我们为NPs配备了这种功能，并将此范式描述为上下文中的上下文学习。标准NP架构，如卷积条件NP（ConvCNP）或变压器神经过程（TNPs）系列，并不具备上下文中的上下文学习能力，因为它们只能在单个数据集上进行条件化。我们通过开发上下文中的上下文学习伪令牌TNP（ICICL-TNP）来解决这个缺点。ICICL-TNP基于PT-TNPs系列，利用基于伪令牌的变压器架构来规避与常规变压器架构相关的二次计算复杂性。重要的是，ICICL-TNP能够在数据点集和数据集集合上进行条件化，使其能够进行上下文中的上下文学习。我们通过一系列实验展示了上下文中的上下文学习的重要性以及ICICL-TNP的有效性。

更新时间: 2024-06-19 12:26:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.13493v1

On the Convergence of Federated Learning Algorithms without Data Similarity

Data similarity assumptions have traditionally been relied upon to understand the convergence behaviors of federated learning methods. Unfortunately, this approach often demands fine-tuning step sizes based on the level of data similarity. When data similarity is low, these small step sizes result in an unacceptably slow convergence speed for federated methods. In this paper, we present a novel and unified framework for analyzing the convergence of federated learning algorithms without the need for data similarity conditions. Our analysis centers on an inequality that captures the influence of step sizes on algorithmic convergence performance. By applying our theorems to well-known federated algorithms, we derive precise expressions for three widely used step size schedules: fixed, diminishing, and step-decay step sizes, which are independent of data similarity conditions. Finally, we conduct comprehensive evaluations of the performance of these federated learning algorithms, employing the proposed step size strategies to train deep neural network models on benchmark datasets under varying data similarity conditions. Our findings demonstrate significant improvements in convergence speed and overall performance, marking a substantial advancement in federated learning research.

Updated: 2024-06-19 12:21:15

标题: 关于没有数据相似性的联邦学习算法的收敛性

摘要: 数据相似性假设传统上被用来理解联邦学习方法的收敛行为。不幸的是，这种方法通常需要根据数据相似性的水平来微调步长。当数据相似性较低时，这些小步长会导致联邦方法的收敛速度过慢。在本文中，我们提出了一个新颖且统一的框架，用于分析联邦学习算法的收敛性，而无需数据相似性条件。我们的分析集中在一个不等式上，该不等式捕捉了步长对算法收敛性能的影响。通过将我们的定理应用于众所周知的联邦算法，我们得出了三种广泛使用的步长调度的精确表达式：固定步长、递减步长和步长衰减步长，这些表达式独立于数据相似性条件。最后，我们对这些联邦学习算法的性能进行了全面评估，使用提出的步长策略在不同数据相似性条件下训练深度神经网络模型。我们的研究结果显示，在收敛速度和整体性能方面取得了显著改善，标志着联邦学习研究的实质性进步。

更新时间: 2024-06-19 12:21:15

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2403.02347v2

The Surprising Benefits of Base Rate Neglect in Robust Aggregation

Robust aggregation integrates predictions from multiple experts without knowledge of the experts' information structures. Prior work assumes experts are Bayesian, providing predictions as perfect posteriors based on their signals. However, real-world experts often deviate systematically from Bayesian reasoning. Our work considers experts who tend to ignore the base rate. We find that a certain degree of base rate neglect helps with robust forecast aggregation. Specifically, we consider a forecast aggregation problem with two experts who each predict a binary world state after observing private signals. Unlike previous work, we model experts exhibiting base rate neglect, where they incorporate the base rate information to degree $\lambda\in[0,1]$, with $\lambda=0$ indicating complete ignorance and $\lambda=1$ perfect Bayesian updating. To evaluate aggregators' performance, we adopt Arieli et al. (2018)'s worst-case regret model, which measures the maximum regret across the set of considered information structures compared to an omniscient benchmark. Our results reveal the surprising V-shape of regret as a function of $\lambda$. That is, predictions with an intermediate incorporating degree of base rate $\lambda<1$ can counter-intuitively lead to lower regret than perfect Bayesian posteriors with $\lambda=1$. We additionally propose a new aggregator with low regret robust to unknown $\lambda$. Finally, we conduct an empirical study to test the base rate neglect model and evaluate the performance of various aggregators.

Updated: 2024-06-19 12:20:29

标题: 稳健聚合中基础利率忽视的惊人益处

摘要: 强大的聚合方法能够整合多个专家的预测，而不需要了解专家的信息结构。先前的研究假设专家是贝叶斯的，根据他们的信号提供完美的后验预测。然而，现实世界中的专家经常系统性地偏离贝叶斯推理。我们的研究考虑了倾向于忽视基础比率的专家。我们发现在某种程度上忽视基础比率有助于强大的预测聚合。具体而言，我们考虑了一个包含两个专家的预测聚合问题，他们在观察私有信号后各自预测一个二元世界状态。与先前的研究不同，我们模拟了专家展示基础比率忽视的情况，他们将基础比率信息整合到程度为λ∈[0,1]的预测中，其中λ=0表示完全无视，λ=1表示完美的贝叶斯更新。为了评估聚合器的性能，我们采用了Arieli等人(2018)的最坏情况遗憾模型，该模型衡量了相对于全知基准考虑的信息结构集合中的最大遗憾。我们的结果揭示了遗憾作为λ的函数的惊人V形状。也就是说，具有中间程度的基础比率整合λ<1的预测可能导致比λ=1的完美贝叶斯后验预测更低的遗憾，这反直觉。我们此外提出了一个对未知λ具有低遗憾的新聚合器。最后，我们进行了一个实证研究，测试基础比率忽视模型，并评估各种聚合器的性能。

更新时间: 2024-06-19 12:20:29

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2406.13490v1

Approximately Equivariant Neural Processes

Equivariant deep learning architectures exploit symmetries in learning problems to improve the sample efficiency of neural-network-based models and their ability to generalise. However, when modelling real-world data, learning problems are often not exactly equivariant, but only approximately. For example, when estimating the global temperature field from weather station observations, local topographical features like mountains break translation equivariance. In these scenarios, it is desirable to construct architectures that can flexibly depart from exact equivariance in a data-driven way. In this paper, we develop a general approach to achieving this using existing equivariant architectures. Our approach is agnostic to both the choice of symmetry group and model architecture, making it widely applicable. We consider the use of approximately equivariant architectures in neural processes (NPs), a popular family of meta-learning models. We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments, demonstrating that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts.

Updated: 2024-06-19 12:17:14

标题: 近似等变神经过程

摘要: 等变深度学习架构利用学习问题中的对称性来提高基于神经网络模型的样本效率和泛化能力。然而，在对建模真实世界数据时，学习问题通常并非完全等变，而只是近似等变。例如，从气象站观测估算全球温度场时，像山脉这样的局部地形特征会破坏平移等变性。在这些场景中，构建能够灵活地从精确等变性中脱离的架构是可取的。在本文中，我们提出了一种利用现有等变架构实现这一目标的通用方法。我们的方法不受对称群和模型架构选择的限制，因此具有广泛适用性。我们考虑在神经过程（NPs）中使用近似等变架构，这是一种流行的元学习模型系列。我们在许多合成和真实世界回归实验中展示了我们方法的有效性，证明了近似等变NP模型可以胜过非等变和严格等变的对应模型。

更新时间: 2024-06-19 12:17:14

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.13488v1

Multi-objective Differentiable Neural Architecture Search

Pareto front profiling in multi-objective optimization (MOO), i.e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training. Typically, in MOO neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a computationally expensive search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics, and yields representative and diverse architectures across multiple devices in just one search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that can be conditioned on hardware features and preference vectors, enabling zero-shot transferability to new devices. Extensive experiments with up to 19 hardware devices and 3 objectives showcase the effectiveness and scalability of our method. Finally, we show that, without extra costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including MobileNetV3 on ImageNet-1k, an encoder-decoder transformer space for machine translation and a decoder-only transformer space for language modelling.

Updated: 2024-06-19 12:15:20

标题: 多目标可微神经结构搜索

摘要: 多目标优化（MOO）中的帕紗图前景剖析，即寻找一组多样化的帕紗图最优解是具有挑战性的，特别是对于像神经网络训练这样昂贵的目标。通常，在MOO神经架构搜索（NAS）中，我们旨在在各种设备上平衡性能和硬件指标。先前的NAS方法通过将硬件约束纳入目标函数来简化这项任务，但剖析帕紗图前景需要为每个约束进行计算昂贵的搜索。在这项工作中，我们提出了一种新颖的NAS算法，它编码了用户对性能和硬件指标之间权衡的偏好，并在一次搜索运行中生成了代表性和多样化的架构跨越多个设备。为此，我们通过一个超网络对跨设备和多个目标的联合架构分布进行参数化，该超网络可以根据硬件特征和偏好向量进行条件化，从而实现对新设备的零样本可迁移性。对最多19个硬件设备和3个目标进行的大量实验展示了我们方法的有效性和可扩展性。最后，我们展示，在没有额外成本的情况下，我们的方法在广泛的不同搜索空间和数据集上优于现有的MOO NAS方法，包括在ImageNet-1k上的MobileNetV3，用于机器翻译的编码器-解码器变压器空间以及用于语言建模的仅解码器变压器空间。

更新时间: 2024-06-19 12:15:20

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2402.18213v2

An evidential time-to-event prediction model based on Gaussian random fuzzy numbers

We introduce an evidential model for time-to-event prediction with censored data. In this model, uncertainty on event time is quantified by Gaussian random fuzzy numbers, a newly introduced family of random fuzzy subsets of the real line with associated belief functions, generalizing both Gaussian random variables and Gaussian possibility distributions. Our approach makes minimal assumptions about the underlying time-to-event distribution. The model is fit by minimizing a generalized negative log-likelihood function that accounts for both normal and censored data. Comparative experiments on two real-world datasets demonstrate the very good performance of our model as compared to the state-of-the-art.

Updated: 2024-06-19 12:14:45

标题: 基于高斯随机模糊数的证据时间事件预测模型

摘要: 我们引入了一种用于具有截尾数据的时间事件预测的证据模型。在这个模型中，事件时间的不确定性通过高斯随机模糊数来量化，这是一种新引入的随机模糊子集家族，与相关的信任函数一起，将高斯随机变量和高斯可能性分布进行了泛化。我们的方法对于基础时间事件分布做了最小的假设。该模型是通过最小化考虑正常和截尾数据的广义负对数似然函数来拟合的。对两个真实世界数据集的比较实验证明了我们模型相对于现有技术的非常好的性能。

更新时间: 2024-06-19 12:14:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.13487v1

Mean-Variance Portfolio Selection in Long-Term Investments with Unknown Distribution: Online Estimation, Risk Aversion under Ambiguity, and Universality of Algorithms

The standard approach for constructing a Mean-Variance portfolio involves estimating parameters for the model using collected samples. However, since the distribution of future data may not resemble that of the training set, the out-of-sample performance of the estimated portfolio is worse than one derived with true parameters, which has prompted several innovations for better estimation. Instead of treating the data without a timing aspect as in the common training-backtest approach, this paper adopts a perspective where data gradually and continuously reveal over time. The original model is recast into an online learning framework, which is free from any statistical assumptions, to propose a dynamic strategy of sequential portfolios such that its empirical utility, Sharpe ratio, and growth rate asymptotically achieve those of the true portfolio, derived with perfect knowledge of the future data. When the distribution of future data has a normal shape, the growth rate of wealth is shown to increase by lifting the portfolio along the efficient frontier through the calibration of risk aversion. Since risk aversion cannot be appropriately predetermined, another proposed algorithm updating this coefficient over time forms a dynamic strategy approaching the optimal empirical Sharpe ratio or growth rate associated with the true coefficient. The performance of these proposed strategies is universally guaranteed under specific stochastic markets. Furthermore, in stationary and ergodic markets, the so-called Bayesian strategy utilizing true conditional distributions, based on observed past market information during investment, almost surely does not perform better than the proposed strategies in terms of empirical utility, Sharpe ratio, or growth rate, which, in contrast, do not rely on conditional distributions.

Updated: 2024-06-19 12:11:42

标题: 长期投资中的均值-方差组合选择：未知分布情况下的在线估计、不确定性下的风险厌恶和算法的普适性

摘要: 构建均值-方差组合投资组合的标准方法涉及使用收集的样本估计模型的参数。然而，由于未来数据的分布可能不类似于训练集的分布，估计投资组合的样本外表现比使用真实参数导出的投资组合更差，这促使提出了几种更好的估计方法。本文不像常见的训练-回测方法那样将数据视为没有时间特性，而是采用一种逐渐随时间连续揭示数据的视角。原始模型被重新构建为一个无需任何统计假设的在线学习框架，以提出一种顺序投资组合的动态策略，使其经验效用、夏普比率和增长率渐近地达到使用对未来数据有完美知识的真实投资组合的水平。当未来数据的分布呈正态形状时，通过风险厌恶的校准来将投资组合沿着有效前沿提升，财富增长率被证明会增加。由于无法适当地预先确定风险厌恶，另一个提出的算法随时间更新这个系数，形成一种接近于真实系数关联的最优经验夏普比率或增长率的动态策略。这些提出的策略在特定的随机市场下具有普遍保证。此外，在稳态和遍历市场中，所谓利用真实条件分布的贝叶斯策略，在投资过程中基于观察到的过去市场信息，几乎肯定不会在经验效用、夏普比率或增长率方面比提出的策略表现更好，而后者不依赖于条件分布。

更新时间: 2024-06-19 12:11:42

领域: q-fin.MF,cs.LG,math.PR,q-fin.PM

下载: http://arxiv.org/abs/2406.13486v1

M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation

One of the most critical aspects of multimodal Reinforcement Learning (RL) is the effective integration of different observation modalities. Having robust and accurate representations derived from these modalities is key to enhancing the robustness and sample efficiency of RL algorithms. However, learning representations in RL settings for visuotactile data poses significant challenges, particularly due to the high dimensionality of the data and the complexity involved in correlating visual and tactile inputs with the dynamic environment and task objectives. To address these challenges, we propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL). Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms. Our method is agnostic to the RL algorithm, thus enabling its integration with any available RL algorithm. We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks. This is evidenced by faster convergence rates and higher cumulative rewards per episode, compared to standard RL algorithms without our representation learning approach.

Updated: 2024-06-19 12:05:41

标题: M2CURL：通过自监督表征学习实现机器人操作的高效多模态强化学习

摘要: 多模态强化学习（RL）最关键的一个方面是有效整合不同的观测模态。从这些模态中获得稳健和准确的表示对增强RL算法的稳健性和样本效率至关重要。然而，在视触觉数据的RL环境中学习表示面临着重大挑战，主要是由于数据的高维度和将视触觉输入与动态环境和任务目标相关联所涉及的复杂性。为了解决这些挑战，我们提出了多模态对比无监督强化学习（M2CURL）。我们的方法采用了一种新颖的多模态自监督学习技术，学习高效的表示并有助于RL算法的更快收敛。我们的方法对RL算法是不可知的，因此可以与任何可用的RL算法集成。我们在触觉Gym 2模拟器上评估了M2CURL，并展示了它在不同操纵任务中显著提高了学习效率。与标准RL算法相比，我们的表示学习方法使得收敛速度更快，每一集的累积奖励更高。

更新时间: 2024-06-19 12:05:41

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.17032v2

Attention-aware Post-training Quantization without Backpropagation

Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated via recently proposed backpropagation-free PTQ methods; however, their performance is somewhat limited by their lack of consideration of inter-layer dependencies. In this paper, we thus propose a novel PTQ algorithm that considers inter-layer dependencies without relying on backpropagation. The fundamental concept involved is the development of attention-aware Hessian matrices, which facilitates the consideration of inter-layer dependencies within the attention module. Extensive experiments demonstrate that the proposed algorithm significantly outperforms conventional PTQ methods, particularly for low bit-widths.

Updated: 2024-06-19 11:53:21

标题: 无需反向传播的注意力感知后训练量化

摘要: 量化是在资源受限设备上部署大规模语言模型（LLMs）的一种有前途的解决方案。然而，现有的量化方法依赖于基于梯度的优化，无论是后训练量化（PTQ）还是量化感知训练（QAT），这对具有数十亿参数的超大规模LLMs来说是有问题的。最近提出了一种无反向传播的后训练量化方法，可以减轻这种开销；然而，它们的性能在一定程度上受到它们缺乏对层间依赖性考虑的限制。因此，在本文中，我们提出了一种新颖的后训练量化算法，它考虑了层间依赖性，而不依赖于反向传播。涉及的基本概念是开发注意力感知的Hessian矩阵，这有助于在注意力模块内考虑层间依赖性。大量实验证明，所提出的算法在低比特宽度情况下明显优于传统的后训练量化方法。

更新时间: 2024-06-19 11:53:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13474v1

Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks

This paper explores the performance of encoder and decoder language models on multilingual Natural Language Understanding (NLU) tasks, with a broad focus on Germanic languages. Building upon the ScandEval benchmark, which initially was restricted to evaluating encoder models, we extend the evaluation framework to include decoder models. We introduce a method for evaluating decoder models on NLU tasks and apply it to the languages Danish, Swedish, Norwegian, Icelandic, Faroese, German, Dutch, and English. Through a series of experiments and analyses, we address key research questions regarding the comparative performance of encoder and decoder models, the impact of NLU task types, and the variation across language resources. Our findings reveal that decoder models can achieve significantly better NLU performance than encoder models, with nuances observed across different tasks and languages. Additionally, we investigate the correlation between decoders and task performance via a UMAP analysis, shedding light on the unique capabilities of decoder and encoder models. This study contributes to a deeper understanding of language model paradigms in NLU tasks and provides valuable insights for model selection and evaluation in multilingual settings.

Updated: 2024-06-19 11:50:09

标题: 编码器 vs 解码器：对多语言NLU任务上编码器和解码器语言模型的比较分析

摘要: 本文探讨了编码器和解码器语言模型在多语言自然语言理解（NLU）任务上的表现，重点关注日耳曼语系语言。在ScandEval基准的基础上，该基准最初仅限于评估编码器模型，我们扩展了评估框架以包括解码器模型。我们引入了一种评估解码器模型在NLU任务上的方法，并将其应用于丹麦语、瑞典语、挪威语、冰岛语、法罗语、德语、荷兰语和英语。通过一系列实验和分析，我们探讨了关于编码器和解码器模型的比较性能、NLU任务类型的影响以及语言资源之间的变化的关键研究问题。我们的研究结果显示，解码器模型可以比编码器模型实现更好的NLU性能，同时观察到在不同任务和语言之间存在细微差异。此外，我们通过UMAP分析调查了解码器与任务表现之间的相关性，揭示了解码器和编码器模型独特能力。这项研究有助于更深入地理解NLU任务中的语言模型范式，并为多语言环境中的模型选择和评估提供了宝贵的见解。

更新时间: 2024-06-19 11:50:09

领域: cs.CL,cs.AI,cs.LG,I.2.7

下载: http://arxiv.org/abs/2406.13469v1

Informatics & dairy industry coalition: AI trends and present challenges

Artificial Intelligence (AI) can potentially transform the industry, enhancing the production process and minimizing manual, repetitive tasks. Accordingly, the synergy between high-performance computing and powerful mathematical models enables the application of sophisticated data analysis procedures like Machine Learning. However, challenges exist regarding effective, efficient, and flexible processing to generate valuable knowledge. Consequently, this work comprehensively describes industrial challenges where AI can be exploited, focusing on the dairy industry. The conclusions presented can help researchers apply novel approaches for cattle monitoring and farmers by proposing advanced technological solutions to their needs.

Updated: 2024-06-19 11:49:03

标题: 信息学与乳制品行业联盟：人工智能趋势和当前挑战

摘要: 人工智能（AI）有可能彻底改变行业，提升生产过程并最小化手动、重复性任务。因此，高性能计算和强大的数学模型之间的协同作用使得复杂数据分析程序如机器学习得以应用。然而，存在着关于有效、高效和灵活处理以生成有价值知识的挑战。因此，本文全面描述了工业挑战，AI在其中可以被利用，重点放在了奶业上。所提出的结论可以帮助研究人员应用新颖方法监测牛只，并为农民提出先进的技术解决方案以满足他们的需求。

更新时间: 2024-06-19 11:49:03

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12770v2

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem.

Updated: 2024-06-19 11:36:30

标题: 在大型语言模型中保留知识：使用模型无关的自解压缩

摘要: 人类在学习新信息的同时可以保留旧知识，但大型语言模型（LLMs）经常在后预训练或监督微调（SFT）领域特定数据后遭受灾难性遗忘。此外，对于由LLM基础和视觉投影仪（例如LLaVA）组成的多模态大型语言模型（MLLMs），与其单模态对应物相比，在语言基准测试中观察到性能显著下降。为了解决这些挑战，我们引入了一种新颖的与模型无关的自解压缩方法，Tree Generation（TG），该方法将LLMs中的知识解压缩到训练语料库中。本文重点介绍了TG-SFT，该方法可以为指导调整步骤合成生成SFT数据。通过在MLLMs的SFT期间合并转储语料库，我们显著减少了遗忘问题。

更新时间: 2024-06-19 11:36:30

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.11354v2

Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning

Recent advancements in off-policy Reinforcement Learning (RL) have significantly improved sample efficiency, primarily due to the incorporation of various forms of regularization that enable more gradient update steps than traditional agents. However, many of these techniques have been tested in limited settings, often on tasks from single simulation benchmarks and against well-known algorithms rather than a range of regularization approaches. This limits our understanding of the specific mechanisms driving RL improvements. To address this, we implemented over 60 different off-policy agents, each integrating established regularization techniques from recent state-of-the-art algorithms. We tested these agents across 14 diverse tasks from 2 simulation benchmarks, measuring training metrics related to overestimation, overfitting, and plasticity loss -- issues that motivate the examined regularization techniques. Our findings reveal that while the effectiveness of a specific regularization setup varies with the task, certain combinations consistently demonstrate robust and superior performance. Notably, a simple Soft Actor-Critic agent, appropriately regularized, reliably finds a better-performing policy within the training regime, which previously was achieved mainly through model-based approaches.

Updated: 2024-06-19 11:32:01

标题: 在演员评论者中的高估、过拟合和可塑性：强化学习的苦涩教训

摘要: 最近在离线策略强化学习（RL）方面取得了显著进展，主要是由于整合了各种形式的正则化，使得比传统代理更多地进行了梯度更新步骤，从而显著提高了样本效率。然而，许多这些技术在有限的设置中进行了测试，通常是在单一模拟基准任务上，并且与众所周知的算法相比，而不是与一系列正则化方法相比。这限制了我们对推动RL改进的具体机制的理解。为了解决这个问题，我们实现了60多种不同的离线代理，每个代理都整合了最近最先进算法中已建立的正则化技术。我们在来自2个模拟基准的14个不同任务上测试了这些代理，测量了与过度估计、过度拟合和可塑性损失有关的训练指标--这些问题激发了所研究的正则化技术。我们的研究结果显示，尽管特定正则化设置的有效性因任务而异，但某些组合始终表现出稳健和优越的性能。值得注意的是，一个简单的Soft Actor-Critic代理，经过适当的正则化，可在训练范围内可靠地找到一个性能更好的策略，而以前主要通过基于模型的方法实现。

更新时间: 2024-06-19 11:32:01

领域: cs.LG

下载: http://arxiv.org/abs/2403.00514v2

EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

Event-based vision has drawn increasing attention due to its unique characteristics, such as high temporal resolution and high dynamic range. It has been used in video super-resolution (VSR) recently to enhance the flow estimation and temporal alignment. Rather than for motion learning, we propose in this paper the first VSR method that utilizes event signals for texture enhancement. Our method, called EvTexture, leverages high-frequency details of events to better recover texture regions in VSR. In our EvTexture, a new texture enhancement branch is presented. We further introduce an iterative texture enhancement module to progressively explore the high-temporal-resolution event information for texture restoration. This allows for gradual refinement of texture regions across multiple iterations, leading to more accurate and rich high-resolution details. Experimental results show that our EvTexture achieves state-of-the-art performance on four datasets. For the Vid4 dataset with rich textures, our method can get up to 4.67dB gain compared with recent event-based methods. Code: https://github.com/DachunKai/EvTexture.

Updated: 2024-06-19 11:27:44

标题: EvTexture: 事件驱动的视频超分辨率纹理增强

摘要: 基于事件的视觉由于其高时间分辨率和高动态范围等独特特性而受到越来越多的关注。最近，它已被用于视频超分辨率（VSR）以增强流估计和时间对齐。本文提出了一种新的VSR方法，该方法利用事件信号进行纹理增强，而不是用于运动学习。我们的方法称为EvTexture，利用事件的高频细节更好地恢复VSR中的纹理区域。在我们的EvTexture中，提出了一个新的纹理增强分支。我们进一步引入了一个迭代纹理增强模块，逐步探索高时间分辨率的事件信息以进行纹理恢复。这允许在多次迭代中逐渐改进纹理区域，从而产生更准确和丰富的高分辨率细节。实验结果表明，我们的EvTexture在四个数据集上实现了最先进的性能。对于具有丰富纹理的Vid4数据集，我们的方法与最近的基于事件的方法相比，可以获得高达4.67dB的增益。代码：https://github.com/DachunKai/EvTexture。

更新时间: 2024-06-19 11:27:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13457v1

CONTAIN: A Community-based Algorithm for Network Immunization

Network immunization is an automated task in the field of network analysis that involves protecting a network (modeled as a graph) from being infected by an undesired arbitrary diffusion. In this article, we consider the spread of harmful content in social networks, and we propose CONTAIN, a novel COmmuNiTy-based Algorithm for network ImmuNization. Our solution uses the network information to (1) detect harmful content spreaders, and (2) generate partitions and rank them for immunization using the subgraphs induced by each spreader, i.e., employing CONTAIN. The experimental results obtained on real-world datasets show that CONTAIN outperforms state-of-the-art solutions, i.e., NetShield and SparseShield, by immunizing the network in fewer iterations, thus, converging significantly faster than the state-of-the-art algorithms. We also compared our solution in terms of scalability with the state-of-the-art tree-based mitigation algorithm MCWDST, as well as with NetShield and SparseShield. We can conclude that our solution outperforms MCWDST and NetShield.

Updated: 2024-06-19 11:18:36

标题: CONTAIN：一种基于社区的网络免疫算法

摘要: 网络免疫是网络分析领域中的一项自动化任务，涉及保护网络（建模为图）免受不良任意扩散的感染。在本文中，我们考虑社交网络中有害内容的传播，并提出了CONTAIN，一种新颖的基于社区的网络免疫算法。我们的解决方案利用网络信息来（1）检测有害内容的传播者，以及（2）通过每个传播者引起的子图来生成分区并对其进行免疫排名，即利用CONTAIN。在真实数据集上获得的实验结果显示，CONTAIN在免疫网络方面表现优异，比最先进的解决方案NetShield和SparseShield表现更好，通过在较少迭代中实现网络免疫，因此比最先进的算法收敛速度显著更快。我们还在可扩展性方面将我们的解决方案与最先进的基于树的缓解算法MCWDST以及NetShield和SparseShield进行了比较。我们得出结论，我们的解决方案优于MCWDST和NetShield。

更新时间: 2024-06-19 11:18:36

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2303.01934v2

Learning from Emergence: A Study on Proactively Inhibiting the Monosemantic Neurons of Artificial Neural Networks

Recently, emergence has received widespread attention from the research community along with the success of large-scale models. Different from the literature, we hypothesize a key factor that promotes the performance during the increase of scale: the reduction of monosemantic neurons that can only form one-to-one correlations with specific features. Monosemantic neurons tend to be sparser and have negative impacts on the performance in large models. Inspired by this insight, we propose an intuitive idea to identify monosemantic neurons and inhibit them. However, achieving this goal is a non-trivial task as there is no unified quantitative evaluation metric and simply banning monosemantic neurons does not promote polysemanticity in neural networks. Therefore, we first propose a new metric to measure the monosemanticity of neurons with the guarantee of efficiency for online computation, then introduce a theoretically supported method to suppress monosemantic neurons and proactively promote the ratios of polysemantic neurons in training neural networks. We validate our conjecture that monosemanticity brings about performance change at different model scales on a variety of neural networks and benchmark datasets in different areas, including language, image, and physics simulation tasks. Further experiments validate our analysis and theory regarding the inhibition of monosemanticity.

Updated: 2024-06-19 11:18:00

标题: 从出现中学习：关于主动抑制人工神经网络单语义神经元的研究

摘要: 最近，随着大规模模型的成功，出现性受到了研究界的广泛关注。与文献不同，我们假设在规模增加过程中促进性能的一个关键因素是：减少只能与特定特征形成一对一关联的单语义神经元。单语义神经元往往更稀疏，并且对大型模型的性能产生负面影响。受到这一洞察的启发，我们提出了一个直观的想法来识别单语义神经元并抑制它们。然而，实现这一目标是一项非常困难的任务，因为没有统一的定量评估指标，简单地禁止单语义神经元并不能促进神经网络的多义性。因此，我们首先提出了一个新的指标来衡量神经元的单语义性，并保证在线计算的效率，然后引入了一个在理论上支持的方法来抑制单语义神经元，并积极促进训练神经网络中多义神经元的比例。我们验证了我们的推测，即单语义性在不同模型规模下对各种神经网络和各种领域的基准数据集（包括语言、图像和物理模拟任务）的性能变化产生影响。进一步实验证实了我们关于抑制单语义性的分析和理论。

更新时间: 2024-06-19 11:18:00

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2312.11560v3

Federating to Grow Transformers with Constrained Resources without Model Sharing

The high resource consumption of large-scale models discourages resource-constrained users from developing their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to help participants expand their pre-trained small models to a transformer. In Dual-LiGO, the Local-LiGO part is used to address the heterogeneity problem caused by the various pre-trained models, and the Global-LiGO part is shared to exchange the implicit knowledge from the pre-trained models, local data, and training process of participants. Instead of model sharing, only sharing the Global-LiGO strengthens the privacy of our approach. Compared with several state-of-the-art methods in simulation, our approach has higher accuracy, better precision, and lower resource consumption on computations and communications. To the best of our knowledge, most of the previous model-scaling works are centralized, and our work is the first one that cooperatively grows a transformer from multiple pre-trained heterogeneous models with the user privacy protected in terms of local data and models. We hope that our approach can extend the transformers to the broadly distributed scenarios and encourage more resource-constrained users to enjoy the bonus taken by the large-scale transformers.

Updated: 2024-06-19 11:17:59

标题: 使用有限资源联合增长变压器，无需模型共享

摘要: 大规模模型的高资源消耗阻碍了资源受限用户开发他们定制的transformers。为此，本文考虑了一个名为Fed-Grow的联邦框架，多个参与者可以合作地从他们的预训练小模型中扩展一个transformer。在Fed-Grow下，设计了一个名为Dual-LiGO（Dual Linear Growth Operator）的架构，帮助参与者将他们的预训练小模型扩展为一个transformer。在Dual-LiGO中，Local-LiGO部分用于解决由各种预训练模型引起的异构性问题，而Global-LiGO部分则用于交换来自预训练模型、本地数据和参与者训练过程的隐含知识。与模型共享不同，仅分享Global-LiGO加强了我们方法的隐私性。与几种最先进的方法相比，在模拟中，我们的方法在计算和通信方面具有更高的准确性、更好的精度和更低的资源消耗。据我们所知，大多数先前的模型扩展工作是集中的，而我们的工作是第一个通过保护本地数据和模型的用户隐私来合作地从多个预训练异构模型中发展transformer的工作。我们希望我们的方法可以将transformers扩展到广泛分布的场景，并鼓励更多资源受限的用户享受大规模transformers带来的好处。

更新时间: 2024-06-19 11:17:59

领域: cs.AI

下载: http://arxiv.org/abs/2406.13450v1

High-probability minimax lower bounds

The minimax risk is often considered as a gold standard against which we can compare specific statistical procedures. Nevertheless, as has been observed recently in robust and heavy-tailed estimation problems, the inherent reduction of the (random) loss to its expectation may entail a significant loss of information regarding its tail behaviour. In an attempt to avoid such a loss, we introduce the notion of a minimax quantile, and seek to articulate its dependence on the quantile level. To this end, we develop high-probability variants of the classical Le Cam and Fano methods, as well as a technique to convert local minimax risk lower bounds to lower bounds on minimax quantiles. To illustrate the power of our framework, we deploy our techniques on several examples, recovering recent results in robust mean estimation and stochastic convex optimisation, as well as obtaining several new results in covariance matrix estimation, sparse linear regression, nonparametric density estimation and isotonic regression. Our overall goal is to argue that minimax quantiles can provide a finer-grained understanding of the difficulty of statistical problems, and that, in wide generality, lower bounds on these quantities can be obtained via user-friendly tools.

Updated: 2024-06-19 11:15:01

标题: 高概率极小极大下界

摘要: 极小风险通常被认为是一个标准，可以用来比较特定的统计程序。然而，正如最近在健壮和重尾估计问题中观察到的那样，将（随机）损失降低到其期望值可能会导致对其尾部行为的重要信息丢失。为了避免这种损失，我们引入了极小分位数的概念，并试图阐明其与分位数水平的依赖关系。为此，我们开发了经典Le Cam和Fano方法的高概率变体，以及一种将局部极小风险下限转换为极小分位数下限的技术。为了展示我们框架的强大性，我们在几个示例上部署了我们的技术，恢复了最近在健壮均值估计和随机凸优化中的结果，以及在协方差矩阵估计、稀疏线性回归、非参数密度估计和单调回归中获得了几个新结果。我们的总体目标是论证极小分位数可以提供对统计问题难度的更细致理解，并且在广泛的一般性情况下，可以通过用户友好的工具获得这些数量的下限。

更新时间: 2024-06-19 11:15:01

领域: math.ST,cs.IT,cs.LG,math.IT,stat.ML,stat.TH,62C20, 62B10

下载: http://arxiv.org/abs/2406.13447v1

MISS: A Generative Pretraining and Finetuning Approach for Med-VQA

Medical visual question answering (VQA) is a challenging multimodal task, where Vision-Language Pre-training (VLP) models can effectively improve the generalization performance. However, most methods in the medical field treat VQA as an answer classification task which is difficult to transfer to practical application scenarios. Additionally, due to the privacy of medical images and the expensive annotation process, large-scale medical image-text pairs datasets for pretraining are severely lacking. In this paper, we propose a large-scale MultI-task Self-Supervised learning based framework (MISS) for medical VQA tasks. Unlike existing methods, we treat medical VQA as a generative task. We unify the text encoder and multimodal encoder and align image-text features through multi-task learning. Furthermore, we propose a Transfer-and-Caption method that extends the feature space of single-modal image datasets using Large Language Models (LLMs), enabling those traditional medical vision field task data to be applied to VLP. Experiments show that our method achieves excellent results with fewer multimodal datasets and demonstrates the advantages of generative VQA models.

Updated: 2024-06-19 11:14:40

标题: MISS：一种用于医学视觉问答的生成式预训练和微调方法

摘要: 医学视觉问答（VQA）是一项具有挑战性的多模态任务，其中视觉语言预训练（VLP）模型可以有效提高泛化性能。然而，医学领域中大多数方法将VQA视为答案分类任务，难以转移到实际应用场景中。另外，由于医学图像的隐私性和昂贵的注释过程，用于预训练的大规模医学图像文本对数据集严重缺乏。在本文中，我们提出了一个基于大规模多任务自监督学习的框架（MISS）用于医学VQA任务。与现有方法不同，我们将医学VQA视为一项生成任务。我们通过多任务学习统一文本编码器和多模态编码器，并通过对齐图像文本特征来实现。此外，我们提出了一种Transfer-and-Caption方法，通过使用大型语言模型（LLMs）扩展单模态图像数据集的特征空间，使传统医学视觉领域任务数据可以应用于VLP。实验证明，我们的方法在使用较少的多模态数据集时取得了出色的结果，并展示了生成式VQA模型的优势。

更新时间: 2024-06-19 11:14:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.05163v3

Lost in UNet: Improving Infrared Small Target Detection by Underappreciated Local Features

Many targets are often very small in infrared images due to the long-distance imaging meachnism. UNet and its variants, as popular detection backbone networks, downsample the local features early and cause the irreversible loss of these local features, leading to both the missed and false detection of small targets in infrared images. We propose HintU, a novel network to recover the local features lost by various UNet-based methods for effective infrared small target detection. HintU has two key contributions. First, it introduces the "Hint" mechanism for the first time, i.e., leveraging the prior knowledge of target locations to highlight critical local features. Second, it improves the mainstream UNet-based architecture to preserve target pixels even after downsampling. HintU can shift the focus of various networks (e.g., vanilla UNet, UNet++, UIUNet, MiM+, and HCFNet) from the irrelevant background pixels to a more restricted area from the beginning. Experimental results on three datasets NUDT-SIRST, SIRSTv2 and IRSTD1K demonstrate that HintU enhances the performance of existing methods with only an additional 1.88 ms cost (on RTX Titan). Additionally, the explicit constraints of HintU enhance the generalization ability of UNet-based methods. Code is available at https://github.com/Wuzhou-Quan/HintU.

Updated: 2024-06-19 11:11:38

标题: 在UNet中迷失：通过未被重视的局部特征改善红外小目标检测

摘要: 由于长距离成像机制，许多目标在红外图像中通常非常小。 UNet及其变种作为流行的检测骨干网络，早期下采样局部特征并导致这些局部特征的不可逆丢失，导致红外图像中小目标的漏检和误检。我们提出了HintU，一种新颖的网络，用于恢复各种基于UNet的方法丢失的局部特征，以实现有效的红外小目标检测。 HintU有两个关键贡献。首先，它首次引入了“Hint”机制，即利用目标位置的先验知识来突出关键的局部特征。其次，它改进了主流的基于UNet的架构，以便在下采样后仍保留目标像素。 HintU可以将各种网络（例如原始UNet，UNet ++，UIUNet，MiM +和HCFNet）的焦点从无关的背景像素转移到更受限制的区域。对NUDT-SIRST，SIRSTv2和IRSTD1K三个数据集的实验结果表明，HintU在只增加1.88毫秒的成本（在RTX Titan上）的情况下提高了现有方法的性能。此外，HintU的显式约束增强了基于UNet的方法的泛化能力。代码可在https://github.com/Wuzhou-Quan/HintU 上找到。

更新时间: 2024-06-19 11:11:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13445v1

Synthesizing PET images from High-field and Ultra-high-field MR images Using Joint Diffusion Attention Model

MRI and PET are crucial diagnostic tools for brain diseases, as they provide complementary information on brain structure and function. However, PET scanning is costly and involves radioactive exposure, resulting in a lack of PET. Moreover, simultaneous PET and MRI at ultra-high-field are currently hardly infeasible. Ultra-high-field imaging has unquestionably proven valuable in both clinical and academic settings, especially in the field of cognitive neuroimaging. These motivate us to propose a method for synthetic PET from high-filed MRI and ultra-high-field MRI. From a statistical perspective, the joint probability distribution (JPD) is the most direct and fundamental means of portraying the correlation between PET and MRI. This paper proposes a novel joint diffusion attention model which has the joint probability distribution and attention strategy, named JDAM. JDAM has a diffusion process and a sampling process. The diffusion process involves the gradual diffusion of PET to Gaussian noise by adding Gaussian noise, while MRI remains fixed. JPD of MRI and noise-added PET was learned in the diffusion process. The sampling process is a predictor-corrector. PET images were generated from MRI by JPD of MRI and noise-added PET. The predictor is a reverse diffusion process and the corrector is Langevin dynamics. Experimental results on the public Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset demonstrate that the proposed method outperforms state-of-the-art CycleGAN for high-field MRI (3T MRI). Finally, synthetic PET images from the ultra-high-field (5T MRI and 7T MRI) be attempted, providing a possibility for ultra-high-field PET-MRI imaging.

Updated: 2024-06-19 11:09:55

标题: 使用联合扩散注意模型从高场和超高场MR图像合成PET图像

摘要: MRI和PET是大脑疾病的关键诊断工具，因为它们提供了关于大脑结构和功能的互补信息。然而，PET扫描成本高昂且涉及放射性暴露，导致PET的使用受限。此外，目前几乎不可能实现超高场下PET和MRI的同时扫描。超高场成像在临床和学术领域都被证明是有价值的，尤其是在认知神经影像学领域。这些促使我们提出一种从高场MRI和超高场MRI合成PET的方法。从统计学的角度看，联合概率分布（JPD）是描绘PET和MRI之间相关性最直接和基本的手段。本文提出了一个新颖的联合扩散注意模型，该模型具有联合概率分布和注意策略，名为JDAM。JDAM具有扩散过程和采样过程。扩散过程涉及将PET逐渐扩散到高斯噪声中，同时MRI保持不变。在扩散过程中学习了MRI和添加噪声的PET的JPD。采样过程是一个预测器-校正器。通过MRI的JPD和添加噪声的PET生成PET图像。预测器是一个反向扩散过程，校正器是Langevin动力学。对公共阿尔茨海默病神经影像学倡议（ADNI）数据集的实验结果表明，该方法优于高场MRI（3T MRI）的最新CycleGAN。最后，尝试从超高场（5T MRI和7T MRI）合成PET图像，为超高场PET-MRI成像提供了可能性。

更新时间: 2024-06-19 11:09:55

领域: cs.LG

下载: http://arxiv.org/abs/2305.03901v2

Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective

The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code and Colab demo are available in (http://github.com/A4Bio/RFold).

Updated: 2024-06-19 11:08:23

标题: 解读RNA次级结构预测：基于概率K-车匹配的视角

摘要: 核糖核酸（RNA）的二级结构在细胞中比其三级结构更稳定且更可访问，因此对于功能预测至关重要。尽管深度学习在这一领域显示出了令人期待的结果，但当前方法存在泛化能力差和复杂性高的问题。在这项工作中，我们将RNA二级结构预测重新定义为K-Rook问题，从而将预测过程简化为在有限解空间内的概率匹配。基于这一创新视角，我们引入了RFold，这是一种简单而有效的方法，它学习从给定序列中预测最匹配的K-Rook解决方案。RFold采用了双向优化策略，将概率匹配问题分解为行向和列向组件，以降低匹配复杂性，简化解决过程同时确保输出的有效性。大量实验证明，RFold取得了竞争性的性能，并且比最先进的方法快大约八倍的推断效率。代码和Colab演示可在(http://github.com/A4Bio/RFold)上找到。

更新时间: 2024-06-19 11:08:23

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2212.14041v5

Robust Melanoma Thickness Prediction via Deep Transfer Learning enhanced by XAI Techniques

This study focuses on analyzing dermoscopy images to determine the depth of melanomas, which is a critical factor in diagnosing and treating skin cancer. The Breslow depth, measured from the top of the granular layer to the deepest point of tumor invasion, serves as a crucial parameter for staging melanoma and guiding treatment decisions. This research aims to improve the prediction of the depth of melanoma through the use of machine learning models, specifically deep learning, while also providing an analysis of the possible existance of graduation in the images characteristics which correlates with the depth of the melanomas. Various datasets, including ISIC and private collections, were used, comprising a total of 1162 images. The datasets were combined and balanced to ensure robust model training. The study utilized pre-trained Convolutional Neural Networks (CNNs). Results indicated that the models achieved significant improvements over previous methods. Additionally, the study conducted a correlation analysis between model's predictions and actual melanoma thickness, revealing a moderate correlation that improves with higher thickness values. Explainability methods such as feature visualization through Principal Component Analysis (PCA) demonstrated the capability of deep features to distinguish between different depths of melanoma, providing insight into the data distribution and model behavior. In summary, this research presents a dual contribution: enhancing the state-of-the-art classification results through advanced training techniques and offering a detailed analysis of the data and model behavior to better understand the relationship between dermoscopy images and melanoma thickness.

Updated: 2024-06-19 11:07:55

标题: 深度迁移学习增强的XAI技术在鲁棒性黑色素瘤厚度预测中的应用

摘要: 这项研究侧重于分析皮肤镜图像，以确定黑色素瘤的深度，这是诊断和治疗皮肤癌的关键因素。从颗粒层顶部到肿瘤侵袭最深点的布雷斯洛深度，作为分期黑色素瘤和指导治疗决策的关键参数。本研究旨在通过使用机器学习模型，特别是深度学习，改善对黑色素瘤深度的预测，同时提供与黑色素瘤深度相关的图像特征可能存在的分级分析。使用了包括ISIC和私人收藏在内的各种数据集，共计1162张图像。数据集被合并和平衡，以确保强健的模型训练。研究利用了预训练的卷积神经网络（CNNs）。结果显示，模型在先前方法上取得了显著改进。此外，研究进行了模型预测与实际黑色素瘤厚度之间的相关性分析，揭示了一个中等相关性，随着厚度值的增加而改善。通过主成分分析（PCA）进行的特征可视化等解释方法展示了深度特征区分不同黑色素瘤深度的能力，为了解皮肤镜图像与黑色素瘤厚度之间的关系提供了见解。总之，本研究提出了双重贡献：通过先进的训练技术增强了最先进的分类结果，并提供了对数据和模型行为的详细分析，以更好地理解皮肤镜图像与黑色素瘤厚度之间的关系。

更新时间: 2024-06-19 11:07:55

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.13441v1

CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images

Due to the lack of automated methods, to diagnose cerebrovascular disease, time-of-flight magnetic resonance angiography (TOF-MRA) is assessed visually, making it time-consuming. The commonly used encoder-decoder architectures for cerebrovascular segmentation utilize redundant features, eventually leading to the extraction of low-level features multiple times. Additionally, convolutional neural networks (CNNs) suffer from performance degradation when the batch size is small, and deeper networks experience the vanishing gradient problem. Methods: In this paper, we attempt to solve these limitations and propose the 3D cerebrovascular attention UNet method, named CV-AttentionUNet, for precise extraction of brain vessel images. We proposed a sequence of preprocessing techniques followed by deeply supervised UNet to improve the accuracy of segmentation of the brain vessels leading to a stroke. To combine the low and high semantics, we applied the attention mechanism. This mechanism focuses on relevant associations and neglects irrelevant anatomical information. Furthermore, the inclusion of deep supervision incorporates different levels of features that prove to be beneficial for network convergence. Results: We demonstrate the efficiency of the proposed method by cross-validating with an unlabeled dataset, which was further labeled by us. We believe that the novelty of this algorithm lies in its ability to perform well on both labeled and unlabeled data with image processing-based enhancement. The results indicate that our method performed better than the existing state-of-the-art methods on the TubeTK dataset. Conclusion: The proposed method will help in accurate segmentation of cerebrovascular structure leading to stroke

Updated: 2024-06-19 10:57:46

标题: CV-Attention UNet: Attention-based UNet用于增强TOF-MRA图像的3D脑血管分割

摘要: 由于缺乏自动化方法，诊断脑血管疾病仍然依赖视觉评估时间飞行磁共振血管成像（TOF-MRA），这使得诊断过程耗时。目前常用的编码器-解码器架构用于脑血管分割，利用了冗余特征，最终导致低级特征被多次提取。此外，卷积神经网络（CNNs）在批处理大小较小时性能下降，并且深层网络存在梯度消失问题。方法：在本文中，我们试图解决这些限制，并提出了3D脑血管关注UNet方法，命名为CV-AttentionUNet，用于精确提取脑血管图像。我们提出了一系列预处理技术，随后使用深度监督UNet来提高脑血管分割的准确性，从而减少中风的发生。为了结合低和高语义信息，我们应用了关注机制。该机制侧重于相关关联并忽略不相关的解剖信息。此外，深度监督的引入将不同级别的特征结合起来，有助于网络的收敛。结果：我们通过与我们进一步标记的未标记数据集进行交叉验证，展示了所提出方法的高效性。我们认为这种算法的新颖之处在于其能够在基于图像处理的增强上表现出色，无论是在已标记数据还是未标记数据上。结果表明，我们的方法在TubeTK数据集上的表现优于现有的最先进方法。结论：所提出的方法将有助于准确分割脑血管结构，减少中风的发生。

更新时间: 2024-06-19 10:57:46

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.10224v3

ProG: A Graph Prompt Learning Benchmark

Artificial general intelligence on graphs has shown significant advancements across various applications, yet the traditional 'Pre-train & Fine-tune' paradigm faces inefficiencies and negative transfer issues, particularly in complex and few-shot settings. Graph prompt learning emerges as a promising alternative, leveraging lightweight prompts to manipulate data and fill the task gap by reformulating downstream tasks to the pretext. However, several critical challenges still remain: how to unify diverse graph prompt models, how to evaluate the quality of graph prompts, and to improve their usability for practical comparisons and selection. In response to these challenges, we introduce the first comprehensive benchmark for graph prompt learning. Our benchmark integrates SIX pre-training methods and FIVE state-of-the-art graph prompt techniques, evaluated across FIFTEEN diverse datasets to assess performance, flexibility, and efficiency. We also present 'ProG', an easy-to-use open-source library that streamlines the execution of various graph prompt models, facilitating objective evaluations. Additionally, we propose a unified framework that categorizes existing graph prompt methods into two main approaches: prompts as graphs and prompts as tokens. This framework enhances the applicability and comparison of graph prompt techniques. The code is available at: https://github.com/sheldonresearch/ProG.

Updated: 2024-06-19 10:55:22

标题: ProG：一个图提示学习基准

摘要: 在图上的人工通用智能已经在各种应用中显示出了显著的进展，然而传统的“预训练和微调”范式在复杂和少样本情况下面临效率低下和负迁移问题。图提示学习作为一种有前途的替代方案出现，利用轻量级提示来操作数据，并通过重新构建下游任务到前提来填补任务间隙。然而，仍然存在几个关键挑战：如何统一多样化的图提示模型，如何评估图提示的质量，以及如何提高它们的可用性以进行实际比较和选择。为了应对这些挑战，我们引入了第一个用于图提示学习的全面基准。我们的基准集成了六种预训练方法和五种最先进的图提示技术，在十五个不同的数据集上评估性能、灵活性和效率。我们还提出了“ProG”，一个易于使用的开源库，简化了各种图提示模型的执行，促进客观评估。此外，我们提出了一个统一的框架，将现有的图提示方法分类为两种主要方法：提示作为图和提示作为标记。这个框架增强了图提示技术的适用性和比较。代码可在以下网址找到：https://github.com/sheldonresearch/ProG。

更新时间: 2024-06-19 10:55:22

领域: cs.LG

下载: http://arxiv.org/abs/2406.05346v2

What's Next? Exploring Utilization, Challenges, and Future Directions of AI-Generated Image Tools in Graphic Design

Recent advancements in artificial intelligence, such as computer vision and deep learning, have led to the emergence of numerous generative AI platforms, particularly for image generation. However, the application of AI-generated image tools in graphic design has not been extensively explored. This study conducted semi-structured interviews with seven designers of varying experience levels to understand their current usage, challenges, and future functional needs for AI-generated image tools in graphic design. As our findings suggest, AI tools serve as creative partners in design, enhancing human creativity, offering strategic insights, and fostering team collaboration and communication. The findings provide guiding recommendations for the future development of AI-generated image tools, aimed at helping engineers optimize these tools to better meet the needs of graphic designers.

Updated: 2024-06-19 10:51:56

标题: 下一步是什么？探索人工智能生成的图像工具在平面设计中的利用、挑战和未来方向

摘要: 人工智能的最新进展，如计算机视觉和深度学习，已经导致了许多生成式人工智能平台的出现，特别是用于图像生成。然而，在平面设计中应用人工智能生成的图像工具尚未得到广泛探讨。本研究通过与七名经验不同的设计师进行半结构化访谈，以了解他们目前对于平面设计中人工智能生成图像工具的使用情况、挑战和未来的功能需求。正如我们的研究结果所表明的，人工智能工具在设计中充当创意合作伙伴，增强人类创造力，提供战略见解，促进团队协作和沟通。研究结果为未来发展人工智能生成图像工具提供了指导建议，旨在帮助工程师优化这些工具，以更好地满足平面设计师的需求。

更新时间: 2024-06-19 10:51:56

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.13436v1

Certificates of Differential Privacy and Unlearning for Gradient-Based Training

Proper data stewardship requires that model owners protect the privacy of individuals' data used during training. Whether through anonymization with differential privacy or the use of unlearning in non-anonymized settings, the gold-standard techniques for providing privacy guarantees can come with significant performance penalties or be too weak to provide practical assurances. In part, this is due to the fact that the guarantee provided by differential privacy represents the worst-case privacy leakage for any individual, while the true privacy leakage of releasing the prediction for a given individual might be substantially smaller or even, as we show, non-existent. This work provides a novel framework based on convex relaxations and bounds propagation that can compute formal guarantees (certificates) that releasing specific predictions satisfies $\epsilon=0$ privacy guarantees or do not depend on data that is subject to an unlearning request. Our framework offers a new verification-centric approach to privacy and unlearning guarantees, that can be used to further engender user trust with tighter privacy guarantees, provide formal proofs of robustness to certain membership inference attacks, identify potentially vulnerable records, and enhance current unlearning approaches. We validate the effectiveness of our approach on tasks from financial services, medical imaging, and natural language processing.

Updated: 2024-06-19 10:47:00

标题: 梯度训练的差分隐私和遗忘证书

摘要: 适当的数据管理要求模型所有者保护在训练过程中使用的个人数据的隐私。无论是通过具有差异性隐私的匿名化，还是在非匿名化设置中使用未学习，为提供隐私保证的黄金标准技术可能会带来显著的性能惩罚，或者过于脆弱以提供实际保证。部分原因在于差分隐私提供的保证代表了对于任何个人而言最坏的隐私泄漏，而对于给定个人的预测发布的真实隐私泄漏可能会大大较小，甚至，正如我们展示的那样，不存在。这项工作提供了一个基于凸松弛和边界传播的新颖框架，可以计算出具有$\epsilon=0$隐私保证的特定预测发布是否满足或不依赖于受到未学习请求的数据。我们的框架提供了一个新的以验证为中心的隐私和未学习保证方法，可以进一步增强用户对更紧密隐私保证的信任，提供对某些成员推断攻击的强健性的正式证明，识别可能容易受到攻击的记录，并增强当前的未学习方法。我们通过金融服务、医学成像和自然语言处理任务验证了我们方法的有效性。

更新时间: 2024-06-19 10:47:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13433v1

Are Logistic Models Really Interpretable?

The demand for open and trustworthy AI models points towards widespread publishing of model weights. Consumers of these model weights must be able to act accordingly with the information provided. That said, one of the simplest AI classification models, Logistic Regression (LR), has an unwieldy interpretation of its model weights, with greater difficulties when extending LR to generalised additive models. In this work, we show via a User Study that skilled participants are unable to reliably reproduce the action of small LR models given the trained parameters. As an antidote to this, we define Linearised Additive Models (LAMs), an optimal piecewise linear approximation that augments any trained additive model equipped with a sigmoid link function, requiring no retraining. We argue that LAMs are more interpretable than logistic models -- survey participants are shown to solve model reasoning tasks with LAMs much more accurately than with LR given the same information. Furthermore, we show that LAMs do not suffer from large performance penalties in terms of ROC-AUC and calibration with respect to their logistic counterparts on a broad suite of public financial modelling data.

Updated: 2024-06-19 10:36:38

标题: Logistic模型真的可解释吗？

摘要: 对于开放和可信赖的人工智能模型的需求指向了模型权重的广泛发布。这些模型权重的消费者必须能够根据提供的信息采取相应的行动。尽管如此，最简单的人工智能分类模型之一，逻辑回归(Logistic Regression, LR)，对其模型权重的解释是棘手的，尤其在将LR扩展到广义可加模型时更加困难。在这项工作中，我们通过用户研究表明，熟练的参与者无法可靠地根据训练参数重现小型LR模型的行为。作为对此的解决方案，我们定义了线性添加模型(Linearised Additive Models, LAMs)，这是一种最佳的分段线性逼近，可增强任何配备Sigmoid连接函数的训练后的可加模型，无需重新训练。我们认为LAMs比逻辑模型更可解释--调查参与者显示，使用相同信息，他们用LAMs解决模型推理任务的准确性要比LR高得多。此外，我们展示了在广泛的公共金融建模数据套件上，与逻辑回归模型相比，LAMs在ROC-AUC和校准方面并不遭受性能损失。

更新时间: 2024-06-19 10:36:38

领域: cs.LG

下载: http://arxiv.org/abs/2406.13427v1

Generation of Asset Administration Shell with Large Language Model Agents: Towards Semantic Interoperability in Digital Twins in the Context of Industry 4.0

This research introduces a novel approach for achieving semantic interoperability in digital twins and assisting the creation of Asset Administration Shell (AAS) as digital twin model within the context of Industry 4.0. The foundational idea of our research is that the communication based on semantics and the generation of meaningful textual data are directly linked, and we posit that these processes are equivalent if the exchanged information can be serialized in text form. Based on this, we construct a "semantic node" data structure in our research to capture the semantic essence of textual data. Then, a system powered by large language models is designed and implemented to process the "semantic node" and generate standardized digital twin models from raw textual data collected from datasheets describing technical assets. Our evaluation demonstrates an effective generation rate of 62-79%, indicating a substantial proportion of the information from the source text can be translated error-free to the target digital twin instance model with the generative capability of large language models. This result has a direct application in the context of Industry 4.0, and the designed system is implemented as a data model generation tool for reducing the manual effort in creating AAS model. In our evaluation, a comparative analysis of different LLMs and an in-depth ablation study of Retrieval-Augmented Generation (RAG) mechanisms provide insights into the effectiveness of LLM systems for interpreting technical concepts and translating data. Our findings emphasize LLMs' capability to automate AAS instance creation and contribute to the broader field of semantic interoperability for digital twins in industrial applications. The prototype implementation and evaluation results are presented on our GitHub Repository: https://github.com/YuchenXia/AASbyLLM.

Updated: 2024-06-19 10:32:21

标题: 利用大型语言模型代理生成资产管理外壳：在工业4.0背景下数字孪生中的语义互操作性

摘要: 这项研究介绍了一种新颖的方法，用于实现数字孪生中的语义互操作性，并在工业4.0的背景下协助创建资产管理外壳（AAS）作为数字孪生模型。我们研究的基本理念是，基于语义的通信和生成有意义的文本数据是直接相关的，我们认为如果交换的信息可以以文本形式序列化，这些过程是等效的。基于此，我们在研究中构建了一个“语义节点”数据结构，以捕捉文本数据的语义本质。然后，设计并实现了一个由大型语言模型驱动的系统，用于处理“语义节点”并从描述技术资产的数据表中收集的原始文本数据生成标准化的数字孪生模型。我们的评估表明，有效生成率为62-79％，表明源文本中的信息的相当大比例可以以无错误地翻译到目标数字孪生实例模型，这取决于大型语言模型的生成能力。这一结果在工业4.0的背景下具有直接应用，并且设计的系统被实现为数据模型生成工具，以减少创建AAS模型的手动工作。在我们的评估中，对不同LLM的比较分析以及对检索增强生成（RAG）机制的深入消融研究提供了关于LLM系统解释技术概念和翻译数据效果的见解。我们的研究结果强调了LLM的能力，自动化AAS实例创建，并为工业应用中数字孪生的语义互操作性领域做出贡献。原型实现和评估结果可在我们的GitHub存储库上找到：https://github.com/YuchenXia/AASbyLLM。

更新时间: 2024-06-19 10:32:21

领域: cs.AI,cs.IR,cs.MA,cs.SE

下载: http://arxiv.org/abs/2403.17209v3

Coupled Input-Output Dimension Reduction: Application to Goal-oriented Bayesian Experimental Design and Global Sensitivity Analysis

We introduce a new method to jointly reduce the dimension of the input and output space of a high-dimensional function. Choosing a reduced input subspace influences which output subspace is relevant and vice versa. Conventional methods focus on reducing either the input or output space, even though both are often reduced simultaneously in practice. Our coupled approach naturally supports goal-oriented dimension reduction, where either an input or output quantity of interest is prescribed. We consider, in particular, goal-oriented sensor placement and goal-oriented sensitivity analysis, which can be viewed as dimension reduction where the most important output or, respectively, input components are chosen. Both applications present difficult combinatorial optimization problems with expensive objectives such as the expected information gain and Sobol indices. By optimizing gradient-based bounds, we can determine the most informative sensors and most sensitive parameters as the largest diagonal entries of some diagnostic matrices, thus bypassing the combinatorial optimization and objective evaluation.

Updated: 2024-06-19 10:31:57

标题: 耦合输入输出维度缩减：应用于面向目标的贝叶斯实验设计和全局敏感性分析

摘要: 我们介绍了一种新方法，用于联合降低高维函数的输入和输出空间的维度。选择一个降维的输入子空间会影响哪些输出子空间是相关的，反之亦然。传统方法侧重于降低输入或输出空间，尽管在实践中通常同时降低两者。我们的耦合方法自然地支持目标导向的维度缩减，其中输入或输出感兴趣的数量被指定。我们特别考虑目标导向的传感器布置和目标导向的灵敏度分析，可以被视为选择了最重要的输出或输入组件的维度缩减。这两种应用都涉及具有昂贵目标的困难组合优化问题，如期望信息增益和Sobol指数。通过优化基于梯度的界限，我们可以确定最具信息性的传感器和最敏感的参数，作为一些诊断矩阵的最大对角线条目，从而绕过组合优化和目标评估。

更新时间: 2024-06-19 10:31:57

领域: stat.ML,cs.LG,math.ST,stat.TH,65D40, 62F15, 62K05

下载: http://arxiv.org/abs/2406.13425v1

Towards a multimodal framework for remote sensing image change retrieval and captioning

Recently, there has been increasing interest in multimodal applications that integrate text with other modalities, such as images, audio and video, to facilitate natural language interactions with multimodal AI systems. While applications involving standard modalities have been extensively explored, there is still a lack of investigation into specific data modalities such as remote sensing (RS) data. Despite the numerous potential applications of RS data, including environmental protection, disaster monitoring and land planning, available solutions are predominantly focused on specific tasks like classification, captioning and retrieval. These solutions often overlook the unique characteristics of RS data, such as its capability to systematically provide information on the same geographical areas over time. This ability enables continuous monitoring of changes in the underlying landscape. To address this gap, we propose a novel foundation model for bi-temporal RS image pairs, in the context of change detection analysis, leveraging Contrastive Learning and the LEVIR-CC dataset for both captioning and text-image retrieval. By jointly training a contrastive encoder and captioning decoder, our model add text-image retrieval capabilities, in the context of bi-temporal change detection, while maintaining captioning performances that are comparable to the state of the art. We release the source code and pretrained weights at: https://github.com/rogerferrod/RSICRC.

Updated: 2024-06-19 10:30:56

标题: 朝着远程感知图像变化检索和字幕生成的多模态框架

摘要: 最近，对于将文本与其他模态，如图像、音频和视频，集成在一起以便促进与多模态人工智能系统的自然语言交互的应用越来越感兴趣。虽然涉及标准模态的应用已经得到广泛探索，但对于特定数据模态，如遥感（RS）数据，仍然缺乏调查。尽管遥感数据具有许多潜在应用，包括环保、灾害监测和土地规划，但现有解决方案主要集中在特定任务，如分类、字幕和检索。这些解决方案通常忽视了遥感数据的独特特征，例如其能够系统地提供同一地理区域随时间变化的信息。这种能力使得连续监测底层景观的变化成为可能。为了填补这一空白，我们提出了一个新颖的基础模型，用于双时遥感图像对的变化检测分析，利用对比学习和LEVIR-CC数据集进行字幕化和文本-图像检索。通过联合训练对比编码器和字幕解码器，我们的模型在双时变化检测的背景下增加了文本-图像检索能力，同时保持了与现有技术水平相当的字幕化性能。我们在https://github.com/rogerferrod/RSICRC发布了源代码和预训练权重。

更新时间: 2024-06-19 10:30:56

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13424v1

Learning with 3D rotations, a hitchhiker's guide to SO(3)

Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.

Updated: 2024-06-19 10:17:54

标题: 学习三维旋转，SO(3)的搭便车者指南

摘要: 机器学习中的许多设置都需要选择旋转表示。然而，从众多可用选项中选择合适的表示是具有挑战性的。本文充当了旋转表示的调查和指南。我们详细介绍了它们的属性，这些属性会对基于梯度优化的深度学习产生有害或有益的影响。通过整合基于旋转的学习的见解，我们提供了学习具有旋转表示的函数的全面概述。我们根据模型的输入或输出中是否包含旋转以及数据是否主要由小角度组成来提供选择表示的指导。

更新时间: 2024-06-19 10:17:54

领域: cs.LG,cs.CV,cs.RO

下载: http://arxiv.org/abs/2404.11735v2

Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators

Large Language Models (LLMs) tend to be unreliable in the factuality of their answers. To address this problem, NLP researchers have proposed a range of techniques to estimate LLM's confidence over facts. However, due to the lack of a systematic comparison, it is not clear how the different methods compare to one another. To fill this gap, we present a survey and empirical comparison of estimators of factual confidence. We define an experimental framework allowing for fair comparison, covering both fact-verification and question answering. Our experiments across a series of LLMs indicate that trained hidden-state probes provide the most reliable confidence estimates, albeit at the expense of requiring access to weights and training data. We also conduct a deeper assessment of factual confidence by measuring the consistency of model behavior under meaning-preserving variations in the input. We find that the confidence of LLMs is often unstable across semantically equivalent inputs, suggesting that there is much room for improvement of the stability of models' parametric knowledge. Our code is available at (https://github.com/amazon-science/factual-confidence-of-llms).

Updated: 2024-06-19 10:11:37

标题: LLM的事实信心：当前估计器的可靠性和稳健性

摘要: 大型语言模型（LLMs）在其答案的真实性方面往往不可靠。为了解决这个问题，自然语言处理研究人员提出了一系列技术来估计LLM对事实的信心。然而，由于缺乏系统性比较，不清楚不同方法之间的比较如何。为了填补这一空白，我们提出了一个调查和经验比较事实信心估计器。我们定义了一个实验框架，允许公平比较，涵盖事实验证和问题回答。我们的实验跨越一系列LLMs表明，经过训练的隐藏状态探针提供最可靠的信心估计，尽管需要访问权重和训练数据。我们还通过测量模型在输入中保留意义变化时的行为一致性，对事实信心进行了更深入的评估。我们发现LLMs的信心在语义上等价的输入中经常不稳定，表明模型的参数化知识稳定性有很大改进空间。我们的代码可以在（https://github.com/amazon-science/factual-confidence-of-llms）找到。

更新时间: 2024-06-19 10:11:37

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13415v1

Archive-based Single-Objective Evolutionary Algorithms for Submodular Optimization

Constrained submodular optimization problems play a key role in the area of combinatorial optimization as they capture many NP-hard optimization problems. So far, Pareto optimization approaches using multi-objective formulations have been shown to be successful to tackle these problems while single-objective formulations lead to difficulties for algorithms such as the $(1+1)$-EA due to the presence of local optima. We introduce for the first time single-objective algorithms that are provably successful for different classes of constrained submodular maximization problems. Our algorithms are variants of the $(1+\lambda)$-EA and $(1+1)$-EA and increase the feasible region of the search space incrementally in order to deal with the considered submodular problems.

Updated: 2024-06-19 10:08:12

标题: 基于存档的单目标进化算法用于次模优化

摘要: 受限次模优化问题在组合优化领域中扮演着关键角色，因为它们涵盖了许多NP难的优化问题。到目前为止，使用多目标公式的帕累托优化方法已被证明成功解决这些问题，而单目标公式会导致像$(1+1)$-EA这样的算法出现局部最优的困难。我们首次引入了针对不同类别受限次模最大化问题的可证明成功的单目标算法。我们的算法是$(1+\lambda)$-EA和$(1+1)$-EA的变种，并逐步增加搜索空间的可行区域，以应对所考虑的次模问题。

更新时间: 2024-06-19 10:08:12

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2406.13414v1

CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator selection as a second policy to be learned, concurrently being updated with the original signal-controlling policy. Specifically, the selection policy in real-time adaptively selects the best teammates according to phase- and intersection-level features. Empirical results on both synthetic and real-world datasets provide robust validation for the superiority of our approach, offering significant improvements over existing state-of-the-art methods. The code is available at https://github.com/bonaldli/CoSLight.

Updated: 2024-06-19 10:07:02

标题: CoSLight: 协同优化协作者选择和决策以增强交通信号控制

摘要: 有效的多交叉口协作对于基于强化学习的交通信号控制至关重要，以减轻拥堵。现有工作主要选择邻近交叉口作为协作者。然而，相当多的拥堵，甚至一些广泛的拥堵，是由非邻近交叉口未能协作造成的。为解决这些问题，我们提出将协作者选择作为第二个待学习的策略，同时与原始信号控制策略并行更新。具体来说，实时选择策略根据相位和交叉口级特征自适应地选择最佳的队友。对合成和真实世界数据集的实证结果为我们的方法的优越性提供了稳健的验证，相较于现有最先进的方法，提供了显著的改进。代码可在https://github.com/bonaldli/CoSLight中找到。

更新时间: 2024-06-19 10:07:02

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2405.17152v3

Composite Concept Extraction through Backdooring

Learning composite concepts, such as \textquotedbl red car\textquotedbl , from individual examples -- like a white car representing the concept of \textquotedbl car\textquotedbl{} and a red strawberry representing the concept of \textquotedbl red\textquotedbl -- is inherently challenging. This paper introduces a novel method called Composite Concept Extractor (CoCE), which leverages techniques from traditional backdoor attacks to learn these composite concepts in a zero-shot setting, requiring only examples of individual concepts. By repurposing the trigger-based model backdooring mechanism, we create a strategic distortion in the manifold of the target object (e.g., \textquotedbl car\textquotedbl ) induced by example objects with the target property (e.g., \textquotedbl red\textquotedbl ) from objects \textquotedbl red strawberry\textquotedbl , ensuring the distortion selectively affects the target objects with the target property. Contrastive learning is then employed to further refine this distortion, and a method is formulated for detecting objects that are influenced by the distortion. Extensive experiments with in-depth analysis across different datasets demonstrate the utility and applicability of our proposed approach.

Updated: 2024-06-19 10:02:54

标题: 通过后门方式进行复合概念提取

摘要: 学习复合概念，例如“红色汽车”，从个别示例 - 比如一辆白色汽车代表“汽车”概念，一颗红色草莓代表“红色”概念 - 是固有具有挑战性的。本文介绍了一种名为Composite Concept Extractor (CoCE)的新方法，它利用传统后门攻击技术来在零样本设置中学习这些复合概念，只需要个别概念的示例。通过重新利用基于触发器的模型后门机制，我们在目标对象（例如“汽车”）的流形中创建了一种战略性扭曲，由具有目标属性（例如“红色”）的示例对象（例如“红色草莓”）引起，确保扭曲选择性地影响具有目标属性的目标对象。然后采用对比学习进一步完善这种扭曲，并制定了一种检测受扭曲影响的对象的方法。通过对不同数据集的深入分析进行了广泛实验，展示了我们提出的方法的实用性和适用性。

更新时间: 2024-06-19 10:02:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13411v1

Structsum Generation for Faster Text Comprehension

We consider the task of generating structured representations of text using large language models (LLMs). We focus on tables and mind maps as representative modalities. Tables are more organized way of representing data, while mind maps provide a visually dynamic and flexible approach, particularly suitable for sparse content. Despite the effectiveness of LLMs on different tasks, we show that current models struggle with generating structured outputs. In response, we present effective prompting strategies for both of these tasks. We introduce a taxonomy of problems around factuality, global and local structure, common to both modalities and propose a set of critiques to tackle these issues resulting in an absolute improvement in accuracy of +37pp (79%) for mind maps and +15pp (78%) for tables. To evaluate semantic coverage of generated structured representations we propose Auto-QA, and we verify the adequacy of Auto-QA using SQuAD dataset. We further evaluate the usefulness of structured representations via a text comprehension user study. The results show a significant reduction in comprehension time compared to text when using table (42.9%) and mind map (31.9%), without loss in accuracy.

Updated: 2024-06-19 09:59:51

标题: 结构摘要生成以加快文本理解速度

摘要: 我们考虑使用大型语言模型（LLMs）生成文本的结构化表示的任务。我们专注于表格和思维导图作为代表性的形式。表格是表示数据的更有组织的方式，而思维导图提供了一种视觉动态和灵活的方法，特别适用于稀疏内容。尽管LLMs在不同任务上的有效性，我们发现当前模型在生成结构化输出方面存在困难。因此，我们提出了针对这两个任务的有效提示策略。我们介绍了围绕事实性、全局和局部结构的问题分类，并提出一系列批评来解决这些问题，从而使思维导图的准确率提高了+37pp（79%），表格提高了+15pp（78%）。为了评估生成的结构化表示的语义覆盖率，我们提出了Auto-QA，并使用SQuAD数据集验证了Auto-QA的适用性。我们进一步通过文本理解用户研究评估了结构化表示的实用性。结果显示，在使用表格（42.9%）和思维导图（31.9%）时，相比于文本，理解时间显著减少，而准确率不受损失。

更新时间: 2024-06-19 09:59:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.06837v2

In-Context Reinforcement Learning for Variable Action Spaces

Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly for some applications. In our work, we show that it is possible to mitigate this issue by proposing the Headless-AD model that, despite being trained only once, is capable of generalizing to discrete action spaces of variable size, semantic content and order. By experimenting with Bernoulli and contextual bandits, as well as a gridworld environment, we show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered, even outperforming specialized models trained for a specific set of actions on several environment configurations. Implementation is available at: https://github.com/corl-team/headless-ad.

Updated: 2024-06-19 09:42:56

标题: 上下文强化学习在可变动作空间中的应用

摘要: 最近，已经显示出在多样化数据集上预训练的变压器可以泛化到新的强化学习任务中。先前提出的模型的一个关键限制是它们依赖于预定义的动作空间大小和结构。引入新的动作空间通常需要重新收集数据和重新训练模型，这对一些应用来说可能是昂贵的。在我们的工作中，我们展示了通过提出Headless-AD模型可以缓解这个问题，尽管只训练一次，它能够泛化到具有可变大小、语义内容和顺序的离散动作空间。通过对伯努利和情境赌博机以及一个网格世界环境进行实验，我们展示了Headless-AD表现出对其从未遇到的动作空间的泛化能力，甚至在几个环境配置上胜过为特定一组动作训练的专门模型。实现可在以下链接找到：https://github.com/corl-team/headless-ad。

更新时间: 2024-06-19 09:42:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.13327v5

VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of the LLM, our VELO framework does not necessitate altering the internal structure of LLM and is broadly applicable to diverse LLMs. Subsequently, building upon the VELO framework, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and devise an algorithm grounded in Multi-Agent Reinforcement Learning (MARL) to decide whether to request the LLM in the cloud or directly return the results from the vector database at the edge. Moreover, to enhance request feature extraction and expedite training, we refine the policy network of MARL and integrate expert demonstrations. Finally, we implement the proposed algorithm within a real edge system. Experimental findings confirm that our VELO framework substantially enhances user satisfaction by concurrently diminishing delay and resource consumption for edge users utilizing LLMs.

Updated: 2024-06-19 09:41:37

标题: VELO：一种基于矢量数据库辅助的云边协作LLM QoS优化框架

摘要: 大型语言模型（LLM）已经获得了显著的流行度，并被广泛应用于各个领域。大多数LLM部署发生在云数据中心，其中它们遇到了相当大的响应延迟并产生了高昂的成本，从而影响了网络边缘的服务质量（QoS）。利用向量数据库缓存在边缘存储LLM请求结果可以大幅度减轻与类似请求相关的响应延迟和成本，这一点先前的研究所忽视。针对这些差距，本文引入了一种新颖的基于向量数据库辅助云边协同LLM QoS优化（VELO）框架。首先，我们提出了VELO框架，巧妙地利用向量数据库在边缘缓存一些LLM请求的结果，以减少后续类似请求的响应时间。与直接优化LLM不同，我们的VELO框架不需要改变LLM的内部结构，可以广泛应用于各种LLM。随后，在VELO框架的基础上，我们将QoS优化问题建模为马尔科夫决策过程（MDP），并设计了一种基于多智能体强化学习（MARL）的算法，用于决定是在云端请求LLM还是直接在边缘从向量数据库返回结果。此外，为了增强请求特征提取并加快训练速度，我们完善了MARL的策略网络并整合了专家演示。最后，我们在一个真实的边缘系统中实现了所提出的算法。实验结果表明，我们的VELO框架极大地提升了用户满意度，同时减少了边缘用户利用LLM时的延迟和资源消耗。

更新时间: 2024-06-19 09:41:37

领域: cs.AI

下载: http://arxiv.org/abs/2406.13399v1

Predicting from a Different Perspective: A Re-ranking Model for Inductive Knowledge Graph Completion

Rule-induction models have demonstrated great power in the inductive setting of knowledge graph completion. In this setting, the models are tested on a knowledge graph entirely composed of unseen entities. These models learn relation patterns as rules by utilizing subgraphs. Providing the same inputs with different rules leads to differences in the model's predictions. In this paper, we focus on the behavior of such models. We propose a re-ranking-based model called ReDistLP (Re-ranking with a Distinct Model for Link Prediction). This model enhances the effectiveness of re-ranking by leveraging the difference in the predictions between the initial retriever and the re-ranker. ReDistLP outperforms the state-of-the-art methods in 2 out of 3 benchmarks.

Updated: 2024-06-19 09:37:53

标题: 从不同角度预测：一种归纳知识图完成的重新排序模型

摘要: 规则归纳模型在知识图完成的归纳设置中表现出了极大的能力。在这种设置中，模型被测试在一个完全由未见实体组成的知识图上。这些模型通过利用子图学习关系模式作为规则。提供相同的输入但不同的规则会导致模型预测的差异。在本文中，我们关注这类模型的行为。我们提出了一个基于重新排名的模型称为ReDistLP（重新排名与链接预测的不同模型）。该模型通过利用初始检索器和重新排名器之间的预测差异来增强重新排名的有效性。ReDistLP在3个基准测试中的2个中胜过了当前最先进的方法。

更新时间: 2024-06-19 09:37:53

领域: cs.LG

下载: http://arxiv.org/abs/2405.16902v2

Unifying Mixed Gas Adsorption in Molecular Sieve Membranes and MOFs using Machine Learning

Recent machine learning models to accurately obtain gas adsorption isotherms focus on polymers or metal-organic frameworks (MOFs) separately. The difficulty in creating a unified model that can predict the adsorption trends in both types of adsorbents is challenging, owing to the diversity in their chemical structures. Moreover, models trained only on single gas adsorption data are incapable of predicting adsorption isotherms for binary gas mixtures. In this work, we address these problems using feature vectors comprising only the physical properties of the gas mixtures and adsorbents. Our model is trained on adsorption isotherms of both single and binary mixed gases inside carbon molecular sieving membrane (CMSM), together with data available from CoRE MOF database. The trained models are capable of accurately predicting the adsorption trends in both classes of materials, for both pure and binary components. ML architecture designed for one class of material, is not suitable for predicting the other class, even after proper training, signifying that the model must be trained jointly for proper predictions and transferability. The model is used to predict with good accuracy the CO2 uptake inside CALF-20 framework. This work opens up a new avenue for predicting complex adsorption processes for gas mixtures in a wide range of materials.

Updated: 2024-06-19 09:30:11

标题: 使用机器学习统一分子筛膜和MOFs中的混合气体吸附

摘要: 最近的机器学习模型专注于准确获取气体吸附等温线，主要关注聚合物或金属有机框架（MOFs）。由于它们化学结构的多样性，创建一个能够预测这两种吸附剂吸附趋势的统一模型是具有挑战性的。此外，仅在单一气体吸附数据上训练的模型无法预测二元气体混合物的吸附等温线。在这项工作中，我们使用仅包含气体混合物和吸附剂的物理特性的特征向量来解决这些问题。我们的模型在碳分子筛膜（CMSM）内的单一和二元混合气体吸附等温线以及来自CoRE MOF数据库的可用数据上进行训练。训练后的模型能够准确预测两类材料中的吸附趋势，无论是纯净还是二元组分。为一类材料设计的ML架构经过适当训练后，也不适合预测另一类材料，这表明模型必须进行联合训练以进行正确的预测和可转移性。该模型被用于预测CALF-20框架内CO2的吸收量，预测精度较高。这项工作为在各种材料中预测气体混合物的复杂吸附过程开辟了新途径。

更新时间: 2024-06-19 09:30:11

领域: cond-mat.soft,cond-mat.mtrl-sci,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2406.13389v1

Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to satisfy "good" properties, such as informativeness, compactness, or modularity, to be interpretable. In this article, we propose an explainable-by-design audio segmentation model based on non-negative matrix factorization (NMF) which is a good candidate for the design of interpretable representations. This paper shows that our model reaches good segmentation performances, and presents deep analyses of the latent representation extracted from the non-negative matrix. The proposed approach opens new perspectives toward the evaluation of interpretable representations according to "good" properties.

Updated: 2024-06-19 09:26:33

标题: 通过非负矩阵分解和探索的可解释性音频分割设计

摘要: 音频分割是许多语音技术的关键任务，大多数技术都是基于神经网络的，通常被视为黑盒子，具有高水平的性能。然而，在许多领域，如健康或法医学，不仅需要良好的性能，还需要关于输出决策的解释。直接从潜在表示中导出的解释需要满足“良好”的属性，如信息量大、紧凑或模块化，以便解释。在本文中，我们提出了一种基于非负矩阵分解（NMF）的可解释设计的音频分割模型，这对于设计可解释的表示来说是一个很好的选择。本文展示了我们的模型达到了良好的分割性能，并对从非负矩阵中提取的潜在表示进行了深入分析。所提出的方法为根据“良好”属性评估可解释表示开辟了新的视角。

更新时间: 2024-06-19 09:26:33

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2406.13385v1

Fredformer: Frequency Debiased Transformer for Time Series Forecasting

The Transformer model has shown leading performance in time series forecasting. Nevertheless, in some complex scenarios, it tends to learn low-frequency features in the data and overlook high-frequency features, showing a frequency bias. This bias prevents the model from accurately capturing important high-frequency data features. In this paper, we undertook empirical analyses to understand this bias and discovered that frequency bias results from the model disproportionately focusing on frequency features with higher energy. Based on our analysis, we formulate this bias and propose Fredformer, a Transformer-based framework designed to mitigate frequency bias by learning features equally across different frequency bands. This approach prevents the model from overlooking lower amplitude features important for accurate forecasting. Extensive experiments show the effectiveness of our proposed approach, which can outperform other baselines in different real-world time-series datasets. Furthermore, we introduce a lightweight variant of the Fredformer with an attention matrix approximation, which achieves comparable performance but with much fewer parameters and lower computation costs. The code is available at: https://github.com/chenzRG/Fredformer

Updated: 2024-06-19 09:25:23

标题: Fredformer：频率校正变压器用于时间序列预测

摘要: Transformer模型在时间序列预测中表现出色。然而，在一些复杂的场景中，它倾向于学习数据中的低频特征并忽视高频特征，表现出频率偏差。这种偏差阻碍了模型准确捕捉重要的高频数据特征。本文进行了实证分析以了解这种偏差，并发现频率偏差是由于模型不成比例地关注具有更高能量的频率特征而产生的。根据我们的分析，我们制定了这种偏差，并提出了Fredformer，这是一种基于Transformer的框架，旨在通过在不同频率段之间均衡学习特征来减轻频率偏差。这种方法防止模型忽视对准确预测至关重要的低振幅特征。大量实验证明了我们提出的方法的有效性，可以在不同的真实时间序列数据集中胜过其他基线方法。此外，我们介绍了Fredformer的轻量级变体，使用注意力矩阵近似，能够实现相当的性能，但参数更少，计算成本更低。代码可在以下链接找到：https://github.com/chenzRG/Fredformer

更新时间: 2024-06-19 09:25:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09009v3

SQL2Circuits: Estimating Metrics for SQL Queries with a Quantum Natural Language Processing Method

In recent years, advances in quantum computing have led to accelerating research on quantum applications across fields. Here, we introduce a quantum machine learning model as a potential solution to the classical question in database research: the estimation of metrics for SQL queries. This work employs a quantum natural language processing (QNLP)-inspired approach for constructing a quantum machine learning model that can classify SQL queries with respect to their cardinalities, costs, and execution times. The model consists of an encoding mechanism and a training phase, including classical and quantum subroutines. The encoding mechanism encodes SQL queries as parametrized quantum circuits. In the training phase, we utilize classical optimization algorithms, such as SPSA and Adam, to optimize the circuit parameters to make predictions about the query metrics. We conclude that our model reaches an accuracy equivalent to that of the QNLP model in the binary classification tasks. Moreover, we extend the previous work by adding 4-class classification tasks and compare the cardinality estimation results to the state-of-the-art databases. We perform a theoretical analysis of the quantum machine learning model by calculating its expressibility and entangling capabilities. The analysis shows that the model has advantageous properties that make it expressible but also not too complex to be executed on the existing quantum hardware.

Updated: 2024-06-19 09:21:44

标题: SQL2Circuits：利用量子自然语言处理方法估计SQL查询的度量指标

摘要: 近年来，量子计算技术的进步推动了各领域量子应用研究的加速发展。在这里，我们介绍了一种量子机器学习模型，作为传统数据库研究中一个潜在的解决方案：SQL查询指标的估计。这项工作采用了受量子自然语言处理（QNLP）启发的方法来构建一个量子机器学习模型，可以根据SQL查询的基数、成本和执行时间对其进行分类。该模型包括编码机制和训练阶段，其中包括经典和量子子程序。编码机制将SQL查询编码为参数化的量子电路。在训练阶段，我们利用经典优化算法，如SPSA和Adam，来优化电路参数以对查询指标进行预测。我们得出结论，我们的模型在二分类任务中达到了与QNLP模型相当的准确性。此外，我们通过添加4类分类任务扩展了先前的工作，并将基数估计结果与最先进的数据库进行了比较。我们通过计算其表达能力和纠缠能力对量子机器学习模型进行了理论分析。分析表明，该模型具有有利的特性，使其具有表现力，但也不会过于复杂以在现有的量子硬件上执行。

更新时间: 2024-06-19 09:21:44

领域: cs.DB,cs.LG,quant-ph

下载: http://arxiv.org/abs/2306.08529v2

The Real Price of Bandit Information in Multiclass Classification

We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}\left(\min \left\{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right\} \right) }$, where $H$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|H|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.

Updated: 2024-06-19 09:20:04

标题: 多类别分类中强盗信息的真实价格

摘要: 我们重新审视了具有强盗反馈的多类别分类的经典问题（Kakade，Shalev-Shwartz和Tewari，2008），在这种情况下，每个输入分类为$K$个可能的标签之一，反馈仅限于预测的标签是否正确。我们的主要研究是关于标签数$K$的依赖性，以及在这种情况下$T$步骤遗憾界是否可以超越现有算法所展示的$\smash{\sqrt{KT}}$依赖性。我们的主要贡献在于展示强盗多类别的最小化遗憾实际上更加微妙，并且具有形式$\smash{\widetilde{\Theta}\left(\min \left\{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right\} \right)}$，其中$H$是底层（有限的）假设类。特别是，我们提出了一种新的强盗分类算法，保证遗憾$\smash{\widetilde{O}(|H|+\sqrt{T})}$，对于中等规模的假设类来说，这比传统算法有所改进，并提供一个匹配的下界，建立了所有参数区域内上界的紧密性（最多对数因子）。

更新时间: 2024-06-19 09:20:04

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.10027v2

Efficient Offline Reinforcement Learning: The Critic is Critical

Recent work has demonstrated both benefits and limitations from using supervised approaches (without temporal-difference learning) for offline reinforcement learning. While off-policy reinforcement learning provides a promising approach for improving performance beyond supervised approaches, we observe that training is often inefficient and unstable due to temporal difference bootstrapping. In this paper we propose a best-of-both approach by first learning the behavior policy and critic with supervised learning, before improving with off-policy reinforcement learning. Specifically, we demonstrate improved efficiency by pre-training with a supervised Monte-Carlo value-error, making use of commonly neglected downstream information from the provided offline trajectories. We find that we are able to more than halve the training time of the considered offline algorithms on standard benchmarks, and surprisingly also achieve greater stability. We further build on the importance of having consistent policy and value functions to propose novel hybrid algorithms, TD3+BC+CQL and EDAC+BC, that regularize both the actor and the critic towards the behavior policy. This helps to more reliably improve on the behavior policy when learning from limited human demonstrations. Code is available at https://github.com/AdamJelley/EfficientOfflineRL

Updated: 2024-06-19 09:16:38

标题: 高效的离线强化学习：评论家至关重要

摘要: 最近的研究表明，使用有监督方法（不使用时差学习）进行离线强化学习既有好处又有局限性。虽然离线策略学习提供了一个有希望的方法来提高性能，但由于时差引导，我们观察到训练通常是低效且不稳定的。在本文中，我们提出了一种最佳方法，首先通过有监督学习来学习行为策略和评论家，然后再利用离线策略学习进行改进。具体来说，我们通过预训练使用有监督的蒙特卡洛值误差，利用提供的离线轨迹中常被忽略的下游信息，展示了改进的效率。我们发现，我们能够在标准基准测试中将考虑的离线算法的训练时间减少一半以上，并且令人惊讶地也实现了更大的稳定性。我们进一步强调了保持一致的策略和值函数的重要性，提出了新颖的混合算法TD3+BC+CQL和EDAC+BC，对演员和评论家都进行了向行为策略正则化。这有助于更可靠地改进从有限人类示范中学习的行为策略。代码可在https://github.com/AdamJelley/EfficientOfflineRL找到。

更新时间: 2024-06-19 09:16:38

领域: cs.LG

下载: http://arxiv.org/abs/2406.13376v1

Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-connectivity. Extensive experiments across open-domain and industrial scenarios demonstrate that Thread outperforms existing data organization paradigms in RAG-based QA systems, significantly improving the handling of how-to questions.

Updated: 2024-06-19 09:14:41

标题: 主题：基于逻辑的数据组织范式用于检索增强生成的How-To问题回答

摘要: 目前的问题回答系统利用检索增强生成在回答事实问题方面表现良好，但在回答非事实问题方面面临挑战，特别是需要详细的逐步说明和解释的操作性查询。在本文中，我们介绍了Thread，这是一种新颖的数据组织范式，它根据文档之间的互连性将文档转化为逻辑单元。在开放领域和工业场景中进行的大量实验表明，Thread在基于RAG的QA系统中优于现有的数据组织范式，显著改善了如何问题的处理能力。

更新时间: 2024-06-19 09:14:41

领域: cs.AI

下载: http://arxiv.org/abs/2406.13372v1

Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment

Causal models provide rich descriptions of complex systems as sets of mechanisms by which each variable is influenced by its direct causes. They support reasoning about manipulating parts of the system and thus hold promise for addressing some of the open challenges of artificial intelligence (AI), such as planning, transferring knowledge in changing environments, or robustness to distribution shifts. However, a key obstacle to more widespread use of causal models in AI is the requirement that the relevant variables be specified a priori, which is typically not the case for the high-dimensional, unstructured data processed by modern AI systems. At the same time, machine learning (ML) has proven quite successful at automatically extracting useful and compact representations of such complex data. Causal representation learning (CRL) aims to combine the core strengths of ML and causality by learning representations in the form of latent variables endowed with causal model semantics. In this thesis, we study and present new results for different CRL settings. A central theme is the question of identifiability: Given infinite data, when are representations satisfying the same learning objective guaranteed to be equivalent? This is an important prerequisite for CRL, as it formally characterises if and when a learning task is, at least in principle, feasible. Since learning causal models, even without a representation learning component, is notoriously difficult, we require additional assumptions on the model class or rich data beyond the classical i.i.d. setting. By partially characterising identifiability for different settings, this thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations. Ideally, the developed insights can help inform data collection practices or inspire the design of new practical estimation methods.

Updated: 2024-06-19 09:14:40

标题: 可识别的因果关系表示学习：无监督、多视角和多环境

摘要: 因果模型提供了对复杂系统的丰富描述，将其看作一组机制，每个变量受其直接原因的影响。它支持对系统部分的操纵进行推理，因此有望解决人工智能（AI）的一些未解之谜，如规划、在不断变化的环境中转移知识，或对分布转移的鲁棒性。然而，在AI中更广泛使用因果模型的一个关键障碍是要求相关变量事先指定，而这通常不适用于现代AI系统处理的高维度、非结构化数据。同时，机器学习（ML）已经被证明在自动提取这种复杂数据的有用和紧凑表示方面非常成功。因果表征学习（CRL）旨在结合ML和因果之间的核心优势，通过学习具有因果模型语义的潜在变量表示。在这篇论文中，我们研究和呈现了不同CRL设置的新结果。一个中心主题是可识别性问题：在无限数据的情况下，何时可以保证满足相同学习目标的表示是等效的？这是CRL的一个重要前提条件，因为它正式地表明何时何地学习任务至少在原则上是可行的。由于学习因果模型，即使没有表示学习组件，也是非常困难的，我们需要对模型类别或丰富数据进行额外假设，超越经典的i.i.d.设置。通过部分表征不同设置的可识别性，这篇论文探讨了CRL在没有直接监督的情况下可能做到什么，从而有助于其理论基础。理想情况下，这些洞察力的发展可以帮助指导数据收集实践，或激发新的实际估计方法的设计。

更新时间: 2024-06-19 09:14:40

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.13371v1

Effective Edge-wise Representation Learning in Edge-Attributed Bipartite Graphs

Graph representation learning (GRL) is to encode graph elements into informative vector representations, which can be used in downstream tasks for analyzing graph-structured data and has seen extensive applications in various domains. However, the majority of extant studies on GRL are geared towards generating node representations, which cannot be readily employed to perform edge-based analytics tasks in edge-attributed bipartite graphs (EABGs) that pervade the real world, e.g., spam review detection in customer-product reviews and identifying fraudulent transactions in user-merchant networks. Compared to node-wise GRL, learning edge representations (ERL) on such graphs is challenging due to the need to incorporate the structure and attribute semantics from the perspective of edges while considering the separate influence of two heterogeneous node sets U and V in bipartite graphs. To our knowledge, despite its importance, limited research has been devoted to this frontier, and existing workarounds all suffer from sub-par results. Motivated by this, this paper designs EAGLE, an effective ERL method for EABGs. Building on an in-depth and rigorous theoretical analysis, we propose the factorized feature propagation (FFP) scheme for edge representations with adequate incorporation of long-range dependencies of edges/features without incurring tremendous computation overheads. We further ameliorate FFP as a dual-view FFP by taking into account the influences from nodes in U and V severally in ERL. Extensive experiments on 5 real datasets showcase the effectiveness of the proposed EAGLE models in semi-supervised edge classification tasks. In particular, EAGLE can attain a considerable gain of at most 38.11% in AP and 1.86% in AUC when compared to the best baselines.

Updated: 2024-06-19 09:11:03

标题: 在边属性的双分图中的有效边表示学习

摘要: 图表示学习（GRL）是将图元素编码为信息向量表示，可用于下游任务中分析图结构数据，在各个领域都得到了广泛应用。然而，目前大多数关于GRL的研究都是针对生成节点表示，这不能直接应用于在真实世界中普遍存在的具有边属性的二部图（EABGs）中执行基于边的分析任务，例如在客户-产品评论中检测垃圾评论和在用户-商户网络中识别欺诈交易。与节点级GRL相比，在这种图上学习边表示（ERL）是具有挑战性的，因为需要从边的角度考虑结构和属性语义，同时考虑二部图中两个异构节点集U和V的独立影响。据我们所知，尽管其重要性，目前对这一领域的研究有限，现有的解决方法都存在结果不理想的问题。受此启发，本文设计了一种名为EAGLE的有效ERL方法，用于EABGs。基于深入和严密的理论分析，我们提出了分解特征传播（FFP）方案，用于边表示，充分考虑了边/特征的长程依赖性，而没有带来巨大的计算开销。我们进一步改进了FFP，将其作为双视图FFP，分别考虑了ERL中U和V节点的影响。对5个真实数据集的广泛实验展示了提出的EAGLE模型在半监督边分类任务中的有效性。特别是，与最佳基线相比，EAGLE在AP和AUC上可获得最多38.11%和1.86%的显著收益。

更新时间: 2024-06-19 09:11:03

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2406.13369v1

PPT-GNN: A Practical Pre-Trained Spatio-Temporal Graph Neural Network for Network Security

Recent works have demonstrated the potential of Graph Neural Networks (GNN) for network intrusion detection. Despite their advantages, a significant gap persists between real-world scenarios, where detection speed is critical, and existing proposals, which operate on large graphs representing several hours of traffic. This gap results in unrealistic operational conditions and impractical detection delays. Moreover, existing models do not generalize well across different networks, hampering their deployment in production environments. To address these issues, we introduce PPTGNN, a practical spatio-temporal GNN for intrusion detection. PPTGNN enables near real-time predictions, while better capturing the spatio-temporal dynamics of network attacks. PPTGNN employs self-supervised pre-training for improved performance and reduced dependency on labeled data. We evaluate PPTGNN on three public datasets and show that it significantly outperforms state-of-the-art models, such as E-ResGAT and E-GraphSAGE, with an average accuracy improvement of 10.38%. Finally, we show that a pre-trained PPTGNN can easily be fine-tuned to unseen networks with minimal labeled examples. This highlights the potential of PPTGNN as a general, large-scale pre-trained model that can effectively operate in diverse network environments.

Updated: 2024-06-19 09:09:46

标题: PPT-GNN：一种用于网络安全的实用预训练时空图神经网络

摘要: 最近的研究表明，图神经网络（GNN）在网络入侵检测方面具有巨大潜力。尽管有其优势，但现实场景中检测速度至关重要，而现有的方案则是基于代表数小时流量的大型图进行操作，导致存在实际操作条件不切实际和检测延迟过长的巨大差距。此外，现有模型在不同网络之间泛化能力有限，阻碍了它们在生产环境中的部署。为了解决这些问题，我们引入了PPTGNN，一种适用于入侵检测的实用时空GNN。PPTGNN能够实现几乎实时的预测，同时更好地捕捉网络攻击的时空动态。PPTGNN采用自监督预训练以提高性能并降低对有标签数据的依赖性。我们在三个公共数据集上评估了PPTGNN，并显示它在准确性方面明显优于E-ResGAT和E-GraphSAGE等最先进模型，平均准确性改进达到10.38％。最后，我们展示了预训练的PPTGNN可以轻松地通过最少的标记示例进行微调，突显了PPTGNN作为一种通用的、大规模预训练模型的潜力，可以有效地在不同网络环境中运行。

更新时间: 2024-06-19 09:09:46

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.13365v1

DCS Chain: A Flexible Private Blockchain System

Blockchain technology has seen tremendous development over the past few years. Despite the emergence of numerous blockchain systems, they all suffer from various limitations, which can all be attributed to the fundamental issue posed by the DCS trilemma. In light of this, this work introduces a novel private blockchain system named DCS Chain. The core idea is to quantify the DCS metrics and dynamically adjust the blockchain's performance across these three dimensions, to achieve theoretically optimal system performance. Overall, our system provides a comprehensive suite of blockchain essentials, including DCS quantification, consensus protocol adjustment, and communication network simulation.

Updated: 2024-06-19 09:09:27

标题: DCS链：一种灵活的私有区块链系统

摘要: 区块链技术在过去几年取得了巨大发展。尽管出现了许多区块链系统，它们都受到各种限制的影响，这些限制都可以归因于DCS三难问题。鉴于此，本文介绍了一种名为DCS Chain的新型私有区块链系统。其核心思想是量化DCS指标，并动态调整区块链在这三个维度上的性能，以实现理论上的最佳系统性能。总体而言，我们的系统提供了一套完整的区块链基础设施，包括DCS量化、共识协议调整和通信网络模拟。

更新时间: 2024-06-19 09:09:27

领域: cs.CR

下载: http://arxiv.org/abs/2406.12376v2

VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models

Visual Language Models (VLMs) have rapidly progressed with the recent success of large language models. However, there have been few attempts to incorporate efficient linear Recurrent Neural Networks (RNNs) architectures into VLMs. In this study, we introduce VisualRWKV, the first application of a linear RNN model to multimodal learning tasks, leveraging the pre-trained RWKV language model. We propose a data-dependent recurrence and sandwich prompts to enhance our modeling capabilities, along with a 2D image scanning mechanism to enrich the processing of visual sequences. Extensive experiments demonstrate that VisualRWKV achieves competitive performance compared to Transformer-based models like LLaVA-1.5 on various benchmarks. To facilitate further research and analysis, we have made the checkpoints and the associated code publicly accessible at the following GitHub repository: \href{https://github.com/howard-hou/VisualRWKV}{https://github.com/howard-hou/VisualRWKV}.

Updated: 2024-06-19 09:07:31

标题: VisualRWKV: 探索用于视觉语言模型的循环神经网络

摘要: 视觉语言模型（VLMs）随着大型语言模型的最近成功而迅速发展。然而，很少有尝试将高效的线性循环神经网络（RNNs）架构整合到VLMs中。在这项研究中，我们介绍了VisualRWKV，这是将线性RNN模型应用于多模态学习任务的第一个应用，利用了预训练的RWKV语言模型。我们提出了一种数据相关的循环和夹心提示，以增强我们的建模能力，以及一种二维图像扫描机制，以丰富视觉序列的处理。大量实验证明，与基于Transformer的模型（如LLaVA-1.5）相比，VisualRWKV在各种基准测试中取得了竞争性表现。为了促进进一步的研究和分析，我们已经将检查点和相关代码公开放置在以下GitHub存储库中：\href{https://github.com/howard-hou/VisualRWKV}{https://github.com/howard-hou/VisualRWKV}。

更新时间: 2024-06-19 09:07:31

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13362v1

Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching

Code-switching is a data augmentation scheme mixing words from multiple languages into source lingual text. It has achieved considerable generalization performance of cross-lingual transfer tasks by aligning cross-lingual contextual word representations. However, uncontrolled and over-replaced code-switching would augment dirty samples to model training. In other words, the excessive code-switching text samples will negatively hurt the models' cross-lingual transferability. To this end, we propose a Progressive Code-Switching (PCS) method to gradually generate moderately difficult code-switching examples for the model to discriminate from easy to hard. The idea is to incorporate progressively the preceding learned multilingual knowledge using easier code-switching data to guide model optimization on succeeding harder code-switching data. Specifically, we first design a difficulty measurer to measure the impact of replacing each word in a sentence based on the word relevance score. Then a code-switcher generates the code-switching data of increasing difficulty via a controllable temperature variable. In addition, a training scheduler decides when to sample harder code-switching data for model training. Experiments show our model achieves state-of-the-art results on three different zero-shot cross-lingual transfer tasks across ten languages.

Updated: 2024-06-19 09:06:24

标题: 通过渐进式代码切换改善零-shot 跨语言转移

摘要: 代码切换是一种数据增强方案，将多种语言的单词混合到源语言文本中。通过对齐跨语言上下文词表示，它已经实现了可观的跨语言转移任务的泛化性能。然而，不受控制和过度替换的代码切换会增加脏样本以模拟训练。换句话说，过度的代码切换文本样本将负面影响模型的跨语言可转移性。为此，我们提出了一种渐进式代码切换（PCS）方法，逐渐生成中等难度的代码切换示例，以便模型从易到难进行区分。这个想法是逐渐将先前学习的多语言知识整合进来，使用更简单的代码切换数据指导模型在接下来更难的代码切换数据上的优化。具体而言，我们首先设计了一个难度测量器，根据单词相关性评分来衡量替换句子中每个单词的影响。然后，一个代码切换器通过可控的温度变量生成难度逐渐增加的代码切换数据。此外，一个训练调度程序决定何时对模型进行更难的代码切换数据采样。实验表明，我们的模型在十种不同语言上的三种零样本跨语言转移任务上实现了最先进的结果。

更新时间: 2024-06-19 09:06:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13361v1

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To address the above problems, we propose a new benchmark, called MLVU (Multi-task Long Video Understanding Benchmark), for the comprehensive and in-depth evaluation of LVU. MLVU presents the following critical values: 1) The substantial and flexible extension of video lengths, which enables the benchmark to evaluate LVU performance across a wide range of durations. 2) The inclusion of various video genres, e.g., movies, surveillance footage, egocentric videos, cartoons, game videos, etc., which reflects the models' LVU performances in different scenarios. 3) The development of diversified evaluation tasks, which enables a comprehensive examination of MLLMs' key abilities in long-video understanding. The empirical study with 20 latest MLLMs reveals significant room for improvement in today's technique, as all existing methods struggle with most of the evaluation tasks and exhibit severe performance degradation when handling longer videos. Additionally, it suggests that factors such as context length, image-understanding quality, and the choice of LLM backbone can play critical roles in future advancements. We anticipate that MLVU will advance the research of long video understanding by providing a comprehensive and in-depth analysis of MLLMs.

Updated: 2024-06-19 09:04:38

标题: MLVU：多任务长视频理解的综合基准

摘要: 对长视频理解（LVU）性能的评估构成一个重要但具有挑战性的研究问题。尽管之前有过努力，但现有的视频理解基准受到几个问题的严重限制，特别是视频长度不足、视频类型和评估任务缺乏多样性，以及不适合评估LVU性能。为了解决上述问题，我们提出了一个新的基准，称为MLVU（多任务长视频理解基准），用于全面深入评估LVU。MLVU提供以下关键价值：1）视频长度的大幅灵活延伸，使得基准能够评估跨广泛持续时间范围内的LVU性能。2）包含各种视频类型，例如电影、监控录像、自拍视频、卡通、游戏视频等，反映了模型在不同场景中的LVU性能。3）开发多样化的评估任务，使得对MLLM的长视频理解关键能力进行全面检验。对最新的20个MLLM进行的实证研究揭示了当今技术还有很大的改进空间，因为所有现有方法在大多数评估任务上都面临困难，并且在处理更长的视频时表现出严重的性能降级。此外，研究还表明，诸如上下文长度、图像理解质量和LLM骨干选择等因素在未来进展中可能起着关键作用。我们预计MLVU将通过提供对MLLM的全面深入分析推动长视频理解研究的发展。

更新时间: 2024-06-19 09:04:38

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.04264v2

Jogging the Memory of Unlearned Model Through Targeted Relearning Attack

Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to reverse the effects of unlearning. We formalize this unlearning-relearning pipeline, explore the attack across three popular unlearning benchmarks, and discuss future directions and guidelines that result from our study.

Updated: 2024-06-19 09:03:21

标题: 通过有针对性的重学习攻击激活未学习模型的记忆

摘要: 机器遗忘是减轻机器学习模型中训练数据不良记忆的一种有前途的方法。然而，在这项工作中，我们发现现有的LLM中的遗忘方法出奇地容易受到一组简单的有针对性的重新学习攻击。通过仅访问一小部分可能与之松散相关的数据，我们发现我们可以“激活”被遗忘模型的记忆，以扭转遗忘的效果。我们形式化了这种遗忘-重新学习管道，探索了这种攻击在三个流行的遗忘基准上的影响，并讨论了我们研究结果带来的未来方向和指导方针。

更新时间: 2024-06-19 09:03:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.13356v1

LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs

Knowledge Graph (KG) inductive reasoning, which aims to infer missing facts from new KGs that are not seen during training, has been widely adopted in various applications. One critical challenge of KG inductive reasoning is handling low-resource scenarios with scarcity in both textual and structural aspects. In this paper, we attempt to address this challenge with Large Language Models (LLMs). Particularly, we utilize the state-of-the-art LLMs to generate a graph-structural prompt to enhance the pre-trained Graph Neural Networks (GNNs), which brings us new methodological insights into the KG inductive reasoning methods, as well as high generalizability in practice. On the methodological side, we introduce a novel pretraining and prompting framework ProLINK, designed for low-resource inductive reasoning across arbitrary KGs without requiring additional training. On the practical side, we experimentally evaluate our approach on 36 low-resource KG datasets and find that ProLINK outperforms previous methods in three-shot, one-shot, and zero-shot reasoning tasks, exhibiting average performance improvements by 20%, 45%, and 147%, respectively. Furthermore, ProLINK demonstrates strong robustness for various LLM promptings as well as full-shot scenarios.

Updated: 2024-06-19 09:00:53

标题: LLM作为提示器：对任意知识图进行低资源归纳推理

摘要: 知识图谱（KG）归纳推理旨在推断训练过程中未见的新KG中的缺失事实，在各种应用中被广泛采用。KG归纳推理的一个关键挑战是处理在文本和结构方面都稀缺的低资源情景。本文试图利用大型语言模型（LLMs）来解决这一挑战。特别地，我们利用最先进的LLMs生成一个图结构提示，以增强预训练的图神经网络（GNNs），为我们带来了新的方法论见解，并在实践中具有高通用性。在方法论方面，我们引入了一个新颖的预训练和提示框架ProLINK，旨在跨任意KG的低资源归纳推理，无需额外训练。在实践方面，我们在36个低资源KG数据集上对我们的方法进行了实验评估，并发现ProLINK在三次、一次和零次推理任务中的表现优于先前的方法，分别表现出平均性能提升20％、45％和147％。此外，ProLINK展示了在各种LLM提示和全射击场景中的强大鲁棒性。

更新时间: 2024-06-19 09:00:53

领域: cs.AI,cs.CL,cs.SI

下载: http://arxiv.org/abs/2402.11804v3

An Integration of policy and reputation based trust mechanisms

Due to popularization of internet and e-commerce, more and more people getting involved in online shopping market. A large number of companies have been transferred to the internet where online customers have been increased due to easy access. The online business facilitates people to communicate without knowing each other. The e-commerce systems are the combination of commerce behavior and internet technologies. Therefore, trust aspects are positive elements in buyer-seller transactions and a potential source of competitive e-commerce industry. There are two different approaches to handle the trust. The first approach has a solid authentication set of rules where decisions are made on some digital or logical rules called policy based trust mechanism. The second approach is a decentralized trust approach where reputation assembled and shared in distributed environment called reputation based trust mechanism. Objectives: In this thesis, the strengths and weaknesses of policy and reputation based trust mechanisms have been identified through systematic literature review and industrial interviews. Furthermore, the process of integrated trust mechanism has been proposed. The integrated trust mechanism is proposed through mapping process, weakness of one mechanism with the strength of other. The proposed integrated trust mechanism was validated by conducting experiment with buyer/seller scenario in auction system. The analysis of collected results indicated that proposed integrated trust mechanism improved the trust of buyer against eBay and Tradera. At the end, we have discussed some key points that may affect trust relationship between seller and buyer. Furthermore, there is a need for further validation of proposed trust mechanism in auction system/e-commerce industry.

Updated: 2024-06-19 08:57:05

标题: 基于政策和声誉的信任机制的整合

摘要: 由于互联网和电子商务的普及，越来越多的人参与在线购物市场。许多公司已转移到互联网，在线客户数量增加了，因为访问变得更加容易。在线业务使人们能够在不了解对方的情况下进行沟通。电子商务系统是商业行为和互联网技术的结合。因此，信任方面是买卖双方交易中的积极元素，也是电子商务行业的潜在竞争来源。处理信任有两种不同的方法。第一种方法具有一套坚实的认证规则，决策基于一些称为基于政策的信任机制的数字或逻辑规则。第二种方法是去中心化信任方法，其中声誉在分布环境中汇集和共享，称为基于声誉的信任机制。目标：在这篇论文中，通过系统文献综述和行业采访，确定了基于政策和声誉的信任机制的优势和劣势。此外，提出了集成信任机制的过程。通过映射过程提出了集成信任机制，将一个机制的弱点与另一个机制的优势相结合。通过在拍卖系统中进行买卖方案实验验证了所提出的集成信任机制。收集结果的分析表明，所提出的集成信任机制提高了买家对eBay和Tradera的信任。最后，我们讨论了可能影响卖家和买家之间信任关系的一些关键点。此外，需要进一步验证在拍卖系统/电子商务行业中提出的信任机制。

更新时间: 2024-06-19 08:57:05

领域: cs.CR,cs.SI

下载: http://arxiv.org/abs/2406.15498v1

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

AI agents aim to solve complex tasks by combining text-based reasoning with external tool calls. Unfortunately, AI agents are vulnerable to prompt injection attacks where data returned by external tools hijacks the agent to execute malicious tasks. To measure the adversarial robustness of AI agents, we introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data. To capture the evolving nature of attacks and defenses, AgentDojo is not a static test suite, but rather an extensible environment for designing and evaluating new agent tasks, defenses, and adaptive attacks. We populate the environment with 97 realistic tasks (e.g., managing an email client, navigating an e-banking website, or making travel bookings), 629 security test cases, and various attack and defense paradigms from the literature. We find that AgentDojo poses a challenge for both attacks and defenses: state-of-the-art LLMs fail at many tasks (even in the absence of attacks), and existing prompt injection attacks break some security properties but not all. We hope that AgentDojo can foster research on new design principles for AI agents that solve common tasks in a reliable and robust manner. We release the code for AgentDojo at https://github.com/ethz-spylab/agentdojo.

Updated: 2024-06-19 08:55:56

标题: AgentDojo：用于评估LLM代理的攻击和防御的动态环境

摘要: AI代理通过将基于文本的推理与外部工具调用相结合来解决复杂任务。不幸的是，AI代理容易受到提示注入攻击的影响，即外部工具返回的数据劫持代理执行恶意任务。为了衡量AI代理的对抗鲁棒性，我们引入了AgentDojo，一个用于评估代理在未受信任数据上执行工具的框架。为了捕捉攻击和防御的演变性质，AgentDojo不是一个静态的测试套件，而是一个可扩展的环境，用于设计和评估新的代理任务、防御和自适应攻击。我们在环境中填充了97个现实任务（例如管理电子邮件客户端、浏览电子银行网站或进行旅行预订）、629个安全测试用例以及来自文献的各种攻击和防御范式。我们发现AgentDojo对攻击和防御都构成挑战：最先进的LLMs在许多任务上失败（即使在没有攻击的情况下），而现有的提示注入攻击破坏了一些安全属性，但并非全部。我们希望AgentDojo能促进对解决常见任务的AI代理的新设计原则的研究，使其具有可靠和强大的方式。我们在https://github.com/ethz-spylab/agentdojo上发布了AgentDojo的代码。

更新时间: 2024-06-19 08:55:56

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.13352v1

A Resource-Adaptive Approach for Federated Learning under Resource-Constrained Environments

The paper studies a fundamental federated learning (FL) problem involving multiple clients with heterogeneous constrained resources. Compared with the numerous training parameters, the computing and communication resources of clients are insufficient for fast local training and real-time knowledge sharing. Besides, training on clients with heterogeneous resources may result in the straggler problem. To address these issues, we propose Fed-RAA: a Resource-Adaptive Asynchronous Federated learning algorithm. Different from vanilla FL methods, where all parameters are trained by each participating client regardless of resource diversity, Fed-RAA adaptively allocates fragments of the global model to clients based on their computing and communication capabilities. Each client then individually trains its assigned model fragment and asynchronously uploads the updated result. Theoretical analysis confirms the convergence of our approach. Additionally, we design an online greedy-based algorithm for fragment allocation in Fed-RAA, achieving fairness comparable to an offline strategy. We present numerical results on MNIST, CIFAR-10, and CIFAR-100, along with necessary comparisons and ablation studies, demonstrating the advantages of our work. To the best of our knowledge, this paper represents the first resource-adaptive asynchronous method for fragment-based FL with guaranteed theoretical convergence.

Updated: 2024-06-19 08:55:40

标题: 一种适应资源受限环境下的联邦学习资源自适应方法

摘要: 这篇论文研究了涉及具有异构受限资源的多个客户的基本联邦学习（FL）问题。与众多训练参数相比，客户的计算和通信资源不足以进行快速本地训练和实时知识共享。此外，在具有异构资源的客户上进行训练可能会导致滞后问题。为了解决这些问题，我们提出了Fed-RAA：一种资源自适应异步联邦学习算法。与传统的FL方法不同，其中所有参数都由每个参与客户进行训练，而不考虑资源的多样性，Fed-RAA根据客户的计算和通信能力自适应地将全局模型的片段分配给客户。然后，每个客户单独训练其分配的模型片段，并异步上传更新后的结果。理论分析证实了我们方法的收敛性。此外，我们设计了一种基于贪心算法的在线片段分配算法，实现了与离线策略相当的公平性。我们在MNIST、CIFAR-10和CIFAR-100上呈现了数值结果，以及必要的比较和消融研究，展示了我们工作的优势。据我们所知，这篇论文代表了具有保证的理论收敛的基于片段的FL的第一个资源自适应异步方法。

更新时间: 2024-06-19 08:55:40

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2406.13351v1

Textual Unlearning Gives a False Sense of Unlearning

Language models (LMs) are susceptible to "memorizing" training data, including a large amount of private or copyright-protected content. To safeguard the right to be forgotten (RTBF), machine unlearning has emerged as a promising method for LMs to efficiently "forget" sensitive training content and mitigate knowledge leakage risks. However, despite its good intentions, could the unlearning mechanism be counterproductive? In this paper, we propose the Textual Unlearning Leakage Attack (TULA), where an adversary can infer information about the unlearned data only by accessing the models before and after unlearning. Furthermore, we present variants of TULA in both black-box and white-box scenarios. Through various experimental results, we critically demonstrate that machine unlearning amplifies the risk of knowledge leakage from LMs. Specifically, TULA can increase an adversary's ability to infer membership information about the unlearned data by more than 20% in black-box scenario. Moreover, TULA can even reconstruct the unlearned data directly with more than 60% accuracy with white-box access. Our work is the first to reveal that machine unlearning in LMs can inversely create greater knowledge risks and inspire the development of more secure unlearning mechanisms.

Updated: 2024-06-19 08:51:54

标题: 文本去学习会给人一种虚假的去学习感觉

摘要: 语言模型（LMs）容易“记忆”训练数据，包括大量私人或受版权保护的内容。为了维护被遗忘权利（RTBF），机器遗忘已经成为一种有效的方法，使LMs能够高效地“遗忘”敏感的训练内容并减轻知识泄露风险。然而，尽管其初衷良好，遗忘机制可能会产生适得其反的效果吗？在本文中，我们提出了文本遗忘泄漏攻击（TULA），其中对手可以仅通过访问遗忘前后的模型来推断有关已遗忘数据的信息。此外，我们在黑盒和白盒场景中提出了TULA的变体。通过各种实验结果，我们批判性地证明了机器遗忘增加了来自LMs的知识泄露风险。具体地，TULA可以在黑盒场景中使对手推断有关已遗忘数据的成员信息的能力增加超过20％。此外，即使是在白盒访问下，TULA也可以以超过60％的准确性直接重建已遗忘的数据。我们的工作是首次揭示了LMs中的机器遗忘可能反过来产生更大的知识风险，并激发了更安全的遗忘机制的开发。

更新时间: 2024-06-19 08:51:54

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13348v1

Transferring climate change knowledge

Accurate and precise climate projections are required for climate adaptation and mitigation, but Earth system models still exhibit great uncertainties. Several approaches have been developed to reduce the spread of climate projections and feedbacks, yet those methods cannot capture the non-linear complexity inherent in the climate system. Using a Transfer Learning approach, we show that Machine Learning can be used to optimally leverage and merge the knowledge gained from Earth system models simulations and historical observations to more accurately project global surface air temperature fields in the 21st century. We reach an uncertainty reduction of more than 50% with respect to state-of-the-art approaches. We give evidence that our novel method provides narrower projection uncertainty together with more accurate mean climate projections, urgently required for climate adaptation.

Updated: 2024-06-19 08:50:50

标题: 转移气候变化知识

摘要: 准确和精确的气候预测对于气候适应和减缓至关重要，但地球系统模型仍存在很大的不确定性。已经开发了几种方法来减少气候预测和反馈的差异，然而这些方法无法捕捉气候系统固有的非线性复杂性。使用迁移学习方法，我们展示了机器学习可以用于最佳地利用和合并从地球系统模型模拟和历史观测中获得的知识，以更准确地预测21世纪全球地表气温场。与最先进的方法相比，我们实现了超过50%的不确定性减少。我们提供证据表明，我们的新方法提供了更窄的预测不确定性，以及更准确的平均气候预测，这对于气候适应是迫切需要的。

更新时间: 2024-06-19 08:50:50

领域: physics.ao-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.14780v4

ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models

The recent advancements in large language models (LLMs) have brought significant progress in solving NLP tasks. Notably, in-context learning (ICL) is the key enabling mechanism for LLMs to understand specific tasks and grasping nuances. In this paper, we propose a simple yet effective method to contextualize a task toward a specific LLM, by (1) observing how a given LLM describes (all or a part of) target datasets, i.e., open-ended zero-shot inference, and (2) aggregating the open-ended inference results by the LLM, and (3) finally incorporate the aggregated meta-information for the actual task. We show the effectiveness of this approach in text clustering tasks, and also highlight the importance of the contextualization through examples of the above procedure.

Updated: 2024-06-19 08:48:05

标题: ZeroDL：通过大型语言模型进行零样本分布学习以用于文本聚类

摘要: 最近大型语言模型（LLMs）的进步在解决自然语言处理（NLP）任务上取得了显著进展。值得注意的是，上下文学习（ICL）是LLMs理解特定任务和抓住细微差别的关键启用机制。在本文中，我们提出了一种简单而有效的方法，通过（1）观察给定LLM如何描述（全部或部分）目标数据集，即开放式零样本推理，（2）通过LLM聚合开放式推理结果，并（3）最终将聚合的元信息纳入实际任务中，来使任务朝向特定LLM的上下文化。我们展示了这种方法在文本聚类任务中的有效性，并通过上述过程的示例突出了上下文化的重要性。

更新时间: 2024-06-19 08:48:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13342v1

Medical Spoken Named Entity Recognition

Spoken Named Entity Recognition (NER) aims to extracting named entities from speech and categorizing them into types like person, location, organization, etc. In this work, we present VietMed-NER - the first spoken NER dataset in the medical domain. To our best knowledge, our real-world dataset is the largest spoken NER dataset in the world in terms of the number of entity types, featuring 18 distinct types. Secondly, we present baseline results using various state-of-the-art pre-trained models: encoder-only and sequence-to-sequence. We found that pre-trained multilingual models XLM-R outperformed all monolingual models on both reference text and ASR output. Also in general, encoders perform better than sequence-to-sequence models for the NER task. By simply translating, the transcript is applicable not just to Vietnamese but to other languages as well. All code, data and models are made publicly available here: https://github.com/leduckhai/MultiMed

Updated: 2024-06-19 08:39:09

标题: 医学口语命名实体识别

摘要: 口语命名实体识别（NER）旨在从语音中提取命名实体并将其分类为人员、位置、组织等类型。在这项工作中，我们提出了VietMed-NER - 这是医学领域中第一个口语NER数据集。据我们所知，我们的实际数据集是世界上最大的口语NER数据集，涵盖了18种不同类型的实体。其次，我们使用各种最先进的预训练模型（仅编码器和序列到序列）提供了基准结果。我们发现，预训练多语言模型XLM-R在参考文本和ASR输出上优于所有单语模型。此外，总体而言，编码器对NER任务的表现优于序列到序列模型。通过简单的翻译，这份文稿不仅适用于越南语，还适用于其他语言。所有代码、数据和模型均在此处公开提供：https://github.com/leduckhai/MultiMed

更新时间: 2024-06-19 08:39:09

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.13337v1

Understanding Understanding: A Pragmatic Framework Motivated by Large Language Models

Motivated by the rapid ascent of Large Language Models (LLMs) and debates about the extent to which they possess human-level qualities, we propose a framework for testing whether any agent (be it a machine or a human) understands a subject matter. In Turing-test fashion, the framework is based solely on the agent's performance, and specifically on how well it answers questions. Elements of the framework include circumscribing the set of questions (the "scope of understanding"), requiring general competence ("passing grade"), avoiding "ridiculous answers", but still allowing wrong and "I don't know" answers to some questions. Reaching certainty about these conditions requires exhaustive testing of the questions which is impossible for nontrivial scopes, but we show how high confidence can be achieved via random sampling and the application of probabilistic confidence bounds. We also show that accompanying answers with explanations can improve the sample complexity required to achieve acceptable bounds, because an explanation of an answer implies the ability to answer many similar questions. According to our framework, current LLMs cannot be said to understand nontrivial domains, but as the framework provides a practical recipe for testing understanding, it thus also constitutes a tool for building AI agents that do understand.

Updated: 2024-06-19 08:34:21

标题: 理解理解：一个由大型语言模型驱动的实用框架

摘要: 受到大型语言模型(LLMs)的快速崛起和关于它们是否具有人类水平品质的争论的启发，我们提出了一个框架，用于测试任何代理（无论是机器还是人类）是否理解某一主题。类似于图灵测试的方式，该框架仅基于代理的表现，特别是它如何回答问题。该框架的要素包括界定问题集（“理解范围”）、要求一般能力（“及格分数”）、避免“荒谬的答案”，但仍允许对一些问题给出错误和“我不知道”的答案。确信这些条件需要对问题进行详尽的测试，对于非平凡的范围来说这是不可能的，但我们展示了如何通过随机抽样和应用概率置信区间来实现高置信度。我们还展示了附加解释可以提高达到可接受置信度所需的样本复杂度，因为对答案的解释意味着能够回答许多类似的问题。根据我们的框架，目前的LLMs不能被说是理解非平凡领域，但由于该框架提供了测试理解的实用配方，因此它也构成了建立真正理解的人工智能代理的工具。

更新时间: 2024-06-19 08:34:21

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.10937v2

On rough mereology and VC-dimension in treatment of decision prediction for open world decision systems

Given a raw knowledge in the form of a data table/a decision system, one is facing two possible venues. One, to treat the system as closed, i.e., its universe does not admit new objects, or, to the contrary, its universe is open on admittance of new objects. In particular, one may obtain new objects whose sets of values of features are new to the system. In this case the problem is to assign a decision value to any such new object. This problem is somehow resolved in the rough set theory, e.g., on the basis of similarity of the value set of a new object to value sets of objects already assigned a decision value. It is crucial for online learning when each new object must have a predicted decision value.\ There is a vast literature on various methods for decision prediction for new yet unseen object. The approach we propose is founded in the theory of rough mereology and it requires a theory of sets/concepts, and, we root our theory in classical set theory of Syllogistic within which we recall the theory of parts known as Mereology. Then, we recall our theory of Rough Mereology along with the theory of weight assignment to the Tarski algebra of Mereology.\ This allows us to introduce the notion of a part to a degree. Once we have defined basics of Mereology and rough Mereology, we recall our theory of weight assignment to elements of the Boolean algebra within Mereology and this allows us to define the relation of parts to the degree and we apply this notion in a procedure to select a decision for new yet unseen objects.\ In selecting a plausible candidate which would pass its decision value to the new object, we employ the notion of Vapnik - Chervonenkis dimension in order to select at the first stage the candidate with the largest VC-dimension of the family of its $\varepsilon$-components for some choice of $\varepsilon$.

Updated: 2024-06-19 08:22:51

标题: 关于在处理开放世界决策系统的决策预测中的粗糙集合论和VC维度

摘要: 考虑到以数据表/决策系统形式的原始知识，一个人面临着两种可能的途径。一种是将系统视为封闭的，即其宇宙不允许新对象存在，或者相反，其宇宙是开放的，允许新对象存在。特别是，可以获得新对象，其特征值集合对于系统是新的。在这种情况下，问题是为任何这样的新对象分配一个决策值。这个问题在粗糙集理论中得到了某种解决，例如，基于新对象的值集与已分配决策值对象的值集的相似性。这在在线学习中至关重要，因为每个新对象必须有一个预测的决策值。有大量文献涉及对新的尚未见过的对象进行决策预测的各种方法。我们提出的方法建立在粗糙概论理论基础上，它需要一个集合/概念理论，并且我们将我们的理论根植于西洛吉斯蒂克集合理论中，其中我们回顾了称为部分论的概论理论。然后，我们回顾了我们的粗糙概论理论以及对部分论的塔斯基代数的权重分配理论。这使我们能够引入部分到一定程度的概念。一旦我们定义了概论和粗糙概论的基础，我们回顾了我们对部分论中布尔代数元素的权重分配理论，这使我们能够定义部分与程度的关系，并将这个概念应用于选择新的尚未见过的对象的决策的程序。在选择可能通过其决策值传递给新对象的可信候选者时，我们运用Vapnik - Chervonenkis维度的概念，以在第一阶段选择具有其$\varepsilon$-分量家族中最大VC维度的候选者。

更新时间: 2024-06-19 08:22:51

领域: cs.LG

下载: http://arxiv.org/abs/2406.13329v1

On Creativity and Open-Endedness

Artificial Life (ALife) as an interdisciplinary field draws inspiration and influence from a variety of perspectives. Scientific progress crucially depends, then, on concerted efforts to invite cross-disciplinary dialogue. The goal of this paper is to revitalize discussions of potential connections between the fields of Computational Creativity (CC) and ALife, focusing specifically on the concept of Open-Endedness (OE); the primary goal of CC is to endow artificial systems with creativity, and ALife has dedicated much research effort into studying and synthesizing OE and artificial innovation. However, despite the close proximity of these concepts, their use so far remains confined to their respective communities, and their relationship is largely unclear. We provide historical context for research in both domains, and review the limited work connecting research on creativity and OE explicitly. We then highlight specific questions to be considered, with the eventual goals of (i) decreasing conceptual ambiguity by highlighting similarities and differences between the concepts of OE and creativity, (ii) identifying synergy effects of a research agenda that encompasses both concepts, and (iii) establishing a dialogue between ALife and CC research.

Updated: 2024-06-19 08:14:33

标题: 关于创造力和开放性

摘要: 人工生命（ALife）作为一个跨学科领域，从各种角度获得灵感和影响。科学进步关键取决于邀请跨学科对话的共同努力。本文的目标是重振计算创造力（CC）和ALife领域之间潜在联系的讨论，重点关注开放性（OE）的概念；CC的主要目标是赋予人工系统创造力，而ALife已经致力于研究和综合OE和人工创新。然而，尽管这些概念非常接近，但迄今为止，它们的使用仍然局限于各自的社区，并且它们之间的关系在很大程度上尚不清楚。我们提供了两个领域研究的历史背景，并回顾了将创造力和OE研究明确联系起来的有限工作。然后，我们突出了要考虑的具体问题，最终的目标是（i）通过突出OE和创造力概念之间的相似性和差异减少概念模糊，（ii）确定涵盖这两个概念的研究议程的协同效应，以及（iii）建立ALife和CC研究之间的对话。

更新时间: 2024-06-19 08:14:33

领域: cs.AI

下载: http://arxiv.org/abs/2405.18016v3

Deep Learning-Based 3D Instance and Semantic Segmentation: A Review

The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation. Segmentation is challenging with point cloud data due to substantial redundancy, fluctuating sample density and lack of apparent organization. The research area has a wide range of robotics applications, including intelligent vehicles, autonomous mapping and navigation. A number of researchers have introduced various methodologies and algorithms. Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I. methods. However, due to the specific problems of processing point clouds with deep neural networks, deep learning on point clouds is still in its initial stages. This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation. In these approaches benefits, draw backs, and design mechanisms are studied and addressed. This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets, as well as the most often used pipelines, their advantages and limits, insightful findings and intriguing future research directions.

Updated: 2024-06-19 07:56:14

标题: 基于深度学习的3D实例和语义分割：综述

摘要: 将点云数据分割成几个同质区域的过程，其中相同区域的点具有相同的属性，被称为3D分割。由于点云数据存在大量冗余、样本密度波动和缺乏明显组织，因此分割是具有挑战性的。研究领域涵盖了广泛的机器人应用，包括智能车辆、自主地图制作和导航。许多研究人员已经提出了各种方法和算法。深度学习已成功应用于各种2D视觉领域作为主要的人工智能方法。然而，由于使用深度神经网络处理点云数据的特定问题，点云上的深度学习仍处于初级阶段。本研究考察了许多已提出的3D实例和语义分割策略，并对基于深度学习的3D分割的当前发展进行了全面评估。在这些方法中，研究了利弊和设计机制。本研究评估了各种分割算法对各种公开可访问数据集的竞争力的影响，以及最常用的流水线、它们的优势和局限性、深入的发现以及有趣的未来研究方向。

更新时间: 2024-06-19 07:56:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13308v1

Multi-intention Inverse Q-learning for Interpretable Behavior Representation

In advancing the understanding of natural decision-making processes, inverse reinforcement learning (IRL) methods have proven instrumental in reconstructing animal's intentions underlying complex behaviors. Given the recent development of a continuous-time multi-intention IRL framework, there has been persistent inquiry into inferring discrete time-varying rewards with IRL. To address this challenge, we introduce the class of hierarchical inverse Q-learning (HIQL) algorithms. Through an unsupervised learning process, HIQL divides expert trajectories into multiple intention segments, and solves the IRL problem independently for each. Applying HIQL to simulated experiments and several real animal behavior datasets, our approach outperforms current benchmarks in behavior prediction and produces interpretable reward functions. Our results suggest that the intention transition dynamics underlying complex decision-making behavior is better modeled by a step function instead of a smoothly varying function. This advancement holds promise for neuroscience and cognitive science, contributing to a deeper understanding of decision-making and uncovering underlying brain mechanisms.

Updated: 2024-06-19 07:55:34

标题: 多意图反向 Q-learning 用于可解释行为表示

摘要: 在推进对自然决策过程的理解方面，逆强化学习（IRL）方法已被证明在重建动物复杂行为背后的意图方面起到关键作用。鉴于最近发展了一种连续时间多意图IRL框架，人们一直在探讨如何用IRL推断离散的时间变化奖励。为了解决这一挑战，我们引入了一类分层逆Q学习（HIQL）算法。通过无监督学习过程，HIQL将专家轨迹分为多个意图段，并独立解决每个意图的IRL问题。将HIQL应用于模拟实验和几个真实的动物行为数据集，我们的方法在行为预测方面优于当前基准，并产生可解释的奖励函数。我们的结果表明，复杂决策行为背后的意图转换动态最好由一个阶跃函数而不是一个平滑变化函数来建模。这一进步对神经科学和认知科学具有潜在价值，有助于更深入地了解决策过程并揭示潜在的大脑机制。

更新时间: 2024-06-19 07:55:34

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2311.13870v3

Provable Guarantees for Model Performance via Mechanistic Interpretability

In this work, we propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.

Updated: 2024-06-19 07:52:58

标题: 通过机制可解释性的可证保证模型性能 (Note: This translation may not be perfect as the original title is quite technical and may have different interpretations.)

摘要: 在这项工作中，我们提出使用机械可解释性——将模型权重逆向工程为人类可解释算法的技术——来推导并简洁地证明模型性能的形式保证。我们通过正式证明151个在Max-of-$K$任务上训练的小型transformer模型的准确性下限来原型化这种方法。我们创建了102种不同的计算机辅助证明策略，并评估它们在每个模型上的长度和紧密度。使用定量指标，我们发现较短的证明似乎需要并提供更多的机械理解。此外，我们发现更忠实的机械理解导致更紧密的性能边界。我们通过定性地检查我们的一部分证明来确认这些联系。最后，我们确定结构噪声的复合是使用机械可解释性生成模型性能紧凑证明的关键挑战。

更新时间: 2024-06-19 07:52:58

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2406.11779v3

Multimodal MRI-based Detection of Amyloid Status in Alzheimer's Disease Continuum

Amyloid-$\beta$ (A$\beta$) plaques in conjunction with hyperphosphorylated tau proteins in the form of neurofibrillary tangles are the two neuropathological hallmarks of Alzheimer's disease (AD). In particular, the accumulation of A$\beta$ plaques, as evinced by the A/T/N (amyloid/tau/neurodegeneration) framework, marks the initial stage. Thus, the identification of individuals with A$\beta$ positivity could enable early diagnosis and potentially lead to more effective interventions. Deep learning methods relying mainly on amyloid PET images have been employed to this end. However, PET imaging has some disadvantages, including the need of radiotracers and expensive acquisitions. Hence, in this work, we propose a novel multimodal approach that integrates information from structural, functional, and diffusion MRI data to discriminate A$\beta$ status in the AD continuum. Our method achieved an accuracy of $0.762\pm0.04$. Furthermore, a \textit{post-hoc} explainability analysis (guided backpropagation) was performed to retrieve the brain regions that most influenced the model predictions. This analysis identified some key regions that were common across modalities, some of which were well-established AD-discriminative biomarkers and related to A$\beta$ deposition, such as the hippocampus, thalamus, precuneus, and cingulate gyrus. Hence, our study demonstrates the potential viability of MRI-based characterization of A$\beta$ status, paving the way for further research in this domain.

Updated: 2024-06-19 07:51:21

标题: 多模态MRI检测在阿尔茨海默病连续性中的β淀粉样蛋白状态

摘要: 淀粉样蛋白-$\beta$ (A$\beta$) 斑块与磷酸化过度的tau蛋白形成的神经原纤维缠结是阿尔茨海默病 (AD) 的两个神经病理学特征。特别是，A$\beta$ 斑块的积累，如 A/T/N (淀粉样/tau/神经退行) 框架所示，标志着初期阶段。因此，识别具有A$\beta$ 阳性的个体可能有助于早期诊断，并可能导致更有效的干预。深度学习方法主要依赖于淀粉样PET图像已被用于此目的。然而，PET成像具有一些缺点，包括需要放射性示踪剂和昂贵的获取。因此，在这项工作中，我们提出了一种新颖的多模态方法，整合了结构、功能和扩散MRI数据的信息，以区分AD连续中的A$\beta$ 状态。我们的方法实现了$0.762\pm0.04$ 的准确率。此外，进行了\textit{事后}可解释性分析 (引导反向传播)，以检索对模型预测影响最大的脑区域。这项分析确定了一些跨模态共同的关键区域，其中一些是已建立的AD鉴别生物标志物，与A$\beta$ 沉积相关，如海马体、丘脑、前扣帕叶和扣带回。因此，我们的研究展示了基于MRI的A$\beta$ 状态表征的潜在可行性，为该领域的进一步研究铺平了道路。

更新时间: 2024-06-19 07:51:21

领域: cs.AI

下载: http://arxiv.org/abs/2406.13305v1

Integration of Policy and Reputation based Trust Mechanisms in e-Commerce Industry

The e-commerce systems are being tackled from commerce behavior and internet technologies. Therefore, trust aspect between buyer-seller transactions is a potential element which needs to be addressed in competitive e-commerce industry. The e-commerce industry is currently handling two different trust approaches. First approach consists on centralized mechanism where digital credentials/set of rules assembled, called Policy based trust mechanisms . Second approach consists on decentralized trust mechanisms where reputation, points assembled and shared, called Reputation based trust mechanisms. The difference between reputation and policy based trust mechanism will be analyzed and recommendations would be proposed to increase trust between buyer and seller in e-commerce industry. The integration of trust mechanism is proposed through mapping process, strength of one mechanism with the weakness of other. The proposed model for integrated mechanism will be presented and illustrated how the proposed model will be used in real world e-commerce industry.

Updated: 2024-06-19 07:47:48

标题: 在电子商务行业中政策和声誉基于信任机制的整合

摘要: 电子商务系统正在从商业行为和互联网技术的角度进行处理。因此，买卖双方交易之间的信任因素是竞争激烈的电子商务行业中需要解决的潜在元素。目前，电子商务行业正在处理两种不同的信任方法。第一种方法是集中机制，其中组装数字凭据/一套规则，称为基于政策的信任机制。第二种方法是分散的信任机制，其中声誉、积分被组装和共享，称为基于声誉的信任机制。将分析声誉和基于政策的信任机制之间的差异，并提出建议以增加电子商务行业中买卖双方之间的信任。建议通过映射过程来整合信任机制，利用一个机制的优势来弥补另一个机制的缺点。提出了集成机制的建模，并说明了提出的模型将如何在现实世界的电子商务行业中使用。

更新时间: 2024-06-19 07:47:48

领域: cs.AI,cs.CY,cs.MM,cs.SI

下载: http://arxiv.org/abs/2406.13303v1

Blockchain Bribing Attacks and the Efficacy of Counterincentives

We analyze bribing attacks in Proof-of-Stake distributed ledgers from a game theoretic perspective. In bribing attacks, an adversary offers participants a reward in exchange for instructing them how to behave, with the goal of attacking the protocol's properties. Specifically, our work focuses on adversaries that target blockchain safety. We consider two types of bribing, depending on how the bribes are awarded: i) guided bribing, where the bribe is given as long as the bribed party behaves as instructed; ii) effective bribing, where bribes are conditional on the attack's success, w.r.t. well-defined metrics. We analyze each type of attack in a game theoretic setting and identify relevant equilibria. In guided bribing, we show that the protocol is not an equilibrium and then describe good equilibria, where the attack is unsuccessful, and a negative one, where all parties are bribed such that the attack succeeds. In effective bribing, we show that both the protocol and the "all bribed" setting are equilibria. Using the identified equilibria, we then compute bounds on the Prices of Stability and Anarchy. Our results indicate that additional mitigations are needed for guided bribing, so our analysis concludes with incentive-based mitigation techniques, namely slashing and dilution. Here, we present two positive results, that both render the protocol an equilibrium and achieve maximal welfare for all parties, and a negative result, wherein an attack becomes more plausible if it severely affects the ledger's token's market price.

Updated: 2024-06-19 07:45:38

标题: 区块链贿赂攻击及反激励机制的有效性

摘要: 我们从博弈论的角度分析了权益证明分布式账本中的贿赂攻击。在贿赂攻击中，对手为参与者提供奖励，以交换指导他们如何行动的信息，从而攻击协议的属性。具体来说，我们的工作重点放在针对区块链安全的对手上。我们考虑两种类型的贿赂，取决于奖励的发放方式：i) 指导性贿赂，即只要被贿赂的一方按照指示行事，就会得到贿赂；ii) 有效性贿赂，即贿赂取决于攻击的成功与否，根据明确定义的指标。我们在博弈论框架中分析了每种类型的攻击，并确定了相关的均衡。在指导性贿赂中，我们表明协议不是一个均衡点，然后描述了良好的均衡点，其中攻击不成功，以及一个负面的均衡点，其中所有参与者都被贿赂以使攻击成功。在有效性贿赂中，我们表明协议和“所有被贿赂”设置都是均衡点。利用确定的均衡点，我们计算了稳定价格和无政府状态价格的界限。我们的结果表明，对于指导性贿赂，需要额外的缓解措施，因此我们的分析以基于激励的缓解技术，即减持和稀释，结束。在这里，我们提出了两个积极的结果，即使协议成为均衡点，并实现所有参与方的最大福利，以及一个负面的结果，即如果攻击严重影响账本代币的市场价格，攻击变得更加可信。

更新时间: 2024-06-19 07:45:38

领域: cs.GT,cs.CR

下载: http://arxiv.org/abs/2402.06352v2

Physics-informed machine learning as a kernel method

Physics-informed machine learning combines the expressiveness of data-based approaches with the interpretability of physical models. In this context, we consider a general regression problem where the empirical risk is regularized by a partial differential equation that quantifies the physical inconsistency. We prove that for linear differential priors, the problem can be formulated as a kernel regression task. Taking advantage of kernel theory, we derive convergence rates for the minimizer of the regularized risk and show that it converges at least at the Sobolev minimax rate. However, faster rates can be achieved, depending on the physical error. This principle is illustrated with a one-dimensional example, supporting the claim that regularizing the empirical risk with physical information can be beneficial to the statistical performance of estimators.

Updated: 2024-06-19 07:40:56

标题: 物理学启发的机器学习作为一种核方法

摘要: 物理信息驱动的机器学习结合了基于数据的方法的表达能力和物理模型的可解释性。在这个背景下，我们考虑了一个一般的回归问题，其中经验风险通过一个偏微分方程进行正则化，该方程量化了物理不一致性。我们证明对于线性微分先验，该问题可以被形式化为一个核回归任务。利用核理论，我们推导了正则化风险的最小化器的收敛速率，并展示其至少以Sobolev最小极值速率收敛。然而，取决于物理误差，更快的速率也是可以实现的。这个原则用一个一维示例进行了说明，支持这样一种说法：通过物理信息对经验风险进行正则化可以有助于估计器的统计性能。

更新时间: 2024-06-19 07:40:56

领域: cs.AI,math.ST,stat.TH

下载: http://arxiv.org/abs/2402.07514v2

LightGBM robust optimization algorithm based on topological data analysis

To enhance the robustness of the Light Gradient Boosting Machine (LightGBM) algorithm for image classification, a topological data analysis (TDA)-based robustness optimization algorithm for LightGBM, TDA-LightGBM, is proposed to address the interference of noise on image classification. Initially, the method partitions the feature engineering process into two streams: pixel feature stream and topological feature stream for feature extraction respectively. Subsequently, these pixel and topological features are amalgamated into a comprehensive feature vector, serving as the input for LightGBM in image classification tasks. This fusion of features not only encompasses traditional feature engineering methodologies but also harnesses topological structure information to more accurately encapsulate the intrinsic features of the image. The objective is to surmount challenges related to unstable feature extraction and diminished classification accuracy induced by data noise in conventional image processing. Experimental findings substantiate that TDA-LightGBM achieves a 3% accuracy improvement over LightGBM on the SOCOFing dataset across five classification tasks under noisy conditions. In noise-free scenarios, TDA-LightGBM exhibits a 0.5% accuracy enhancement over LightGBM on two classification tasks, achieving a remarkable accuracy of 99.8%. Furthermore, the method elevates the classification accuracy of the Ultrasound Breast Images for Breast Cancer dataset and the Masked CASIA WebFace dataset by 6% and 15%, respectively, surpassing LightGBM in the presence of noise. These empirical results underscore the efficacy of the TDA-LightGBM approach in fortifying the robustness of LightGBM by integrating topological features, thereby augmenting the performance of image classification tasks amidst data perturbations.

Updated: 2024-06-19 07:40:37

标题: 基于拓扑数据分析的LightGBM鲁棒优化算法

摘要: 为了增强Light Gradient Boosting Machine（LightGBM）算法在图像分类中的鲁棒性，提出了一种基于拓扑数据分析（TDA）的鲁棒性优化算法，称为TDA-LightGBM，以解决噪声对图像分类的干扰。首先，该方法将特征工程过程分为两个流：像素特征流和拓扑特征流，分别用于特征提取。随后，这些像素和拓扑特征被合并成一个全面的特征向量，作为LightGBM在图像分类任务中的输入。这种特征融合不仅包含传统的特征工程方法，还利用拓扑结构信息更准确地包含图像的内在特征。其目标是克服传统图像处理中由数据噪声引起的不稳定特征提取和降低分类准确度的挑战。实验结果证实，在嘈杂环境下，TDA-LightGBM在五个分类任务中的SOCOFing数据集上比LightGBM提高了3%的准确度。在无噪声的情况下，TDA-LightGBM在两个分类任务上比LightGBM提高了0.5%的准确度，达到了惊人的99.8%的准确度。此外，该方法分别将超声乳腺图像癌症数据集和Masked CASIA WebFace数据集的分类准确度提高了6%和15%，在噪声存在的情况下超过了LightGBM。这些实证结果强调了TDA-LightGBM方法在通过整合拓扑特征增强LightGBM的鲁棒性方面的有效性，从而提高了在数据扰动中图像分类任务的性能。

更新时间: 2024-06-19 07:40:37

领域: cs.LG

下载: http://arxiv.org/abs/2406.13300v1

Empirical Evaluation of Integrated Trust Mechanism to Improve Trust in E-commerce Services

There are mostly two approaches to tackle trust management worldwide Strong and crisp and Soft and Social. We analyze the impact of integrated trust mechanism in three different e-commerce services. The trust aspect is a dormant element between potential users and being developed expert or internet systems. We support our integration by preside over an experiment in controlled laboratory environment. The model selected for the experiment is a composite of policy and reputation based trust mechanisms and widely acknowledged in e-commerce industry. The integration between policy and trust mechanism was accomplished through mapping process, weakness of one brought to a close with the strength of other. Furthermore, experiment has been supervised to validate the effectiveness of implementation by segregating both integrated and traditional trust mechanisms in learning system

Updated: 2024-06-19 07:38:51

标题: 实证评估整合信任机制以提升电子商务服务信任

摘要: 世界范围内解决信任管理问题的主要两种方法是强和明确以及软和社会。我们分析了集成信任机制在三种不同的电子商务服务中的影响。信任方面是潜在用户和正在开发的专家或互联网系统之间的一种潜在元素。我们通过在受控实验室环境中主持一项实验来支持我们的集成。实验选用的模型是基于政策和声誉的信任机制的组合，在电子商务行业得到广泛认可。政策和信任机制之间的集成是通过映射过程实现的，一个的弱点被另一个的优势弥补。此外，实验已经监督，以验证通过在学习系统中分离集成和传统信任机制来实施的有效性。

更新时间: 2024-06-19 07:38:51

领域: cs.SI,cs.AI,cs.CY,cs.PF

下载: http://arxiv.org/abs/2406.13299v1

A Bandit Approach with Evolutionary Operators for Model Selection

This work formulates model selection as an infinite-armed bandit problem, namely, a problem in which a decision maker iteratively selects one of an infinite number of fixed choices (i.e., arms) when the properties of each choice are only partially known at the time of allocation and may become better understood over time, via the attainment of rewards.Here, the arms are machine learning models to train and selecting an arm corresponds to a partial training of the model (resource allocation).The reward is the accuracy of the selected model after its partial training.We aim to identify the best model at the end of a finite number of resource allocations and thus consider the best arm identification setup. We propose the algorithm Mutant-UCB that incorporates operators from evolutionary algorithms into the UCB-E (Upper Confidence Bound Exploration) bandit algorithm introduced by Audiber et al.Tests carried out on three open source image classification data sets attest to the relevance of this novel combining approach, which outperforms the state-of-the-art for a fixed budget.

Updated: 2024-06-19 07:38:05

标题: 一种基于进化算子的强盗方法用于模型选择

摘要: 这项工作将模型选择形式化为一个无限武装匪徒问题，即一个决策者在每次分配时迭代地选择无限数量的固定选择（即手臂）之一，当时仅部分了解每个选择的属性，并且随着时间的推移可能更好地理解，通过获得奖励。这里，手臂是用于训练和选择机器学习模型的模型，选择手臂对应于模型的部分训练（资源分配）。奖励是选定模型在部分训练后的准确性。我们的目标是在有限数量的资源分配结束时识别出最佳模型，因此考虑最佳手臂识别设置。我们提出了Mutant-UCB算法，将进化算法中的运算符整合到Audiber等人引入的UCB-E（上置信度边界探索）歹徒算法中。在三个开源图像分类数据集上进行的测试证明了这种新颖的组合方法的相关性，它在固定预算下表现优于最先进的技术。

更新时间: 2024-06-19 07:38:05

领域: cs.NE,cs.AI,cs.LG,math.OC

下载: http://arxiv.org/abs/2402.05144v2

IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to biased outcomes. To address this challenge, Imbalanced Graph Learning (IGL) has garnered substantial attention, enabling more balanced data distributions and better task performance. Despite the proliferation of IGL algorithms, the absence of consistent experimental protocols and fair performance comparisons pose a significant barrier to comprehending advancements in this field. To bridge this gap, we introduce IGL-Bench, a foundational comprehensive benchmark for imbalanced graph learning, embarking on 16 diverse graph datasets and 24 distinct IGL algorithms with uniform data processing and splitting strategies. Specifically, IGL-Bench systematically investigates state-of-the-art IGL algorithms in terms of effectiveness, robustness, and efficiency on node-level and graph-level tasks, with the scope of class-imbalance and topology-imbalance. Extensive experiments demonstrate the potential benefits of IGL algorithms on various imbalanced conditions, offering insights and opportunities in the IGL field. Further, we have developed an open-sourced and unified package to facilitate reproducible evaluation and inspire further innovative research, which is available at https://github.com/RingBDStack/IGL-Bench.

Updated: 2024-06-19 07:34:40

标题: IGL-Bench：建立不平衡图学习的全面基准

摘要: 深度图学习在过去几年中因其多样性和成功在表示各种领域的图数据方面而广受欢迎。然而，不平衡的图数据分布普遍存在，其中某些部分表现出过多的数据，而其他部分则保持稀疏，这削弱了传统图学习算法的有效性，导致偏见结果。为了解决这一挑战，不平衡图学习（IGL）引起了相当大的关注，使数据分布更加平衡，从而提高任务性能。尽管IGL算法不断增多，但缺乏一致的实验协议和公平的性能比较，这对理解该领域的进展构成了重大障碍。为了弥补这一差距，我们引入了IGL-Bench，这是一个基础性的全面基准，用于不平衡图学习，涵盖了16个不同的图数据集和24种不同的IGL算法，采用统一的数据处理和分割策略。具体而言，IGL-Bench系统地研究了最先进的IGL算法在节点级和图级任务上的有效性、鲁棒性和效率，考虑到类别不平衡和拓扑不平衡。大量实验证明了IGL算法在各种不平衡条件下的潜在益处，为IGL领域提供了见解和机会。此外，我们开发了一个开源且统一的软件包，以促进可重复的评估并激发进一步的创新研究，可在https://github.com/RingBDStack/IGL-Bench获取。

更新时间: 2024-06-19 07:34:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09870v2

Media Forensics and Deepfake Systematic Survey

Deepfake is a generative deep learning algorithm that creates or changes facial features in a very realistic way making it hard to differentiate the real from the fake features It can be used to make movies look better as well as to spread false information by imitating famous people In this paper many different ways to make a Deepfake are explained analyzed and separated categorically Using Deepfake datasets models are trained and tested for reliability through experiments Deepfakes are a type of facial manipulation that allow people to change their entire faces identities attributes and expressions The trends in the available Deepfake datasets are also discussed with a focus on how they have changed Using Deep learning a general Deepfake detection model is made Moreover the problems in making and detecting Deepfakes are also mentioned As a result of this survey it is expected that the development of new Deepfake based imaging tools will speed up in the future This survey gives indepth review of methods for manipulating images of face and various techniques to spot altered face images Four types of facial manipulation are specifically discussed which are attribute manipulation expression swap entire face synthesis and identity swap Across every manipulation category we yield information on manipulation techniques significant benchmarks for technical evaluation of counterfeit detection techniques available public databases and a summary of the outcomes of all such analyses From all of the topics in the survey we focus on the most recent development of Deepfake showing its advances and obstacles in detecting fake images

Updated: 2024-06-19 07:33:33

标题: 媒体取证与深度伪造系统的系统调查

摘要: Deepfake是一种生成式深度学习算法，可以非常逼真地创建或改变面部特征，使得很难区分真实特征和虚假特征。它可以用于使电影看起来更好，也可以通过模仿名人来传播虚假信息。本文解释了许多制作Deepfake的不同方法，并进行了分析和分类。使用Deepfake数据集训练和测试模型，通过实验验证可靠性。Deepfake是一种类型的面部操作，允许人们改变他们的整个面部、身份、属性和表情。还讨论了可用Deepfake数据集的趋势，重点关注它们的变化。通过深度学习制作了一个通用的Deepfake检测模型。此外，还提到了制作和检测Deepfake的问题。预计在未来将加速开发基于新Deepfake的成像工具。本调查对操纵面部图像的方法和各种识别被篡改面部图像的技术进行了深入审查。具体讨论了四种类型的面部操作，即属性操作、表情交换、整个面部合成和身份交换。在每种操作类别中，我们提供了操纵技术信息、用于技术评估的重要基准、可用的公共数据库以及所有这些分析结果的总结。在调查中，我们重点关注了Deepfake的最新发展，展示了它在检测虚假图像方面的进展和障碍。

更新时间: 2024-06-19 07:33:33

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.13295v1

Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the original image rather than the target tokens. To address this challenge, we propose a Contextual-Injection Attack (CIA) that employs gradient-based perturbation to inject target tokens into both visual and textual contexts, thereby improving the probability distribution of the target tokens. By shifting the contextual semantics towards the target tokens instead of the original image semantics, CIA enhances the cross-prompt transferability of adversarial images.Extensive experiments on the BLIP2, InstructBLIP, and LLaVA models show that CIA outperforms existing methods in cross-prompt transferability, demonstrating its potential for more effective adversarial strategies in VLMs.

Updated: 2024-06-19 07:32:55

标题: 通过在视觉-语言模型中注入目标标记的上下文，增强跨提示的可转移性

摘要: 视觉语言模型（VLMs）无缝地整合视觉和文本数据，执行诸如图像分类、标题生成和视觉问答等任务。然而，在交叉提示迁移攻击的背景下，敌对图像往往难以有效地欺骗所有提示，因为这些图像中的令牌的概率分布往往更有利于原始图像的语义而不是目标令牌的语义。为了解决这一挑战，我们提出了一种称为上下文注入攻击（CIA）的方法，它利用基于梯度的扰动将目标令牌注入到视觉和文本上下文中，从而提高目标令牌的概率分布。通过将上下文语义转向目标令牌而不是原始图像语义，CIA增强了敌对图像的跨提示可转移性。在BLIP2、InstructBLIP和LLaVA模型上进行的大量实验表明，CIA在跨提示可转移性方面优于现有方法，展示了其在VLMs中更有效的对抗策略的潜力。

更新时间: 2024-06-19 07:32:55

领域: cs.MM,cs.LG

下载: http://arxiv.org/abs/2406.13294v1

An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease

Alzheimer's disease (AD) is the most prevalent form of dementia with a progressive decline in cognitive abilities. The AD continuum encompasses a prodormal stage known as Mild Cognitive Impairment (MCI), where patients may either progress to AD or remain stable. In this study, we leveraged structural and functional MRI to investigate the disease-induced grey matter and functional network connectivity changes. Moreover, considering AD's strong genetic component, we introduce SNPs as a third channel. Given such diverse inputs, missing one or more modalities is a typical concern of multimodal methods. We hence propose a novel deep learning-based classification framework where generative module employing Cycle GANs was adopted to impute missing data within the latent space. Additionally, we adopted an Explainable AI method, Integrated Gradients, to extract input features relevance, enhancing our understanding of the learned representations. Two critical tasks were addressed: AD detection and MCI conversion prediction. Experimental results showed that our model was able to reach the SOA in the classification of CN/AD reaching an average test accuracy of $0.926\pm0.02$. For the MCI task, we achieved an average prediction accuracy of $0.711\pm0.01$ using the pre-trained model for CN/AD. The interpretability analysis revealed significant grey matter modulations in cortical and subcortical brain areas well known for their association with AD. Moreover, impairments in sensory-motor and visual resting state network connectivity along the disease continuum, as well as mutations in SNPs defining biological processes linked to amyloid-beta and cholesterol formation clearance and regulation, were identified as contributors to the achieved performance. Overall, our integrative deep learning approach shows promise for AD detection and MCI prediction, while shading light on important biological insights.

Updated: 2024-06-19 07:31:47

标题: 一个可解释的生成式多模式神经影像基因组学框架用于解码阿尔茨海默病

摘要: 阿尔茨海默病（AD）是认知能力逐渐下降的痴呆症最为普遍的形式。AD连续谱包括被称为轻度认知障碍（MCI）的早期阶段，患者可能会进展至AD或保持稳定。在这项研究中，我们利用结构和功能性MRI来研究疾病引起的灰质和功能网络连接的变化。此外，考虑到AD具有强烈的遗传成分，我们引入SNP作为第三通道。鉴于如此多样的输入，缺少一个或多个模态是多模态方法的一个典型关注点。因此，我们提出了一个新颖的基于深度学习的分类框架，其中采用生成模块使用Cycle GANs来填补潜在空间中的缺失数据。此外，我们采用了一种可解释的AI方法，集成梯度，以提取输入特征的相关性，增强对学习表示的理解。我们解决了两个关键任务：AD检测和MCI转化预测。实验结果显示，我们的模型能够在CN/AD分类中达到SOA，达到了平均测试准确率为0.926±0.02。对于MCI任务，我们使用预训练模型为CN/AD实现了平均预测准确率为0.711±0.01。可解释性分析揭示了在皮质和皮质下脑区域中灰质调节的显著变化，这些区域因其与AD的关联而闻名。此外，在疾病连续谱中感觉-运动和视觉静息态网络连接的损伤，以及定义与淀粉样蛋白和胆固醇形成清除和调节相关的生物过程的SNP突变，被确定为实现性能的贡献者。总的来说，我们的综合深度学习方法在AD检测和MCI预测方面显示出潜力，同时为重要的生物学洞见提供了启示。

更新时间: 2024-06-19 07:31:47

领域: q-bio.QM,cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.13292v1

Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation

Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a substantial increase in training time. With the ongoing trend of integrating large-scale synthetic data this is only expected to increase even further. Thus, the need for data-centric approaches that reduce the number of training samples while maintaining accuracy and robustness arises. While data pruning and active learning are prominent research topics in deep learning, they are as of now largely unexplored in the adversarial training literature. We address this gap and propose a new data pruning strategy based on extrapolating data importance scores from a small set of data to a larger set. In an empirical evaluation, we demonstrate that extrapolation-based pruning can efficiently reduce dataset size while maintaining robustness.

Updated: 2024-06-19 07:23:51

标题: 通过数据重要性外推在对抗训练中进行大规模数据集修剪

摘要: 它们对微小、难以察觉的攻击的脆弱性限制了深度学习模型在实际系统中的应用。对抗训练已被证明是针对这些攻击最有前途的策略之一，但以大幅增加训练时间为代价。随着不断集成大规模合成数据的趋势，这种情况只会进一步加剧。因此，需要一种数据为中心的方法，能够在保持准确性和稳健性的同时减少训练样本数量。虽然数据修剪和主动学习是深度学习中突出的研究课题，但在对抗训练文献中至今仍未得到广泛探讨。我们填补了这一空白，并提出了一种基于从少量数据到大量数据推断数据重要性分数的新数据修剪策略。在实证评估中，我们证明了基于推断的修剪可以有效减少数据集大小同时保持稳健性。

更新时间: 2024-06-19 07:23:51

领域: cs.LG

下载: http://arxiv.org/abs/2406.13283v1

Right on Time: Revising Time Series Models by Constraining their Explanations

The reliability of deep time series models is often compromised by their tendency to rely on confounding factors, which may lead to incorrect outputs. Our newly recorded, naturally confounded dataset named P2S from a real mechanical production line emphasizes this. To avoid "Clever-Hans" moments in time series, i.e., to mitigate confounders, we introduce the method Right on Time (RioT). RioT enables, for the first time, interactions with model explanations across both the time and frequency domain. Feedback on explanations in both domains is then used to constrain the model, steering it away from the annotated confounding factors. The dual-domain interaction strategy is crucial for effectively addressing confounders in time series datasets. We empirically demonstrate that RioT can effectively guide models away from the wrong reasons in P2S as well as popular time series classification and forecasting datasets.

Updated: 2024-06-19 07:20:54

标题: 按时完成：通过限制解释来修订时间序列模型

摘要: 深度时间序列模型的可靠性经常受到其依赖混淆因素的影响，这可能导致错误的输出。我们新录制的、自然混淆的数据集P2S来自一个真实的机械生产线，强调了这一点。为了避免在时间序列中出现“聪明汉斯”时刻，即减轻混淆因素，我们引入了名为Right on Time（RioT）的方法。RioT首次实现了在时间和频率域中与模型解释的交互。然后，对两个领域的解释的反馈被用来约束模型，使其远离注释的混淆因素。双域交互策略对于有效地处理时间序列数据集中的混淆因素至关重要。我们经验性地证明，RioT可以有效地引导模型远离P2S数据集中的错误原因，以及流行的时间序列分类和预测数据集。

更新时间: 2024-06-19 07:20:54

领域: cs.LG

下载: http://arxiv.org/abs/2402.12921v3

Can GPT-4 Replicate Empirical Software Engineering Research?

Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodologies and subtle nuances in software engineering data. Given that large language models (LLMs), such as GPT-4, show promise in tackling both software engineering- and science-related tasks, these models could help replicate and thus democratize empirical software engineering research. In this paper, we examine GPT-4's abilities to perform replications of empirical software engineering research on new data. We study their ability to surface assumptions made in empirical software engineering research methodologies, as well as their ability to plan and generate code for analysis pipelines on seven empirical software engineering papers. We perform a user study with 14 participants with software engineering research expertise, who evaluate GPT-4-generated assumptions and analysis plans (i.e., a list of module specifications) from the papers. We find that GPT-4 is able to surface correct assumptions, but struggles to generate ones that apply common knowledge about software engineering data. In a manual analysis of the generated code, we find that the GPT-4-generated code contains correct high-level logic, given a subset of the methodology. However, the code contains many small implementation-level errors, reflecting a lack of software engineering knowledge. Our findings have implications for leveraging LLMs for software engineering research as well as practitioner data scientists in software teams.

Updated: 2024-06-19 07:17:28

标题: 《GPT-4能复制实证软件工程研究吗？》

摘要: 生产系统上的经验软件工程研究为从业者和研究人员提供了对软件工程过程的更好理解。然而，只有一小部分生产系统被研究，限制了这项研究的影响。虽然软件工程从业者可以从在他们自己的数据上复制研究中受益，但这也带来了一系列挑战，因为进行复制需要对研究方法和软件工程数据中微妙的细节有深入的了解。鉴于大型语言模型（LLM）如GPT-4显示出在处理软件工程和科学相关任务方面的潜力，这些模型可以帮助复制并因此使经验软件工程研究民主化。在本文中，我们研究了GPT-4在新数据上执行经验软件工程研究复制的能力。我们研究了它们揭示经验软件工程研究方法中所做假设的能力，以及它们计划和生成代码以用于对七篇经验软件工程论文的分析流程。我们进行了一个用户研究，有14名具有软件工程研究专业知识的参与者，他们评估了GPT-4生成的假设和分析计划（即模块规范列表）来自这些论文。我们发现GPT-4能够揭示正确的假设，但难以生成适用于软件工程数据的常识。在对生成的代码进行手动分析时，我们发现GPT-4生成的代码在给定方法论的子集下包含正确的高层逻辑。然而，该代码包含许多小的实现级错误，反映了对软件工程知识的缺乏。我们的研究结果对于利用LLM进行软件工程研究以及软件团队中的从业者数据科学家具有重要意义。

更新时间: 2024-06-19 07:17:28

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2310.01727v3

Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach

Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs. STAR-RISs offer 360-degree full-space coverage and optimized transmission and reflection for enhanced network performance and dynamic control of the indoor propagation environment. However, deploying STAR-RISs indoors presents challenges in interference mitigation, power consumption, and real-time configuration. In this work, a novel network architecture utilizing multiple access points (APs) and STAR-RISs is proposed for indoor communication. An optimization problem encompassing user assignment, access point beamforming, and STAR-RIS phase control for reflection and transmission is formulated. The inherent complexity of the formulated problem necessitates a decomposition approach for an efficient solution. This involves tackling different sub-problems with specialized techniques: a many-to-one matching algorithm is employed to assign users to appropriate access points, optimizing resource allocation. To facilitate efficient resource management, access points are grouped using a correlation-based K-means clustering algorithm. Multi-agent deep reinforcement learning (MADRL) is leveraged to optimize the control of the STAR-RIS. Within the proposed MADRL framework, a novel approach is introduced where each decision variable acts as an independent agent, enabling collaborative learning and decision-making. Additionally, the proposed MADRL approach incorporates convex approximation (CA). This technique utilizes suboptimal solutions from successive convex approximation (SCA) to accelerate policy learning for the agents, thereby leading to faster environment adaptation and convergence. Simulations demonstrate significant network utility improvements compared to baseline approaches.

Updated: 2024-06-19 07:17:04

标题: 室内环境下NOMA辅助多STAR-RIS的设计优化：凸逼近模仿强化学习方法

摘要: 第六代（6G）网络利用同时传输和反射可重构智能表面（STAR-RISs）来克服传统RISs的限制。STAR-RISs提供360度全空间覆盖和优化的传输和反射，以增强网络性能和动态控制室内传播环境。然而，在室内部署STAR-RISs存在干扰消除、功耗和实时配置等挑战。本文提出了一种利用多个接入点（APs）和STAR-RISs的新型室内通信网络架构。建立了一个涵盖用户分配、接入点波束成形和STAR-RIS相位控制的优化问题。所制定问题的固有复杂性需要分解方法来实现高效解决。这涉及使用专门技术解决不同的子问题：采用一对多匹配算法将用户分配给适当的接入点，优化资源分配。为了促进高效的资源管理，使用基于相关性的K均值聚类算法对接入点进行分组。利用多智能体深度强化学习（MADRL）来优化对STAR-RIS的控制。在提出的MADRL框架中，引入了一种新颖的方法，其中每个决策变量充当独立代理，实现协作学习和决策。此外，所提出的MADRL方法还结合了凸逼近（CA）。该技术利用连续凸逼近（SCA）的次优解来加速代理的策略学习，从而实现更快的环境适应和收敛。模拟结果显示，与基线方法相比，网络效用显着提高。

更新时间: 2024-06-19 07:17:04

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2406.13280v1

Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets

In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.

Updated: 2024-06-19 06:59:57

标题: 调查低成本LLM标注用于口语对话理解数据集

摘要: 在口语任务导向对话（TOD）系统中，描述用户请求的语义表示的选择对于流畅的交互至关重要。实际上，系统利用这个表示来对数据库和领域知识进行推理，从而选择下一步的动作。对话的进程因此取决于这个语义表示提供的信息。虽然文本数据集提供了细粒度的语义表示，但口语对话数据集落后于此。本文提供了关于自动增强口语对话数据集语义表示的见解。我们的贡献有三个方面：（1）评估大型语言模型微调的相关性，（2）评估产生的注释所捕获的知识，（3）强调半自动注释的影响。

更新时间: 2024-06-19 06:59:57

领域: cs.AI,cs.CL,cs.HC,eess.SP

下载: http://arxiv.org/abs/2406.13269v1

Understanding User Preferences in Explainable Artificial Intelligence: A Survey and a Mapping Function Proposal

The increasing complexity of AI systems has led to the growth of the field of Explainable Artificial Intelligence (XAI), which aims to provide explanations and justifications for the outputs of AI algorithms. While there is considerable demand for XAI, there remains a scarcity of studies aimed at comprehensively understanding the practical distinctions among different methods and effectively aligning each method with users individual needs, and ideally, offer a mapping function which can map each user with its specific needs to a method of explainability. This study endeavors to bridge this gap by conducting a thorough review of extant research in XAI, with a specific focus on Explainable Machine Learning (XML), and a keen eye on user needs. Our main objective is to offer a classification of XAI methods within the realm of XML, categorizing current works into three distinct domains: philosophy, theory, and practice, and providing a critical review for each category. Moreover, our study seeks to facilitate the connection between XAI users and the most suitable methods for them and tailor explanations to meet their specific needs by proposing a mapping function that take to account users and their desired properties and suggest an XAI method to them. This entails an examination of prevalent XAI approaches and an evaluation of their properties. The primary outcome of this study is the formulation of a clear and concise strategy for selecting the optimal XAI method to achieve a given goal, all while delivering personalized explanations tailored to individual users.

Updated: 2024-06-19 06:58:30

标题: 理解可解释人工智能中用户偏好：一项调查和映射函数提案

摘要: 人工智能系统日益复杂，导致了可解释人工智能（XAI）领域的发展，该领域旨在为人工智能算法的输出提供解释和理由。虽然对XAI的需求很大，但仍然缺乏旨在全面理解不同方法之间的实际区别并有效地将每种方法与用户个人需求对齐的研究，理想情况下，提供一个能够将每个用户及其特定需求映射到可解释性方法的映射函数。本研究通过对现有XAI研究进行彻底审查，特别关注可解释机器学习（XML），并关注用户需求，致力于弥补这一差距。我们的主要目标是在XML领域内提供XAI方法的分类，将当前研究划分为三个不同领域：哲学、理论和实践，并对每个类别进行批判性评论。此外，我们的研究旨在促进XAI用户与最适合他们的方法之间的联系，并根据他们的特定需求量身定制解释，提出一个考虑用户及其所需属性的映射函数，并向他们建议XAI方法。这涉及对流行的XAI方法的审查和评估其属性。本研究的主要成果是制定一种清晰简明的策略，选择最佳的XAI方法实现给定目标，同时提供量身定制的解释，以满足个人用户的需求。

更新时间: 2024-06-19 06:58:30

领域: cs.AI

下载: http://arxiv.org/abs/2302.03180v2

Advancing Parsimonious Deep Learning Weather Prediction using the HEALPix Mesh

We present a parsimonious deep learning weather prediction model to forecast seven atmospheric variables with 3-h time resolution for up to one-year lead times on a 110-km global mesh using the Hierarchical Equal Area isoLatitude Pixelization (HEALPix). In comparison to state-of-the-art (SOTA) machine learning (ML) weather forecast models, such as Pangu-Weather and GraphCast, our DLWP-HPX model uses coarser resolution and far fewer prognostic variables. Yet, at one-week lead times, its skill is only about one day behind both SOTA ML forecast models and the SOTA numerical weather prediction model from the European Centre for Medium-Range Weather Forecasts. We report several improvements in model design, including switching from the cubed sphere to the HEALPix mesh, inverting the channel depth of the U-Net, and introducing gated recurrent units (GRU) on each level of the U-Net hierarchy. The consistent east-west orientation of all cells on the HEALPix mesh facilitates the development of location-invariant convolution kernels that successfully propagate weather patterns across the globe without requiring separate kernels for the polar and equatorial faces of the cube sphere. Without any loss of spectral power after the first two days, the model can be unrolled autoregressively for hundreds of steps into the future to generate realistic states of the atmosphere that respect seasonal trends, as showcased in one-year simulations.

Updated: 2024-06-19 06:54:30

标题: 推进使用HEALPix网格的简约深度学习天气预测

摘要: 我们提出了一种简约的深度学习天气预测模型，用于在一个110公里的全球网格上使用分层等面积等纬度像素化（HEALPix）预测七个大气变量，时间分辨率为3小时，可提前一年。与最先进的机器学习（ML）天气预测模型（如Pangu-Weather和GraphCast）相比，我们的DLWP-HPX模型分辨率更粗，预测变量更少。然而，在一周的提前时间内，其技能仅比SOTA ML预测模型和欧洲中期天气预报中心的SOTA数值天气预测模型落后一天。我们报告了模型设计的几项改进，包括从立方体切换到HEALPix网格，颠倒U-Net的通道深度，并在U-Net层次的每个级别引入门控循环单元（GRU）。HEALPix网格上所有单元的一致东西方方向有助于发展位置不变的卷积核，成功地在全球范围内传播天气模式，而无需为立方体的极地和赤道面分别使用核。在前两天后没有任何谱能量损失的情况下，该模型可以自回归地展开数百步到未来，生成尊重季节趋势的大气状态，如展示在一年模拟中。

更新时间: 2024-06-19 06:54:30

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2311.06253v2

Molecule Graph Networks with Many-body Equivariant Interactions

Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on their shared node. In this study, we develop Equivariant N-body Interaction Networks (ENINet) that explicitly integrates equivariant many-body interactions to preserve directional information in the message passing scheme. Experiments indicate that integrating many-body equivariant representations enhances prediction accuracy across diverse scalar and tensorial quantum chemical properties. Ablation studies show an average performance improvement of 7.9% across 11 out of 12 properties in QM9, 27.9% in forces in MD17, and 11.3% in polarizabilities (CCSD) in QM7b.

Updated: 2024-06-19 06:53:09

标题: 使用分子图网络进行多体等变相互作用

摘要: 消息传递神经网络在预测分子相互作用方面表现出显著的有效性。引入等变矢量表示增强了表达能力，通过捕获几何数据对称性来提高模型准确性。然而，在消息传递过程中，相对的两体键向量可能会相互抵消，导致在它们共享的节点上丢失方向信息。在本研究中，我们开发了等变N体相互作用网络（ENINet），明确地整合了等变多体相互作用，以保留消息传递方案中的方向信息。实验证明，整合多体等变表示增强了对各种标量和张量量子化学性质的预测准确性。消融研究显示，在QM9中的12种性质中，平均性能提高了7.9％，在MD17中的力量提高了27.9％，在QM7b中的极化率（CCSD）提高了11.3％。

更新时间: 2024-06-19 06:53:09

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2406.13265v1

Recent advances in text embedding: A Comprehensive Review of Top-Performing Methods on the MTEB Benchmark

Text embedding methods have become increasingly popular in both industrial and academic fields due to their critical role in a variety of natural language processing tasks. The significance of universal text embeddings has been further highlighted with the rise of Large Language Models (LLMs) applications such as Retrieval-Augmented Systems (RAGs). While previous models have attempted to be general-purpose, they often struggle to generalize across tasks and domains. However, recent advancements in training data quantity, quality and diversity; synthetic data generation from LLMs as well as using LLMs as backbones encourage great improvements in pursuing universal text embeddings. In this paper, we provide an overview of the recent advances in universal text embedding models with a focus on the top performing text embeddings on Massive Text Embedding Benchmark (MTEB). Through detailed comparison and analysis, we highlight the key contributions and limitations in this area, and propose potentially inspiring future research directions.

Updated: 2024-06-19 06:52:13

标题: 最近在文本嵌入方面的进展：对MTEB基准测试中表现最佳方法的全面评估

摘要: 文本嵌入方法在工业和学术领域中变得越来越流行，因为它们在各种自然语言处理任务中起着关键作用。随着大型语言模型（LLMs）应用（如检索增强系统（RAGs））的兴起，通用文本嵌入的重要性进一步凸显。虽然先前的模型曾试图成为通用型，但它们往往难以在任务和领域之间进行泛化。然而，最近在训练数据数量、质量和多样性方面的进展，以及从LLMs生成合成数据以及将LLMs作为骨干鼓励在追求通用文本嵌入方面取得极大进展。在本文中，我们概述了通用文本嵌入模型的最新进展，重点关注在大规模文本嵌入基准测试（MTEB）上表现最佳的文本嵌入。通过详细比较和分析，我们突出了该领域的关键贡献和局限性，并提出了潜在的启发未来研究方向。

更新时间: 2024-06-19 06:52:13

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.01607v2

Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks

Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task - full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today - simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation. Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. Our benchmark shows that while state-of-the-art FMs can automatically generate documentation (e.g. recalling 88% of the steps taken in a video demonstration of a workflow), they struggle to re-apply that knowledge towards finer-grained validation of workflow completion (F1 < 0.3). We hope WONDERBREAD encourages the development of more "human-centered" AI tooling for enterprise applications and furthers the exploration of multimodal FMs for the broader universe of BPM tasks. We publish our dataset and experiments here: https://github.com/HazyResearch/wonderbread

Updated: 2024-06-19 06:50:15

标题: 多模基础模型是否理解企业工作流程？业务流程管理任务的基准测试

摘要: 现有的机器学习基准缺乏评估商业流程管理（BPM）任务模型所需的深度和多样性注释。BPM是记录、衡量、改进和自动化企业工作流程的实践。然而，研究几乎完全集中在一个任务上 - 使用基于多模式基础模型（FMs）如GPT-4的代理进行全面自动化。这种对自动化的关注忽略了大多数BPM工具如何在今天应用的现实情况 - 仅仅记录相关工作流程就占到了典型流程优化项目时间的60%。为了解决这一差距，我们提出了WONDERBREAD，这是评估BPM任务上多模式FMs的第一个基准。我们的贡献包括：（1）包含2928个记录的工作流演示的数据集；（2）从涵盖工作流程文档、知识传递到流程改进的真实应用中获取的6个新型BPM任务；以及（3）一个自动评估工具。我们的基准显示，尽管最先进的FMs可以自动生成文档（例如，在工作流程视频演示中回忆88%的操作步骤），但它们很难将这些知识重新应用于更精细的工作流程完成验证（F1 <0.3）。我们希望WONDERBREAD鼓励开发更多面向人类的AI工具，用于企业应用，并推动对更广泛BPM任务的多模式FMs的探索。我们在此发布我们的数据集和实验：https://github.com/HazyResearch/wonderbread

更新时间: 2024-06-19 06:50:15

领域: cs.AI,cs.LG,cs.SE

下载: http://arxiv.org/abs/2406.13264v1

Machine Learning Applications of Quantum Computing: A Review

At the intersection of quantum computing and machine learning, this review paper explores the transformative impact these technologies are having on the capabilities of data processing and analysis, far surpassing the bounds of traditional computational methods. Drawing upon an in-depth analysis of 32 seminal papers, this review delves into the interplay between quantum computing and machine learning, focusing on transcending the limitations of classical computing in advanced data processing and applications. This review emphasizes the potential of quantum-enhanced methods in enhancing cybersecurity, a critical sector that stands to benefit significantly from these advancements. The literature review, primarily leveraging Science Direct as an academic database, delves into the transformative effects of quantum technologies on machine learning, drawing insights from a diverse collection of studies and scholarly articles. While the focus is primarily on the growing significance of quantum computing in cybersecurity, the review also acknowledges the promising implications for other sectors as the field matures. Our systematic approach categorizes sources based on quantum machine learning algorithms, applications, challenges, and potential future developments, uncovering that quantum computing is increasingly being implemented in practical machine learning scenarios. The review highlights advancements in quantum-enhanced machine learning algorithms and their potential applications in sectors such as cybersecurity, emphasizing the need for industry-specific solutions while considering ethical and security concerns. By presenting an overview of the current state and projecting future directions, the paper sets a foundation for ongoing research and strategic advancement in quantum machine learning.

Updated: 2024-06-19 06:47:35

标题: 量子计算的机器学习应用：一项综述

摘要: 在量子计算和机器学习的交汇点，本综述探讨了这些技术对数据处理和分析能力的变革影响，远远超越了传统计算方法的界限。通过对32篇开创性论文的深入分析，本综述深入探讨了量子计算和机器学习之间的相互作用，重点关注在先进数据处理和应用方面超越经典计算的限制。本综述强调了量子增强方法在增强网络安全方面的潜力，这是一个可以从这些进步中获益的关键领域。文献综述主要利用Science Direct作为学术数据库，深入探讨了量子技术对机器学习的变革影响，从各种研究和学术文章中获得了见解。虽然焦点主要是量子计算在网络安全领域日益重要，但综述也承认了随着领域的成熟，其他领域的有希望的影响。我们的系统方法根据量子机器学习算法、应用、挑战和潜在未来发展对来源进行分类，发现量子计算越来越多地被应用于实际机器学习场景。综述突出了量子增强机器学习算法的进展及其在网络安全等领域的潜在应用，强调了在考虑伦理和安全问题的同时需要行业特定解决方案。通过对当前状态和未来方向进行概述，本文为量子机器学习的持续研究和战略发展奠定了基础。

更新时间: 2024-06-19 06:47:35

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2406.13262v1

Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering

Recent studies have investigated utilizing Knowledge Graphs (KGs) to enhance Quesetion Answering (QA) performance of Large Language Models (LLMs), yet structured KG verbalization remains challengin. Existing methods, such as triple-form or free-form textual conversion of triple-form facts, encounter several issues. These include reduced evidence density due to duplicated entities or relationships, and reduced evidence clarity due to an inability to emphasize crucial evidence. To address these issues, we propose EFSum, an Evidence-focused Fact Summarization framework for enhanced QA with knowledge-augmented LLMs. We optimize an open-source LLM as a fact summarizer through distillation and preference alignment. Our extensive experiments show that EFSum improves LLM's zero-shot QA performance, and it is possible to ensure both the helpfulness and faithfulness of the summary.

Updated: 2024-06-19 06:47:32

标题: 基于证据的事实总结，用于知识增强的零样本问答

摘要: 最近的研究调查了利用知识图（KGs）来增强大型语言模型（LLMs）的问答（QA）性能，然而结构化的KG表达仍然具有挑战性。现有的方法，如三元组形式或三元组事实的自由文本转换，遇到了几个问题。这些问题包括由于重复实体或关系而导致的证据密度降低，以及由于无法强调关键证据而导致的证据清晰度降低。为了解决这些问题，我们提出了EFSum，一个用于增强QA的面向证据的事实总结框架，使用了知识增强的LLMs。我们通过蒸馏和偏好对齐来优化一个开源的LLM作为事实总结器。我们的广泛实验证明，EFSum改善了LLM的零-shot QA性能，并且可以确保总结的有用性和忠实性。

更新时间: 2024-06-19 06:47:32

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.02966v2

BeHonest: Benchmarking Honesty of Large Language Models

Previous works on Large Language Models (LLMs) have mainly focused on evaluating their helpfulness or harmlessness. However, honesty, another crucial alignment criterion, has received relatively less attention. Dishonest behaviors in LLMs, such as spreading misinformation and defrauding users, eroding user trust, and causing real-world harm, present severe risks that intensify as these models approach superintelligence levels. Enhancing honesty in LLMs addresses critical deficiencies and helps uncover latent capabilities that are not readily expressed. This underscores the urgent need for reliable methods and benchmarks to effectively ensure and evaluate the honesty of LLMs. In this paper, we introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses. Building on this foundation, we designed 10 scenarios to evaluate and analyze 9 popular LLMs on the market, including both closed-source and open-source models from different model families with varied model sizes. Our findings indicate that there is still significant room for improvement in the honesty of LLMs. We also encourage the AI community to prioritize honesty alignment in LLMs. Our benchmark and code can be found at: \url{https://github.com/GAIR-NLP/BeHonest}.

Updated: 2024-06-19 06:46:59

标题: BeHonest: 大型语言模型诚实度基准测试

摘要: 以前关于大型语言模型（LLMs）的研究主要集中在评估它们的有用性或无害性上。然而，诚实性，另一个至关重要的对齐标准，却受到相对较少的关注。LLMs中的不诚实行为，如传播错误信息和欺骗用户，破坏用户信任，并造成现实世界的危害，带来严重风险，随着这些模型接近超智能水平，风险不断加剧。增强LLMs的诚实性解决了关键缺陷，并帮助揭示不容易表达的潜在能力。这凸显了迫切需要可靠方法和基准来有效确保和评估LLMs的诚实性。在本文中，我们引入了BeHonest，一个专门设计用于全面评估LLMs诚实性的先驱基准。BeHonest评估诚实性的三个基本方面：知识边界的意识，欺骗的避免，以及回应的一致性。在此基础上，我们设计了10个场景，评估并分析了市场上的9个流行LLMs，包括来自不同模型系列、不同模型大小的闭源和开源模型。我们的研究结果表明，LLMs的诚实性仍有很大的改进空间。我们也鼓励人工智能社区将诚实性对齐作为LLMs的优先事项。我们的基准和代码可以在以下网址找到：\url{https://github.com/GAIR-NLP/BeHonest}。

更新时间: 2024-06-19 06:46:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13261v1

Cyber Protection Applications of Quantum Computing: A Review

Quantum computing is a cutting-edge field of information technology that harnesses the principles of quantum mechanics to perform computations. It has major implications for the cyber security industry. Existing cyber protection applications are working well, but there are still challenges and vulnerabilities in computer networks. Sometimes data and privacy are also compromised. These complications lead to research questions asking what kind of cyber protection applications of quantum computing are there and what potential methods or techniques can be used for cyber protection? These questions will reveal how much power quantum computing has and to what extent it can outperform the conventional computing systems. This scoping review was conducted by considering 815 papers. It showed the possibilities that can be achievedif quantum technologies are implemented in cyber environments. This scoping review discusses various domains such as algorithms and applications, bioinformatics, cloud and edge computing, the organization of complex systems, application areas focused on security and threats, and the broader quantum computing ecosystem. In each of these areas, there is significant scope for quantum computing to be implemented and to revolutionize the working environment. Numerous quantum computing applications for cyber protection and a number of techniques to protect our data and privacy were identified. The results are not limited to network security but also include data security. This paper also discusses societal aspects, e.g., the applications of quantum computing in the social sciences. This scoping review discusses how to enhance the efficiency and security of quantum computing in various cyber security domains. Additionally, it encourages the reader to think about what kind of techniques and methods can be deployed to secure the cyber world.

Updated: 2024-06-19 06:46:31

标题: 量子计算的网络保护应用：一项综述

摘要: 量子计算是一种利用量子力学原理进行计算的前沿信息技术领域。它对网络安全行业有重大影响。现有的网络安全应用程序运行良好，但计算机网络仍然存在挑战和漏洞。有时数据和隐私也会受到威胁。这些复杂性引发了研究问题，即量子计算的网络安全应用有哪些，以及可以用于网络安全的潜在方法或技术是什么？这些问题将揭示量子计算的能力有多大，以及在多大程度上它可以超越传统计算系统。本文通过考虑815篇论文进行了这项范围审查。它展示了如果量子技术在网络环境中得到应用，可能会实现的可能性。这项范围审查讨论了各种领域，如算法和应用程序、生物信息学、云和边缘计算、复杂系统的组织、侧重于安全和威胁的应用领域，以及更广泛的量子计算生态系统。在每个领域中，量子计算都有很大的实施空间，并且可以彻底改变工作环境。已确定了许多用于网络安全的量子计算应用和大量保护数据和隐私的技术。结果不仅局限于网络安全，还包括数据安全。本文还讨论了社会方面，例如量子计算在社会科学中的应用。这项范围审查讨论了如何增强各种网络安全领域中量子计算的效率和安全性。此外，它鼓励读者考虑可以部署何种技术和方法来保护网络世界。

更新时间: 2024-06-19 06:46:31

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2406.13259v1

Applications of Post-quantum Cryptography

With the constantly advancing capabilities of quantum computers, conventional cryptographic systems relying on complex math problems may encounter unforeseen vulnerabilities. Unlike regular computers, which are often deemed cost-ineffective in cryptographic attacks, quantum computers have a significant advantage in calculation speed. This distinction potentially makes currently used algorithms less secure or even completely vulnerable, compelling the exploration of post-quantum cryptography (PQC) as the most reasonable solution to quantum threats. This review aims to provide current information on applications, benefits, and challenges associated with the PQC. The review employs a systematic scoping review with the scope restricted to the years 2022 and 2023; only articles that were published in scientific journals were used in this paper. The review examined the articles on the applications of quantum computing in various spheres. However, the scope of this paper was restricted to the domain of the PQC because most of the analyzed articles featured this field. Subsequently, the paper is analyzing various PQC algorithms, including lattice-based, hash-based, code-based, multivariate polynomial, and isogeny-based cryptography. Each algorithm is being judged based on its potential applications, robustness, and challenges. All the analyzed algorithms are promising for the post-quantum era in such applications as digital signatures, communication channels, and IoT. Moreover, some of the algorithms are already implemented in the spheres of banking transactions, communication, and intellectual property. Meanwhile, despite their potential, these algorithms face serious challenges since they lack standardization, require vast amounts of storage and computation power, and might have unknown vulnerabilities that can be discovered only with years of cryptanalysis.

Updated: 2024-06-19 06:45:39

标题: 后量子密码学的应用

摘要: 随着量子计算机不断发展，依赖于复杂数学问题的传统加密系统可能会遇到意想不到的漏洞。与常规计算机不同，常常被认为在加密攻击中成本高昂的量子计算机在计算速度上具有显著优势。这种区别可能使当前使用的算法变得不够安全甚至完全容易受到攻击，迫使探索后量子密码学（PQC）作为对量子威胁最合理的解决方案。本综述旨在提供有关PQC应用、优势和挑战的最新信息。综述采用了系统的范围审查，范围限定在2022年和2023年；本文只使用发表在科学期刊上的文章。综述审查了有关量子计算在各个领域应用的文章。然而，本文的范围限定在PQC领域，因为大多数分析的文章都涉及到了这个领域。随后，本文正在分析各种PQC算法，包括基于格、基于哈希、基于代码、多变量多项式和同态密码学。每种算法都根据其潜在应用、健壮性和挑战进行评估。所有分析的算法在后量子时代的数字签名、通信渠道和物联网等应用中都很有前景。此外，一些算法已经在银行交易、通信和知识产权领域中得到实施。然而，尽管有潜力，这些算法面临严重挑战，因为它们缺乏标准化，需要大量的存储和计算能力，并可能存在未知的漏洞，只有经过多年的密码分析才能发现。

更新时间: 2024-06-19 06:45:39

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2406.13258v1

Reasoning with trees: interpreting CNNs using hierarchies

Challenges persist in providing interpretable explanations for neural network reasoning in explainable AI (xAI). Existing methods like Integrated Gradients produce noisy maps, and LIME, while intuitive, may deviate from the model's reasoning. We introduce a framework that uses hierarchical segmentation techniques for faithful and interpretable explanations of Convolutional Neural Networks (CNNs). Our method constructs model-based hierarchical segmentations that maintain the model's reasoning fidelity and allows both human-centric and model-centric segmentation. This approach offers multiscale explanations, aiding bias identification and enhancing understanding of neural network decision-making. Experiments show that our framework, xAiTrees, delivers highly interpretable and faithful model explanations, not only surpassing traditional xAI methods but shedding new light on a novel approach to enhancing xAI interpretability. Code at: https://github.com/CarolMazini/reasoning_with_trees .

Updated: 2024-06-19 06:45:19

标题: 用树进行推理：使用层次结构解释CNNs

摘要: 在可解释人工智能（xAI）中，为神经网络推理提供可解释性解释仍然存在挑战。现有的方法如Integrated Gradients产生嘈杂的映射，而LIME虽然直观，但可能偏离模型的推理。我们引入了一个框架，利用分层分割技术对卷积神经网络（CNNs）进行忠实和可解释的解释。我们的方法构建基于模型的分层分割，保持模型的推理忠实性，并允许人类中心和模型中心的分割。这种方法提供了多尺度的解释，有助于偏见识别和增强对神经网络决策制定的理解。实验证明，我们的框架xAiTrees提供了高度可解释和忠实的模型解释，不仅超越了传统的xAI方法，而且为增强xAI可解释性的新方法带来了新的启示。代码链接：https://github.com/CarolMazini/reasoning_with_trees。

更新时间: 2024-06-19 06:45:19

领域: cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13257v1

Smart Contracts in the Real World: A Statistical Exploration of External Data Dependencies

Smart contracts are pivotal for implementing various functions due to their interactivity with external data. However, this interactivity also presents challenges in terms of security and reliability. There is a lack of statistical and quantitative research on the interaction between smart contracts and external data. To fill this gap, we thoroughly examine 10,500 actual smart contracts to select 9,356 valid samples, excluding those that are outdated or have compilation errors. Utilizing code parsing techniques, the study transformed contract code into Abstract Syntax Trees (ASTs) and extracted keywords related to external data dependency through code analysis. By comparing the ASTs with the keyword list, we conduct a quantitative analysis of the number and proportion of contracts involving external data interaction. Furthermore, we collect over 3,600 security audit reports and manually filter 249 (approximately 9%) reports related to external data interaction, categorizing the external data dependency in these contracts. We also explore the relationship between the complexity of smart contracts and their dependence on external data.

Updated: 2024-06-19 06:36:23

标题: 智能合约在现实世界中的应用：外部数据依赖性的统计探索

摘要: 智能合约对于实现各种功能至关重要，因为它们与外部数据的交互性。然而，这种互动性也带来了安全性和可靠性方面的挑战。关于智能合约与外部数据交互之间的关系，缺乏统计和定量研究。为填补这一空白，我们彻底检查了10,500个实际智能合约，选取了9,356个有效样本，排除了那些过时或编译错误的合约。利用代码解析技术，研究将合约代码转换为抽象语法树（ASTs）并通过代码分析提取与外部数据依赖相关的关键词。通过将ASTs与关键词列表进行比较，我们对涉及外部数据交互的合约数量和比例进行了定量分析。此外，我们收集了超过3,600份安全审计报告，并手动筛选了249份（约占9%）与外部数据交互相关的报告，对这些合约中的外部数据依赖进行分类。我们还探讨了智能合约的复杂性与其对外部数据的依赖之间的关系。

更新时间: 2024-06-19 06:36:23

领域: cs.CR

下载: http://arxiv.org/abs/2406.13253v1

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

While recommender systems have become an integral component of the Web experience, their heavy reliance on user data raises privacy and security concerns. Substituting user data with synthetic data can address these concerns, but accurately replicating these real-world datasets has been a notoriously challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across various domains. In this work we introduce a Score-based Diffusion Recommendation Module (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems. SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity. Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@k and 4.65% in NDCG@k.

Updated: 2024-06-19 06:23:55

标题: 多分辨率扩散用于隐私敏感推荐系统

摘要: 推荐系统已经成为网络体验的一个重要组成部分，但它们对用户数据的重度依赖引起了隐私和安全方面的担忧。用合成数据替代用户数据可以解决这些问题，但准确复制这些真实世界数据集一直是一个极具挑战性的问题。生成式人工智能的最新进展展示了扩散模型在各个领域生成逼真数据的显著能力。在本研究中，我们引入了基于得分的扩散推荐模块（SDRM），它捕捉了用于训练高度准确推荐系统所需的真实世界数据集的复杂模式。SDRM允许生成可以替代现有数据集以保护用户隐私，或者增加现有数据集以解决过度数据稀疏性的合成数据。我们的方法在合成各种数据集以替代或增补原始数据方面优于竞争基线，如生成对抗网络、变分自动编码器和最近提出的扩散模型，平均提高了4.30%的Recall@k和4.65%的NDCG@k。

更新时间: 2024-06-19 06:23:55

领域: cs.IR,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2311.03488v4

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

Large language models (LLMs) encode a vast amount of world knowledge acquired from massive text datasets. Recent studies have demonstrated that LLMs can assist an embodied agent in solving complex sequential decision making tasks by providing high-level instructions. However, interactions with LLMs can be time-consuming. In many practical scenarios, it requires a significant amount of storage space that can only be deployed on remote cloud servers. Additionally, using commercial LLMs can be costly since they may charge based on usage frequency. In this paper, we explore how to enable intelligent cost-effective interactions between a down stream task oriented agent and an LLM. We find that this problem can be naturally formulated by a Markov decision process (MDP), and propose When2Ask, a reinforcement learning based approach that learns when it is necessary to query LLMs for high-level instructions to accomplish a target task. One one side, When2Ask discourages unnecessary redundant interactions, while on the other side, it enables the agent to identify and follow useful instructions from the LLM. This enables the agent to halt an ongoing plan and transition to a more suitable one based on new environmental observations. Experiments on MiniGrid and Habitat environments that entail planning sub-goals demonstrate that When2Ask learns to solve target tasks with only a few necessary interactions with the LLM, significantly reducing interaction costs in testing environments compared with baseline methods. Our code is available at: https://github.com/ZJLAB-AMMI/LLM4RL.

Updated: 2024-06-19 06:22:25

标题: 实现智能互动：一个强化学习方法在代理和LLM之间

摘要: 大型语言模型（LLMs）编码了从大规模文本数据集中获得的大量世界知识。最近的研究表明，LLMs可以通过提供高级指令，帮助具有体验的代理解决复杂的序贯决策任务。然而，与LLMs的交互可能耗时。在许多实际情况下，这需要大量的存储空间，只能部署在远程云服务器上。此外，使用商业LLMs可能成本高昂，因为它们可能基于使用频率收费。在本文中，我们探讨了如何实现下游任务导向代理与LLM之间的智能成本效益交互。我们发现，这个问题可以自然地通过马尔可夫决策过程（MDP）来形式化，并提出了When2Ask，一种基于强化学习的方法，学习何时需要查询LLMs以完成目标任务的高级指令。一方面，When2Ask抑制了不必要的冗余交互，另一方面，它使代理能够识别并遵循LLM的有用指令。这使代理能够根据新的环境观察停止正在进行的计划，并转移到更合适的计划。在涉及规划子目标的MiniGrid和Habitat环境中的实验表明，When2Ask学会了仅通过与LLM的少量必要交互来解决目标任务，在测试环境中与基准方法相比显著降低了交互成本。我们的代码可在以下链接找到：https://github.com/ZJLAB-AMMI/LLM4RL。

更新时间: 2024-06-19 06:22:25

领域: cs.AI

下载: http://arxiv.org/abs/2306.03604v7

LangTopo: Aligning Language Descriptions of Graphs with Tokenized Topological Modeling

Recently, large language models (LLMs) have been widely researched in the field of graph machine learning due to their outstanding abilities in language comprehension and learning. However, the significant gap between natural language tasks and topological structure modeling poses a nonnegligible challenge. Specifically, since natural language descriptions are not sufficient for LLMs to understand and process graph-structured data, fine-tuned LLMs perform even worse than some traditional GNN models on graph tasks, lacking inherent modeling capabilities for graph structures. Existing research overly emphasizes LLMs' understanding of semantic information captured by external models, while inadequately exploring graph topological structure modeling, thereby overlooking the genuine capabilities that LLMs lack. Consequently, in this paper, we introduce a new framework, LangTopo, which aligns graph structure modeling with natural language understanding at the token level. LangTopo quantifies the graph structure modeling capabilities of GNNs and LLMs by constructing a codebook for the graph modality and performs consistency maximization. This process aligns the text description of LLM with the topological modeling of GNN, allowing LLM to learn the ability of GNN to capture graph structures, enabling LLM to handle graph-structured data independently. We demonstrate the effectiveness of our proposed method on multiple datasets.

Updated: 2024-06-19 06:20:22

标题: LangTopo：将图的语言描述与标记化的拓扑建模进行对齐

摘要: 最近，由于其在语言理解和学习方面出色的能力，在图机器学习领域广泛研究了大型语言模型（LLMs）。然而，自然语言任务与拓扑结构建模之间的显著差距构成了一个不可忽视的挑战。具体来说，由于自然语言描述不足以让LLMs理解和处理图结构化数据，经过微调的LLMs在图任务上表现甚至不如一些传统的GNN模型，缺乏对图结构的固有建模能力。现有研究过分强调LLMs对外部模型捕捉的语义信息的理解，而不够探索图拓扑结构建模，从而忽视了LLMs缺乏的真正能力。因此，在本文中，我们介绍了一个新框架LangTopo，该框架将图结构建模与自然语言理解在标记级别上进行了对齐。LangTopo通过为图模态构建一个码书并执行一致性最大化来量化GNNs和LLMs的图结构建模能力。这个过程将LLMs的文本描述与GNN的拓扑建模对齐，使LLMs学习到GNN捕捉图结构的能力，使LLMs能够独立处理图结构化数据。我们在多个数据集上展示了我们提出的方法的有效性。

更新时间: 2024-06-19 06:20:22

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13250v1

R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation

Retrieval augmented generation (RAG) has been applied in many scenarios to augment large language models (LLMs) with external documents provided by retrievers. However, a semantic gap exists between LLMs and retrievers due to differences in their training objectives and architectures. This misalignment forces LLMs to passively accept the documents provided by the retrievers, leading to incomprehension in the generation process, where the LLMs are burdened with the task of distinguishing these documents using their inherent knowledge. This paper proposes R$^2$AG, a novel enhanced RAG framework to fill this gap by incorporating Retrieval information into Retrieval Augmented Generation. Specifically, R$^2$AG utilizes the nuanced features from the retrievers and employs a R$^2$-Former to capture retrieval information. Then, a retrieval-aware prompting strategy is designed to integrate retrieval information into LLMs' generation. Notably, R$^2$AG suits low-source scenarios where LLMs and retrievers are frozen. Extensive experiments across five datasets validate the effectiveness, robustness, and efficiency of R$^2$AG. Our analysis reveals that retrieval information serves as an anchor to aid LLMs in the generation process, thereby filling the semantic gap.

Updated: 2024-06-19 06:19:48

标题: R^2AG: 将检索信息纳入检索增强生成

摘要: 检索增强生成（RAG）已经应用于许多场景，以增强大型语言模型（LLMs）与由检索器提供的外部文档。然而，由于训练目标和架构的差异，LLMs和检索器之间存在语义差距。这种不匹配迫使LLMs passively接受检索器提供的文档，在生成过程中导致了无法理解的情况，LLMs负担着使用其固有知识来区分这些文档的任务。本文提出了R$^2$AG，一种新颖的增强RAG框架，通过将检索信息整合到检索增强生成中来填补这一差距。具体而言，R$^2$AG利用了来自检索器的微妙特征，并采用了一个R$^2$-Former来捕捉检索信息。然后，设计了一个检索感知提示策略，将检索信息整合到LLMs的生成中。值得注意的是，R$^2$AG适用于低源场景，其中LLMs和检索器被冻结。通过对五个数据集的广泛实验验证了R$^2$AG的有效性，稳健性和效率。我们的分析显示，检索信息作为一个锚点，帮助LLMs在生成过程中，从而填补了语义差距。

更新时间: 2024-06-19 06:19:48

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.13249v1

GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs

The ability to understand and reason about spatial relationships between objects in images is an important component of visual reasoning. This skill rests on the ability to recognize and localize objects of interest and determine their spatial relation. Early vision and language models (VLMs) have been shown to struggle to recognize spatial relations. We extend the previously released What'sUp dataset and propose a novel comprehensive evaluation for spatial relationship understanding that highlights the strengths and weaknesses of 27 different models. In addition to the VLMs evaluated in What'sUp, our extensive evaluation encompasses 3 classes of Multimodal LLMs (MLLMs) that vary in their parameter sizes (ranging from 7B to 110B), training/instruction-tuning methods, and visual resolution to benchmark their performances and scrutinize the scaling laws in this task.

Updated: 2024-06-19 06:15:26

标题: GSR-BENCH：通过多模态LLMs评估基于空间推理的基准Benchmark

摘要: 理解和推理图像中对象之间的空间关系的能力是视觉推理的重要组成部分。这种技能依赖于识别和定位感兴趣对象以及确定它们的空间关系的能力。早期的视觉和语言模型(VLMs)已经被证明在识别空间关系方面存在困难。我们扩展了先前发布的What'sUp数据集，并提出了一个新颖的全面评估，用于空间关系理解，突出了27种不同模型的优势和劣势。除了在What'sUp中评估的VLMs之外，我们的广泛评估还包括3类多模态LLMs(MLLMs)，这些类别在其参数大小(范围从7B到110B)、训练/指导调整方法和视觉分辨率方面存在差异，以评估它们的性能并审查该任务中的缩放定律。

更新时间: 2024-06-19 06:15:26

领域: cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13246v1

Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy Constraint

Large language models internalize enormous parametric knowledge during pre-training. Concurrently, realistic applications necessitate external contextual knowledge to aid models on the underlying tasks. This raises a crucial dilemma known as knowledge conflicts, where the contextual knowledge clashes with the However, existing decoding works are specialized in resolving knowledge conflicts and could inadvertently deteriorate performance in absence of conflicts. In this paper, we propose an adaptive decoding method, termed as contextual information-entropy constraint decoding (COIECD), to discern whether the knowledge conflicts occur and resolve them. It can improve the model's faithfulness to conflicting context, and simultaneously maintain high performance among non- Our experiments show that COIECD exhibits strong performance and robustness over knowledge conflicts in realistic datasets. Code is available.

Updated: 2024-06-19 06:07:37

标题: 通过自适应解码和上下文信息熵约束识别和解决知识冲突

摘要: 大型语言模型在预训练过程中内化了巨大的参数知识。同时，现实应用需要外部上下文知识来帮助模型处理基础任务。这引发了一个被称为知识冲突的关键困境，其中上下文知识与现有知识相冲突。然而，现有的解码工作专门解决知识冲突，可能在没有冲突的情况下无意中降低性能。在本文中，我们提出了一种自适应解码方法，称为上下文信息熵约束解码（COIECD），以辨别知识冲突是否发生并解决它们。它可以提高模型对冲突上下文的忠实度，同时在非冲突情况下保持高性能。我们的实验表明，COIECD在现实数据集中表现出强大的性能和稳健性。代码可供使用。

更新时间: 2024-06-19 06:07:37

领域: cs.AI

下载: http://arxiv.org/abs/2402.11893v2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.

Updated: 2024-06-19 06:04:17

标题: 深度搜索-V2：强大、经济、高效的专家混合语言模型

摘要: 我们提出了DeepSeek-V2，这是一个强大的混合专家（MoE）语言模型，具有经济高效的训练和推理特性。它包含236B总参数，其中每个标记激活21B，并支持128K标记的上下文长度。DeepSeek-V2采用创新的架构，包括多头潜在注意力（MLA）和DeepSeekMoE。MLA通过将关键-值（KV）缓存显著压缩为潜在向量来保证高效推理，而DeepSeekMoE通过稀疏计算实现在经济成本下训练强大模型。与DeepSeek 67B相比，DeepSeek-V2表现更强大，同时节省了42.5%的训练成本，将KV缓存减少了93.3%，并将最大生成吞吐量提高了5.76倍。我们在包含8.1T标记的高质量和多源语料库上预训练DeepSeek-V2，进一步进行监督微调（SFT）和强化学习（RL）以充分释放其潜力。评估结果显示，即使只有21B激活参数，DeepSeek-V2及其聊天版本仍然在开源模型中表现出顶尖性能。

更新时间: 2024-06-19 06:04:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04434v5

Data Contamination Can Cross Language Barriers

The opacity in developing large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text overlap between training and evaluation data, which can be too superficial to reflect deeper forms of contamination. In this paper, we first present a cross-lingual form of contamination that inflates LLMs' performance while evading current detection methods, deliberately injected by overfitting LLMs on the translated versions of benchmark test sets. Then, we propose generalization-based approaches to unmask such deeply concealed contamination. Specifically, we examine the LLM's performance change after modifying the original benchmark by replacing the false answer choices with correct ones from other questions. Contaminated models can hardly generalize to such easier situations, where the false choices can be \emph{not even wrong}, as all choices are correct in their memorization. Experimental results demonstrate that cross-lingual contamination can easily fool existing detection methods, but not ours. In addition, we discuss the potential utilization of cross-lingual contamination in interpreting LLMs' working mechanisms and in post-training LLMs for enhanced multilingual capabilities. The code and dataset we use can be obtained from \url{https://github.com/ShangDataLab/Deep-Contam}.

Updated: 2024-06-19 05:53:27

标题: 数据污染可以跨越语言障碍

摘要: 开发大型语言模型（LLMs）中的不透明度引起了人们对预训练数据中可能存在的公共基准的潜在污染日益关注。现有的污染检测方法通常基于训练和评估数据之间的文本重叠，这可能太肤浅，无法反映更深层次的污染形式。在本文中，我们首先提出了一种跨语言形式的污染，通过将LLMs过度拟合于基准测试集的翻译版本中故意注入，从而提高LLMs的性能，同时规避当前的检测方法。然后，我们提出了基于泛化的方法来揭示这种深度隐藏的污染。具体地，我们研究了通过将原始基准替换为其他问题中的正确答案选择项来修改后，LLM的性能变化。受污染的模型几乎无法泛化到这种更容易的情况，其中错误选择可能是“甚至不正确”，因为它们都是正确的记忆。实验结果表明，跨语言污染可以轻易愚弄现有的检测方法，但不能愚弄我们的方法。此外，我们讨论了跨语言污染在解释LLMs的工作机制和在后期训练LLMs以增强多语言能力方面的潜在利用。我们使用的代码和数据集可从\url{https://github.com/ShangDataLab/Deep-Contam}获取。

更新时间: 2024-06-19 05:53:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13236v1

Enhancing Collaborative Semantics of Language Model-Driven Recommendations via Graph-Aware Learning

Large Language Models (LLMs) are increasingly prominent in the recommendation systems domain. Existing studies usually utilize in-context learning or supervised fine-tuning on task-specific data to align LLMs into recommendations. However, the substantial bias in semantic spaces between language processing tasks and recommendation tasks poses a nonnegligible challenge. Specifically, without the adequate capturing ability of collaborative information, existing modeling paradigms struggle to capture behavior patterns within community groups, leading to LLMs' ineffectiveness in discerning implicit interaction semantic in recommendation scenarios. To address this, we consider enhancing the learning capability of language model-driven recommendation models for structured data, specifically by utilizing interaction graphs rich in collaborative semantics. We propose a Graph-Aware Learning for Language Model-Driven Recommendations (GAL-Rec). GAL-Rec enhances the understanding of user-item collaborative semantics by imitating the intent of Graph Neural Networks (GNNs) to aggregate multi-hop information, thereby fully exploiting the substantial learning capacity of LLMs to independently address the complex graphs in the recommendation system. Sufficient experimental results on three real-world datasets demonstrate that GAL-Rec significantly enhances the comprehension of collaborative semantics, and improves recommendation performance.

Updated: 2024-06-19 05:50:15

标题: 通过图感知学习增强基于语言模型驱动的推荐的协作语义

摘要: 大型语言模型（LLMs）在推荐系统领域越来越突出。现有研究通常利用上下文学习或在特定任务数据上进行监督微调，以将LLMs与推荐进行对齐。然而，语言处理任务和推荐任务之间语义空间中的实质性偏差构成了一个不可忽视的挑战。具体而言，缺乏协作信息的充分捕获能力，现有的建模范式很难捕捉社区群体内的行为模式，导致LLMs在推荐场景中无法有效识别隐性交互语义。为解决这一问题，我们考虑通过利用充满协作语义的交互图，增强基于语言模型的推荐模型对结构化数据的学习能力。我们提出了一种面向语言模型驱动推荐的图感知学习（GAL-Rec）方法。GAL-Rec通过模拟图神经网络（GNNs）的意图来聚合多跳信息，充分利用LLMs的巨大学习能力，独立地解决推荐系统中的复杂图形。对三个真实世界数据集的充分实验结果表明，GAL-Rec显著增强了协作语义的理解，并提高了推荐性能。

更新时间: 2024-06-19 05:50:15

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.13235v1

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

Mixture of experts (MoE) has become the standard for constructing production-level large language models (LLMs) due to its promise to boost model capacity without causing significant overheads. Nevertheless, existing MoE methods usually enforce a constant top-k routing for all tokens, which is arguably restrictive because various tokens (e.g., "<EOS>" vs. "apple") may require various numbers of experts for feature abstraction. Lifting such a constraint can help make the most of limited resources and unleash the potential of the model for downstream tasks. In this sense, we introduce AdaMoE to realize token-adaptive routing for MoE, where different tokens are permitted to select a various number of experts. AdaMoE makes minimal modifications to the vanilla MoE with top-k routing -- it simply introduces a fixed number of null experts, which do not consume any FLOPs, to the expert set and increases the value of k. AdaMoE does not force each token to occupy a fixed number of null experts but ensures the average usage of the null experts with a load-balancing loss, leading to an adaptive number of null/true experts used by each token. AdaMoE exhibits a strong resemblance to MoEs with expert choice routing while allowing for trivial auto-regressive modeling. AdaMoE is easy to implement and can be effectively applied to pre-trained (MoE-)LLMs. Extensive studies show that AdaMoE can reduce average expert load (FLOPs) while achieving superior performance. For example, on the ARC-C dataset, applying our method to fine-tuning Mixtral-8x7B can reduce FLOPs by 14.5% while increasing accuracy by 1.69%.

Updated: 2024-06-19 05:47:10

标题: AdaMoE: 基于空白专家的令牌自适应路由的专家混合语言模型

摘要: 混合专家（MoE）已成为构建生产级大型语言模型（LLMs）的标准，因为它承诺提高模型容量而不会造成显着的开销。然而，现有的MoE方法通常对所有标记强制执行常数的前k路由，这可能是限制性的，因为各种标记（例如，“<EOS>”与“apple”）可能需要不同数量的专家进行特征抽象。解除这种限制可以帮助充分利用有限资源，并释放模型在下游任务中的潜力。在这方面，我们引入了AdaMoE，以实现MoE的令牌自适应路由，其中不同的令牌允许选择不同数量的专家。AdaMoE对原始MoE进行了最小的修改，只是向专家组引入了一定数量的空专家，不消耗任何FLOPs，并增加了k的值。AdaMoE不强制每个标记占据固定数量的空专家，而是确保空专家的平均使用率通过负载平衡损失，导致每个标记使用的空/真实专家数量是自适应的。AdaMoE与具有专家选择路由的MoE非常相似，同时允许简单的自回归建模。AdaMoE易于实现，并可以有效应用于预训练（MoE-）LLMs。广泛的研究表明，AdaMoE可以减少平均专家负载（FLOPs），同时实现更优越的性能。例如，在ARC-C数据集上，将我们的方法应用于对Mixtral-8x7B的微调可以将FLOPs减少14.5%，同时将准确性提高1.69%。

更新时间: 2024-06-19 05:47:10

领域: cs.AI

下载: http://arxiv.org/abs/2406.13233v1

Towards Measuring and Modeling "Culture" in LLMs: A Survey

We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs). We observe that none of the studies explicitly define "culture, which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of "culture". We call these aspects the proxies of culture, and organize them across two dimensions of demographic and semantic proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of ``culture,'' such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness of probing techniques and situated studies on the impact of cultural mis- and under-representation in LLM-based applications.

Updated: 2024-06-19 05:43:27

标题: 朝向在LLMs中测量和建模“文化”：一项调查

摘要: 我们提出了一个针对大型语言模型（LLMs）中文化表达和包容性研究的90多篇近期论文的调查。我们观察到，这些研究中没有一个明确定义“文化”，而文化是一个复杂多面的概念；相反，它们在一些专门设计的数据集上对模型进行探测，这些数据集代表了“文化”的某些方面。我们称这些方面为文化的代理，并将它们组织在人口统计学代理和语义代理两个维度上。我们还对采用的探测方法进行分类。我们的分析表明，只有文化的某些方面，如价值观和目标，已经被研究，留下了其他一些有趣和重要的方面，特别是语义领域的多样性（Thompson等，2020）和关于性（Hershcovich等，2022），尚未探索。另外两个关键的差距是探测技术的稳健性不足和关于文化不当和不足代表在基于LLM的应用中的影响的定位研究。

更新时间: 2024-06-19 05:43:27

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.15412v4

Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models

Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora. Recent advances stem from the confluence of several factors, such as large-scale training datasets, deep learning techniques, and the rise of large language models. High-quality datasets are used to train models on realistic scenarios and enable the evaluation of the system on potentially unseen data. Standardized metrics facilitate comparisons between different ODQA systems, allowing researchers to objectively track advancements in the field. Our study presents a thorough examination of the current landscape of ODQA benchmarking by reviewing 52 datasets and 20 evaluation techniques across textual and multimodal modalities. We introduce a novel taxonomy for ODQA datasets that incorporates both the modality and difficulty of the question types. Additionally, we present a structured organization of ODQA evaluation metrics along with a critical analysis of their inherent trade-offs. Our study aims to empower researchers by providing a framework for the robust evaluation of modern question-answering systems. We conclude by identifying the current challenges and outlining promising avenues for future research and development.

Updated: 2024-06-19 05:43:02

标题: 朝向稳健评估：大语言模型时代开放领域问答的数据集和指标全面分类学

摘要: 自然语言处理中的开放领域问答（ODQA）涉及构建利用大规模知识语料库回答事实问题的系统。最近的进展源于几个因素的交汇，如大规模训练数据集、深度学习技术和大型语言模型的兴起。高质量的数据集用于在现实场景下训练模型，并使系统在可能未见过的数据上进行评估。标准化指标促进了不同ODQA系统之间的比较，使研究人员能够客观地跟踪该领域的进展。我们的研究通过审查52个数据集和20种文本和多模态模式的评估技术，全面审视了当前的ODQA基准评估情况。我们引入了一个新颖的ODQA数据集分类法，结合了问题类型的模态和难度。此外，我们提出了一个结构化的ODQA评估指标组织，并对其固有的权衡进行了批判性分析。我们的研究旨在为研究人员提供一个现代问答系统强有力评估的框架。最后，我们确定了当前的挑战，并概述了未来研究和发展的有前途的途径。

更新时间: 2024-06-19 05:43:02

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2406.13232v1

Probing the Emergence of Cross-lingual Alignment during LLM Training

Multilingual Large Language Models (LLMs) achieve remarkable levels of zero-shot cross-lingual transfer performance. We speculate that this is predicated on their ability to align languages without explicit supervision from parallel sentences. While representations of translationally equivalent sentences in different languages are known to be similar after convergence, however, it remains unclear how such cross-lingual alignment emerges during pre-training of LLMs. Our study leverages intrinsic probing techniques, which identify which subsets of neurons encode linguistic features, to correlate the degree of cross-lingual neuron overlap with the zero-shot cross-lingual transfer performance for a given model. In particular, we rely on checkpoints of BLOOM, a multilingual autoregressive LLM, across different training steps and model scales. We observe a high correlation between neuron overlap and downstream performance, which supports our hypothesis on the conditions leading to effective cross-lingual transfer. Interestingly, we also detect a degradation of both implicit alignment and multilingual abilities in certain phases of the pre-training process, providing new insights into the multilingual pretraining dynamics.

Updated: 2024-06-19 05:31:59

标题: 探究LLM训练过程中跨语言对齐的出现

摘要: 多语言大语言模型（LLMs）实现了显著水平的零-shot 跨语言转移性能。我们推测这是基于它们能够在没有平行句子的明确监督下对齐语言的能力。虽然已知不同语言中翻译等价句子的表示在收敛后相似，但是，目前尚不清楚这种跨语言对齐是如何在LLMs的预训练过程中出现的。我们的研究利用内在探测技术，识别哪些神经元子集编码语言特征，以将跨语言神经元重叠程度与给定模型的零-shot 跨语言转移性能相关联。特别地，我们依赖于BLOOM的检查点，这是一个多语言自回归LLM，跨不同的训练步骤和模型规模。我们观察到神经元重叠与下游性能之间的高相关性，这支持了我们关于有效跨语言转移条件的假设。有趣的是，我们还发现在预训练过程的某些阶段中，隐式对齐和多语言能力都会出现退化，从而提供了关于多语言预训练动态的新见解。

更新时间: 2024-06-19 05:31:59

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13229v1

AGSOA:Graph Neural Network Targeted Attack Based on Average Gradient and Structure Optimization

Graph Neural Networks(GNNs) are vulnerable to adversarial attack that cause performance degradation by adding small perturbations to the graph. Gradient-based attacks are one of the most commonly used methods and have achieved good performance in many attack scenarios. However, current gradient attacks face the problems of easy to fall into local optima and poor attack invisibility. Specifically, most gradient attacks use greedy strategies to generate perturbations, which tend to fall into local optima leading to underperformance of the attack. In addition, many attacks only consider the effectiveness of the attack and ignore the invisibility of the attack, making the attacks easily exposed leading to failure. To address the above problems, this paper proposes an attack on GNNs, called AGSOA, which consists of an average gradient calculation and a structre optimization module. In the average gradient calculation module, we compute the average of the gradient information over all moments to guide the attack to generate perturbed edges, which stabilizes the direction of the attack update and gets rid of undesirable local maxima. In the structure optimization module, we calculate the similarity and homogeneity of the target node's with other nodes to adjust the graph structure so as to improve the invisibility and transferability of the attack. Extensive experiments on three commonly used datasets show that AGSOA improves the misclassification rate by 2$\%$-8$\%$ compared to other state-of-the-art models.

Updated: 2024-06-19 05:29:20

标题: AGSOA:基于平均梯度和结构优化的图神经网络定向攻击

摘要: 图神经网络（GNNs）容易受到对抗性攻击的影响，通过向图中添加小的扰动导致性能下降。梯度攻击是最常用的方法之一，在许多攻击场景中取得了良好的表现。然而，当前的梯度攻击存在容易陷入局部最优和攻击不可见性差的问题。具体来说，大多数梯度攻击使用贪婪策略生成扰动，往往会陷入局部最优导致攻击效果不佳。此外，许多攻击只考虑攻击的有效性，忽视攻击的不可见性，使得攻击容易暴露导致失败。为解决上述问题，本文提出了一种针对GNNs的攻击，称为AGSOA，包括平均梯度计算和结构优化模块。在平均梯度计算模块中，我们计算所有时刻梯度信息的平均值，引导攻击生成扰动边，稳定攻击更新的方向，摆脱不良局部最大值。在结构优化模块中，我们计算目标节点与其他节点的相似性和同质性，调整图结构以提高攻击的不可见性和可转移性。对三个常用数据集进行的大量实验表明，与其他最先进模型相比，AGSOA将错误分类率提高了2％-8％。

更新时间: 2024-06-19 05:29:20

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.13228v1

REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark

The ability of CodeLLMs to generate executable and functionally correct code at the repository-level scale remains largely unexplored. We introduce RepoExec, a novel benchmark for evaluating code generation at the repository-level scale. RepoExec focuses on three main aspects: executability, functional correctness through automated test case generation with high coverage rate, and carefully crafted cross-file contexts to accurately generate code. Our work explores a controlled scenario where developers specify necessary code dependencies, challenging the model to integrate these accurately. Experiments show that while pretrained LLMs outperform instruction-tuned models in correctness, the latter excel in utilizing provided dependencies and demonstrating debugging capabilities. We also introduce a new instruction-tuned dataset that focuses on code dependencies and demonstrate that CodeLLMs fine-tuned on our dataset have a better capability to leverage these dependencies effectively. RepoExec aims to provide a comprehensive evaluation of code functionality and alignment with developer intent, paving the way for more reliable and applicable CodeLLMs in real-world scenarios. The dataset and source code can be found at~\url{https://github.com/FSoft-AI4Code/RepoExec}.

Updated: 2024-06-19 05:27:32

标题: REPOEXEC：使用存储库级可执行基准评估代码生成

摘要: CodeLLMs在存储库级别规模上生成可执行和功能正确的代码的能力仍然大部分未被探索。我们引入了RepoExec，这是一个用于评估存储库级别规模上的代码生成的新型基准。RepoExec关注三个主要方面：可执行性，通过高覆盖率的自动化测试用例生成的功能正确性，以及精心制作的跨文件上下文，以准确生成代码。我们的工作探索了一个受控场景，在这个场景中开发人员指定必要的代码依赖项，挑战模型准确集成这些依赖项。实验表明，虽然预训练的LLMs在正确性方面优于指令调整模型，但后者在利用提供的依赖项和展示调试能力方面表现出色。我们还引入了一个新的指令调整数据集，该数据集关注代码依赖关系，并展示了在我们的数据集上微调的CodeLLMs具有更好地有效利用这些依赖关系的能力。RepoExec旨在提供对代码功能和与开发人员意图的一致性的全面评估，为真实场景中更可靠和适用的CodeLLMs铺平道路。数据集和源代码可以在\url{https://github.com/FSoft-AI4Code/RepoExec}上找到。

更新时间: 2024-06-19 05:27:32

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.11927v2

Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithms

We study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This setup uses noise to account for uncertainty in either gradient evaluation or iteration updates, and it includes Polyak's heavy-ball and Nesterov's accelerated methods as special cases. For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification and identify fundamental stochastic performance tradeoffs. Our approach utilizes the Jury stability criterion to provide a novel geometric characterization of conditions for linear convergence, and it reveals the relation between the noise amplification and convergence rate as well as their dependence on the condition number and the constant algorithmic parameters. This geometric insight leads to simple alternative proofs of standard convergence results and allows us to establish ``uncertainty principle'' of strongly convex optimization: for the two-step momentum method with linear convergence rate, the lower bound on the product between the settling time and noise amplification scales quadratically with the condition number. Our analysis also identifies a key difference between the gradient and iterate noise models: while the amplification of gradient noise can be made arbitrarily small by sufficiently decelerating the algorithm, the best achievable variance for the iterate noise model increases linearly with the settling time in the decelerating regime. Finally, we introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time while preserving order-wise Pareto optimality for both noise models.

Updated: 2024-06-19 05:26:12

标题: 基于动量的加速优化算法在收敛速度和噪声放大之间的权衡Tradeoffs

摘要: 我们研究基于动量的一阶优化算法，其中迭代利用前两步的信息，并受到附加的白噪声的影响。这种设置使用噪声来考虑梯度评估或迭代更新中的不确定性，并将Polyak的重球和Nesterov的加速方法作为特例包括在内。对于强凸二次问题，我们使用优化变量中误差的稳态方差来量化噪声放大，并确定基本随机性能的折衷。我们的方法利用Jury稳定性标准提供了线性收敛条件的新颖几何特征描述，并揭示了噪声放大与收敛速率之间的关系，以及它们对条件数和常数算法参数的依赖关系。这种几何洞察力导致了标准收敛结果的简单替代证明，并使我们能够建立强凸优化的“不确定性原理”：对于具有线性收敛速率的两步动量方法，解决时间和噪声放大之间乘积的下限与条件数的平方比例增加。我们的分析还确定了梯度和迭代噪声模型之间的一个关键区别：尽管通过足够减速算法可以使梯度噪声的放大变得任意小，但在减速区域，迭代噪声模型的最佳可达方差随着解决时间线性增加。最后，我们介绍了两个参数化的算法族，平衡了噪声放大和解决时间，同时保持了对两种噪声模型的阶次Pareto最优性。

更新时间: 2024-06-19 05:26:12

领域: math.OC,cs.LG,cs.SY,eess.SY,math.DS

下载: http://arxiv.org/abs/2209.11920v3

Communication-Efficient Federated Knowledge Graph Embedding with Entity-Wise Top-K Sparsification

Federated Knowledge Graphs Embedding learning (FKGE) encounters challenges in communication efficiency stemming from the considerable size of parameters and extensive communication rounds. However, existing FKGE methods only focus on reducing communication rounds by conducting multiple rounds of local training in each communication round, and ignore reducing the size of parameters transmitted within each communication round. To tackle the problem, we first find that universal reduction in embedding precision across all entities during compression can significantly impede convergence speed, underscoring the importance of maintaining embedding precision. We then propose bidirectional communication-efficient FedS based on Entity-Wise Top-K Sparsification strategy. During upload, clients dynamically identify and upload only the Top-K entity embeddings with the greater changes to the server. During download, the server first performs personalized embedding aggregation for each client. It then identifies and transmits the Top-K aggregated embeddings to each client. Besides, an Intermittent Synchronization Mechanism is used by FedS to mitigate negative effect of embedding inconsistency among shared entities of clients caused by heterogeneity of Federated Knowledge Graph. Extensive experiments across three datasets showcase that FedS significantly enhances communication efficiency with negligible (even no) performance degradation.

Updated: 2024-06-19 05:26:02

标题: 通信高效的联合知识图嵌入与基于实体的Top-K稀疏化

摘要: 联邦知识图嵌入学习（FKGE）在通信效率上遇到挑战，这是由于参数规模和广泛的通信轮数较大。然而，现有的FKGE方法只关注通过在每次通信轮中进行多轮本地训练来减少通信轮数，而忽略了在每次通信轮中减少传输参数规模的问题。为了解决这个问题，我们首先发现，在压缩过程中对所有实体的嵌入精度进行普遍降低会显著阻碍收敛速度，突显了保持嵌入精度的重要性。然后，我们提出了基于实体级别Top-K稀疏化策略的双向通信高效FedS。在上传过程中，客户端动态识别并仅上传对服务器有较大变化的Top-K实体嵌入。在下载过程中，服务器首先为每个客户端执行个性化嵌入聚合。然后识别并传输Top-K聚合嵌入给每个客户端。此外，FedS使用间歇同步机制来减轻因联邦知识图的异构性而导致的客户端共享实体之间嵌入不一致的负面影响。通过对三个数据集的广泛实验表明，FedS显著提高了通信效率，几乎没有（甚至没有）性能降级。

更新时间: 2024-06-19 05:26:02

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.13225v1

BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM

Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods can directly benefit from online training samples, we highlight the need to develop specific online DAP algorithms to fully harness the power of online training. Specifically, we identify that the learned LLM should adhere to the proximity of the behavior LLM, which collects the training samples. To this end, we propose online Preference Optimization in proximity to the Behavior LLM (BPO), emphasizing the importance of constructing a proper trust region for LLM alignment. We conduct extensive experiments to validate the effectiveness and applicability of our approach by integrating it with various DAP methods, resulting in significant performance improvements across a wide range of tasks when training with the same amount of preference data. Even when only introducing one additional data collection phase, our online BPO improves its offline DAP baseline from 72.0% to 80.2% on TL;DR and from 82.2% to 89.1% on Anthropic Helpfulness in terms of win rate against human reference text.

Updated: 2024-06-19 05:25:27

标题: BPO: 通过遵循行为接近性来增强在线偏好学习 LLM

摘要: 直接从偏好（DAP）出现为一个有前景的范例，用于将大型语言模型（LLMs）与人类渴望从预先收集的离线偏好数据集对齐。最近的研究表明，现有的离线DAP方法可以直接从在线训练样本中受益，我们强调需要开发特定的在线DAP算法，以充分利用在线训练的力量。具体来说，我们确定学习的LLM应该遵循行为LLM的接近性，该LLM收集训练样本。为此，我们提出在与行为LLM的接近性中进行在线偏好优化（BPO），强调构建适当的信任区域对于LLM对齐的重要性。我们进行了大量实验，通过将其与各种DAP方法整合，验证了我们方法的有效性和适用性，在使用相同量的偏好数据进行训练时，在各种任务中取得了显著的性能改进。即使只引入一个附加的数据收集阶段，我们的在线BPO也将其离线DAP基线从TL;DR的72.0%提高到80.2%，在对人类参考文本的胜率方面，从82.2%提高到89.1%。

更新时间: 2024-06-19 05:25:27

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12168v2

Privacy-Preserving Logistic Regression Training on Large Datasets

Privacy-preserving machine learning is one class of cryptographic methods that aim to analyze private and sensitive data while keeping privacy, such as homomorphic logistic regression training over large encrypted data. In this paper, we propose an efficient algorithm for logistic regression training on large encrypted data using Homomorphic Encryption (HE), which is the mini-batch version of recent methods using a faster gradient variant called $\texttt{quadratic gradient}$. It is claimed that $\texttt{quadratic gradient}$ can integrate curve information (Hessian matrix) into the gradient and therefore can effectively accelerate the first-order gradient (descent) algorithms. We also implement the full-batch version of their method when the encrypted dataset is so large that it has to be encrypted in the mini-batch manner. We compare our mini-batch algorithm with our full-batch implementation method on real financial data consisting of 422,108 samples with 200 freatures. %Our experiments show that Nesterov's accelerated gradient (NAG) Given the inefficiency of HEs, our results are inspiring and demonstrate that the logistic regression training on large encrypted dataset is of practical feasibility, marking a significant milestone in our understanding.

Updated: 2024-06-19 05:19:20

标题: 大数据集上隐私保护的逻辑回归训练

摘要: 隐私保护机器学习是一类密码学方法，旨在分析私人和敏感数据的同时保持隐私，例如在大规模加密数据上进行同态逻辑回归训练。本文提出了一种使用同态加密（HE）的大规模加密数据上逻辑回归训练的高效算法，这是使用更快的梯度变体称为“二次梯度”的最新方法的小批量版本。据称，“二次梯度”可以将曲线信息（Hessian矩阵）集成到梯度中，因此可以有效加速第一阶梯度（下降）算法。当加密数据集太大而必须以小批量方式加密时，我们还实现了他们方法的全批量版本。我们将我们的小批量算法与我们在由422,108个样本和200个特征组成的真实财务数据上的全批量实现方法进行了比较。我们的实验表明，尼斯托罗夫加速梯度（NAG）鉴于HE的低效率，我们的结果令人鼓舞，并表明在大规模加密数据集上进行逻辑回归训练是实际可行的，这标志着我们理解的重要里程碑。

更新时间: 2024-06-19 05:19:20

领域: cs.CR

下载: http://arxiv.org/abs/2406.13221v1

Combining Optimal Transport and Embedding-Based Approaches for More Expressiveness in Unsupervised Graph Alignment

Unsupervised graph alignment finds the one-to-one node correspondence between a pair of attributed graphs by only exploiting graph structure and node features. One category of existing works first computes the node representation and then matches nodes with close embeddings, which is intuitive but lacks a clear objective tailored for graph alignment in the unsupervised setting. The other category reduces the problem to optimal transport (OT) via Gromov-Wasserstein (GW) learning with a well-defined objective but leaves a large room for exploring the design of transport cost. We propose a principled approach to combine their advantages motivated by theoretical analysis of model expressiveness. By noticing the limitation of discriminative power in separating matched and unmatched node pairs, we improve the cost design of GW learning with feature transformation, which enables feature interaction across dimensions. Besides, we propose a simple yet effective embedding-based heuristic inspired by the Weisfeiler-Lehman test and add its prior knowledge to OT for more expressiveness when handling non-Euclidean data. Moreover, we are the first to guarantee the one-to-one matching constraint by reducing the problem to maximum weight matching. The algorithm design effectively combines our OT and embedding-based predictions via stacking, an ensemble learning strategy. We propose a model framework named \texttt{CombAlign} integrating all the above modules to refine node alignment progressively. Through extensive experiments, we demonstrate significant improvements in alignment accuracy compared to state-of-the-art approaches and validate the effectiveness of the proposed modules.

Updated: 2024-06-19 04:57:35

标题: 将最佳传输和基于嵌入的方法相结合，以实现无监督图对齐的更高表现力

摘要: 无监督图对齐通过仅利用图结构和节点特征，在一对带属性的图之间找到一对一的节点对应关系。现有工作的一类首先计算节点表示，然后匹配具有相似嵌入的节点，这是直观的，但缺乏针对无监督设置中图对齐的明确目标。另一类通过将问题简化为最优输运（OT）通过Gromov-Wasserstein（GW）学习，具有明确定义的目标，但对于探索输运成本的设计留下了很大的空间。我们提出了一种合理的方法，结合它们的优势，这是受到模型表达能力的理论分析的启发。通过注意到在分开匹配和未匹配节点对方面的辨别能力的限制，我们改进了GW学习的成本设计，通过特征变换实现，这使得特征在维度之间进行交互。此外，我们提出了一个简单而有效的基于嵌入的启发式方法，受到Weisfeiler-Lehman测试的启发，并将其先验知识添加到OT中，以处理非欧几里得数据时具有更强的表现力。此外，我们是第一个通过将问题简化为最大权重匹配，来保证一对一匹配约束。算法设计有效地通过堆叠，一个整体学习策略，结合了我们的OT和基于嵌入的预测。我们提出了一个名为\texttt{CombAlign}的模型框架，整合了上述所有模块，逐渐完善节点对齐。通过大量实验，我们展示了与最先进方法相比对齐精度的显著提高，并验证了所提出模块的有效性。

更新时间: 2024-06-19 04:57:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13216v1

Neural Residual Diffusion Models for Deep Scalable Vision Generation

The most advanced diffusion models have recently adopted increasingly deep stacked networks (e.g., U-Net or Transformer) to promote the generative emergence capabilities of vision generation models similar to large language models (LLMs). However, progressively deeper stacked networks will intuitively cause numerical propagation errors and reduce noisy prediction capabilities on generative data, which hinders massively deep scalable training of vision generation models. In this paper, we first uncover the nature that neural networks being able to effectively perform generative denoising lies in the fact that the intrinsic residual unit has consistent dynamic property with the input signal's reverse diffusion process, thus supporting excellent generative abilities. Afterwards, we stand on the shoulders of two common types of deep stacked networks to propose a unified and massively scalable Neural Residual Diffusion Models framework (Neural-RDM for short), which is a simple yet meaningful change to the common architecture of deep generative networks by introducing a series of learnable gated residual parameters that conform to the generative dynamics. Experimental results on various generative tasks show that the proposed neural residual models obtain state-of-the-art scores on image's and video's generative benchmarks. Rigorous theoretical proofs and extensive experiments also demonstrate the advantages of this simple gated residual mechanism consistent with dynamic modeling in improving the fidelity and consistency of generated content and supporting large-scale scalable training. Code is available at https://github.com/Anonymous/Neural-RDM.

Updated: 2024-06-19 04:57:18

标题: 神经残差扩散模型用于深度可扩展视觉生成

摘要: 最先进的扩散模型最近采用了越来越深层的堆叠网络（例如U-Net或Transformer），以促进类似于大型语言模型（LLMs）的视觉生成模型的生成性能。然而，逐渐更深的堆叠网络会直观地导致数值传播错误，并降低对生成数据的嘈杂预测能力，这阻碍了视觉生成模型的大规模深度可扩展训练。在本文中，我们首先揭示了神经网络能够有效进行生成去噪的本质，其在于内在残差单元具有与输入信号的反向扩散过程一致的动态特性，从而支持出色的生成能力。随后，我们基于两种常见类型的深度堆叠网络提出了一个统一且大规模可扩展的神经残差扩散模型框架（简称为神经-RDM），通过引入一系列可学习的门控残差参数来符合生成动态，这是对深度生成网络常见架构的简单而有意义的改变。在各种生成任务上的实验结果显示，所提出的神经残差模型在图像和视频的生成基准上获得了最先进的得分。严格的理论证明和广泛的实验证明了这种简单的门控残差机制与动态建模一致，在改善生成内容的忠实度和一致性以及支持大规模可扩展训练方面具有优势。代码可在https://github.com/Anonymous/Neural-RDM找到。

更新时间: 2024-06-19 04:57:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13215v1

Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck

Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in identifying how past events influence their predictions. Since the explanation model for a static graph cannot be readily applied to temporal graphs due to its inability to capture temporal dependencies, recent studies proposed explanation models for temporal graphs. However, existing explanation models for temporal graphs rely on post-hoc explanations, requiring separate models for prediction and explanation, which is limited in two aspects: efficiency and accuracy of explanation. In this work, we propose a novel built-in explanation framework for temporal graphs, called Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck (TGIB). TGIB provides explanations for event occurrences by introducing stochasticity in each temporal event based on the Information Bottleneck theory. Experimental results demonstrate the superiority of TGIB in terms of both the link prediction performance and explainability compared to state-of-the-art methods. This is the first work that simultaneously performs prediction and explanation for temporal graphs in an end-to-end manner.

Updated: 2024-06-19 04:55:34

标题: 基于图信息瓶颈的可解释时序图网络

摘要: 时间图神经网络（TGNN）具有捕捉图拓扑和图内交互动态依赖关系的能力。由于在识别过去事件如何影响预测的困难，对TGNN模型预测的解释需求日益增长。由于静态图的解释模型不能直接应用于时间图，因为它无法捕捉时间依赖关系，近期研究提出了时间图的解释模型。然而，现有的时间图解释模型依赖于事后解释，需要为预测和解释分别建立模型，这在效率和解释准确性方面存在限制。在这项工作中，我们提出了一种新颖的内置解释框架，称为基于图信息瓶颈的自解释时间图网络（TGIB）。TGIB通过引入信息瓶颈理论中的随机性，为事件发生提供解释。实验结果表明，与最先进的方法相比，TGIB在链接预测性能和可解释性方面具有优势。这是第一项在端到端方式中同时进行时间图预测和解释的工作。

更新时间: 2024-06-19 04:55:34

领域: cs.LG

下载: http://arxiv.org/abs/2406.13214v1

Solving Robust MDPs through No-Regret Dynamics

Reinforcement Learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics. However, solving Markov Decision Processes that are robust to changes is difficult due to nonconvexity and size of action or state spaces. While most works have analyzed this problem by taking different assumptions on the problem, a general and efficient theoretical analysis is still missing. However, we generate a simple framework for improving robustness by solving a minimax iterative optimization problem where a policy player and an environmental dynamics player are playing against each other. Leveraging recent results in online nonconvex learning and techniques from improving policy gradient methods, we yield an algorithm that maximizes the robustness of the Value Function on the order of $\mathcal{O}\left(\frac{1}{T^{\frac{1}{2}}}\right)$ where $T$ is the number of iterations of the algorithm.

Updated: 2024-06-19 04:53:55

标题: 通过无悔动态解决鲁棒MDPs

摘要: 强化学习是训练代理在不同情况下导航的强大框架，但容易受环境动态变化的影响。然而，解决对环境变化具有稳健性的马尔可夫决策过程是困难的，因为动作或状态空间的非凸性和规模问题。虽然大多数研究通过对问题做出不同的假设来分析这个问题，但仍然缺乏一种通用和高效的理论分析方法。然而，我们提出了一个简单的框架，通过解决一个最小最大迭代优化问题来改善稳健性，在这个问题中，一个策略玩家和一个环境动态玩家相互对抗。利用在线非凸学习的最新结果和改进策略梯度方法的技术，我们提出了一种算法，可以将价值函数的稳健性最大化到$\mathcal{O}\left(\frac{1}{T^{\frac{1}{2}}}\right)$的程度，其中$T$是算法的迭代次数。

更新时间: 2024-06-19 04:53:55

领域: cs.LG

下载: http://arxiv.org/abs/2305.19035v2

Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata

The retrieval-augmented generation (RAG) enables retrieval of relevant information from an external knowledge source and allows large language models (LLMs) to answer queries over previously unseen document collections. However, it was demonstrated that traditional RAG applications perform poorly in answering multi-hop questions, which require retrieving and reasoning over multiple elements of supporting evidence. We introduce a new method called Multi-Meta-RAG, which uses database filtering with LLM-extracted metadata to improve the RAG selection of the relevant documents from various sources, relevant to the question. While database filtering is specific to a set of questions from a particular domain and format, we found out that Multi-Meta-RAG greatly improves the results on the MultiHop-RAG benchmark. The code is available at https://github.com/mxpoliakov/Multi-Meta-RAG.

Updated: 2024-06-19 04:53:48

标题: 多元元RAG：使用LLM提取的元数据通过数据库过滤来改进多跳查询的RAG

摘要: 检索增强生成（RAG）使得能够从外部知识源中检索相关信息，并允许大型语言模型（LLMs）在以前未见过的文档集合上回答查询。然而，已经证明传统的RAG应用在回答需要检索和推理多个支持证据元素的多跳问题时表现不佳。我们引入了一种称为多元RAG的新方法，该方法使用LLM提取的元数据进行数据库过滤，以改进RAG从各种来源中选择相关文档的能力，与问题相关。虽然数据库过滤是针对特定领域和格式的一组问题，但我们发现Multi-Meta-RAG极大地提高了MultiHop-RAG基准测试的结果。该代码可在https://github.com/mxpoliakov/Multi-Meta-RAG 上找到。

更新时间: 2024-06-19 04:53:48

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2406.13213v1

The Uli Dataset: An Exercise in Experience Led Annotation of oGBV

Online gender based violence has grown concomitantly with adoption of the internet and social media. Its effects are worse in the Global majority where many users use social media in languages other than English. The scale and volume of conversations on the internet has necessitated the need for automated detection of hate speech, and more specifically gendered abuse. There is, however, a lack of language specific and contextual data to build such automated tools. In this paper we present a dataset on gendered abuse in three languages- Hindi, Tamil and Indian English. The dataset comprises of tweets annotated along three questions pertaining to the experience of gender abuse, by experts who identify as women or a member of the LGBTQIA community in South Asia. Through this dataset we demonstrate a participatory approach to creating datasets that drive AI systems.

Updated: 2024-06-19 04:50:25

标题: Uli数据集：基于经验引导的oGBV标注练习

摘要: 网络性别暴力随着互联网和社交媒体的普及而不断增长。在全球大多数地区，许多用户使用非英语语言使用社交媒体，其影响更为严重。互联网上的对话规模和数量的增加促使了对仇恨言论的自动检测的需求，尤其是针对性别虐待的检测。然而，目前缺乏特定语言和语境数据来构建这样的自动化工具。本文介绍了一个关于印地语、泰米尔语和印度英语中性别虐待的数据集。该数据集包含专家在南亚自认为是女性或LGBTQIA社区成员的三个问题的推文注释，涉及性别虐待的经历。通过这个数据集，我们展示了一种参与式的方法来创建驱动AI系统的数据集。

更新时间: 2024-06-19 04:50:25

领域: cs.CL,cs.AI,cs.SI

下载: http://arxiv.org/abs/2311.09086v2

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.

Updated: 2024-06-19 04:48:13

标题: SwinGNN：重新思考扩散模型中的置换不变性，用于图生成

摘要: 基于置换等变网络的扩散模型可以学习图数据的置换不变分布。然而，与非不变对照模型相比，我们发现这些不变模型遇到更大的学习挑战，因为1）它们的有效目标分布具有更多模式；2）它们的最佳一步去噪分数是具有更多组分的高斯混合的分数函数。在这一分析的基础上，我们提出了一个非不变的扩散模型，称为 $\textit{SwinGNN}$，它使用高效的边对边 2-WL 消息传递网络，并利用受 SwinTransformers 启发的移动窗口自注意力。此外，通过系统的消融，我们确定了几种关键的训练和采样技术，显著提高了图生成样本的质量。最后，我们介绍了一个简单的后处理技巧，即随机排列生成的图形，可以将任何图生成模型转换为置换不变模型。在合成和真实蛋白质和分子数据集上进行的大量实验表明，我们的 SwinGNN 实现了最先进的性能。我们的代码发布在 https://github.com/qiyan98/SwinGNN。

更新时间: 2024-06-19 04:48:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.01646v4

Surgical Triplet Recognition via Diffusion Model

Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms. The goal is to identify the combinations of instruments, verbs, and targets presented in surgical video frames. In this paper, we propose DiffTriplet, a new generative framework for surgical triplet recognition employing the diffusion model, which predicts surgical triplets via iterative denoising. To handle the challenge of triplet association, two unique designs are proposed in our diffusion framework, i.e., association learning and association guidance. During training, we optimize the model in the joint space of triplets and individual components to capture the dependencies among them. At inference, we integrate association constraints into each update of the iterative denoising process, which refines the triplet prediction using the information of individual components. Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition. Our codes will be released.

Updated: 2024-06-19 04:43:41

标题: 手术三胞胎识别的扩散模型

摘要: 手术三重识别是实现下一代具有上下文感知的手术室的基本构建模块。其目标是识别手术视频帧中呈现的仪器、动词和目标的组合。在本文中，我们提出了DiffTriplet，一种利用扩散模型进行手术三重识别的新的生成框架，通过迭代去噪来预测手术三重。为了解决三重关联的挑战，在我们的扩散框架中提出了两个独特的设计，即关联学习和关联指导。在训练过程中，我们优化模型在三重和各个组件的联合空间中，以捕捉它们之间的依赖关系。在推断中，我们将关联约束集成到迭代去噪过程的每次更新中，利用各个组件的信息来优化三重预测。在CholecT45和CholecT50数据集上的实验显示，所提出的方法在实现手术三重识别的最新性能方面具有优越性。我们的代码将会发布。

更新时间: 2024-06-19 04:43:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13210v1

PowerPeeler: A Precise and General Dynamic Deobfuscation Method for PowerShell Scripts

PowerShell is a powerful and versatile task automation tool. Unfortunately, it is also widely abused by cyber attackers. To bypass malware detection and hinder threat analysis, attackers often employ diverse techniques to obfuscate malicious PowerShell scripts. Existing deobfuscation tools suffer from the limitation of static analysis, which fails to simulate the real deobfuscation process accurately. In this paper, we propose PowerPeeler. To the best of our knowledge, it is the first dynamic PowerShell script deobfuscation approach at the instruction level. It utilizes expression-related Abstract Syntax Tree (AST) nodes to identify potential obfuscated script pieces. Then, PowerPeeler correlates the AST nodes with their corresponding instructions and monitors the script's entire execution process. Subsequently, PowerPeeler dynamically tracks the execution of these instructions and records their execution results. Finally, PowerPeeler stringifies these results to replace the corresponding obfuscated script pieces and reconstruct the deobfuscated script. To evaluate the effectiveness of PowerPeeler, we collect 1,736,669 real-world malicious PowerShell samples with diversity obfuscation methods. We compare PowerPeeler with five state-of-the-art deobfuscation tools and GPT-4. The evaluation results demonstrate that PowerPeeler can effectively handle all well-known obfuscation methods. Additionally, the deobfuscation correctness rate of PowerPeeler reaches 95%, significantly surpassing that of other tools. PowerPeeler not only recovers the highest amount of sensitive data but also maintains a semantic consistency over 97%, which is also the best. Moreover, PowerPeeler effectively obtains the largest quantity of valid deobfuscated results within a limited time frame. Furthermore, PowerPeeler is extendable and can be used as a helpful tool for other cyber security solutions.

Updated: 2024-06-19 04:42:55

标题: PowerPeeler：一种用于PowerShell脚本的精确和通用的动态反混淆方法

摘要: PowerShell是一种强大而多功能的任务自动化工具。不幸的是，它也经常被网络攻击者滥用。为了绕过恶意软件检测并阻碍威胁分析，攻击者经常采用各种技术来混淆恶意PowerShell脚本。现有的解混淆工具受到静态分析的限制，无法准确模拟真实的解混淆过程。在本文中，我们提出了PowerPeeler。据我们所知，这是首个基于指令级别的动态PowerShell脚本解混淆方法。它利用与表达式相关的抽象语法树（AST）节点来识别潜在的混淆脚本片段。然后，PowerPeeler将AST节点与其相应的指令进行关联，并监视脚本的整个执行过程。随后，PowerPeeler动态跟踪这些指令的执行，并记录它们的执行结果。最后，PowerPeeler将这些结果字符串化，以替换相应的混淆脚本片段并重建解混淆脚本。为了评估PowerPeeler的有效性，我们收集了1,736,669个具有多样化混淆方法的真实恶意PowerShell样本。我们将PowerPeeler与五种最先进的解混淆工具和GPT-4进行比较。评估结果表明，PowerPeeler能够有效处理所有众所周知的混淆方法。此外，PowerPeeler的解混淆正确率达到95%，显著超过其他工具。PowerPeeler不仅恢复了最多的敏感数据，而且保持了超过97%的语义一致性，这也是最好的。此外，PowerPeeler在有限的时间内有效地获得了最多的有效解混淆结果。此外，PowerPeeler是可扩展的，并可以用作其他网络安全解决方案的有用工具。

更新时间: 2024-06-19 04:42:55

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.04027v2

Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach

Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic graph embedding remains an open problem. Neglecting degree changes in dynamic graphs will significantly impair embedding effectiveness without notably improving structure fairness. This is because the embedding performance of high-degree and low-to-high-degree vertices will significantly drop close to the generally poorer embedding performance of most slightly changed vertices in the long-tail part of the power-law distribution. We first identify biased structural evolutions in a dynamic graph based on the evolving trend of vertex degree and then propose FairDGE, the first structurally Fair Dynamic Graph Embedding algorithm. FairDGE learns biased structural evolutions by jointly embedding the connection changes among vertices and the long-short-term evolutionary trend of vertex degrees. Furthermore, a novel dual debiasing approach is devised to encode fair embeddings contrastively, customizing debiasing strategies for different biased structural evolutions. This innovative debiasing strategy breaks the effectiveness bottleneck of embeddings without notable fairness loss. Extensive experiments demonstrate that FairDGE achieves simultaneous improvement in the effectiveness and fairness of embeddings.

Updated: 2024-06-19 04:20:12

标题: 朝向动态图嵌入中的结构公平性：一种趋势感知的双重去偏方法

摘要: 最近的研究成功地学习了静态图嵌入，通过防止高度和低度顶点组在下游图挖掘任务中的有效性差异而实现结构公平。然而，在动态图嵌入中实现结构公平仍然是一个开放的问题。忽视动态图中的度数变化将显着损害嵌入的有效性，而不显著改善结构公平。这是因为高度顶点和低到高度顶点的嵌入性能将显着下降，接近幂律分布尾部大多数轻微变化顶点的普遍较差的嵌入性能。我们首先根据顶点度的演变趋势在动态图中识别有偏见的结构演变，然后提出了FairDGE，第一个结构上公平的动态图嵌入算法。FairDGE通过联合嵌入顶点之间的连接变化和顶点度的长短期演变趋势来学习有偏见的结构演变。此外，还设计了一种新颖的双重去偏方法来对嵌入进行对比编码，为不同有偏见的结构演变定制去偏策略。这种创新的去偏策略打破了嵌入的有效性瓶颈，而没有明显的公平损失。大量实验证明，FairDGE实现了嵌入的有效性和公平性的同时改善。

更新时间: 2024-06-19 04:20:12

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2406.13201v1

RobGC: Towards Robust Graph Condensation

Graph neural networks (GNNs) have attracted widespread attention for their impressive capability of graph representation learning. However, the increasing prevalence of large-scale graphs presents a significant challenge for GNN training due to their computational demands, limiting the applicability of GNNs in various scenarios. In response to this challenge, graph condensation (GC) is proposed as a promising acceleration solution, focusing on generating an informative compact graph that enables efficient training of GNNs while retaining performance. Despite the potential to accelerate GNN training, existing GC methods overlook the quality of large training graphs during both the training and inference stages. They indiscriminately emulate the training graph distributions, making the condensed graphs susceptible to noises within the training graph and significantly impeding the application of GC in intricate real-world scenarios. To address this issue, we propose robust graph condensation (RobGC), a plug-and-play approach for GC to extend the robustness and applicability of condensed graphs in noisy graph structure environments. Specifically, RobGC leverages the condensed graph as a feedback signal to guide the denoising process on the original training graph. A label propagation-based alternating optimization strategy is in place for the condensation and denoising processes, contributing to the mutual purification of the condensed graph and training graph. Additionally, as a GC method designed for inductive graph inference, RobGC facilitates test-time graph denoising by leveraging the noise-free condensed graph to calibrate the structure of the test graph. Extensive experiments show that RobGC is compatible with various GC methods, significantly boosting their robustness under different types and levels of graph structural noises.

Updated: 2024-06-19 04:14:57

标题: RobGC：走向稳健的图压缩

摘要: 图神经网络（GNNs）因其在图表示学习方面的出色能力而受到广泛关注。然而，大规模图的普及越来越多地提出了对GNN训练的重大挑战，由于其计算需求限制了GNN在各种场景中的适用性。为了应对这一挑战，提出了图压缩（GC）作为一种有前途的加速解决方案，专注于生成一种信息紧凑的图，使得GNN的训练更有效率，同时保持性能。尽管有加速GNN训练的潜力，现有的GC方法在训练和推断阶段都忽略了大型训练图的质量。它们不加区分地模拟训练图分布，使得压缩图容易受到训练图内的噪声影响，严重阻碍了GC在复杂的现实场景中的应用。为了解决这个问题，我们提出了鲁棒图压缩（RobGC），这是一种插拔式的GC方法，旨在扩展在嘈杂的图结构环境中压缩图的鲁棒性和适用性。具体而言，RobGC利用压缩图作为反馈信号来引导原始训练图上的去噪过程。基于标签传播的交替优化策略用于压缩和去噪过程，有助于相互净化压缩图和训练图。此外，作为一种面向归纳图推断的GC方法，RobGC通过利用无噪声的压缩图来校准测试图的结构，从而促进测试时的图去噪。广泛的实验表明，RobGC与各种GC方法兼容，显著提高了它们在不同类型和水平的图结构噪声下的鲁棒性。

更新时间: 2024-06-19 04:14:57

领域: cs.LG

下载: http://arxiv.org/abs/2406.13200v1

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications. Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks. This study introduces PRESTO(Progressive Pretraining Enhances Synthetic Chemistry Outcomes), a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations. It progressively improves multimodal LLMs through cross-modal alignment and multi-graph understanding. Our extensive experiments demonstrate that PRESTO offers competitive results in downstream synthetic chemistry tasks. The code can be found at https://github.com/IDEA-XL/PRESTO.

Updated: 2024-06-19 03:59:46

标题: PRESTO：渐进式预训练提升合成化学结果

摘要: 多模态大型语言模型（MLLMs）已经在各种科学学科中得到越来越广泛的应用。这些进展鼓励在合成化学领域内进行分子-文本建模的研究，该领域致力于设计和进行化学反应，合成具有期望性质和应用的新化合物。然而，当前的方法往往忽视了多个分子图之间相互作用在理解化学反应中的关键作用，导致在合成化学任务中表现出亚优秀的性能。本研究介绍了一种名为PRESTO（Progressive Pretraining Enhances Synthetic Chemistry Outcomes）的新框架，通过整合全面的预训练策略和数据集配置的基准，弥合了分子-文本模态间的差距。它通过跨模态对齐和多图理解逐步改进了多模态LLMs。我们的大量实验表明，PRESTO在下游合成化学任务中提供了竞争性的结果。代码可以在https://github.com/IDEA-XL/PRESTO 找到。

更新时间: 2024-06-19 03:59:46

领域: cs.LG,cs.AI,cs.CL,physics.chem-ph

下载: http://arxiv.org/abs/2406.13193v1

DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address this gap, we introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs from diverse domains, with nodes and edges enriched by dynamically changing text attributes and categories. To facilitate the use of DTGB, we design standardized evaluation procedures based on four real-world use cases: future link prediction, destination node retrieval, edge classification, and textual relation generation. These tasks require models to understand both dynamic graph structures and natural language, highlighting the unique challenges posed by DyTAGs. Moreover, we conduct extensive benchmark experiments on DTGB, evaluating 7 popular dynamic graph learning algorithms and their variants of adapting to text attributes with LLM embeddings, along with 6 powerful large language models (LLMs). Our results show the limitations of existing models in handling DyTAGs. Our analysis also demonstrates the utility of DTGB in investigating the incorporation of structural and textual dynamics. The proposed DTGB fosters research on DyTAGs and their broad applications. It offers a comprehensive benchmark for evaluating and advancing models to handle the interplay between dynamic graph structures and natural language. The dataset and source code are available at https://github.com/zjs123/DTGB.

Updated: 2024-06-19 03:58:35

标题: DTGB: 一个用于动态文本属性图的全面基准

摘要: 动态文本属性图（DyTAGs）在各种实际场景中普遍存在，其中每个节点和边都与文本描述相关联，图结构和文本描述都随时间演变。尽管它们具有广泛的适用性，但是缺乏专门针对DyTAGs的基准数据集，这妨碍了许多研究领域的潜在进展。为了填补这一空白，我们引入了Dynamic Text-attributed Graph Benchmark（DTGB），这是一个来自不同领域的大规模、不断演变的图形集合，其节点和边通过动态变化的文本属性和类别丰富。为了促进DTGB的使用，我们设计了基于四种实际用例的标准化评估程序：未来链接预测、目标节点检索、边缘分类和文本关系生成。这些任务要求模型理解动态图结构和自然语言，突显了DyTAGs带来的独特挑战。此外，我们在DTGB上进行了广泛的基准实验，评估了7种流行的动态图学习算法及其通过LLM嵌入适应文本属性的变体，以及6种强大的大型语言模型（LLMs）。我们的结果显示现有模型在处理DyTAGs方面的局限性。我们的分析还展示了DTGB在研究结构和文本动态性融合方面的实用性。所提出的DTGB促进了对DyTAGs及其广泛应用的研究。它为评估和推进处理动态图结构和自然语言之间相互作用的模型提供了全面的基准。数据集和源代码可在https://github.com/zjs123/DTGB 上获取。

更新时间: 2024-06-19 03:58:35

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12072v2

Computing in the Life Sciences: From Early Algorithms to Modern AI

Computing in the life sciences has undergone a transformative evolution, from early computational models in the 1950s to the applications of artificial intelligence (AI) and machine learning (ML) seen today. This paper highlights key milestones and technological advancements through the historical development of computing in the life sciences. The discussion includes the inception of computational models for biological processes, the advent of bioinformatics tools, and the integration of AI/ML in modern life sciences research. Attention is given to AI-enabled tools used in the life sciences, such as scientific large language models and bio-AI tools, examining their capabilities, limitations, and impact to biological risk. This paper seeks to clarify and establish essential terminology and concepts to ensure informed decision-making and effective communication across disciplines.

Updated: 2024-06-19 03:54:28

标题: 生命科学中的计算：从早期算法到现代人工智能

摘要: 生命科学中的计算经历了一场变革性的演变，从上世纪50年代早期的计算模型到今天所见的人工智能（AI）和机器学习（ML）的应用。本文通过生命科学中计算的历史发展，突出关键的里程碑和技术进步。讨论包括生物过程的计算模型的起源，生物信息学工具的出现，以及人工智能/机器学习在现代生命科学研究中的整合。重点放在生命科学中使用的人工智能工具上，例如科学大型语言模型和生物人工智能工具，检查它们的能力、限制和对生物风险的影响。本文旨在澄清和确立基本术语和概念，以确保跨学科间的明智决策和有效沟通。

更新时间: 2024-06-19 03:54:28

领域: q-bio.OT,cs.AI

下载: http://arxiv.org/abs/2406.12108v2

COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals

A wide range of respiratory diseases, such as cold and flu, asthma, and COVID-19, affect people's daily lives worldwide. In medical practice, respiratory sounds are widely used in medical services to diagnose various respiratory illnesses and lung disorders. The traditional diagnosis of such sounds requires specialized knowledge, which can be costly and reliant on human expertise. Despite this, recent advancements, such as cough audio recordings, have emerged as a means to automate the detection of respiratory conditions. Therefore, this research aims to explore various acoustic features that enhance the performance of machine learning (ML) models in detecting COVID-19 from cough signals. It investigates the efficacy of three feature extraction techniques, including Mel Frequency Cepstral Coefficients (MFCC), Chroma, and Spectral Contrast features, when applied to two machine learning algorithms, Support Vector Machine (SVM) and Multilayer Perceptron (MLP), and therefore proposes an efficient CovCepNet detection system. The proposed system provides a practical solution and demonstrates state-of-the-art classification performance, with an AUC of 0.843 on the COUGHVID dataset and 0.953 on the Virufy dataset for COVID-19 detection from cough audio signals.

Updated: 2024-06-19 03:51:04

标题: COVID-19检测系统：基于咳嗽音频信号的声学特征的系统性能比较分析

摘要: 一系列呼吸系统疾病，如感冒、流感、哮喘和COVID-19，影响着全球人们的日常生活。在医学实践中，呼吸音被广泛用于诊断各种呼吸系统疾病和肺部疾病。传统的诊断方法需要专业知识，成本高且依赖于人类专业知识。尽管如此，最近出现了一些新进展，如咳嗽音频记录，作为自动检测呼吸状况的手段。因此，这项研究旨在探索各种声学特征，以提高机器学习（ML）模型在从咳嗽信号中检测COVID-19时的性能。它研究了三种特征提取技术的有效性，包括Mel频率倒谱系数（MFCC）、Chroma和谱对比特征，当应用于两种机器学习算法，支持向量机（SVM）和多层感知器（MLP）时，因此提出了一种高效的CovCepNet检测系统。提出的系统提供了一个实际解决方案，并展示了最新的分类性能，对于COUGHVID数据集的AUC为0.843，对于从咳嗽音频信号中检测COVID-19的Virufy数据集为0.953。

更新时间: 2024-06-19 03:51:04

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2309.04505v2

FinBen: A Holistic Financial Benchmark for Large Language Models

LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals several key findings: While LLMs excel in IE and textual analysis, they struggle with advanced reasoning and complex tasks like text generation and forecasting. GPT-4 excels in IE and stock trading, while Gemini is better at text generation and forecasting. Instruction-tuned LLMs improve textual analysis but offer limited benefits for complex tasks such as QA. FinBen has been used to host the first financial LLMs shared task at the FinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novel solutions outperformed GPT-4, showcasing FinBen's potential to drive innovation in financial LLMs. All datasets, results, and codes are released for the research community: https://github.com/The-FinAI/PIXIU.

Updated: 2024-06-19 03:38:56

标题: FinBen：大型语言模型的全面金融基准

摘要: LLMs已经改变了自然语言处理，并在各个领域展现了潜力，然而由于缺乏全面的评估基准、LLMs的快速发展以及金融任务的复杂性，它们在金融领域的潜力尚未充分挖掘。本文介绍了FinBen，这是第一个广泛的开源评估基准，包括36个数据集，涵盖24个金融任务，涉及七个关键方面：信息抽取（IE）、文本分析、问答（QA）、文本生成、风险管理、预测和决策。FinBen提供了几个关键创新：更广泛的任务和数据集范围，首次对股票交易进行评估，新颖的agent和检索增强生成（RAG）评估，以及三个新颖的开源评估数据集，用于文本摘要、问答和股票交易。我们对包括GPT-4、ChatGPT和最新的Gemini在内的15个代表性LLMs进行评估，揭示了几个关键发现：虽然LLMs在IE和文本分析方面表现出色，但在高级推理和复杂任务（如文本生成和预测）方面表现不佳。GPT-4在IE和股票交易方面表现出色，而Gemini在文本生成和预测方面表现更好。经过指导调整的LLMs改善了文本分析，但在问答等复杂任务上提供有限的好处。FinBen已被用于在IJCAI-2024期间的FinNLP-AgentScen研讨会上举办第一个金融LLMs共享任务，吸引了12支团队。他们的新颖解决方案超越了GPT-4，展示了FinBen推动金融LLMs创新的潜力。所有数据集、结果和代码都已发布给研究社区：https://github.com/The-FinAI/PIXIU。

更新时间: 2024-06-19 03:38:56

领域: cs.CL,cs.AI,cs.CE

下载: http://arxiv.org/abs/2402.12659v2

Synthetic Context Generation for Question Generation

Despite rapid advancements in large language models (LLMs), QG remains a challenging problem due to its complicated process, open-ended nature, and the diverse settings in which question generation occurs. A common approach to address these challenges involves fine-tuning smaller, custom models using datasets containing background context, question, and answer. However, obtaining suitable domain-specific datasets with appropriate context is often more difficult than acquiring question-answer pairs. In this paper, we investigate training QG models using synthetic contexts generated by LLMs from readily available question-answer pairs. We conduct a comprehensive study to answer critical research questions related to the performance of models trained on synthetic contexts and their potential impact on QG research and applications. Our empirical results reveal: 1) contexts are essential for QG tasks, even if they are synthetic; 2) fine-tuning smaller language models has the capability of achieving better performances as compared to prompting larger language models; and 3) synthetic context and real context could achieve comparable performances. These findings highlight the effectiveness of synthetic contexts in QG and paves the way for future advancements in the field.

Updated: 2024-06-19 03:37:52

标题: 合成环境生成用于问题生成

摘要: 尽管大型语言模型（LLMs）取得了快速的进展，但由于其复杂的过程、开放式的特性以及问题生成发生的多样化环境，QG仍然是一个具有挑战性的问题。解决这些挑战的常见方法包括使用包含背景上下文、问题和答案的数据集来微调较小的自定义模型。然而，获得适用于特定领域的数据集和适当上下文通常比获取问题-答案对更困难。在本文中，我们研究了使用LLMs生成的合成上下文训练QG模型，这些合成上下文是由现有的问题-答案对生成的。我们进行了一项全面研究，回答了与在合成上下文上训练的模型的性能及其对QG研究和应用潜在影响相关的关键研究问题。我们的实证结果显示：1）上下文对于QG任务至关重要，即使它们是合成的；2）微调较小的语言模型具有比提示较大的语言模型更好性能的能力；3）合成上下文和真实上下文可以实现可比的性能。这些发现突显了合成上下文在QG中的有效性，并为未来该领域的进展铺平道路。

更新时间: 2024-06-19 03:37:52

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13188v1

Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised Learning

While long-tailed semi-supervised learning (LTSSL) has received tremendous attention in many real-world classification problems, existing LTSSL algorithms typically assume that the class distributions of labeled and unlabeled data are almost identical. Those LTSSL algorithms built upon the assumption can severely suffer when the class distributions of labeled and unlabeled data are mismatched since they utilize biased pseudo-labels from the model. To alleviate this problem, we propose a new simple method that can effectively utilize unlabeled data from unknown class distributions through Boosting cOnsistency in duAl Training (BOAT). Specifically, we construct the standard and balanced branch to ensure the performance of the head and tail classes, respectively. Throughout the training process, the two branches incrementally converge and interact with each other, eventually resulting in commendable performance across all classes. Despite its simplicity, we show that BOAT achieves state-of-the-art performance on a variety of standard LTSSL benchmarks, e.g., an averaged 2.7% absolute increase in test accuracy against existing algorithms when the class distributions of labeled and unlabeled data are mismatched. Even when the class distributions are identical, BOAT consistently outperforms many sophisticated LTSSL algorithms. We carry out extensive ablation studies to tease apart the factors that are the most important to the success of BOAT. The source code is available at https://github.com/Gank0078/BOAT.

Updated: 2024-06-19 03:35:26

标题: 提升双向训练中的一致性，用于长尾半监督学习

摘要: 尽管长尾半监督学习（LTSSL）在许多现实世界的分类问题中受到了极大关注，现有的LTSSL算法通常假设有标记和无标记数据的类分布几乎相同。建立在这一假设基础上的LTSSL算法在有标记和无标记数据的类分布不匹配时可能严重受损，因为它们利用模型中的偏倚伪标签。为了缓解这个问题，我们提出了一种新的简单方法，可以通过双重训练中的增强一致性（BOAT）有效利用未知类分布的无标记数据。具体地，我们构建了标准和平衡分支，分别确保头部和尾部类的性能。在整个训练过程中，这两个分支逐渐收敛并相互交互，最终导致在所有类别上表现出色。尽管简单，我们展示了BOAT在各种标准LTSSL基准上取得了最先进的性能，例如，在有标记和无标记数据的类分布不匹配时，测试准确率平均提高了2.7％绝对值，超过了现有算法。即使类分布相同，BOAT仍然 consistently 超越了许多复杂的LTSSL算法。我们进行了大量的消融研究，以揭示对BOAT成功最重要的因素。源代码可在https://github.com/Gank0078/BOAT 上找到。

更新时间: 2024-06-19 03:35:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.13187v1

MAiDE-up: Multilingual Deception Detection of GPT-generated Hotel Reviews

Deceptive reviews are becoming increasingly common, especially given the increase in performance and the prevalence of LLMs. While work to date has addressed the development of models to differentiate between truthful and deceptive human reviews, much less is known about the distinction between real reviews and AI-authored fake reviews. Moreover, most of the research so far has focused primarily on English, with very little work dedicated to other languages. In this paper, we compile and make publicly available the MAiDE-up dataset, consisting of 10,000 real and 10,000 AI-generated fake hotel reviews, balanced across ten languages. Using this dataset, we conduct extensive linguistic analyses to (1) compare the AI fake hotel reviews to real hotel reviews, and (2) identify the factors that influence the deception detection model performance. We explore the effectiveness of several models for deception detection in hotel reviews across three main dimensions: sentiment, location, and language. We find that these dimensions influence how well we can detect AI-generated fake reviews.

Updated: 2024-06-19 03:34:42

标题: MAiDE-up: GPT生成的酒店评论的多语言欺骗检测

摘要: 欺骗性评论越来越普遍，特别是随着性能提升和LLM的普及。迄今为止的工作主要致力于开发区分真实和欺骗性人类评论的模型，但对真实评论和AI生成的假评论之间的区别了解甚少。此外，迄今为止的研究大多集中在英语上，很少有工作专门研究其他语言。在本文中，我们编制并公开了MAiDE-up数据集，包括10,000条真实和10,000条AI生成的假酒店评论，跨越十种语言平衡分布。利用这一数据集，我们进行了广泛的语言分析，以(1)比较AI生成的假酒店评论和真实酒店评论，以及(2)识别影响欺骗检测模型性能的因素。我们探讨了在酒店评论中欺骗检测的几种模型在情感、位置和语言三个主要维度上的有效性。我们发现这些维度影响我们对AI生成的假评论的检测能力。

更新时间: 2024-06-19 03:34:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.12938v2

A Federated Learning Approach for Multi-stage Threat Analysis in Advanced Persistent Threat Campaigns

Multi-stage threats like advanced persistent threats (APT) pose severe risks by stealing data and destroying infrastructure, with detection being challenging. APTs use novel attack vectors and evade signature-based detection by obfuscating their network presence, often going unnoticed due to their novelty. Although machine learning models offer high accuracy, they still struggle to identify true APT behavior, overwhelming analysts with excessive data. Effective detection requires training on multiple datasets from various clients, which introduces privacy issues under regulations like GDPR. To address these challenges, this paper proposes a novel 3-phase unsupervised federated learning (FL) framework to detect APTs. It identifies unique log event types, extracts suspicious patterns from related log events, and orders them by complexity and frequency. The framework ensures privacy through a federated approach and enhances security using Paillier's partial homomorphic encryption. Tested on the SoTM 34 dataset, our framework compares favorably against traditional methods, demonstrating efficient pattern extraction and analysis from log files, reducing analyst workload, and maintaining stringent data privacy. This approach addresses significant gaps in current methodologies, offering a robust solution to APT detection in compliance with privacy laws.

Updated: 2024-06-19 03:34:41

标题: 一个用于高级持久性威胁攻击中多阶段威胁分析的联邦学习方法

摘要: 多阶段威胁，如高级持久性威胁（APT），通过窃取数据和破坏基础设施，带来严重风险，检测具有挑战性。APT使用新型攻击向量，并通过混淆其网络存在来规避基于签名的检测，通常由于其新颖性而不被注意。虽然机器学习模型提供了高准确性，但仍然难以识别真正的APT行为，给分析员带来过多的数据。有效的检测需要在来自不同客户的多个数据集上进行训练，这在像GDPR这样的法规下引入了隐私问题。为了解决这些挑战，本文提出了一种新颖的3阶段无监督联邦学习（FL）框架来检测APT。它识别了独特的日志事件类型，从相关日志事件中提取可疑模式，并按复杂性和频率对其进行排序。该框架通过联邦方法确保隐私，并使用Paillier的部分同态加密增强安全性。在SoTM 34数据集上经过测试，我们的框架与传统方法相比表现优异，展示了从日志文件中高效提取和分析模式，减轻了分析员的工作量，并保持严格的数据隐私。这种方法解决了当前方法中的重大差距，为符合隐私法律的APT检测提供了强大的解决方案。

更新时间: 2024-06-19 03:34:41

领域: cs.CR

下载: http://arxiv.org/abs/2406.13186v1

Communication-Efficient and Privacy-Preserving Decentralized Meta-Learning

Distributed learning, which does not require gathering training data in a central location, has become increasingly important in the big-data era. In particular, random-walk-based decentralized algorithms are flexible in that they do not need a central server trusted by all clients and do not require all clients to be active in all iterations. However, existing distributed learning algorithms assume that all learning clients share the same task. In this paper, we consider the more difficult meta-learning setting, in which different clients perform different (but related) tasks with limited training data. To reduce communication cost and allow better privacy protection, we propose LoDMeta (Local Decentralized Meta-learning) with the use of local auxiliary optimization parameters and random perturbations on the model parameter. Theoretical results are provided on both convergence and privacy analysis. Empirical results on a number of few-shot learning data sets demonstrate that LoDMeta has similar meta-learning accuracy as centralized meta-learning algorithms, but does not require gathering data from each client and is able to better protect data privacy for each client.

Updated: 2024-06-19 03:29:51

标题: 高效通信和隐私保护的分布式元学习

摘要: 分布式学习在大数据时代变得越来越重要，因为它不需要在中心位置收集训练数据。特别是基于随机游走的分散式算法具有灵活性，因为它们不需要所有客户端信任的中央服务器，也不需要所有客户端在所有迭代中都处于活动状态。然而，现有的分布式学习算法假设所有学习客户端共享相同的任务。在本文中，我们考虑更困难的元学习设置，在该设置中，不同的客户端执行不同（但相关）的任务，且训练数据有限。为了减少通信成本并更好地保护隐私，我们提出了LoDMeta（本地分散式元学习），利用本地辅助优化参数和对模型参数的随机扰动。我们在收敛性和隐私分析方面提供了理论结果。在许多少样本学习数据集上的实证结果表明，LoDMeta具有与集中式元学习算法类似的元学习精度，但不需要从每个客户端收集数据，并且能够更好地保护每个客户端的数据隐私。

更新时间: 2024-06-19 03:29:51

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.13183v1

TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Not surprisingly, many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. To achieve this, we propose TFB, an automated benchmark for Time Series Forecasting (TSF) methods. TFB advances the state-of-the-art by addressing shortcomings related to datasets, comparison methods, and evaluation pipelines: 1) insufficient coverage of data domains, 2) stereotype bias against traditional methods, and 3) inconsistent and inflexible pipelines. To achieve better domain coverage, we include datasets from 10 different domains: traffic, electricity, energy, the environment, nature, economic, stock markets, banking, health, and the web. We also provide a time series characterization to ensure that the selected datasets are comprehensive. To remove biases against some methods, we include a diverse range of methods, including statistical learning, machine learning, and deep learning methods, and we also support a variety of evaluation strategies and metrics to ensure a more comprehensive evaluations of different methods. To support the integration of different methods into the benchmark and enable fair comparisons, TFB features a flexible and scalable pipeline that eliminates biases. Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8,068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets. The benchmark code and data are available at https://github.com/decisionintelligence/TFB.

Updated: 2024-06-19 03:29:46

标题: TFB：朝着时间序列预测方法的全面和公平基准化前进

摘要: 时间序列在经济、交通、健康和能源等各个领域生成，未来值的预测具有许多重要应用。毫不奇怪，许多预测方法正在被提出。为了确保进展，有必要能够以全面和可靠的方式对这些方法进行经验性研究和比较。为了实现这一目标，我们提出了TFB，一个自动化的时间序列预测（TSF）方法基准。TFB通过解决与数据集、比较方法和评估管道相关的缺陷来推动最新技术进步：1）数据领域覆盖不足，2）对传统方法的刻板印象，3）不一致和不灵活的流程。为了实现更好的领域覆盖，我们包括来自10个不同领域的数据集：交通、电力、能源、环境、自然、经济、股票市场、银行、健康和网络。我们还提供时间序列特征化，以确保所选数据集是全面的。为了消除对某些方法的偏见，我们包括各种方法，包括统计学习、机器学习和深度学习方法，并支持各种评估策略和指标，以确保对不同方法进行更全面的评估。为了支持不同方法的整合进基准，并实现公平比较，TFB具有灵活和可扩展的流程，消除了偏见。接下来，我们使用TFB对21种单变量时间序列预测（UTSF）方法在8,068个单变量时间序列和14种多变量时间序列预测（MTSF）方法在25个数据集上进行了彻底评估。基准代码和数据可在https://github.com/decisionintelligence/TFB上获得。

更新时间: 2024-06-19 03:29:46

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2403.20150v3

Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

Inspiration is linked to various positive outcomes, such as increased creativity, productivity, and happiness. Although inspiration has great potential, there has been limited effort toward identifying content that is inspiring, as opposed to just engaging or positive. Additionally, most research has concentrated on Western data, with little attention paid to other cultures. This work is the first to study cross-cultural inspiration through machine learning methods. We aim to identify and analyze real and AI-generated cross-cultural inspiring posts. To this end, we compile and make publicly available the InspAIred dataset, which consists of 2,000 real inspiring posts, 2,000 real non-inspiring posts, and 2,000 generated inspiring posts evenly distributed across India and the UK. The real posts are sourced from Reddit, while the generated posts are created using the GPT-4 model. Using this dataset, we conduct extensive computational linguistic analyses to (1) compare inspiring content across cultures, (2) compare AI-generated inspiring posts to real inspiring posts, and (3) determine if detection models can accurately distinguish between inspiring content across cultures and data sources.

Updated: 2024-06-19 03:27:43

标题: 跨文化激励检测和分析在真实和LLM生成的社交媒体数据中的应用

摘要: 灵感与各种积极结果相关，如增加创造力、生产力和幸福感。尽管灵感具有巨大潜力，但对于鉴别激发灵感的内容，而非仅仅是吸引人或积极的内容，付出的努力有限。此外，大多数研究集中在西方数据上，对其他文化的关注较少。本研究是第一个通过机器学习方法研究跨文化灵感的工作。我们旨在识别和分析真实和人工智能生成的跨文化激发灵感的帖子。为此，我们编制并公开提供了InspAIred数据集，其中包含2,000个真实激发灵感的帖子，2,000个真实非激发灵感的帖子，以及平均分布在印度和英国的2,000个生成的激发灵感的帖子。真实帖子来自Reddit，而生成的帖子是使用GPT-4模型创建的。利用这个数据集，我们进行了广泛的计算语言分析，以(1)比较跨文化间的激发内容，(2)比较人工智能生成的激发帖子与真实激发帖子，以及(3)确定检测模型是否能够准确区分跨文化和数据来源的激发内容。

更新时间: 2024-06-19 03:27:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.12933v2

Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative modules: 1) Global-Local Spiking Convolution (GLSC) module and 2) Bottleneck-PLIF module. Compared to the hand-crafted feature extraction methods, the GLSC module achieves speech feature extraction that is sparser, more energy-efficient, and yields better performance. The Bottleneck-PLIF module further processes the signals from GLSC with the aim to achieve higher accuracy with fewer parameters. Extensive experiments are conducted on the Google Speech Commands Dataset (V1 and V2). The results show our method achieves competitive performance among SNN-based KWS models with fewer parameters.

Updated: 2024-06-19 03:19:25

标题: 全球-本地卷积与脉冲神经网络在能效关键词检测中的应用

摘要: 由于深度神经网络（DNNs）的发展，关键词检测（KWS）的准确性取得了显著进展。然而，由于KWS系统通常在边缘设备上实现，能效除了性能之外也变得至关重要。在这里，我们利用脉冲神经网络的能效优势，提出了一种端到端的轻量级KWS模型。该模型包括两个创新模块：1）全局-局部脉冲卷积（GLSC）模块和2）瓶颈-PLIF模块。与手工特征提取方法相比，GLSC模块实现了更稀疏、更节能且性能更好的语音特征提取。瓶颈-PLIF模块进一步处理来自GLSC的信号，旨在以更少的参数实现更高的准确性。我们在Google语音命令数据集（V1和V2）上进行了大量实验。结果显示，我们的方法在较少的参数下实现了与基于SNN的KWS模型相媲美的性能。

更新时间: 2024-06-19 03:19:25

领域: cs.SD,cs.AI,cs.NE,eess.AS

下载: http://arxiv.org/abs/2406.13179v1

Transferable Watermarking to Self-supervised Pre-trained Graph Encoders by Trigger Embeddings

Recent years have witnessed the prosperous development of Graph Self-supervised Learning (GSSL), which enables to pre-train transferable foundation graph encoders. However, the easy-to-plug-in nature of such encoders makes them vulnerable to copyright infringement. To address this issue, we develop a novel watermarking framework to protect graph encoders in GSSL settings. The key idea is to force the encoder to map a set of specially crafted trigger instances into a unique compact cluster in the outputted embedding space during model pre-training. Consequently, when the encoder is stolen and concatenated with any downstream classifiers, the resulting model inherits the backdoor of the encoder and predicts the trigger instances to be in a single category with high probability regardless of the ground truth. Experimental results have shown that, the embedded watermark can be transferred to various downstream tasks in black-box settings, including node classification, link prediction and community detection, which forms a reliable watermark verification system for GSSL in reality. This approach also shows satisfactory performance in terms of model fidelity, reliability and robustness.

Updated: 2024-06-19 03:16:11

标题: 标题翻译：通过触发嵌入实现自监督预训练图编码器的可转移水印技术

摘要: 近年来，图自监督学习（GSSL）的繁荣发展使得可以预训练可转移的基础图编码器。然而，这种编码器易于插入的特性使得它们容易受到版权侵犯的风险。为了解决这个问题，我们开发了一个新颖的水印框架来保护GSSL设置中的图编码器。关键思想是在模型预训练期间强制编码器将一组特别设计的触发实例映射到输出的嵌入空间中的一个唯一紧凑的簇中。因此，当编码器被盗用并与任何下游分类器连接时，生成的模型继承了编码器的后门，并且以高概率预测触发实例属于单一类别，而不考虑地面真相。实验结果表明，嵌入的水印可以在黑盒设置中转移到各种下游任务中，包括节点分类、链接预测和社区检测，从而为现实中的GSSL形成可靠的水印验证系统。这种方法在模型忠实度、可靠性和鲁棒性方面也表现出令人满意的性能。

更新时间: 2024-06-19 03:16:11

领域: cs.CR

下载: http://arxiv.org/abs/2406.13177v1

Sparse High Rank Adapters

Low Rank Adaptation (LoRA) has gained massive attention in the recent generative AI research. One of the main advantages of LoRA is its ability to be fused with pretrained models adding no overhead during inference. However, from a mobile deployment standpoint, we can either avoid inference overhead in the fused mode but lose the ability to switch adapters rapidly, or suffer significant (up to 30% higher) inference latency while enabling rapid switching in the unfused mode. LoRA also exhibits concept-loss when multiple adapters are used concurrently. In this paper, we propose Sparse High Rank Adapters (SHiRA), a new paradigm which incurs no inference overhead, enables rapid switching, and significantly reduces concept-loss. Specifically, SHiRA can be trained by directly tuning only 1-2% of the base model weights while leaving others unchanged. This results in a highly sparse adapter which can be switched directly in the fused mode. We further provide theoretical and empirical insights on how high sparsity in SHiRA can aid multi-adapter fusion by reducing concept loss. Our extensive experiments on LVMs and LLMs demonstrate that finetuning only a small fraction of the parameters in the base model is sufficient for many tasks while enabling both rapid switching and multi-adapter fusion. Finally, we provide a latency- and memory-efficient SHiRA implementation based on Parameter-Efficient Finetuning (PEFT) Library. This implementation trains at nearly the same speed as LoRA while consuming lower peak GPU memory, thus making SHiRA easy to adopt for practical use cases.

Updated: 2024-06-19 03:13:11

标题: 稀疏高秩适配器

摘要: 低秩适应（Low Rank Adaptation，LoRA）在最近的生成式人工智能研究中受到了广泛关注。LoRA的主要优势之一是其能够与预训练模型融合，而在推理过程中不增加额外开销。然而，从移动部署的角度来看，我们可以在融合模式下避免推理开销，但失去快速切换适配器的能力，或者在未融合模式下启用快速切换，但会遭受显著（高达30%）的推理延迟。当同时使用多个适配器时，LoRA还会出现概念丢失的问题。在本文中，我们提出了稀疏高秩适配器（Sparse High Rank Adapters，SHiRA），这是一种新的范式，它没有推理开销，可以实现快速切换，并显著减少概念丢失。具体来说，SHiRA可以通过直接调整基础模型权重的1-2%来进行训练，而其他权重保持不变。这导致了一个高度稀疏的适配器，可以直接在融合模式下进行切换。我们进一步提供了理论和实证见解，说明了SHiRA中的高稀疏性如何通过减少概念丢失来帮助多适配器融合。我们在LVMs和LLMs上进行了大量实验，证明只需要微调基础模型中的一小部分参数就足以完成许多任务，同时实现快速切换和多适配器融合。最后，我们基于参数高效微调（Parameter-Efficient Finetuning，PEFT）库提供了一个延迟和内存效率高的SHiRA实现。这个实现的训练速度几乎与LoRA相同，但消耗的GPU内存峰值更低，因此使SHiRA易于在实际用例中采用。

更新时间: 2024-06-19 03:13:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13175v1

Biomedical Visual Instruction Tuning with Clinician Preference Alignment

Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the resultant datasets are not explicitly aligned with domain expertise. In this work, we propose a data-centric framework, Biomedical Visual Instruction Tuning with Clinician Preference Alignment (BioMed-VITAL), that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models. First, during the generation stage, we prompt the GPT-4V generator with a diverse set of clinician-selected demonstrations for preference-aligned data candidate generation. Then, during the selection phase, we train a separate selection model, which explicitly distills clinician and policy-guided model preferences into a rating function to select high-quality data for medical instruction tuning. Results show that the model tuned with the instruction-following data from our method demonstrates a significant improvement in open visual chat (18.5% relatively) and medical VQA (win rate up to 81.73%). Our instruction-following data and models are available at BioMed-VITAL.github.io.

Updated: 2024-06-19 03:07:33

标题: 通过与临床医生偏好调整的生物医学视觉教学优化

摘要: 最近的多模态基础模型的进展展示了在理解和推理视觉和文本信息方面的令人印象深刻的能力。将这些为一般用途训练的基础模型调整到生物医学等专业领域需要大规模的领域特定指导数据集。虽然现有的研究已经探讨了自动筛选这些数据集，但结果数据集并没有明确与领域专业知识对齐。在这项工作中，我们提出了一个数据中心的框架，即医学视觉指导调整与临床医生偏好对齐（BioMed-VITAL），将临床医生偏好纳入到生成和选择调整生物医学多模态基础模型的指导数据的两个阶段。首先，在生成阶段，我们使用各种临床医生选择的示范来提示GPT-4V生成器，以生成偏好对齐的数据候选集。然后，在选择阶段，我们训练一个单独的选择模型，明确地将临床医生和策略引导的模型偏好融入到评分函数中，以选择高质量的医学指导调整数据。结果表明，使用我们方法生成的遵循指导的数据调整的模型在开放式视觉聊天（相对增加18.5%）和医学VQA（获胜率高达81.73%）方面有显著改善。我们的遵循指导数据和模型可在BioMed-VITAL.github.io上找到。

更新时间: 2024-06-19 03:07:33

领域: cs.CV,cs.AI,cs.CL,cs.LG,68T50, 68T45, 68T37, 68T05, 68T07, 68T09,,I.2.7; I.2.6; I.2.10

下载: http://arxiv.org/abs/2406.13173v1

Poisoning Prevention in Federated Learning and Differential Privacy via Stateful Proofs of Execution

The rise in IoT-driven distributed data analytics, coupled with increasing privacy concerns, has led to a demand for effective privacy-preserving and federated data collection/model training mechanisms. In response, approaches such as Federated Learning (FL) and Local Differential Privacy (LDP) have been proposed and attracted much attention over the past few years. However, they still share the common limitation of being vulnerable to poisoning attacks wherein adversaries compromising edge devices feed forged (a.k.a. poisoned) data to aggregation back-ends, undermining the integrity of FL/LDP results. In this work, we propose a system-level approach to remedy this issue based on a novel security notion of Proofs of Stateful Execution (PoSX) for IoT/embedded devices' software. To realize the PoSX concept, we design SLAPP: a System-Level Approach for Poisoning Prevention. SLAPP leverages commodity security features of embedded devices - in particular ARM TrustZoneM security extensions - to verifiably bind raw sensed data to their correct usage as part of FL/LDP edge device routines. As a consequence, it offers robust security guarantees against poisoning. Our evaluation, based on real-world prototypes featuring multiple cryptographic primitives and data collection schemes, showcases SLAPP's security and low overhead.

Updated: 2024-06-19 03:01:31

标题: 在联邦学习和差分隐私中的中毒预防：通过执行状态证明

摘要: 随着物联网驱动的分布式数据分析的兴起，加上日益增加的隐私问题，对有效的保护隐私和联邦数据收集/模型训练机制的需求也在增加。作为回应，近年来提出了联邦学习（FL）和本地差分隐私（LDP）等方法，并引起了广泛关注。然而，它们仍然共享一个共同的限制，即容易受到毒化攻击的影响，即对手危害边缘设备，向聚合后端提供伪造（也称为毒化）数据，破坏FL/LDP结果的完整性。在这项工作中，我们提出了一种基于新颖的安全概念“Proofs of Stateful Execution（PoSX）”的系统级方法，用于解决这个问题，针对物联网/嵌入式设备软件。为了实现PoSX概念，我们设计了SLAPP：一种用于毒化预防的系统级方法。SLAPP利用嵌入式设备的通用安全特性 - 特别是ARM TrustZoneM安全扩展 - 将原始传感数据可验证地绑定到FL/LDP边缘设备例程中的正确使用。因此，它提供了针对毒化攻击的强大安全保证。我们基于具有多个加密原语和数据收集方案的真实世界原型进行的评估展示了SLAPP的安全性和低开销。

更新时间: 2024-06-19 03:01:31

领域: cs.CR

下载: http://arxiv.org/abs/2404.06721v3

Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style

Large Language Models (LLMs) inherently use autoregressive decoding, which lacks parallelism in inference and results in significantly slow inference speeds, especially when hardware parallel accelerators and memory bandwidth are not fully utilized. In this work, we propose Amphista, a speculative decoding algorithm that adheres to a non-autoregressive decoding paradigm. Owing to the increased parallelism, our method demonstrates higher efficiency in inference compared to autoregressive methods. Specifically, Amphista models an Auto-embedding Block capable of parallel inference, incorporating bi-directional attention to enable interaction between different drafting heads. Additionally, Amphista implements Staged Adaptation Layers to facilitate the transition of semantic information from the base model's autoregressive inference to the drafting heads' non-autoregressive speculation, thereby achieving paradigm transformation and feature fusion. We conduct a series of experiments on a suite of Vicuna models using MT-Bench and Spec-Bench. For the Vicuna 33B model, Amphista achieves up to 2.75$\times$ and 1.40$\times$ wall-clock acceleration compared to vanilla autoregressive decoding and Medusa, respectively, while preserving lossless generation quality.

Updated: 2024-06-19 02:53:39

标题: Amphista：在非自回归风格中使用双向多重草稿头加速LLM推断

摘要: 大型语言模型（LLMs）本质上使用自回归解码，缺乏推理中的并行性，并导致推理速度显着减慢，特别是当硬件并行加速器和内存带宽没有充分利用时。在这项工作中，我们提出了Amphista，一种遵循非自回归解码范式的推测解码算法。由于增加了并行性，我们的方法相比于自回归方法在推理中表现出更高的效率。具体而言，Amphista模拟了一个能够进行并行推理的自嵌入块，将双向注意力结合起来，以实现不同起草头之间的交互。此外，Amphista实现了分阶段适应层，以促进从基础模型的自回归推理到起草头的非自回归猜测的语义信息过渡，从而实现范式转换和特征融合。我们在一系列Vicuna模型上使用MT-Bench和Spec-Bench进行了一系列实验。对于Vicuna 33B模型，Amphista相比于原始自回归解码和Medusa，分别实现了高达2.75倍和1.40倍的墙钟加速，同时保持无损生成质量。

更新时间: 2024-06-19 02:53:39

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.13170v1

A Survey on Image-text Multimodal Models

With the significant advancements of Large Language Models (LLMs) in the field of Natural Language Processing (NLP), the development of image-text multimodal models has garnered widespread attention. Current surveys on image-text multimodal models mainly focus on representative models or application domains, but lack a review on how general technical models influence the development of domain-specific models, which is crucial for domain researchers. Based on this, this paper first reviews the technological evolution of image-text multimodal models, from early explorations of feature space to visual language encoding structures, and then to the latest large model architectures. Next, from the perspective of technological evolution, we explain how the development of general image-text multimodal technologies promotes the progress of multimodal technologies in the biomedical field, as well as the importance and complexity of specific datasets in the biomedical domain. Then, centered on the tasks of image-text multimodal models, we analyze their common components and challenges. After that, we summarize the architecture, components, and data of general image-text multimodal models, and introduce the applications and improvements of image-text multimodal models in the biomedical field. Finally, we categorize the challenges faced in the development and application of general models into external factors and intrinsic factors, further refining them into 2 external factors and 5 intrinsic factors, and propose targeted solutions, providing guidance for future research directions. For more details and data, please visit our GitHub page: \url{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}.

Updated: 2024-06-19 02:53:38

标题: 一份关于图像文本多模态模型的调查

摘要: 随着大型语言模型（LLMs）在自然语言处理（NLP）领域的显著进展，图像文本多模态模型的发展引起了广泛关注。目前关于图像文本多模态模型的调查主要集中在代表性模型或应用领域，但缺乏对通用技术模型如何影响领域特定模型发展的审查，这对领域研究人员至关重要。基于此，本文首先回顾了图像文本多模态模型的技术演变，从早期特征空间探索到视觉语言编码结构，再到最新的大模型架构。接下来，从技术演变的角度，我们解释了通用图像文本多模态技术的发展如何促进生物医学领域多模态技术的进步，以及生物医学领域特定数据集的重要性和复杂性。然后，以图像文本多模态模型的任务为中心，分析它们的常见组件和挑战。之后，我们总结了一般图像文本多模态模型的架构、组件和数据，并介绍了图像文本多模态模型在生物医学领域的应用和改进。最后，我们将通用模型在发展和应用中面临的挑战分类为外部因素和内在因素，并进一步将它们细化为2个外部因素和5个内在因素，并提出有针对性的解决方案，为未来研究方向提供指导。有关更多详细信息和数据，请访问我们的GitHub页面：\url{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}。

更新时间: 2024-06-19 02:53:38

领域: cs.CL,cs.AI,cs.MM

下载: http://arxiv.org/abs/2309.15857v3

DiLA: Enhancing LLM Tool Learning with Differential Logic Layer

Considering the challenges faced by large language models (LLMs) in logical reasoning and planning, prior efforts have sought to augment LLMs with access to external solvers. While progress has been made on simple reasoning problems, solving classical constraint satisfaction problems, such as the Boolean Satisfiability Problem (SAT) and Graph Coloring Problem (GCP), remains difficult for off-the-shelf solvers due to their intricate expressions and exponential search spaces. In this paper, we propose a novel differential logic layer-aided language modeling (DiLA) approach, where logical constraints are integrated into the forward and backward passes of a network layer, to provide another option for LLM tool learning. In DiLA, LLM aims to transform the language description to logic constraints and identify initial solutions of the highest quality, while the differential logic layer focuses on iteratively refining the LLM-prompted solution. Leveraging the logic layer as a bridge, DiLA enhances the logical reasoning ability of LLMs on a range of reasoning problems encoded by Boolean variables, guaranteeing the efficiency and correctness of the solution process. We evaluate the performance of DiLA on two classic reasoning problems and empirically demonstrate its consistent outperformance against existing prompt-based and solver-aided approaches.

Updated: 2024-06-19 02:52:00

标题: DiLA: 利用差分逻辑层增强LLM工具学习

摘要: 考虑到大型语言模型（LLMs）在逻辑推理和规划中面临的挑战，先前的努力已经试图通过为LLMs提供外部求解器来增强其能力。虽然在简单推理问题上取得了进展，但解决传统的约束满足问题，如布尔可满足性问题（SAT）和图着色问题（GCP），对于现成的求解器来说仍然困难，因为它们具有复杂的表达式和指数级的搜索空间。在本文中，我们提出了一种新颖的差分逻辑层辅助语言建模（DiLA）方法，其中逻辑约束被整合到网络层的前向和后向传递中，为LLM工具学习提供了另一个选项。在DiLA中，LLM的目标是将语言描述转换为逻辑约束并识别最高质量的初始解，而差分逻辑层则专注于逐步优化LLM提示的解决方案。利用逻辑层作为桥梁，DiLA增强了LLMs在由布尔变量编码的一系列推理问题上的逻辑推理能力，确保解决过程的效率和正确性。我们在两个经典推理问题上评估了DiLA的性能，并通过实证分析展示了其相对于现有基于提示和求解器辅助方法的一贯优越性。

更新时间: 2024-06-19 02:52:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.11903v3

SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm, which is widely used in industry, due to its good performance and easy interpretation. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively with privacy guarantees between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, in order to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples easily. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization, the introduction of new training mechanisms, and multi-classification training optimization. The experimental results show that SecureBoost+ is 6-35x faster than SecureBoost, but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.

Updated: 2024-06-19 02:45:59

标题: SecureBoost+：大规模高性能垂直联邦梯度提升决策树

摘要: 梯度提升决策树(GBDT)是一种集成机器学习算法，由于其良好的性能和易于解释的特点，在工业领域被广泛应用。由于数据隔离和隐私要求的问题，许多研究尝试使用垂直联邦学习来在不同数据所有者之间提供隐私保证的情况下协同训练机器学习模型。SecureBoost是GBDT的最流行的垂直联邦学习算法之一。然而，为了实现隐私保护，SecureBoost涉及复杂的训练过程和耗时的加密操作。这导致SecureBoost训练缓慢，无法扩展到大规模数据。在这项工作中，我们提出了SecureBoost+，一个大规模高性能的垂直联邦梯度提升决策树框架。SecureBoost+在半诚实模型中是安全的，与SecureBoost相同。SecureBoost+可以轻松扩展到数千万个数据样本。SecureBoost+通过对SecureBoost进行几项新的优化，包括密文操作优化，引入新的训练机制和多分类训练优化，实现了高性能。实验结果显示，SecureBoost+比SecureBoost快6-35倍，但准确率相同，并且可以扩展到数千万个数据样本和数千个特征维度。

更新时间: 2024-06-19 02:45:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2110.10927v5

Enhancing supply chain security with automated machine learning

This study tackles the complexities of global supply chains, which are increasingly vulnerable to disruptions caused by port congestion, material shortages, and inflation. To address these challenges, we explore the application of machine learning methods, which excel in predicting and optimizing solutions based on large datasets. Our focus is on enhancing supply chain security through fraud detection, maintenance prediction, and material backorder forecasting. We introduce an automated machine learning framework that streamlines data analysis, model construction, and hyperparameter optimization for these tasks. By automating these processes, our framework improves the efficiency and effectiveness of supply chain security measures. Our research identifies key factors that influence machine learning performance, including sampling methods, categorical encoding, feature selection, and hyperparameter optimization. We demonstrate the importance of considering these factors when applying machine learning to supply chain challenges. Traditional mathematical programming models often struggle to cope with the complexity of large-scale supply chain problems. Our study shows that machine learning methods can provide a viable alternative, particularly when dealing with extensive datasets and complex patterns. The automated machine learning framework presented in this study offers a novel approach to supply chain security, contributing to the existing body of knowledge in the field. Its comprehensive automation of machine learning processes makes it a valuable contribution to the domain of supply chain management.

Updated: 2024-06-19 02:45:32

标题: 利用自动化机器学习增强供应链安全

摘要: 这项研究解决了全球供应链的复杂性问题，这些供应链越来越容易受到港口拥堵、物料短缺和通货膨胀等因素的干扰。为了应对这些挑战，我们探讨了机器学习方法的应用，这些方法在基于大型数据集进行预测和优化解决方案方面表现出色。我们的重点是通过欺诈检测、维护预测和物料缺货预测来增强供应链安全性。我们介绍了一个自动化的机器学习框架，该框架简化了数据分析、模型构建和超参数优化等任务。通过自动化这些过程，我们的框架提高了供应链安全措施的效率和效果。我们的研究确定了影响机器学习性能的关键因素，包括抽样方法、分类编码、特征选择和超参数优化。我们演示了在应用机器学习解决供应链挑战时考虑这些因素的重要性。传统的数学规划模型经常难以应对大规模供应链问题的复杂性。我们的研究表明，机器学习方法可以提供一种可行的替代方案，特别是在处理大量数据集和复杂模式时。本研究提出的自动化机器学习框架为供应链安全提供了一种新颖的方法，有助于补充该领域现有的知识体系。其对机器学习过程的全面自动化使其成为供应链管理领域的宝贵贡献。

更新时间: 2024-06-19 02:45:32

领域: cs.LG,econ.GN,math.OC,q-fin.EC

下载: http://arxiv.org/abs/2406.13166v1

Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification

Patent classification aims to assign multiple International Patent Classification (IPC) codes to a given patent. Recent methods for automatically classifying patents mainly focus on analyzing the text descriptions of patents. However, apart from the texts, each patent is also associated with some assignees, and the knowledge of their applied patents is often valuable for classification. Furthermore, the hierarchical taxonomy formulated by the IPC system provides important contextual information and enables models to leverage the correlations between IPC codes for more accurate classification. However, existing methods fail to incorporate the above aspects. In this paper, we propose an integrated framework that comprehensively considers the information on patents for patent classification. To be specific, we first present an IPC codes correlations learning module to derive their semantic representations via adaptively passing and aggregating messages within the same level and across different levels along the hierarchical taxonomy. Moreover, we design a historical application patterns learning component to incorporate the corresponding assignee's previous patents by a dual channel aggregation mechanism. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions. Experiments on real-world datasets demonstrate the superiority of our approach over the existing methods. Besides, we present the model's ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.

Updated: 2024-06-19 02:45:02

标题: 适应性分类学习和专利分类的历史模式建模

摘要: 专利分类旨在为给定专利分配多个国际专利分类（IPC）代码。最近用于自动分类专利的方法主要集中在分析专利的文本描述上。然而，除了文本之外，每个专利还与一些受让人相关联，了解他们申请的专利对于分类通常是有价值的。此外，IPC系统制定的分层分类法提供了重要的上下文信息，并使模型能够利用IPC代码之间的相关性进行更准确的分类。然而，现有方法未能将上述方面纳入考虑。在本文中，我们提出了一个综合考虑专利信息用于专利分类的集成框架。具体而言，我们首先提出了一个IPC代码相关性学习模块，通过在层次分类中自适应地传递和聚合消息来推导它们的语义表示。此外，我们设计了一个历史申请模式学习组件，通过双通道聚合机制来整合相应受让人的先前专利。最后，我们结合了包含IPC代码语义和受让人的顺序偏好的专利文本的上下文信息进行预测。对真实世界数据集的实验表明，我们的方法优于现有方法。此外，我们展示了模型捕捉受让人的时间模式和IPC代码之间的语义依赖的能力。

更新时间: 2024-06-19 02:45:02

领域: cs.AI

下载: http://arxiv.org/abs/2308.05385v2

Recent and Upcoming Developments in Randomized Numerical Linear Algebra for Machine Learning

Large matrices arise in many machine learning and data analysis applications, including as representations of datasets, graphs, model weights, and first and second-order derivatives. Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness to develop improved algorithms for ubiquitous matrix problems. The area has reached a certain level of maturity; but recent hardware trends, efforts to incorporate RandNLA algorithms into core numerical libraries, and advances in machine learning, statistics, and random matrix theory, have lead to new theoretical and practical challenges. This article provides a self-contained overview of RandNLA, in light of these developments.

Updated: 2024-06-19 02:43:48

标题: 最近和即将发展的基于随机数的数值线性代数在机器学习中的应用

摘要: 大型矩阵在许多机器学习和数据分析应用中出现，包括作为数据集、图形、模型权重以及一阶和二阶导数的表示。随机数值线性代数（RandNLA）是一种利用随机性来发展改进的矩阵问题算法的领域。这一领域已经达到一定的成熟水平；但是最近的硬件趋势，将RandNLA算法纳入核心数值库的努力，以及机器学习、统计学和随机矩阵理论的进展，导致了新的理论和实际挑战。本文基于这些发展提供了RandNLA的自包含概述。

更新时间: 2024-06-19 02:43:48

领域: cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2406.11151v2

Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model

Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe movement guidance to assist less experienced sonographers in conducting freehand echocardiography. This system can enable non-experts, especially in primary departments and medically underserved areas, to perform cardiac ultrasound examinations, potentially improving global healthcare delivery. The core innovation lies in proposing a data-driven world model, named Cardiac Dreamer, for representing cardiac spatial structures. This world model can provide structure features of any cardiac planes around the current probe position in the latent space, serving as an precise navigation map for autonomous plane localization. We train our model with real-world ultrasound data and corresponding probe motion from 110 routine clinical scans with 151K sample pairs by three certified sonographers. Evaluations on three standard planes with 37K sample pairs demonstrate that the world model can reduce navigation errors by up to 33\% and exhibit more stable performance.

Updated: 2024-06-19 02:42:29

标题: 心脏辅助系统：具有世界模型的超声心动图自动探测引导

摘要: 超声心动图是唯一能够实时成像心脏的技术，并且对于诊断大多数心脏疾病至关重要。然而，由于心脏复杂的结构和显著的操作挑战，有经验的心脏超声检查者严重短缺。为了缓解这种情况，我们提出了一种名为心脏副驾驶员系统，可以提供实时探头移动指导，以帮助经验较少的超声检查者进行手控心脏超声检查。这一系统可以使非专家，特别是在初级部门和医疗服务不足地区，执行心脏超声检查，从而潜在地改善全球医疗保健服务。其核心创新在于提出了一个基于数据驱动的世界模型，名为心脏梦想者，用于表示心脏空间结构。这个世界模型可以在潜在空间中提供当前探头位置周围任何心脏平面的结构特征，作为自主平面定位的精确导航地图。我们使用110例常规临床扫描的真实超声数据和相应的探头运动，由三名认证超声检查者提供的151K样本对进行模型训练。对37K样本对的三个标准平面进行评估表明，该世界模型可以将导航错误降低高达33\%，并表现出更稳定的性能。

更新时间: 2024-06-19 02:42:29

领域: eess.IV,cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2406.13165v1

Differentially Private Bias-Term Fine-tuning of Foundation Models

We study the problem of differentially private (DP) fine-tuning of large pre-trained models -- a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraint, yet requires significant computational overhead or modifications to the network architecture. We propose differentially private bias-term fine-tuning (DP-BiTFiT), which matches the state-of-the-art accuracy for DP algorithms and the efficiency of the standard BiTFiT. DP-BiTFiT is model agnostic (not modifying the network architecture), parameter efficient (only training about 0.1% of the parameters), and computation efficient (almost removing the overhead caused by DP, in both the time and space complexity). On a wide range of tasks, DP-BiTFiT is 2~30X faster and uses 2~8X less memory than DP full fine-tuning, even faster than the standard full fine-tuning. This amazing efficiency enables us to conduct DP fine-tuning on language and vision tasks with long-sequence texts and high-resolution images, which were computationally difficult using existing methods. We open-source our code at FastDP (https://github.com/awslabs/fast-differential-privacy).

Updated: 2024-06-19 02:40:18

标题: 基于差分隐私的基础模型偏差项微调

摘要: 我们研究了大型预训练模型的差分隐私（DP）微调问题 - 这是一种适用于解决包含敏感数据的下游任务的最新隐私保护方法。现有工作已经证明，在强隐私约束下可以实现高精度，但需要显着的计算开销或对网络架构的修改。我们提出了差分隐私偏置项微调（DP-BiTFiT），它在DP算法的准确性和标准BiTFiT的效率方面取得了最先进的结果。DP-BiTFiT是模型无关的（不修改网络架构），参数高效（只训练大约0.1%的参数），计算高效（几乎消除了由DP引起的开销，包括时间和空间复杂度）。在广泛的任务上，DP-BiTFiT比DP全面微调快2~30倍，内存使用量少2~8倍，甚至比标准的全面微调还要快。这种惊人的效率使我们能够在语言和视觉任务上进行DP微调，包括长序列文本和高分辨率图像，这些任务在现有方法中是计算上困难的。我们在FastDP（https://github.com/awslabs/fast-differential-privacy）开源了我们的代码。

更新时间: 2024-06-19 02:40:18

领域: cs.LG,cs.CL,cs.CR,cs.CV

下载: http://arxiv.org/abs/2210.00036v3

CodeGemma: Open Code Models Based on Gemma

This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open models. CodeGemma 2B is a state-of-the-art code completion model designed for fast code infilling and open-ended generation in latency-sensitive settings.

Updated: 2024-06-19 02:37:50

标题: CodeGemma：基于Gemma的开放代码模型

摘要: 本文介绍了CodeGemma，这是一个专门针对Gemma构建的开放代码模型集合，能够执行各种代码和自然语言生成任务。我们发布了三个模型变种。CodeGemma 7B预训练（PT）和指令调整（IT）变种具有非常强大的自然语言理解能力，在数学推理方面表现出色，并且能够与其他开放模型的代码能力相匹敌。CodeGemma 2B是一种最先进的代码自动完成模型，专为在延迟敏感环境中进行快速代码填充和开放式生成而设计。

更新时间: 2024-06-19 02:37:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.11409v2

LLMatDesign: Autonomous Materials Discovery with Large Language Models

Discovering new materials can have significant scientific and technological implications but remains a challenging problem today due to the enormity of the chemical space. Recent advances in machine learning have enabled data-driven methods to rapidly screen or generate promising materials, but these methods still depend heavily on very large quantities of training data and often lack the flexibility and chemical understanding often desired in materials discovery. We introduce LLMatDesign, a novel language-based framework for interpretable materials design powered by large language models (LLMs). LLMatDesign utilizes LLM agents to translate human instructions, apply modifications to materials, and evaluate outcomes using provided tools. By incorporating self-reflection on its previous decisions, LLMatDesign adapts rapidly to new tasks and conditions in a zero-shot manner. A systematic evaluation of LLMatDesign on several materials design tasks, in silico, validates LLMatDesign's effectiveness in developing new materials with user-defined target properties in the small data regime. Our framework demonstrates the remarkable potential of autonomous LLM-guided materials discovery in the computational setting and towards self-driving laboratories in the future.

Updated: 2024-06-19 02:35:02

标题: LLMatDesign：大型语言模型自主材料发现

摘要: 发现新材料可能会对科学和技术产生重大影响，但由于化学空间的巨大性，这仍然是一个具有挑战性的问题。最近机器学习的进展使得数据驱动的方法能够快速筛选或生成有潜力的材料，但这些方法仍然严重依赖于大量的训练数据，并且通常缺乏材料发现中常常需要的灵活性和化学理解。我们引入了LLMatDesign，这是一个全新的基于语言的可解释材料设计框架，由大型语言模型（LLMs）驱动。LLMatDesign利用LLM代理将人类指令转化为材料应用修改，并使用提供的工具评估结果。通过对其先前决策的自我反思，LLMatDesign可以以零-shot方式快速适应新任务和条件。对LLMatDesign在若干材料设计任务上进行系统评估，在计算机模拟中验证了LLMatDesign在小数据范围内开发具有用户定义目标属性的新材料的有效性。我们的框架展示了自主LLM引导的材料发现在计算环境中具有显著潜力，未来可能朝着自动驾驶实验室的方向发展。

更新时间: 2024-06-19 02:35:02

领域: cond-mat.mtrl-sci,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.13163v1

AntibodyFlow: Normalizing Flow Model for Designing Antibody Complementarity-Determining Regions

Therapeutic antibodies have been extensively studied in drug discovery and development in the past decades. Antibodies are specialized protective proteins that bind to antigens in a lock-to-key manner. The binding strength/affinity between an antibody and a specific antigen is heavily determined by the complementarity-determining regions (CDRs) on the antibodies. Existing machine learning methods cast in silico development of CDRs as either sequence or 3D graph (with a single chain) generation tasks and have achieved initial success. However, with CDR loops having specific geometry shapes, learning the 3D geometric structures of CDRs remains a challenge. To address this issue, we propose AntibodyFlow, a 3D flow model to design antibody CDR loops. Specifically, AntibodyFlow first constructs the distance matrix, then predicts amino acids conditioned on the distance matrix. Also, AntibodyFlow conducts constraint learning and constrained generation to ensure valid 3D structures. Experimental results indicate that AntibodyFlow outperforms the best baseline consistently with up to 16.0% relative improvement in validity rate and 24.3% relative reduction in geometric graph level error (root mean square deviation, RMSD).

Updated: 2024-06-19 02:31:23

标题: 抗体流：用于设计抗体互补决定区的归一化流模型

摘要: 在过去几十年里，治疗性抗体在药物发现和开发中得到了广泛研究。抗体是一种专门的保护蛋白质，以锁-钥方式结合抗原。抗体与特定抗原之间的结合强度/亲和力主要由抗体上的互补决定区域（CDRs）决定。现有的机器学习方法将CDRs的体外开发视为序列或三维图（具有单链）生成任务，并取得了初步成功。然而，由于CDR环具有特定的几何形状，学习CDRs的三维几何结构仍然是一个挑战。为了解决这个问题，我们提出了AntibodyFlow，一种用于设计抗体CDR环的三维流模型。具体来说，AntibodyFlow首先构建距离矩阵，然后根据距离矩阵预测氨基酸。此外，AntibodyFlow进行约束学习和受约束生成，以确保有效的三维结构。实验结果表明，AntibodyFlow在有效率方面始终优于最佳基准线，相对改善率高达16.0％，几何图水平误差（均方根偏差，RMSD）相对减少24.3％。

更新时间: 2024-06-19 02:31:23

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2406.13162v1

APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts

Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer programs and LLMs, allowing seamless embedding of prompts into Python functions, and vice versa. APPL provides an intuitive and Python-native syntax, an efficient parallelized runtime with asynchronous semantics, and a tracing module supporting effective failure diagnosis and replaying without extra costs. We demonstrate that APPL programs are intuitive, concise, and efficient through three representative scenarios: Chain-of-Thought with self-consistency (CoT-SC), ReAct tool use agent, and multi-agent chat. Experiments on three parallelizable workflows further show that APPL can effectively parallelize independent LLM calls, with a significant speedup ratio that almost matches the estimation.

Updated: 2024-06-19 02:29:59

标题: APPL：一个用于程序和大型语言模型提示和谐集成的快速编程语言

摘要: 大型语言模型（LLMs）在精心制定的提示和集成外部工具的帮助下，越来越能够处理各种任务，但随着任务复杂性的提高，涉及LLMs的工作流程可能变得复杂，因此实施和维护具有挑战性。为了解决这一挑战，我们提出了APPL，一种提示编程语言，作为计算机程序和LLMs之间的桥梁，允许将提示无缝嵌入Python函数，反之亦然。APPL提供直观且与Python原生的语法，具有异步语义的高效并行运行时，以及支持有效故障诊断和回放的跟踪模块，而无需额外成本。我们通过三种代表性场景展示了APPL程序的直观、简洁和高效性：具有自一致性的思维链（CoT-SC）、ReAct工具使用代理和多代理聊天。对三个可并行化工作流程的实验进一步表明，APPL可以有效地并行化独立的LLM调用，其速度提升比例几乎与估计值相匹配。

更新时间: 2024-06-19 02:29:59

领域: cs.AI,cs.CL,cs.LG,cs.PL

下载: http://arxiv.org/abs/2406.13161v1

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Vision transformers (ViTs) have emerged as a significant area of focus, particularly for their capacity to be jointly trained with large language models and to serve as robust vision foundation models. Yet, the development of trustworthy explanation methods for ViTs has lagged, particularly in the context of post-hoc interpretations of ViT predictions. Existing sub-image selection approaches, such as feature-attribution and conceptual models, fall short in this regard. This paper proposes five desiderata for explaining ViTs -- faithfulness, stability, sparsity, multi-level structure, and parsimony -- and demonstrates the inadequacy of current methods in meeting these criteria comprehensively. We introduce a variational Bayesian explanation framework, dubbed ProbAbilistic Concept Explainers (PACE), which models the distributions of patch embeddings to provide trustworthy post-hoc conceptual explanations. Our qualitative analysis reveals the distributions of patch-level concepts, elucidating the effectiveness of ViTs by modeling the joint distribution of patch embeddings and ViT's predictions. Moreover, these patch-level explanations bridge the gap between image-level and dataset-level explanations, thus completing the multi-level structure of PACE. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that PACE surpasses state-of-the-art methods in terms of the defined desiderata.

Updated: 2024-06-19 02:21:09

标题: 概率性概念解释器：视觉基础模型的可信概念解释

摘要: 视觉变换器（ViTs）已经成为一个重要的研究领域，特别是因为它们具有与大型语言模型联合训练的能力，并且可以作为强大的视觉基础模型。然而，对于ViTs的可信解释方法的发展滞后，特别是在后续解释ViT预测的情境中。现有的子图像选择方法，如特征归因和概念模型，在这方面表现不佳。本文提出了解释ViTs的五个必要条件——忠实性、稳定性、稀疏性、多级结构和简约性——并展示了当前方法在全面满足这些标准方面的不足。我们引入了一个变分贝叶斯解释框架，命名为ProbAbilistic Concept Explainers（PACE），它模拟了补丁嵌入的分布，以提供可信的后续概念解释。我们的定性分析揭示了补丁级别概念的分布，通过建模补丁嵌入和ViT预测的联合分布来阐明ViTs的有效性。此外，这些补丁级别的解释弥合了图像级和数据集级解释之间的差距，从而完成了PACE的多级结构。通过在合成和真实世界数据集上进行大量实验，我们证明了PACE在所定义的必要条件方面超过了现有技术方法。

更新时间: 2024-06-19 02:21:09

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2406.12649v2

Convolutional Kolmogorov-Arnold Networks

In this paper, we introduce the Convolutional Kolmogorov-Arnold Networks (Convolutional KANs), an innovative alternative to the standard Convolutional Neural Networks (CNNs) that have revolutionized the field of computer vision. We integrate the non-linear activation functions presented in Kolmogorov-Arnold Networks (KANs) into convolutions to build a new layer. Throughout the paper, we empirically validate the performance of Convolutional KANs against traditional architectures across MNIST and Fashion-MNIST benchmarks, illustrating that this new approach maintains a similar level of accuracy while using half the amount of parameters. This significant reduction of parameters opens up a new approach to advance the optimization of neural network architectures.

Updated: 2024-06-19 02:09:44

标题: 卷积科尔莫戈洛夫-阿诺德网络

摘要: 在本文中，我们介绍了卷积Kolmogorov-Arnold网络(Convolutional KANs)，这是标准卷积神经网络(CNNs)的一个创新替代品，已经彻底改变了计算机视觉领域。我们将Kolmogorov-Arnold网络(KANs)中提出的非线性激活函数整合到卷积中，构建了一个新的层。在整个论文中，我们通过对MNIST和Fashion-MNIST基准测试中的传统架构进行实证验证，说明这种新方法在使用一半参数的情况下保持了类似的准确性水平。这种参数的显著减少开辟了一种推进神经网络架构优化的新途径。

更新时间: 2024-06-19 02:09:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13155v1

Conditional score-based diffusion models for solving inverse problems in mechanics

We propose a framework to perform Bayesian inference using conditional score-based diffusion models to solve a class of inverse problems in mechanics involving the inference of a specimen's spatially varying material properties from noisy measurements of its mechanical response to loading. Conditional score-based diffusion models are generative models that learn to approximate the score function of a conditional distribution using samples from the joint distribution. More specifically, the score functions corresponding to multiple realizations of the measurement are approximated using a single neural network, the so-called score network, which is subsequently used to sample the posterior distribution using an appropriate Markov chain Monte Carlo scheme based on Langevin dynamics. Training the score network only requires simulating the forward model. Hence, the proposed approach can accommodate black-box forward models and complex measurement noise. Moreover, once the score network has been trained, it can be re-used to solve the inverse problem for different realizations of the measurements. We demonstrate the efficacy of the proposed approach on a suite of high-dimensional inverse problems in mechanics that involve inferring heterogeneous material properties from noisy measurements. Some examples we consider involve synthetic data, while others include data collected from actual elastography experiments. Further, our applications demonstrate that the proposed approach can handle different measurement modalities, complex patterns in the inferred quantities, non-Gaussian and non-additive noise models, and nonlinear black-box forward models. The results show that the proposed framework can solve large-scale physics-based inverse problems efficiently.

Updated: 2024-06-19 02:09:15

标题: 力学中求解逆问题的基于条件评分扩散模型

摘要: 我们提出了一个框架，利用条件得分基础扩散模型进行贝叶斯推断，以解决力学中涉及从加载的噪声测量中推断试样空间变化材料特性的逆问题。条件得分基础扩散模型是生成模型，它学习使用来自联合分布的样本来近似条件分布的得分函数。更具体地说，对应于多个测量实现的得分函数使用单个神经网络（所谓的得分网络）来近似，随后使用基于 Langevin 动力学的适当马尔可夫链蒙特卡罗方案来对后验分布进行采样。训练得分网络只需要模拟前向模型。因此，提出的方法可以适应黑盒前向模型和复杂的测量噪声。此外，一旦得分网络被训练，它可以被重复使用来解决不同测量实现的逆问题。我们在一系列涉及从噪声测量中推断异质材料特性的高维逆问题上展示了提出方法的有效性。我们考虑的一些示例涉及合成数据，而其他包括来自实际弹性成像实验的数据。此外，我们的应用展示了提出方法可以处理不同的测量模态，推断数量中的复杂模式，非高斯和非加性噪声模型以及非线性黑盒前向模型。结果表明，提出的框架可以高效地解决大规模基于物理的逆问题。

更新时间: 2024-06-19 02:09:15

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13154v1

von Mises Quasi-Processes for Bayesian Circular Regression

The need for regression models to predict circular values arises in many scientific fields. In this work we explore a family of expressive and interpretable distributions over circle-valued random functions related to Gaussian processes targeting two Euclidean dimensions conditioned on the unit circle. The resulting probability model has connections with continuous spin models in statistical physics. Moreover, its density is very simple and has maximum-entropy, unlike previous Gaussian process-based approaches, which use wrapping or radial marginalization. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling. We argue that transductive learning in these models favors a Bayesian approach to the parameters. We present experiments applying this model to the prediction of (i) wind directions and (ii) the percentage of the running gait cycle as a function of joint angles.

Updated: 2024-06-19 01:57:21

标题: 冯·米塞斯贝叶斯环回归的准过程

摘要: 在许多科学领域中，需要使用回归模型来预测圆形数值。在这项工作中，我们探索了一个关于环值随机函数的表达式丰富且可解释的分布族，这些函数与高斯过程相关，以两个欧几里得维度为目标，并以单位圆为条件。由此产生的概率模型与统计物理中的连续自旋模型相关。此外，其密度非常简单且具有最大熵性质，与先前基于高斯过程的方法不同，后者使用包装或径向边缘化。对于后验推断，我们引入了一种新的类Stratonovich的扩展，适合快速马尔可夫链蒙特卡洛采样。我们认为，在这些模型中，对参数采用贝叶斯方法有利于感知学习。我们展示了将该模型应用于预测（i）风向和（ii）作为关节角度函数的运动步态周期百分比的实验。

更新时间: 2024-06-19 01:57:21

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2406.13151v1

Diffusion-EXR: Controllable Review Generation for Explainable Recommendation via Diffusion Models

Denoising Diffusion Probabilistic Model (DDPM) has shown great competence in image and audio generation tasks. However, there exist few attempts to employ DDPM in the text generation, especially review generation under recommendation systems. Fueled by the predicted reviews explainability that justifies recommendations could assist users better understand the recommended items and increase the transparency of recommendation system, we propose a Diffusion Model-based Review Generation towards EXplainable Recommendation named Diffusion-EXR. Diffusion-EXR corrupts the sequence of review embeddings by incrementally introducing varied levels of Gaussian noise to the sequence of word embeddings and learns to reconstruct the original word representations in the reverse process. The nature of DDPM enables our lightweight Transformer backbone to perform excellently in the recommendation review generation task. Extensive experimental results have demonstrated that Diffusion-EXR can achieve state-of-the-art review generation for recommendation on two publicly available benchmark datasets.

Updated: 2024-06-19 01:53:18

标题: Diffusion-EXR: 通过扩散模型实现可控的解释性推荐评论生成

摘要: Denoising Diffusion Probabilistic Model（DDPM）在图像和音频生成任务中表现出很强的竞争力。然而，目前很少有尝试将DDPM应用于文本生成，尤其是在推荐系统下的评论生成。受到预测评论可解释性的推动，可以帮助用户更好地理解推荐的物品并增加推荐系统的透明度，我们提出了一种基于扩散模型的评论生成方法，命名为Diffusion-EXR。Diffusion-EXR通过逐步引入不同水平的高斯噪声来破坏评论嵌入序列，学习在逆向过程中重建原始单词表示。DDPM的性质使得我们的轻量级Transformer骨干在推荐评论生成任务中表现出色。广泛的实验结果表明，Diffusion-EXR可以在两个公开可用的基准数据集上实现最先进的推荐评论生成。

更新时间: 2024-06-19 01:53:18

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2312.15490v2

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach

Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio modalities. Moreover, we develop a cross-modality attention mechanism to retain the maximal information of the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baselines and achieve comparable results to the previous methods with complete multi-modality supervision.

Updated: 2024-06-19 01:52:24

标题: 多模态情感分析中的缺失模态：一种知识迁移方法

摘要: 多模态情感分析旨在通过视觉、语言和声音线索识别个体所表达的情绪。然而，大多数现有的研究工作假设所有模态在训练和测试期间都是可用的，使得它们的算法容易受到缺失模态情况的影响。本文提出了一种新颖的知识转移网络，用于在不同模态之间进行翻译以重建缺失的音频模态。此外，我们开发了一种跨模态注意机制，以保留对情感预测的最大信息。在三个公开可用的数据集上进行的大量实验证明，相对于基线方法，我们取得了显著的改进，并且获得了与具有完整多模态监督的先前方法相当的结果。

更新时间: 2024-06-19 01:52:24

领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2401.10747v2

Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, specifically designed to enhance the accuracy and utility of DTs in testing algorithmic performance. We propose a novel construction methodology that integrates deep learning-based policy gradient techniques to dynamically tune the DT parameters, ensuring high fidelity in the digital replication of physical systems. Moreover, the Mean STate Error (MSTE) is proposed as a robust metric for evaluating the performance of algorithms within these digital space. The efficacy of our framework is demonstrated through extensive simulations that show our DT not only accurately mirrors the physical reality but also provides a reliable platform for algorithm evaluation. This work lays a foundation for future research into DT technologies, highlighting pathways for both theoretical enhancements and practical implementations in various industries.

Updated: 2024-06-19 01:45:18

标题: 构建和评估数字孪生体：数字孪生体发展的智能框架

摘要: 数字孪生（DTs）的发展代表了在受控数字空间中模拟和优化复杂系统的一项变革性进展。尽管具有潜力，但构建准确复制和预测真实系统动态的DTs的挑战仍然很大。本文介绍了一种智能框架，用于构建和评估DTs，专门设计以提高测试算法性能的准确性和实用性。我们提出了一种新颖的构建方法，该方法整合了基于深度学习的策略梯度技术，动态调整DT参数，确保数字系统对物理系统的高保真复制。此外，提出了Mean STate Error（MSTE）作为评估这些数字空间中算法性能的强大指标。通过大量模拟展示了我们框架的功效，证明了我们的DT不仅准确地反映了物理现实，还提供了一个可靠的平台用于算法评估。这项工作为未来研究DT技术奠定了基础，突出了在各个行业中进行理论增强和实际实施的途径。

更新时间: 2024-06-19 01:45:18

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2406.13145v1

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication. Our code is released at \url{https://github.com/thunlp/AutoForm}.

Updated: 2024-06-19 01:42:22

标题: 超越自然语言：LLMs利用其他格式以增强推理和沟通

摘要: 自然语言（NL）长期以来一直是人类认知和交流的主要格式，并且在大型语言模型（LLMs）的发展和应用中起着同样关键的作用。然而，除了NL之外，在预训练过程中，LLMs还看到了各种非NL格式，如代码和逻辑表达式。尚未全面研究NL在LLMs中的最佳格式地位，特别是在单一LLM推理和多智能体通信方面。在这项工作中，我们挑战默认使用NL，探讨在这些情境下非NL格式的实用性。我们展示，允许LLMs在推理或通信之前自主选择最适合的格式，可以使不同LLMs的推理效率提高3.3到5.7％，并且在多智能体通信中最多可以减少72.7％的令牌使用，同时保持交流有效性。我们的全面分析进一步揭示，LLMs可以从有限的任务说明中设计格式，并且所设计的格式能够有效地在不同LLMs之间传输。有趣的是，LLMs决定的结构化通信格式与已建立的智能体通信语言存在显著的相似之处，表明在智能体通信中朝着高效的结构化通信自然演化。我们的代码发布在\url{https://github.com/thunlp/AutoForm}。

更新时间: 2024-06-19 01:42:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.18439v3

DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents

Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent's ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent's reliance on pre-trained knowledge. We utilized this simulator to evaluate the latest conversational agents and analyze their limitations. Our experiments highlight both the strengths and weaknesses of these agents, providing valuable insights for future improvements in the field of conversational AI. DialSim is available at https://github.com/jiho283/Simulator.

Updated: 2024-06-19 01:37:10

标题: DialSim：用于评估会话代理长期对话理解的实时模拟器

摘要: 最近关于大型语言模型（LLMs）的进展显著增强了对话代理的能力，使其适用于各个领域（例如教育）。尽管它们取得了进展，但代理的评估往往忽视了现实对话的复杂性，例如实时交互、多方对话和扩展性上下文依赖性。为了弥合这一差距，我们介绍了 DialSim，一个实时对话模拟器。在这个模拟器中，代理被指派扮演流行电视节目中的角色，需要使用过去的对话信息回答即兴提问，并区分已知和未知信息。DialSim 的关键特点包括评估代理在合理时间限制内作出回应的能力，处理长期多方对话，并管理对抗性设置（例如交换角色名称）以挑战代理对预先训练知识的依赖。我们利用这个模拟器来评估最新的对话代理并分析它们的局限性。我们的实验突出了这些代理的优势和劣势，为未来改进对话人工智能领域提供了宝贵的见解。DialSim 可在 https://github.com/jiho283/Simulator 上找到。

更新时间: 2024-06-19 01:37:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13144v1

Formally Certified Approximate Model Counting

Approximate model counting is the task of approximating the number of solutions to an input Boolean formula. The state-of-the-art approximate model counter for formulas in conjunctive normal form (CNF), ApproxMC, provides a scalable means of obtaining model counts with probably approximately correct (PAC)-style guarantees. Nevertheless, the validity of ApproxMC's approximation relies on a careful theoretical analysis of its randomized algorithm and the correctness of its highly optimized implementation, especially the latter's stateful interactions with an incremental CNF satisfiability solver capable of natively handling parity (XOR) constraints. We present the first certification framework for approximate model counting with formally verified guarantees on the quality of its output approximation. Our approach combines: (i) a static, once-off, formal proof of the algorithm's PAC guarantee in the Isabelle/HOL proof assistant; and (ii) dynamic, per-run, verification of ApproxMC's calls to an external CNF-XOR solver using proof certificates. We detail our general approach to establish a rigorous connection between these two parts of the verification, including our blueprint for turning the formalized, randomized algorithm into a verified proof checker, and our design of proof certificates for both ApproxMC and its internal CNF-XOR solving steps. Experimentally, we show that certificate generation adds little overhead to an approximate counter implementation, and that our certificate checker is able to fully certify $84.7\%$ of instances with generated certificates when given the same time and memory limits as the counter.

Updated: 2024-06-19 01:24:40

标题: 正式认证的近似模型计数

摘要: 近似模型计数是近似估计输入布尔公式的解的数量的任务。针对合取范式（CNF）公式的近似模型计数器ApproxMC是目前最先进的，提供了一种可扩展的方式来获取具有可能近似正确（PAC）保证的模型计数。然而，ApproxMC的近似的有效性依赖于其随机算法的仔细理论分析以及其高度优化实现的正确性，特别是后者与内置处理奇偶（XOR）约束的增量CNF可满足性求解器的状态交互。我们提出了第一个近似模型计数的认证框架，对其输出近似质量的保证进行了正式验证。我们的方法结合了：（i）在Isabelle/HOL证明助手中静态、一次性的算法PAC保证的形式证明；和（ii）动态、每次运行时，使用证明证书验证ApproxMC对外部CNF-XOR求解器的调用。我们详细介绍了我们建立验证的这两部分之间严格联系的一般方法，包括我们将形式化的随机算法转化为经过验证的证明检查器的蓝图，以及我们为ApproxMC和其内部CNF-XOR求解步骤设计的证明证书。在实验中，我们展示了证书生成对近似计数实现增加的开销很小，并且当给予与计数器相同的时间和内存限制时，我们的证书检查器能够完全认证84.7％的实例。

更新时间: 2024-06-19 01:24:40

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2406.11414v2

Large Language Models are Biased Because They Are Large Language Models

This paper's primary goal is to provoke thoughtful discussion about the relationship between bias and fundamental properties of large language models. We do this by seeking to convince the reader that harmful biases are an inevitable consequence arising from the design of any large language model as LLMs are currently formulated. To the extent that this is true, it suggests that the problem of harmful bias cannot be properly addressed without a serious reconsideration of AI driven by LLMs, going back to the foundational assumptions underlying their design.

Updated: 2024-06-19 01:08:03

标题: 大型语言模型存在偏见是因为它们是大型语言模型

摘要: 本文的主要目标是引发有关偏见和大型语言模型基本属性之间关系的深入讨论。我们通过试图说服读者，有害偏见是任何大型语言模型设计的不可避免的后果来实现这一目标，因为当前的LLM的设计形式。在这种程度上，这表明解决有害偏见问题无法恰当地处理，而不重新考虑由LLM驱动的人工智能，回到其设计基础假设。

更新时间: 2024-06-19 01:08:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13138v1

DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors

Modern advances in machine learning (ML) and wearable medical sensors (WMSs) in edge devices have enabled ML-driven disease detection for smart healthcare. Conventional ML-driven methods for disease detection rely on customizing individual models for each disease and its corresponding WMS data. However, such methods lack adaptability to distribution shifts and new task classification classes. In addition, they need to be rearchitected and retrained from scratch for each new disease. Moreover, installing multiple ML models in an edge device consumes excessive memory, drains the battery faster, and complicates the detection process. To address these challenges, we propose DOCTOR, a multi-disease detection continual learning (CL) framework based on WMSs. It employs a multi-headed deep neural network (DNN) and a replay-style CL algorithm. The CL algorithm enables the framework to continually learn new missions where different data distributions, classification classes, and disease detection tasks are introduced sequentially. It counteracts catastrophic forgetting with a data preservation method and a synthetic data generation (SDG) module. The data preservation method preserves the most informative subset of real training data from previous missions for exemplar replay. The SDG module models the probability distribution of the real training data and generates synthetic data for generative replay while retaining data privacy. The multi-headed DNN enables DOCTOR to detect multiple diseases simultaneously based on user WMS data. We demonstrate DOCTOR's efficacy in maintaining high disease classification accuracy with a single DNN model in various CL experiments. In complex scenarios, DOCTOR achieves 1.43 times better average test accuracy, 1.25 times better F1-score, and 0.41 higher backward transfer than the naive fine-tuning framework with a small model size of less than 350KB.

Updated: 2024-06-19 01:06:15

标题: 医生：基于可穿戴医疗传感器的多疾病检测持续学习框架

摘要: 现代机器学习（ML）和可穿戴医疗传感器（WMSs）在边缘设备上的进步使得基于机器学习的疾病检测成为智能医疗保健的可能。传统的基于机器学习的疾病检测方法依赖于为每种疾病及其相应的WMS数据定制个别模型。然而，这些方法缺乏适应分布变化和新任务分类类别的能力。此外，它们需要为每种新疾病从头开始重新设计和重新训练。此外，在边缘设备中安装多个机器学习模型会消耗过多内存，更快地耗尽电池，并复杂化检测过程。为了解决这些挑战，我们提出了基于WMS的多疾病检测持续学习（CL）框架DOCTOR。它采用多头深度神经网络（DNN）和一种重播式CL算法。CL算法使框架能够不断学习新任务，其中不同的数据分布、分类类别和疾病检测任务被逐个引入。它通过数据保留方法和合成数据生成（SDG）模块来对抗灾难性遗忘。数据保留方法保留了前几个任务中最具信息量的真实训练数据子集，以进行示例重播。SDG模块对真实训练数据的概率分布进行建模，并生成合成数据以进行生成式重播，同时保留数据隐私。多头DNN使DOCTOR能够基于用户WMS数据同时检测多种疾病。我们在各种持续学习实验中展示了DOCTOR在单个DNN模型中保持高疾病分类准确性的有效性。在复杂场景中，DOCTOR的平均测试准确性比朴素微调框架高出1.43倍，F1分数高出1.25倍，并且后向传输高出0.41，而模型大小小于350KB。

更新时间: 2024-06-19 01:06:15

领域: cs.LG,cs.HC,eess.SP

下载: http://arxiv.org/abs/2305.05738v5

Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models

Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and mitigate generalization degradation. However, SAM requires two sequential gradient computations during the optimization of each step: one to obtain the perturbation gradient and the other to obtain the updating gradient. Compared with the base optimizer (e.g., Adam), SAM doubles the time overhead due to the additional perturbation gradient. By dissecting the theory of SAM and observing the training gradient of the molecular graph transformer, we propose a new algorithm named GraphSAM, which reduces the training cost of SAM and improves the generalization performance of graph transformer models. There are two key factors that contribute to this result: (i) \textit{gradient approximation}: we use the updating gradient of the previous step to approximate the perturbation gradient at the intermediate steps smoothly (\textbf{increases efficiency}); (ii) \textit{loss landscape approximation}: we theoretically prove that the loss landscape of GraphSAM is limited to a small range centered on the expected loss of SAM (\textbf{guarantees generalization performance}). The extensive experiments on six datasets with different tasks demonstrate the superiority of GraphSAM, especially in optimizing the model update process. The code is in:https://github.com/YL-wang/GraphSAM/tree/graphsam

Updated: 2024-06-19 01:03:23

标题: 高效的锐度感知最小化分子图转换模型

摘要: Sharpness-aware minimization (SAM)自从在计算机视觉领域引起了越来越多的关注，因为它可以有效地消除训练轨迹中的尖锐局部最小值，并减轻泛化退化。然而，SAM在优化每一步时需要两次顺序梯度计算：一次用于获取扰动梯度，另一次用于获取更新梯度。与基础优化器（例如Adam）相比，SAM由于额外的扰动梯度而导致时间开销翻倍。通过解剖SAM的理论并观察分子图变换器的训练梯度，我们提出了一种名为GraphSAM的新算法，它减少了SAM的训练成本并提高了图变换器模型的泛化性能。有两个关键因素导致了这一结果：(i) 梯度近似：我们使用上一步的更新梯度来平滑地近似中间步骤的扰动梯度（增加效率）；(ii) 损失景观近似：我们在理论上证明了GraphSAM的损失景观被限制在以SAM的期望损失为中心的小范围内（保证泛化性能）。在六个具有不同任务的数据集上进行的大量实验证明了GraphSAM的优越性，尤其是在优化模型更新过程方面。代码位于：https://github.com/YL-wang/GraphSAM/tree/graphsam

更新时间: 2024-06-19 01:03:23

领域: cs.LG

下载: http://arxiv.org/abs/2406.13137v1

PathoLM: Identifying pathogenicity from the DNA sequence through the Genome Foundation Model

Pathogen identification is pivotal in diagnosing, treating, and preventing diseases, crucial for controlling infections and safeguarding public health. Traditional alignment-based methods, though widely used, are computationally intense and reliant on extensive reference databases, often failing to detect novel pathogens due to their low sensitivity and specificity. Similarly, conventional machine learning techniques, while promising, require large annotated datasets and extensive feature engineering and are prone to overfitting. Addressing these challenges, we introduce PathoLM, a cutting-edge pathogen language model optimized for the identification of pathogenicity in bacterial and viral sequences. Leveraging the strengths of pre-trained DNA models such as the Nucleotide Transformer, PathoLM requires minimal data for fine-tuning, thereby enhancing pathogen detection capabilities. It effectively captures a broader genomic context, significantly improving the identification of novel and divergent pathogens. We developed a comprehensive data set comprising approximately 30 species of viruses and bacteria, including ESKAPEE pathogens, seven notably virulent bacterial strains resistant to antibiotics. Additionally, we curated a species classification dataset centered specifically on the ESKAPEE group. In comparative assessments, PathoLM dramatically outperforms existing models like DciPatho, demonstrating robust zero-shot and few-shot capabilities. Furthermore, we expanded PathoLM-Sp for ESKAPEE species classification, where it showed superior performance compared to other advanced deep learning methods, despite the complexities of the task.

Updated: 2024-06-19 00:53:48

标题: PathoLM：通过基因组基础模型从DNA序列中识别致病性

摘要: 病原体鉴定在诊断、治疗和预防疾病中至关重要，对于控制感染和保障公共健康至关重要。传统的基于比对的方法虽然被广泛使用，但在计算上非常耗时，并依赖于广泛的参考数据库，通常由于其低灵敏度和特异性而无法检测到新的病原体。同样，传统的机器学习技术虽然有前途，但需要大量的有标注数据集和广泛的特征工程，并且容易过拟合。为了解决这些挑战，我们引入了PathoLM，这是一个专门用于细菌和病毒序列中病原性识别的尖端病原体语言模型。利用类似核苷酸变换器的预训练DNA模型的优势，PathoLM在微调时需要最少的数据，从而增强了病原体检测能力。它有效地捕捉了更广泛的基因组上下文，显著改善了新的和不同的病原体的识别。我们开发了一个包括大约30种病毒和细菌的综合数据集，包括对抗生素产生耐药的七种显著具有毒性的细菌菌株。此外，我们专门围绕ESKAPEE组开展了一个物种分类数据集的策划。在比较评估中，PathoLM明显优于现有模型如DciPatho，展示了强大的零样本和少样本能力。此外，我们为ESKAPEE物种分类扩展了PathoLM-Sp，在这个任务的复杂性下，与其他先进的深度学习方法相比，它显示出了更出色的性能。

更新时间: 2024-06-19 00:53:48

领域: cs.CL,cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2406.13133v1

Lazy Data Practices Harm Fairness Research

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by highlighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a \textbf{lack of representation for certain protected attributes} in both data and evaluations; (2) the widespread \textbf{exclusion of minorities} during data preprocessing; and (3) \textbf{opaque data processing} threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.

Updated: 2024-06-19 00:52:16

标题: 懒散的数据实践损害公平研究

摘要: 数据实践塑造了机器学习公平性研究和实践（公平ML）。关键数据研究为该领域的负责任发展提供了重要的反思和批评，突出了不足之处，并提出了改进建议。在这项工作中，我们对公平ML数据集进行了全面分析，展示了不经过深思熟虑但常见的实践如何阻碍算法公平性研究的影响力和可靠性发现。我们系统地研究了表格数据集中编码的受保护信息及其在142篇出版物中的280个实验中的使用。我们的分析确定了三个主要关注领域：（1）数据和评估中某些受保护属性的代表性不足；（2）数据预处理过程中广泛排除少数族裔；以及（3）威胁公平研究普遍化的不透明数据处理。通过对知名数据集的利用进行示范性分析，我们展示了不经过深思熟虑的数据决策如何不成比例地影响少数族裔群体、公平性度量和结果模型比较。此外，我们还确定了一些附加因素，如公开可用数据的限制、隐私考虑和普遍缺乏意识，加剧了这些挑战。为了解决这些问题，我们提出了一系列关于透明度和负责任包容的数据使用建议，重点是公平性研究。这项研究强调了对公平ML数据实践的批判性重新评估的必要性，并提供了改善数据集的采集和使用方面的方向。

更新时间: 2024-06-19 00:52:16

领域: cs.LG,cs.CY,stat.AP,stat.ML

下载: http://arxiv.org/abs/2404.17293v2

One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions

A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems' design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are made implicitly, without knowing exactly how they will influence the final system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit decisions during design and evaluation into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can investigate the variability and robustness of fairness scores and see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand fairness implications of design and evaluation decisions using an exemplary case study of predicting public health care coverage for vulnerable populations. Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model. This is problematic, as a nefarious actor could optimise or "hack" a fairness metric to portray a discriminating model as fair merely by changing how it is evaluated. We illustrate how a multiverse analysis can help to address this issue.

Updated: 2024-06-19 00:49:07

标题: 一个模型多个分数：使用多元宇宙分析预防公平性破坏并评估模型设计决策的影响

摘要: 全球范围内许多系统使用算法决策制定（ADM）来（部分地）自动化以前由人类进行的决策。ADM系统的下游影响在很大程度上取决于系统设计、实施和评估过程中所做的决定，因为数据中的偏见可以在建模流程中被减轻或加强。许多这些决定是隐式地做出的，而并不确切知道它们将如何影响最终系统。为了研究这个问题，我们借鉴了心理学领域的见解，并引入了多元宇宙分析方法来进行算法公平性研究。在我们提出的方法中，我们将设计和评估过程中的隐式决定转化为显式决定，并展示它们的公平性影响。通过结合决定，我们创建了所有可能“宇宙”决策组合的网格。对于这些宇宙中的每一个，我们计算公平性和性能指标。利用得到的数据集，人们可以调查公平性评分的变化和稳健性，以及哪些决定如何影响公平性。我们演示了多元宇宙分析如何被用来更好地理解设计和评估决策对公平性的影响，使用一个预测弱势群体公共医疗保健覆盖率的案例研究。我们的结果突显了关于系统评估决定如何导致相同模型的公平性指标截然不同。这是有问题的，因为一个恶意行为者可以通过改变评估方法来优化或“黑客”公平性指标，将一个歧视性模型描述为公平性。我们阐述了多元宇宙分析如何帮助解决这个问题。

更新时间: 2024-06-19 00:49:07

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2308.16681v3

Advancing Retail Data Science: Comprehensive Evaluation of Synthetic Data

The evaluation of synthetic data generation is crucial, especially in the retail sector where data accuracy is paramount. This paper introduces a comprehensive framework for assessing synthetic retail data, focusing on fidelity, utility, and privacy. Our approach differentiates between continuous and discrete data attributes, providing precise evaluation criteria. Fidelity is measured through stability and generalizability. Stability ensures synthetic data accurately replicates known data distributions, while generalizability confirms its robustness in novel scenarios. Utility is demonstrated through the synthetic data's effectiveness in critical retail tasks such as demand forecasting and dynamic pricing, proving its value in predictive analytics and strategic planning. Privacy is safeguarded using Differential Privacy, ensuring synthetic data maintains a perfect balance between resembling training and holdout datasets without compromising security. Our findings validate that this framework provides reliable and scalable evaluation for synthetic retail data. It ensures high fidelity, utility, and privacy, making it an essential tool for advancing retail data science. This framework meets the evolving needs of the retail industry with precision and confidence, paving the way for future advancements in synthetic data methodologies.

Updated: 2024-06-19 00:47:38

标题: 推进零售数据科学：综合评估合成数据

摘要: 对合成数据生成的评估至关重要，尤其是在零售领域，数据准确性至关重要。本文介绍了一个全面的框架，用于评估合成零售数据，重点关注忠实度、实用性和隐私性。我们的方法区分了连续和离散数据属性，提供了精确的评估标准。忠实度通过稳定性和泛化性来衡量。稳定性确保合成数据准确复制已知数据分布，而泛化性则确认其在新领域中的稳健性。实用性通过合成数据在关键零售任务中的有效性来展示，如需求预测和动态定价，证明其在预测分析和战略规划中的价值。隐私性通过使用差分隐私来保护，确保合成数据在保持类似训练和保留数据集之间的完美平衡的同时不损害安全性。我们的发现验证了这一框架为合成零售数据提供可靠和可扩展的评估。它确保高忠实度、实用性和隐私性，使其成为推进零售数据科学的重要工具。这一框架以精确和自信的方式满足了零售行业不断变化的需求，为合成数据方法论的未来进展铺平了道路。

更新时间: 2024-06-19 00:47:38

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.13130v1

M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation

Automated retinal image medical description generation is crucial for streamlining medical diagnosis and treatment planning. Existing challenges include the reliance on learned retinal image representations, difficulties in handling multiple imaging modalities, and the lack of clinical context in visual representations. Addressing these issues, we propose the Multi-Modal Medical Transformer (M3T), a novel deep learning architecture that integrates visual representations with diagnostic keywords. Unlike previous studies focusing on specific aspects, our approach efficiently learns contextual information and semantics from both modalities, enabling the generation of precise and coherent medical descriptions for retinal images. Experimental studies on the DeepEyeNet dataset validate the success of M3T in meeting ophthalmologists' standards, demonstrating a substantial 13.5% improvement in BLEU@4 over the best-performing baseline model.

Updated: 2024-06-19 00:46:48

标题: M3T：多模态医学变换器，用于将临床背景与视觉洞察相结合，生成视网膜图像医学描述

摘要: 自动化视网膜图像医学描述生成对于简化医学诊断和治疗计划至关重要。现有挑战包括依赖学习的视网膜图像表示、处理多种成像模式的困难以及视觉表示中缺乏临床背景。针对这些问题，我们提出了多模医疗变压器（M3T），这是一种集成了视觉表示和诊断关键词的新颖深度学习架构。与先前研究侧重特定方面不同，我们的方法有效地学习了来自两种模式的上下文信息和语义，从而能够为视网膜图像生成精确和连贯的医学描述。对DeepEyeNet数据集的实验研究验证了M3T在符合眼科医生标准方面的成功，表明其在BLEU@4上相比表现最佳的基线模型有显著提高了13.5%。

更新时间: 2024-06-19 00:46:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13129v1

A New Approach for Evaluating and Improving the Performance of Segmentation Algorithms on Hard-to-Detect Blood Vessels

Many studies regarding the vasculature of biological tissues involve the segmentation of the blood vessels in a sample followed by the creation of a graph structure to model the vasculature. The graph is then used to extract relevant vascular properties. Small segmentation errors can lead to largely distinct connectivity patterns and a high degree of variability of the extracted properties. Nevertheless, global metrics such as Dice, precision, and recall are commonly applied for measuring the performance of blood vessel segmentation algorithms. These metrics might conceal important information about the accuracy at specific regions of a sample. To tackle this issue, we propose a local vessel salience (LVS) index to quantify the expected difficulty in segmenting specific blood vessel segments. The LVS index is calculated for each vessel pixel by comparing the local intensity of the vessel with the image background around the pixel. The index is then used for defining a new accuracy metric called low-salience recall (LSRecall), which quantifies the performance of segmentation algorithms on blood vessel segments having low salience. The perspective provided by the LVS index is used to define a data augmentation procedure that can be used to improve the segmentation performance of convolutional neural networks. We show that segmentation algorithms having high Dice and recall values can display very low LSRecall values, which reveals systematic errors of these algorithms for vessels having low salience. The proposed data augmentation procedure is able to improve the LSRecall of some samples by as much as 25%. The developed methodology opens up new possibilities for comparing the performance of segmentation algorithms regarding hard-to-detect blood vessels as well as their capabilities for vascular topology preservation.

Updated: 2024-06-19 00:45:57

标题: 一种评估和改进难以检测血管分割算法性能的新方法

摘要: 许多关于生物组织血管的研究涉及对样本中的血管进行分割，然后创建一个图结构来模拟血管结构。然后使用该图来提取相关的血管特性。小的分割错误可能导致明显不同的连通性模式和提取的特性具有高度的变异性。然而，全局指标如Dice、精度和召回率通常被应用于衡量血管分割算法的性能。这些指标可能隐藏关于样本特定区域准确性的重要信息。为了解决这个问题，我们提出了一个局部血管显著性（LVS）指数来量化分割特定血管段的预期困难程度。LVS指数通过比较血管的局部强度与像素周围的图像背景来计算每个血管像素的值。然后，该指数用于定义一个新的准确性指标称为低显著性召回率（LSRecall），用于量化分割算法在具有低显著性的血管段上的性能。LVS指数提供的视角被用来定义一个数据增强过程，可以用来改善卷积神经网络的分割性能。我们表明，具有高Dice和召回率值的分割算法可能显示出非常低的LSRecall值，这揭示了这些算法在具有低显著性的血管上的系统性错误。所提出的数据增强过程能够将一些样本的LSRecall提高高达25%。开发的方法为比较关于难以检测的血管以及其对血管拓扑保持能力的分割算法的性能打开了新的可能性。

更新时间: 2024-06-19 00:45:57

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13128v1

Oralytics Reinforcement Learning Algorithm

Dental disease is still one of the most common chronic diseases in the United States. While dental disease is preventable through healthy oral self-care behaviors (OSCB), this basic behavior is not consistently practiced. We have developed Oralytics, an online, reinforcement learning (RL) algorithm that optimizes the delivery of personalized intervention prompts to improve OSCB. In this paper, we offer a full overview of algorithm design decisions made using prior data, domain expertise, and experiments in a simulation test bed. The finalized RL algorithm was deployed in the Oralytics clinical trial, conducted from fall 2023 to summer 2024.

Updated: 2024-06-19 00:44:11

标题: Oralytics强化学习算法

摘要: 牙病仍然是美国最常见的慢性疾病之一。虽然牙病可以通过健康的口腔自我护理行为（OSCB）来预防，但这种基本行为并不是始终如一地被实践。我们开发了Oralytics，这是一个在线的强化学习（RL）算法，用于优化传递个性化干预提示，以改善OSCB。在本文中，我们提供了使用先前数据、领域专业知识和在模拟测试平台中进行的实验所做的算法设计决策的全面概述。最终确定的RL算法已部署在Oralytics临床试验中，该试验从2023年秋季持续到2024年夏季。

更新时间: 2024-06-19 00:44:11

领域: cs.AI

下载: http://arxiv.org/abs/2406.13127v1

Guided Context Gating: Learning to leverage salient lesions in retinal fundus images

Effectively representing medical images, especially retinal images, presents a considerable challenge due to variations in appearance, size, and contextual information of pathological signs called lesions. Precise discrimination of these lesions is crucial for diagnosing vision-threatening issues such as diabetic retinopathy. While visual attention-based neural networks have been introduced to learn spatial context and channel correlations from retinal images, they often fall short in capturing localized lesion context. Addressing this limitation, we propose a novel attention mechanism called Guided Context Gating, an unique approach that integrates Context Formulation, Channel Correlation, and Guided Gating to learn global context, spatial correlations, and localized lesion context. Our qualitative evaluation against existing attention mechanisms emphasize the superiority of Guided Context Gating in terms of explainability. Notably, experiments on the Zenodo-DR-7 dataset reveal a substantial 2.63% accuracy boost over advanced attention mechanisms & an impressive 6.53% improvement over the state-of-the-art Vision Transformer for assessing the severity grade of retinopathy, even with imbalanced and limited training samples for each class.

Updated: 2024-06-19 00:42:35

标题: 引导式上下文门控：学习如何利用眼底图像中显著的病变

摘要: 有效地表示医学图像，特别是视网膜图像，面临着巨大挑战，因为病变的外观、大小和上下文信息存在变化。准确区分这些病变对于诊断像糖尿病性视网膜病变等威胁视力的问题至关重要。虽然基于视觉注意力的神经网络已经被引入，以从视网膜图像中学习空间上下文和通道相关性，但它们往往无法捕捉到局部病变上下文。为了解决这一限制，我们提出了一种新颖的注意力机制，称为引导上下文门控，这是一种集成了上下文建模、通道相关性和引导门控的独特方法，用于学习全局上下文、空间相关性和局部病变上下文。我们对现有注意力机制进行定性评估，强调了引导上下文门控在可解释性方面的优越性。值得注意的是，在Zenodo-DR-7数据集上的实验显示，相对于先进的注意力机制，引导上下文门控提高了2.63%的准确率，并且比最先进的视觉变换器在评估视网膜病变严重程度方面取得了令人印象深刻的6.53%的改进，即使每个类别的训练样本不平衡且有限。

更新时间: 2024-06-19 00:42:35

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13126v1

A Unified Framework for Combinatorial Optimization Based on Graph Neural Networks

Graph neural networks (GNNs) have emerged as a powerful tool for solving combinatorial optimization problems (COPs), exhibiting state-of-the-art performance in both graph-structured and non-graph-structured domains. However, existing approaches lack a unified framework capable of addressing a wide range of COPs. After presenting a summary of representative COPs and a brief review of recent advancements in GNNs for solving COPs, this paper proposes a unified framework for solving COPs based on GNNs, including graph representation of COPs, equivalent conversion of non-graph structured COPs to graph-structured COPs, graph decomposition, and graph simplification. The proposed framework leverages the ability of GNNs to effectively capture the relational information and extract features from the graph representation of COPs, offering a generic solution to COPs that can address the limitations of state-of-the-art in solving non-graph-structured and highly complex graph-structured COPs.

Updated: 2024-06-19 00:40:31

标题: 基于图神经网络的组合优化统一框架

摘要: 图神经网络（GNNs）已经成为解决组合优化问题（COPs）的强大工具，表现出在图结构和非图结构领域中的最新性能。然而，现有方法缺乏一个统一的框架，能够解决各种COPs。在介绍代表性COPs的摘要和对最近GNNs在解决COPs方面的进展进行简要回顾后，本文提出了一个基于GNNs解决COPs的统一框架，包括COPs的图表示、非图结构COPs到图结构COPs的等效转换、图分解和图简化。所提出的框架利用GNNs有效捕获COPs图表示中的关系信息和提取特征的能力，提供了一个通用的解决COPs的解决方案，可以解决解决非图结构和高度复杂图结构COPs的最新技术的局限性。

更新时间: 2024-06-19 00:40:31

领域: cs.AI

下载: http://arxiv.org/abs/2406.13125v1

ViLCo-Bench: VIdeo Language COntinual learning Benchmark

Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model's ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark, ViLCo-Bench, designed to evaluate continual learning models across a range of video-text tasks. The dataset comprises ten-minute-long videos and corresponding language queries collected from publicly available datasets. Additionally, we introduce a novel memory-efficient framework that incorporates self-supervised learning and mimics long-term and short-term memory effects. This framework addresses challenges including memory complexity from long video clips, natural language complexity from open queries, and text-video misalignment. We posit that ViLCo-Bench, with greater complexity compared to existing continual learning benchmarks, would serve as a critical tool for exploring the video-language domain, extending beyond conventional class-incremental tasks, and addressing complex and limited annotation issues. The curated data, evaluations, and our novel method are available at https://github.com/cruiseresearchgroup/ViLCo .

Updated: 2024-06-19 00:38:19

标题: ViLCo-Bench: 视频语言持续学习基准测试

摘要: 视频语言持续学习涉及不断适应来自视频和文本输入的信息，增强模型处理新任务的能力同时保留先前知识。这个领域是一个相对未被充分探索的领域，建立适当的数据集对于促进该领域的交流和研究至关重要。在这项研究中，我们提出了第一个专门的基准测试ViLCo-Bench，旨在评估跨一系列视频文本任务的持续学习模型。该数据集包括从公开可用数据集中收集的长达十分钟的视频和相应的语言查询。此外，我们引入了一种新颖的内存高效框架，该框架结合了自监督学习，并模拟了长期和短期记忆效应。这个框架解决了来自长视频剪辑的内存复杂性、来自开放查询的自然语言复杂性以及文本视频不对齐等挑战。我们认为，与现有的持续学习基准测试相比，ViLCo-Bench具有更大的复杂性，将成为探索视频语言领域的关键工具，超越传统的类增量任务，并解决复杂和有限注释问题。精心策划的数据、评估和我们的新方法可在https://github.com/cruiseresearchgroup/ViLCo 获取。

更新时间: 2024-06-19 00:38:19

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.13123v1

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale.

Updated: 2024-06-19 00:28:58

标题: 长文情境语言模型能否包含检索、RAG、SQL等功能？

摘要: 长文本语言模型（LCLMs）有潜力彻底改变我们传统依赖外部工具如检索系统或数据库的任务方法。利用LCLMs原生摄取和处理整个语料库信息的能力提供了许多优势。它通过消除对工具专业知识的需求增强了用户友好性，提供了最小化复杂流程中级联错误的强大端到端建模，并允许在整个系统中应用复杂的提示技术。为了评估这种范式转变，我们引入了LOFT，一个需要上百万标记上下文的真实世界任务基准，旨在评估LCLMs在上下文检索和推理方面的性能。我们的发现显示，尽管从未明确为这些任务接受培训，LCLMs令人惊讶地能够与最先进的检索和RAG系统竞争。然而，LCLMs仍然面临像在SQL类似任务中所需的组合推理等领域的挑战。值得注意的是，提示策略显着影响性能，强调随着上下文长度增长的需要继续研究。总体而言，LOFT为LCLMs提供了一个严格的测试基础，展示了它们取代现有范式并应对模型能力扩展的新任务的潜力。

更新时间: 2024-06-19 00:28:58

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.13121v1

GbHammer: Malicious Inter-process Page Sharing by Hammering Global Bits in Page Table Entries

RowHammer is a vulnerability inside DRAM chips where an attacker repeatedly accesses a DRAM row to flip bits in the nearby rows without directly accessing them. Several studies have found that flipping bits in the address part inside a page table entry (PTE) leads to serious security risks such as privilege escalation. However, the risk of management bits in a PTE being flipped by RowHammer has not yet been discussed as far as we know. In this paper, we point out a new vulnerability called GbHammer that allows an attacker to maliciously share a physical memory page with a victim by hammering the global bit in a PTE. GbHammer not only creates a shared page but also enables the attacker to (1) make the victim's process execute arbitrary binary and (2) snoop on the victim's secret data through the shared page. We demonstrate the two exploits on a real Linux kernel running on a cycle-accurate CPU simulator. We also discuss possible mitigation measures for GbHammer and the risk of GbHammer in non-x86 ISAs.

Updated: 2024-06-19 00:26:13

标题: GbHammer：通过在页表条目中敲击全局位实现恶意进程间页面共享

摘要: RowHammer是DRAM芯片内部的一种漏洞，攻击者通过重复访问DRAM行来翻转附近行中的位，而不直接访问它们。几项研究发现，在页面表项（PTE）内部的地址部分翻转位会导致严重的安全风险，如提权攻击。然而，据我们所知，关于管理位在PTE中是否会被RowHammer翻转的风险尚未讨论。在本文中，我们指出了一个名为GbHammer的新漏洞，允许攻击者通过在PTE中的全局位上施加压力，恶意共享一个物理内存页面给受害者。GbHammer不仅创建了一个共享页面，还使攻击者能够（1）使受害者的进程执行任意二进制文件并且（2）通过共享页面窥探受害者的机密数据。我们在一个运行在一个周期精确的CPU模拟器上的真实Linux内核上展示了这两种利用。我们还讨论了针对GbHammer的可能缓解措施以及GbHammer在非x86 ISA中的风险。

更新时间: 2024-06-19 00:26:13

领域: cs.CR

下载: http://arxiv.org/abs/2406.13119v1

Efficient Training of Probabilistic Neural Networks for Survival Analysis

Variational Inference (VI) is a commonly used technique for approximate Bayesian inference and uncertainty estimation in deep learning models, yet it comes at a computational cost, as it doubles the number of trainable parameters to represent uncertainty. This rapidly becomes challenging in high-dimensional settings and motivates the use of alternative techniques for inference, such as Monte Carlo Dropout (MCD) or Spectral-normalized Neural Gaussian Process (SNGP). However, such methods have seen little adoption in survival analysis, and VI remains the prevalent approach for training probabilistic neural networks. In this paper, we investigate how to train deep probabilistic survival models in large datasets without introducing additional overhead in model complexity. To achieve this, we adopt three probabilistic approaches, namely VI, MCD, and SNGP, and evaluate them in terms of their prediction performance, calibration performance, and model complexity. In the context of probabilistic survival analysis, we investigate whether non-VI techniques can offer comparable or possibly improved prediction performance and uncertainty calibration compared to VI. In the MIMIC-IV dataset, we find that MCD aligns with VI in terms of the concordance index (0.748 vs. 0.743) and mean absolute error (254.9 vs. 254.7) using hinge loss, while providing C-calibrated uncertainty estimates. Moreover, our SNGP implementation provides D-calibrated survival functions in all datasets compared to VI (4/4 vs. 2/4, respectively). Our work encourages the use of techniques alternative to VI for survival analysis in high-dimensional datasets, where computational efficiency and overhead are of concern.

Updated: 2024-06-19 00:21:50

标题: 生存分析的概率神经网络的高效训练

摘要: 变分推断（VI）是深度学习模型中用于近似贝叶斯推断和不确定性估计的常用技术，但它会带来计算成本，因为它会将可训练参数的数量翻倍以表示不确定性。在高维设置中，这很快变得具有挑战性，并鼓励使用其他推断技术，如蒙特卡洛辍学（MCD）或谱归一化神经高斯过程（SNGP）。然而，这些方法在生存分析中很少被采用，而VI仍然是训练概率神经网络的主要方法。在本文中，我们研究如何在大型数据集中训练深度概率生存模型而不引入额外的模型复杂性。为实现这一目标，我们采用三种概率方法，即VI、MCD和SNGP，并从预测性能、校准性能和模型复杂性方面对它们进行评估。在概率生存分析的背景下，我们研究非VI技术是否能够提供与VI相当或可能改进的预测性能和不确定性校准。在MIMIC-IV数据集中，我们发现，使用铰链损失时，MCD在协调指数（0.748 vs. 0.743）和平均绝对误差（254.9 vs. 254.7）方面与VI相符，同时提供C-校准的不确定性估计。此外，相较于VI（分别为2/4），我们的SNGP实现在所有数据集中提供了D-校准的生存函数（为4/4）。我们的工作鼓励在高维数据集中使用替代VI的技术进行生存分析，其中计算效率和额外开销是值得关注的。

更新时间: 2024-06-19 00:21:50

领域: cs.LG

下载: http://arxiv.org/abs/2404.06421v3

ElicitationGPT: Text Elicitation Mechanisms via Language Models

Scoring rules evaluate probabilistic forecasts of an unknown state against the realized state and are a fundamental building block in the incentivized elicitation of information and the training of machine learning models. This paper develops mechanisms for scoring elicited text against ground truth text using domain-knowledge-free queries to a large language model (specifically ChatGPT) and empirically evaluates their alignment with human preferences. The empirical evaluation is conducted on peer reviews from a peer-grading dataset and in comparison to manual instructor scores for the peer reviews.

Updated: 2024-06-19 00:12:35

标题: ElicitationGPT: 通过语言模型的文本引发机制

摘要: 评分规则评估对未知状态的概率预测与实际状态，并且是激励信息采集和机器学习模型训练的基本构建模块。本文开发了一种机制，通过对一个大型语言模型（具体来说是ChatGPT）使用无领域知识查询来评分获取的文本与真实文本的一致性，并在实证上评估它们与人类偏好的一致性。实证评估是在一个同行评审数据集上进行的，并与同行评审数据集的手动评分进行比较。

更新时间: 2024-06-19 00:12:35

领域: cs.AI,cs.GT,cs.LG

下载: http://arxiv.org/abs/2406.09363v2

State-of-the-Art Review: The Use of Digital Twins to Support Artificial Intelligence-Guided Predictive Maintenance

In recent years, predictive maintenance (PMx) has gained prominence for its potential to enhance efficiency, automation, accuracy, and cost-effectiveness while reducing human involvement. Importantly, PMx has evolved in tandem with digital advancements, such as Big Data and the Internet of Things (IOT). These technological strides have enabled Artificial Intelligence (AI) to revolutionize PMx processes, with increasing capacities for real-time automation of monitoring, analysis, and prediction tasks. However, PMx still faces challenges such as poor explainability and sample inefficiency in data-driven methods and high complexity in physics-based models, hindering broader adoption. This paper posits that Digital Twins (DTs) can be integrated into PMx to overcome these challenges, paving the way for more automated PMx applications across various stakeholders. Despite their potential, current DTs have not fully matured to bridge existing gaps. Our paper provides a comprehensive roadmap for DT evolution, addressing current limitations to foster large-scale automated PMx progression. We structure our approach in three stages: First, we reference prior work where we identified and defined the Information Requirements (IRs) and Functional Requirements (FRs) for PMx, forming the blueprint for a unified framework. Second, we conduct a literature review to assess current DT applications integrating these IRs and FRs, revealing standardized DT models and tools that support automated PMx. Lastly, we highlight gaps in current DT implementations, particularly those IRs and FRs not fully supported, and outline the necessary components for a comprehensive, automated PMx system. Our paper concludes with research directions aimed at seamlessly integrating DTs into the PMx paradigm to achieve this ambitious vision.

Updated: 2024-06-19 00:10:57

标题: 最新综述：使用数字孪生技术支持人工智能引导的预测性维护

摘要: 近年来，预测性维护（PMx）因其增强效率、自动化、准确性和成本效益的潜力而备受关注，同时减少了人类参与。重要的是，PMx随着数字技术的进步，如大数据和物联网（IOT）一起发展。这些技术进步使人工智能（AI）能够革新PMx流程，具有越来越强大的实时自动化监控、分析和预测任务的能力。然而，PMx仍然面临挑战，如数据驱动方法中的解释能力不足和样本效率低，以及基于物理模型的高复杂性，阻碍了更广泛的采用。本文认为数字孪生（DTs）可以整合到PMx中，以克服这些挑战，为各利益相关者之间更多自动化的PMx应用铺平道路。尽管具有潜力，但当前的DTs尚未完全成熟以弥合现有差距。我们的论文为DT的发展提供了全面的路线图，解决当前限制，促进大规模自动化PMx的进展。我们将我们的方法分为三个阶段：首先，我们参考先前的工作，确定了PMx的信息需求（IRs）和功能需求（FRs），形成了一个统一框架的蓝图。其次，我们进行文献回顾，评估目前整合这些IRs和FRs的DT应用，揭示支持自动化PMx的标准化DT模型和工具。最后，我们强调当前DT实施中存在的差距，特别是那些没有得到充分支持的IRs和FRs，并勾勒出一个全面的自动化PMx系统所需的组件。我们的论文以研究方向结尾，旨在将DT无缝整合到PMx范式中，实现这一雄心勃勃的愿景。

更新时间: 2024-06-19 00:10:57

领域: cs.AI

下载: http://arxiv.org/abs/2406.13117v1

Signatures Meet Dynamic Programming: Generalizing Bellman Equations for Trajectory Following

Path signatures have been proposed as a powerful representation of paths that efficiently captures the path's analytic and geometric characteristics, having useful algebraic properties including fast concatenation of paths through tensor products. Signatures have recently been widely adopted in machine learning problems for time series analysis. In this work we establish connections between value functions typically used in optimal control and intriguing properties of path signatures. These connections motivate our novel control framework with signature transforms that efficiently generalizes the Bellman equation to the space of trajectories. We analyze the properties and advantages of the framework, termed signature control. In particular, we demonstrate that (i) it can naturally deal with varying/adaptive time steps; (ii) it propagates higher-level information more efficiently than value function updates; (iii) it is robust to dynamical system misspecification over long rollouts. As a specific case of our framework, we devise a model predictive control method for path tracking. This method generalizes integral control, being suitable for problems with unknown disturbances. The proposed algorithms are tested in simulation, with differentiable physics models including typical control and robotics tasks such as point-mass, curve following for an ant model, and a robotic manipulator.

Updated: 2024-06-19 00:07:53

标题: 签名遇见动态规划：将Bellman方程推广到轨迹跟踪

摘要: 路径签名被提出作为路径的一个强大的表示，有效地捕捉路径的分析和几何特性，具有有用的代数特性，包括通过张量积快速连接路径。路径签名最近在时间序列分析的机器学习问题中被广泛采用。在这项工作中，我们建立了通常用于最优控制的值函数与路径签名的有趣特性之间的联系。这些联系促使我们提出了一个新颖的控制框架，其中包括路径签名转换，有效地将贝尔曼方程推广到轨迹空间。我们分析了该框架的特性和优势，称为路径签名控制。特别是，我们展示了（i）它可以自然地处理不同/自适应的时间步长；（ii）它比值函数更新更有效地传播高层信息；（iii）它对长期回滚中动力系统规格错误具有鲁棒性。作为我们框架的一个特定案例，我们设计了一个用于路径跟踪的模型预测控制方法。这种方法推广了积分控制，适用于具有未知干扰的问题。提出的算法在模拟中经过测试，包括具有可微物理模型的典型控制和机器人任务，例如点质量、蚂蚁模型的曲线跟踪，以及机器人 manipulator。

更新时间: 2024-06-19 00:07:53

领域: eess.SY,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2312.05547v2

SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations

Despite their remarkable successes, state-of-the-art large language models (LLMs), including vision-and-language models (VLMs) and unimodal language models (ULMs), fail to understand precise semantics. For example, semantically equivalent sentences expressed using different lexical compositions elicit diverging representations. The degree of this divergence and its impact on encoded semantics is not very well understood. In this paper, we introduce the SUGARCREPE++ dataset to analyze the sensitivity of VLMs and ULMs to lexical and semantic alterations. Each sample in SUGARCREPE++ dataset consists of an image and a corresponding triplet of captions: a pair of semantically equivalent but lexically different positive captions and one hard negative caption. This poses a 3-way semantic (in)equivalence problem to the language models. We comprehensively evaluate VLMs and ULMs that differ in architecture, pre-training objectives and datasets to benchmark the performance of SUGARCREPE++ dataset. Experimental results highlight the difficulties of VLMs in distinguishing between lexical and semantic variations, particularly in object attributes and spatial relations. Although VLMs with larger pre-training datasets, model sizes, and multiple pre-training objectives achieve better performance on SUGARCREPE++, there is a significant opportunity for improvement. We show that all the models which achieve better performance on compositionality datasets need not perform equally well on SUGARCREPE++, signifying that compositionality alone may not be sufficient for understanding semantic and lexical alterations. Given the importance of the property that the SUGARCREPE++ dataset targets, it serves as a new challenge to the vision-and-language community.

Updated: 2024-06-19 00:03:42

标题: "SUGARCREPE++数据集：视觉-语言模型对语义和词汇变化的敏感性"

摘要: 尽管现代最先进的大型语言模型（LLM），包括视觉和语言模型（VLM）和单模语言模型（ULM），取得了显著的成功，但它们仍然无法理解精确的语义。例如，使用不同词汇组合表达的语义等价句会引发不同的表征。这种分歧的程度及其对编码语义的影响尚不太清楚。在本文中，我们介绍了SUGARCREPE++数据集，以分析VLM和ULM对词汇和语义变化的敏感性。SUGARCREPE++数据集中的每个样本包括一幅图像和一个相应的三元组标题：一对语义等价但在词汇上不同的正标题以及一个困难的负标题。这给语言模型提出了一个三元语义（不）等价问题。我们全面评估了在架构、预训练目标和数据集方面有所不同的VLM和ULM，以评估SUGARCREPE++数据集的性能。实验结果突显了VLM在区分词汇和语义变化方面的困难，特别是在对象属性和空间关系方面。虽然具有更大的预训练数据集、模型大小和多个预训练目标的VLM在SUGARCREPE++上取得了更好的性能，但仍有显着的改进机会。我们表明，在组成数据集上取得更好性能的所有模型不一定在SUGARCREPE++上表现同样出色，这意味着仅靠组成性可能不足以理解语义和词汇的变化。鉴于SUGARCREPE++数据集的目标的重要性，它为视觉和语言社区提供了一个新的挑战。

更新时间: 2024-06-19 00:03:42

领域: cs.CV,cs.CL,cs.LG,68T45, 68T50,I.2.7; I.2.10

下载: http://arxiv.org/abs/2406.11171v2

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of merely final outcomes, shows great potential in enhancing students' reasoning capabilities. However, current methods struggle with sequence level KD under long-tailed data distributions, adversely affecting generalization on sparsely represented domains. We introduce the Multi-Stage Balanced Distillation (BalDistill) framework, which iteratively balances training data within a fixed computational budget. By dynamically selecting representative head domain examples and synthesizing tail domain examples, BalDistill achieves state-of-the-art performance across diverse long-tailed datasets, enhancing both the efficiency and efficacy of the distilled models.

Updated: 2024-06-19 00:01:14

标题: 多阶段平衡蒸馏：解决序列级知识蒸馏中的长尾挑战

摘要: 大型语言模型（LLMs）已显著推动了各种自然语言处理任务，但部署它们仍然在计算上昂贵。知识蒸馏（KD）是一个有希望的解决方案，可以将更大的教师LLMs的能力转移到更紧凑的学生模型中。特别是，序列级KD通过提炼基于原因的推理过程而不仅仅是最终结果，显示出在增强学生推理能力方面具有巨大潜力。然而，当前的方法在长尾数据分布下很难进行序列级KD，对稀疏表示领域的泛化产生负面影响。我们引入了多阶段平衡蒸馏（BalDistill）框架，该框架在固定的计算预算内迭代平衡训练数据。通过动态选择代表性的头域示例并合成尾域示例，BalDistill 在各种长尾数据集上实现了最先进的性能，提升了蒸馏模型的效率和效力。

更新时间: 2024-06-19 00:01:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13114v1

CU-Net: a U-Net architecture for efficient brain-tumor segmentation on BraTS 2019 dataset

Accurately segmenting brain tumors from MRI scans is important for developing effective treatment plans and improving patient outcomes. This study introduces a new implementation of the Columbia-University-Net (CU-Net) architecture for brain tumor segmentation using the BraTS 2019 dataset. The CU-Net model has a symmetrical U-shaped structure and uses convolutional layers, max pooling, and upsampling operations to achieve high-resolution segmentation. Our CU-Net model achieved a Dice score of 82.41%, surpassing two other state-of-the-art models. This improvement in segmentation accuracy highlights the robustness and effectiveness of the model, which helps to accurately delineate tumor boundaries, which is crucial for surgical planning and radiation therapy, and ultimately has the potential to improve patient outcomes.

Updated: 2024-06-19 00:01:01

标题: CU-Net：一种用于在BraTS 2019数据集上高效进行脑肿瘤分割的U-Net架构

摘要: 准确地从MRI扫描中分割脑肿瘤对于制定有效的治疗方案和改善患者预后至关重要。本研究介绍了一种基于BraTS 2019数据集使用哥伦比亚大学网络(CU-Net)架构的新实现，用于脑肿瘤分割。CU-Net模型具有对称的U形结构，并使用卷积层、最大池化和上采样操作来实现高分辨率分割。我们的CU-Net模型实现了82.41%的Dice分数，超越了其他两种最先进的模型。这种分割准确性的改善突显了该模型的稳健性和有效性，有助于准确勾画肿瘤边界，这对于手术规划和放射治疗至关重要，最终有潜力改善患者预后。

更新时间: 2024-06-19 00:01:01

领域: cs.CV,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2406.13113v1