Arxiv Day: Article

VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning

The proliferation of fake news poses a significant threat not only by disseminating misleading information but also by undermining the very foundations of democracy. The recent advance of generative artificial intelligence has further exacerbated the challenge of distinguishing genuine news from fabricated stories. In response to this challenge, we introduce VeraCT Scan, a novel retrieval-augmented system for fake news detection. This system operates by extracting the core facts from a given piece of news and subsequently conducting an internet-wide search to identify corroborating or conflicting reports. Then sources' credibility is leveraged for information verification. Besides determining the veracity of news, we also provide transparent evidence and reasoning to support its conclusions, resulting in the interpretability and trust in the results. In addition to GPT-4 Turbo, Llama-2 13B is also fine-tuned for news content understanding, information verification, and reasoning. Both implementations have demonstrated state-of-the-art accuracy in the realm of fake news detection.

Updated: 2024-06-24 23:53:05

标题: VeraCT扫描：具有可证明推理的检索增强型假新闻检测

摘要: 虚假新闻的泛滥不仅通过传播误导性信息，还通过破坏民主的基础构成了重大威胁。最近生成人工智能的进步进一步加剧了区分真实新闻和虚假故事的挑战。为了应对这一挑战，我们介绍了VeraCT Scan，这是一种用于检测虚假新闻的新颖检索增强系统。该系统通过从一条新闻中提取核心事实，随后进行全网搜索以识别协同或冲突的报告来运作。然后利用来源的可信度进行信息验证。除了确定新闻的真实性外，我们还提供透明的证据和推理来支持其结论，从而增加了结果的可解释性和信任度。除了GPT-4 Turbo外，Llama-2 13B也被调整用于新闻内容理解、信息验证和推理。这两种实施在虚假新闻检测领域展示了最先进的准确性。

更新时间: 2024-06-24 23:53:05

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.10289v2

Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models

Dynamic pricing, the practice of adjusting prices based on contextual factors, has gained significant attention due to its impact on revenue maximization. In this paper, we address the contextual dynamic pricing problem, which involves pricing decisions based on observable product features and customer characteristics. We propose a novel algorithm that achieves improved regret bounds while minimizing assumptions about the problem. Our algorithm discretizes the unknown noise distribution and combines the upper confidence bounds with a layered data partitioning technique to effectively regulate regret in each episode. These techniques effectively control the regret associated with pricing decisions, leading to the minimax optimality. Specifically, our algorithm achieves a regret upper bound of $\tilde{\mathcal{O}}(\rho_{\mathcal{V}}^{\frac{1}{3}}(\delta) T^{\frac{2}{3}})$, where $\rho_{\mathcal{V}}(\delta)$ represents the estimation error of the valuation function. Importantly, this bound matches the lower bound up to logarithmic terms, demonstrating the minimax optimality of our approach. Furthermore, our method extends beyond linear valuation models commonly used in dynamic pricing by considering general function spaces. We simplify the estimation process by reducing it to general offline regression oracles, making implementation more straightforward.

Updated: 2024-06-24 23:43:56

标题: 上下文动态定价中的极小化优化与一般估值模型

摘要: 动态定价是根据上下文因素调整价格的实践，由于其对收入最大化的影响而受到重视。在本文中，我们解决了与上下文动态定价问题相关的问题，该问题涉及基于可观察的产品特征和客户特征进行定价决策。我们提出了一种新颖的算法，该算法在最小化对问题的假设的同时实现了改进的后悔界。我们的算法离散化未知的噪声分布，并将上置信界与分层数据分区技术相结合，以有效地调节每一集的后悔。这些技术有效地控制了与定价决策相关的后悔，从而实现了极小化的最优性。具体而言，我们的算法实现了一个后悔上界为$\tilde{\mathcal{O}}(\rho_{\mathcal{V}}^{\frac{1}{3}}(\delta) T^{\frac{2}{3}})$，其中$\rho_{\mathcal{V}}(\delta)$表示估值函数的估计误差。重要的是，这个界与对数项相匹配，展示了我们方法的极小化最优性。此外，我们的方法通过考虑普通函数空间，超越了动态定价中常用的线性估值模型。我们通过将估计过程简化为一般离线回归预测器，使实施更加简单。

更新时间: 2024-06-24 23:43:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.17184v1

Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if these frameworks are actually aligned with human preferences. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs). Our experiments across several domains, including CartPole, Visual Gridworld environments and Atari games, provide evidence that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences. We also provide experimental evidence that not only shows that reward DDTs can often achieve competitive RL performance when compared with larger capacity deep neural network reward functions but also demonstrates the diagnostic utility of our framework in checking alignment of learned reward functions. We also observe that the choice between soft and hard (argmax) output of reward DDT reveals a tension between wanting highly shaped rewards to ensure good RL performance, while also wanting simpler, more interpretable rewards. Videos and code, are available at: https://sites.google.com/view/ddt-rlhf

Updated: 2024-06-24 23:43:30

标题: Differentiable Decision Trees是否能够实现从人类反馈中解释性奖励学习？

摘要: 人类反馈强化学习（RLHF）已成为捕捉人类意图的流行范式，以减轻手工制作奖励值的挑战。尽管对RLHF的兴趣日益增加，但大多数作品学习黑盒奖励函数，虽然表达力强，但难以解释，并且通常需要在我们甚至可以解密这些框架是否实际与人类偏好一致之前运行整个昂贵的RL过程。我们提出并评估了一种新颖的方法，使用可微分决策树（DDTs）从偏好中学习表达力强且可解释的奖励函数。我们在包括CartPole、Visual Gridworld环境和Atari游戏在内的多个领域进行的实验表明，我们学习的奖励函数的树结构有助于确定奖励函数与人类偏好的一致程度。我们还提供实验证据，不仅显示奖励DDTs在与更大容量的深度神经网络奖励函数相比通常可以实现竞争性RL性能，而且还展示了我们框架在检查学习奖励函数的一致性方面的诊断实用性。我们还观察到奖励DDT软输出和硬（argmax）输出之间的选择揭示了一种紧张关系，即希望高度塑形的奖励以确保良好的RL性能，同时也希望更简单、更可解释的奖励。视频和代码可在以下网址获取：https://sites.google.com/view/ddt-rlhf

更新时间: 2024-06-24 23:43:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.13004v4

Debiased Recommendation with Noisy Feedback

Ratings of a user to most items in recommender systems are usually missing not at random (MNAR), largely because users are free to choose which items to rate. To achieve unbiased learning of the prediction model under MNAR data, three typical solutions have been proposed, including error-imputation-based (EIB), inverse-propensity-scoring (IPS), and doubly robust (DR) methods. However, these methods ignore an alternative form of bias caused by the inconsistency between the observed ratings and the users' true preferences, also known as noisy feedback or outcome measurement errors (OME), e.g., due to public opinion or low-quality data collection process. In this work, we study intersectional threats to the unbiased learning of the prediction model from data MNAR and OME in the collected data. First, we design OME-EIB, OME-IPS, and OME-DR estimators, which largely extend the existing estimators to combat OME in real-world recommendation scenarios. Next, we theoretically prove the unbiasedness and generalization bound of the proposed estimators. We further propose an alternate denoising training approach to achieve unbiased learning of the prediction model under MNAR data with OME. Extensive experiments are conducted on three real-world datasets and one semi-synthetic dataset to show the effectiveness of our proposed approaches. The code is available at https://github.com/haoxuanli-pku/KDD24-OME-DR.

Updated: 2024-06-24 23:42:18

标题: 用噪声反馈进行无偏推荐

摘要: 用户对推荐系统中大多数项目的评分通常是缺失的，并且通常不是随机的，这主要是因为用户可以自由选择要评分的项目。为了在缺失非随机数据下实现预测模型的无偏学习，已经提出了三种典型的解决方案，包括基于误差插补的（EIB）、倒数倾向评分（IPS）和双重稳健（DR）方法。然而，这些方法忽略了一种由观察到的评分和用户真实偏好之间的不一致性引起的另一种偏差形式，也称为噪声反馈或结果测量误差（OME），例如由于公众意见或低质量数据收集过程。在这项工作中，我们研究了来自数据MNAR和OME的无偏学习的交叉威胁。首先，我们设计了OME-EIB、OME-IPS和OME-DR估计器，这些估计器在实际推荐场景中大大扩展了现有的估计器，以对抗OME。接下来，我们从理论上证明了所提出估计器的无偏性和泛化界限。我们进一步提出了一种替代去噪训练方法，以实现在具有OME的MNAR数据下的预测模型的无偏学习。我们在三个真实数据集和一个半合成数据集上进行了大量实验，以展示我们提出的方法的有效性。代码可在https://github.com/haoxuanli-pku/KDD24-OME-DR 上找到。

更新时间: 2024-06-24 23:42:18

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2406.17182v1

Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

Concept Drift is a phenomenon in which the underlying data distribution and statistical properties of a target domain change over time, leading to a degradation of the model's performance. Consequently, models deployed in production require continuous monitoring through drift detection techniques. Most drift detection methods to date are supervised, i.e., based on ground-truth labels. However, true labels are usually not available in many real-world scenarios. Although recent efforts have been made to develop unsupervised methods, they often lack the required accuracy, have a complexity that makes real-time implementation in production environments difficult, or are unable to effectively characterize drift. To address these challenges, we propose DriftLens, an unsupervised real-time concept drift detection framework. It works on unstructured data by exploiting the distribution distances of deep learning representations. DriftLens can also provide drift characterization by analyzing each label separately. A comprehensive experimental evaluation is presented with multiple deep learning classifiers for text, image, and speech. Results show that (i) DriftLens performs better than previous methods in detecting drift in $11/13$ use cases; (ii) it runs at least 5 times faster; (iii) its detected drift value is very coherent with the amount of drift (correlation $\geq 0.85$); (iv) it is robust to parameter changes.

Updated: 2024-06-24 23:41:46

标题: 实时深度学习表示中的无监督概念漂移检测

摘要: 概念漂移是一种现象，其中目标领域的基础数据分布和统计特性随时间变化，导致模型性能下降。因此，在生产中部署的模型需要通过漂移检测技术进行持续监控。迄今为止，大多数漂移检测方法都是监督的，即基于地面真实标签。然而，在许多实际场景中通常无法获得真实标签。尽管最近已经努力开发了无监督方法，但它们往往缺乏所需的准确性，复杂性使得在生产环境中实时实施困难，或者无法有效地表征漂移。为了解决这些挑战，我们提出了DriftLens，一种无监督的实时概念漂移检测框架。它通过利用深度学习表示的分布距离来处理非结构化数据。DriftLens还可以通过分析每个标签单独提供漂移特征。我们进行了多个深度学习分类器用于文本、图像和语音的全面实验评估。结果显示：(i) DriftLens在$11/13$个用例中检测漂移的表现优于先前的方法；(ii)它至少运行5倍更快；(iii)其检测到的漂移值与漂移量非常一致（相关性$\geq 0.85$）；(iv)它对参数变化具有鲁棒性。

更新时间: 2024-06-24 23:41:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17813v1

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Image retrieval, i.e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures. Recent works leverage text instructions to allow users to more freely express their search intents. However, they primarily focus on image pairs that are visually similar and/or can be characterized by a small set of pre-defined relations. The core thesis of this paper is that text instructions can enable retrieving images with richer relations beyond visual similarity. To show this, we introduce MagicLens, a series of self-supervised image retrieval models that support open-ended instructions. MagicLens is built on a key novel insight: image pairs that naturally occur on the same web pages contain a wide range of implicit relations (e.g., inside view of), and we can bring those implicit relations explicit by synthesizing instructions via foundation models. Trained on 36.7M (query image, instruction, target image) triplets with rich semantic relations mined from the web, MagicLens achieves results comparable with or better than prior best on eight benchmarks of various image retrieval tasks, while maintaining high parameter efficiency with a significantly smaller model size. Additional human analyses on a 1.4M-image unseen corpus further demonstrate the diversity of search intents supported by MagicLens. Code and models are publicly available at https://open-vision-language.github.io/MagicLens/.

Updated: 2024-06-24 23:41:29

标题: MagicLens: 通过开放式指令进行自监督图像检索

摘要: 图像检索，即在给定参考图像的情况下查找所需图像，固有地包含了丰富的、多方面的搜索意图，这些意图很难仅通过基于图像的度量来捕捉。最近的研究利用文本指令使用户能够更自由地表达他们的搜索意图。然而，它们主要集中在视觉上相似的图像对和/或可以用一小组预定义的关系进行表征的图像对。本文的核心论点是，文本指令可以使检索到具有比视觉相似性更丰富关系的图像。为了证明这一点，我们引入了MagicLens，这是一系列支持开放式指令的自监督图像检索模型。MagicLens建立在一个关键的新颖见解上：自然出现在同一网页上的图像对包含一系列隐含关系（例如，内部视图），我们可以通过合成指令来将这些隐含关系显性化。在从网络中挖掘的具有丰富语义关系的36.7M（查询图像，指令，目标图像）三元组进行训练后，MagicLens在八个不同图像检索任务的基准上取得了与先前最佳结果相媲美甚至更好的成绩，同时保持了高参数效率，并具有显著较小的模型规模。对一个包含1.4M图像的未见语料库进行的额外人类分析进一步展示了MagicLens支持的搜索意图的多样性。代码和模型可以在https://open-vision-language.github.io/MagicLens/上公开获取。

更新时间: 2024-06-24 23:41:29

领域: cs.CV,cs.AI,cs.CL,cs.IR,cs.MM

下载: http://arxiv.org/abs/2403.19651v2

Latent Diffusion Model-Enabled Real-Time Semantic Communication Considering Semantic Ambiguities and Channel Noises

Semantic communication (SemCom) has emerged as a new paradigm for 6G communication, with deep learning (DL) models being one of the key drives to shift from the accuracy of bit/symbol to the semantics and pragmatics of data. Nevertheless, DL-based SemCom systems often face performance bottlenecks due to overfitting, poor generalization, and sensitivity to outliers. Furthermore, the varying-fading gains and noises with uncertain signal-to-noise ratios (SNRs) commonly present in wireless channels usually restrict the accuracy of semantic information transmission. Consequently, this paper constructs a latent diffusion model-enabled SemCom system, and proposes three improvements compared to existing works: i) To handle potential outliers in the source data, semantic errors obtained by projected gradient descent based on the vulnerabilities of DL models, are utilized to update the parameters and obtain an outlier-robust encoder. ii) A lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter and is placed before the decoder at the receiver, enabling adaptation for out-of-distribution data and enhancing human-perceptual quality. iii) An end-to-end consistency distillation (EECD) strategy is used to distill the diffusion models trained in latent space, enabling deterministic single or few-step real-time denoising in various noisy channels while maintaining high semantic quality. Extensive numerical experiments across different datasets demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality, outperforming the existing denoising approaches in semantic metrics.

Updated: 2024-06-24 23:41:23

标题: 潜在扩散模型实现实时语义通信，考虑语义模糊和通道噪声

摘要: 语义通信（SemCom）已经成为6G通信的一种新范式，深度学习（DL）模型是从比特/符号的准确性转向数据的语义和语用的关键驱动之一。然而，基于DL的SemCom系统经常面临性能瓶颈，因为过拟合、泛化能力差以及对异常值敏感。此外，在无线信道中通常存在不确定信噪比（SNR）的信号衰落增益和噪声，这限制了语义信息传输的准确性。因此，本文构建了一个基于潜在扩散模型的SemCom系统，并提出了与现有工作相比的三个改进：i）为了处理源数据中的潜在异常值，基于DL模型的脆弱性得到的语义错误，被用来更新参数并获得一个鲁棒的编码器。ii）一个轻量级的单层潜在空间转换适配器在发射端进行一次性学习，并放置在接收端解码器之前，实现对分布外数据的适应，并增强人类感知质量。iii）采用端到端一致性蒸馏（EECD）策略对在潜在空间训练的扩散模型进行蒸馏，实现在各种嘈杂信道中进行确定性的单步或少步实时去噪，同时保持高语义质量。在不同数据集上进行的大量数值实验表明了所提出的SemCom系统的优越性，始终证明了其对异常值的稳健性、传输未知分布数据的能力以及在保持高人类感知质量的同时执行实时信道去噪任务的能力，优于现有的去噪方法在语义指标上的表现。

更新时间: 2024-06-24 23:41:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06644v2

Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks

The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D datasets and they easily encounter overfitting issues on small medical image datasets. To address this limitation, we propose a Diffusion-based 3D Vision Transformer (Diff3Dformer), which utilizes the latent space of the Diffusion model to form the slice sequence for 3D analysis and incorporates clustering attention into ViT to aggregate repetitive information within 3D CT scans, thereby harnessing the power of the advanced transformer in 3D classification tasks on small datasets. Our method exhibits improved performance on two different scales of small datasets of 3D lung CT scans, surpassing the state of the art 3D methods and other transformer-based approaches that emerged during the COVID-19 pandemic, demonstrating its robust and superior performance across different scales of data. Experimental results underscore the superiority of our proposed method, indicating its potential for enhancing medical image classification tasks in real-world scenarios.

Updated: 2024-06-24 23:23:18

标题: Diff3Dformer：利用切片序列扩散增强Transformer网络的3D CT分类

摘要: 与肺部疾病相关症状的表现可以在不同深度的患者中变化，突显了CT扫描中的3D信息对医学图像分类的重要性。虽然视觉变换器在图像分类任务中表现出优越性能，优于卷积神经网络，但它们的有效性通常在足够大的2D数据集上得到证明，并且在小型医学图像数据集上很容易遇到过拟合问题。为了解决这一限制，我们提出了一种基于扩散的3D视觉变换器（Diff3Dformer），该方法利用扩散模型的潜在空间形成切片序列进行3D分析，并将聚类注意力融入ViT中，以聚合3D CT扫描中的重复信息，从而利用高级变换器在小型数据集上的3D分类任务中的能力。我们的方法在两种不同规模的小型3D肺部CT扫描数据集上表现出改进的性能，超越了最先进的3D方法和其他在COVID-19大流行期间出现的基于变换器的方法，展示了其在不同数据规模下的鲁棒和优越性能。实验结果强调了我们提出的方法的优越性，表明它在实际场景中增强医学图像分类任务的潜力。

更新时间: 2024-06-24 23:23:18

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17173v1

Robust Zero Trust Architecture: Joint Blockchain based Federated learning and Anomaly Detection based Framework

This paper introduces a robust zero-trust architecture (ZTA) tailored for the decentralized system that empowers efficient remote work and collaboration within IoT networks. Using blockchain-based federated learning principles, our proposed framework includes a robust aggregation mechanism designed to counteract malicious updates from compromised clients, enhancing the security of the global learning process. Moreover, secure and reliable trust computation is essential for remote work and collaboration. The robust ZTA framework integrates anomaly detection and trust computation, ensuring secure and reliable device collaboration in a decentralized fashion. We introduce an adaptive algorithm that dynamically adjusts to varying user contexts, using unsupervised clustering to detect novel anomalies, like zero-day attacks. To ensure a reliable and scalable trust computation, we develop an algorithm that dynamically adapts to varying user contexts by employing incremental anomaly detection and clustering techniques to identify and share local and global anomalies between nodes. Future directions include scalability improvements, Dirichlet process for advanced anomaly detection, privacy-preserving techniques, and the integration of post-quantum cryptographic methods to safeguard against emerging quantum threats.

Updated: 2024-06-24 23:15:19

标题: 强大的零信任架构：基于区块链的联合式联邦学习和基于异常检测的框架

摘要: 这篇论文介绍了一种针对去中心化系统定制的强大的零信任架构（ZTA），可以促进物联网网络内高效的远程工作和协作。利用基于区块链的联邦学习原则，我们提出的框架包括一个强大的聚合机制，旨在抵制来自受损客户端的恶意更新，增强全局学习过程的安全性。此外，安全可靠的信任计算对于远程工作和协作至关重要。强大的ZTA框架集成了异常检测和信任计算，确保以去中心化的方式进行安全可靠的设备协作。我们引入了一个自适应算法，可以动态调整到不同的用户环境，利用无监督聚类来检测新的异常，如零日攻击。为了确保可靠和可扩展的信任计算，我们开发了一个算法，通过采用增量异常检测和聚类技术动态适应不同的用户环境，以识别和共享节点之间的本地和全局异常。未来的方向包括提高可扩展性、对高级异常检测使用狄利克雷过程、隐私保护技术以及整合后量子密码方法以应对新兴的量子威胁。

更新时间: 2024-06-24 23:15:19

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.17172v1

Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation benchmarks often focus primarily on simplistic single-step or multi-step reasoning with a limited set of inference rules. Furthermore, the lack of datasets for evaluating non-monotonic reasoning represents a crucial gap since it aligns more closely with human-like reasoning. To address these limitations, we propose Multi-LogiEval, a comprehensive evaluation dataset encompassing multi-step logical reasoning with various inference rules and depths. Multi-LogiEval covers three logic types--propositional, first-order, and non-monotonic--consisting of more than 30 inference rules and more than 60 of their combinations with various depths. Leveraging this dataset, we conduct evaluations on a range of LLMs including GPT-4, ChatGPT, Gemini-Pro, Yi, Orca, and Mistral, employing a zero-shot chain-of-thought. Experimental results show that there is a significant drop in the performance of LLMs as the reasoning steps/depth increases (average accuracy of ~68% at depth-1 to ~43% at depth-5). We further conduct a thorough investigation of reasoning chains generated by LLMs which reveals several important findings. We believe that Multi-LogiEval facilitates future research for evaluating and enhancing the logical reasoning ability of LLMs. Data is available at https://github.com/Mihir3009/Multi-LogiEval.

Updated: 2024-06-24 23:02:56

标题: Multi-LogiEval:朝着评估大语言模型多步逻辑推理能力的方向

摘要: 随着大型语言模型（LLMs）在自然语言理解任务中继续展现出卓越的性能，衡量它们对人类类似的多步逻辑推理能力变得至关重要。现有的逻辑推理评估基准往往主要关注简单的单步或多步推理，具有有限的推理规则。此外，缺乏用于评估非单调推理的数据集是一个关键的缺口，因为它更接近于人类类似的推理。为了解决这些局限性，我们提出了Multi-LogiEval，这是一个包含多步逻辑推理、不同推理规则和深度的全面评估数据集。Multi-LogiEval涵盖三种逻辑类型--命题、一阶和非单调--包括30多种推理规则和60多种它们的组合，具有不同的深度。利用这一数据集，我们对一系列LLMs进行评估，包括GPT-4、ChatGPT、Gemini-Pro、Yi、Orca和Mistral，采用零次迭代思维。实验结果显示，随着推理步骤/深度的增加，LLMs的性能显著下降（在深度1处的平均准确率约为68%，在深度5处的准确率约为43%）。我们进一步对LLMs生成的推理链进行了彻底调查，揭示了几个重要发现。我们相信Multi-LogiEval将促进未来研究，以评估和增强LLMs的逻辑推理能力。数据可在https://github.com/Mihir3009/Multi-LogiEval获取。

更新时间: 2024-06-24 23:02:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17169v1

Reinforcement Learning via Auxiliary Task Distillation

We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation loss transfers behaviors from these auxiliary tasks to solve the main task. We demonstrate that AuxDistill can learn a pixels-to-actions policy for a challenging multi-stage embodied object rearrangement task from the environment reward without demonstrations, a learning curriculum, or pre-trained skills. AuxDistill achieves $2.3 \times$ higher success than the previous state-of-the-art baseline in the Habitat Object Rearrangement benchmark and outperforms methods that use pre-trained skills and expert demonstrations.

Updated: 2024-06-24 23:02:18

标题: 通过辅助任务蒸馏的强化学习

摘要: 我们提出了一种名为Auxiliary Task Distillation（AuxDistill）的强化学习方法，通过从辅助强化学习任务中提取行为，使强化学习能够解决长时间跨度的机器人控制问题。AuxDistill通过同时进行具有辅助任务的多任务强化学习来实现这一目标，这些辅助任务更容易学习且与主任务相关。通过加权蒸馏损失将这些辅助任务中的行为转移至解决主任务。我们展示了AuxDistill能够学习一个具有挑战性的多阶段实体对象重新排列任务的像素到动作策略，而无需演示、学习课程或预先训练的技能。AuxDistill在Habitat Object Rearrangement基准测试中取得了比先前最先进基线高2.3倍的成功率，并且优于使用预训练技能和专家演示的方法。

更新时间: 2024-06-24 23:02:18

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.17168v1

Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

Efficient training and inference algorithms, such as low-rank adaption and model pruning, have shown impressive performance for learning Transformer-based large foundation models. However, due to the technical challenges of the non-convex optimization caused by the complicated architecture of Transformers, the theoretical study of why these methods can be applied to learn Transformers is mostly elusive. To the best of our knowledge, this paper shows the first theoretical analysis of the property of low-rank and sparsity of one-layer Transformers by characterizing the trained model after convergence using stochastic gradient descent. By focusing on a data model based on label-relevant and label-irrelevant patterns, we quantify that the gradient updates of trainable parameters are low-rank, which depends on the number of label-relevant patterns. We also analyze how model pruning affects the generalization while improving computation efficiency and conclude that proper magnitude-based pruning has a slight effect on the testing performance. We implement numerical experiments to support our findings.

Updated: 2024-06-24 23:00:58

标题: Transformers的学习可证明为低秩和稀疏：一层分析

摘要: 高效的训练和推断算法，如低秩适应和模型修剪，已经展现出对学习基于Transformer的大型基础模型有印象深刻的性能。然而，由于Transformer复杂架构导致的非凸优化的技术挑战，为什么这些方法可以应用于学习Transformer的理论研究大多是难以捉摸的。据我们所知，本文首次对一层Transformer的低秩和稀疏性质进行了理论分析，通过使用随机梯度下降在收敛后对训练模型进行表征。通过专注于基于标签相关和标签无关模式的数据模型，我们量化了可训练参数的梯度更新是低秩的，这取决于标签相关模式的数量。我们还分析了模型修剪如何影响泛化性能，同时提高计算效率，并得出结论：适当基于数量的修剪对测试性能有轻微影响。我们进行了数字实验来支持我们的发现。

更新时间: 2024-06-24 23:00:58

领域: cs.LG

下载: http://arxiv.org/abs/2406.17167v1

Laplacian Convolutional Representation for Traffic Time Series Imputation

Spatiotemporal traffic data imputation is of great significance in intelligent transportation systems and data-driven decision-making processes. To perform efficient learning and accurate reconstruction from partially observed traffic data, we assert the importance of characterizing both global and local trends in time series. In the literature, substantial works have demonstrated the effectiveness of utilizing the low-rank property of traffic data by matrix/tensor completion models. In this study, we first introduce a Laplacian kernel to temporal regularization for characterizing local trends in traffic time series, which can be formulated as a circular convolution. Then, we develop a low-rank Laplacian convolutional representation (LCR) model by putting the circulant matrix nuclear norm and the Laplacian kernelized temporal regularization together, which is proved to meet a unified framework that has a fast Fourier transform (FFT) solution in log-linear time complexity. Through extensive experiments on several traffic datasets, we demonstrate the superiority of LCR over several baseline models for imputing traffic time series of various time series behaviors (e.g., data noises and strong/weak periodicity) and reconstructing sparse speed fields of vehicular traffic flow. The proposed LCR model is also an efficient solution to large-scale traffic data imputation over the existing imputation models.

Updated: 2024-06-24 22:52:28

标题: 拉普拉斯卷积表示在交通时间序列插值中的应用

摘要: 时空交通数据插补在智能交通系统和数据驱动决策过程中具有重要意义。为了从部分观测到的交通数据中进行高效学习和准确重建，我们强调在时间序列中表征全局和局部趋势的重要性。在文献中，大量研究表明利用交通数据的低秩特性通过矩阵/张量完成模型的有效性。在这项研究中，我们首先引入拉普拉斯核到时间正则化，以表征交通时间序列中的局部趋势，可以被形式化为循环卷积。然后，我们通过将循环矩阵核范数和拉普拉斯核化的时间正则化结合在一起，开发了一个低秩拉普拉斯卷积表示(LCR)模型，证明符合一个统一框架，具有快速傅里叶变换(FFT)解决方案，时间复杂度为对数线性。通过对几个交通数据集的广泛实验，我们证明了LCR在插补各种时间序列行为的交通时间序列（例如数据噪声和强/弱周期性）和重构稀疏车辆交通流速场方面优于几种基准模型。提出的LCR模型也是对现有插补模型进行大规模交通数据插补的高效解决方案。

更新时间: 2024-06-24 22:52:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2212.01529v3

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address these critical issues, we introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries), performs multi-class classification for the original query and each paraphrase, and at the end aggregate all the classification labels based on their confidence scores. We evaluate PAG-LLM on two large multi-class classication datasets: CLINC, and Banking and show 22.7% and 15.1% error reduction. We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors

Updated: 2024-06-24 22:30:26

标题: 使用大型语言模型进行释义和聚合，以最小化意图分类错误

摘要: 大型语言模型（LLM）在自然语言生成方面取得了显著成功，但对它们在分类等决策任务中的适用性关注较少。我们展示了像LLaMa这样的LLM可以在大型多类分类任务上取得高性能，但仍会出现分类错误，更糟糕的是会生成超出词汇表的类标签。为了解决这些关键问题，我们引入了“释义和聚合”（PAG）-LLM方法，其中LLM生成输入查询的多个释义（并行查询），对原始查询和每个释义进行多类分类，最后根据它们的置信度得分聚合所有分类标签。我们在两个大型多类分类数据集CLINC和Banking上评估了PAG-LLM，并展示了22.7%和15.1%的错误减少。我们发现PAG-LLM特别适用于LLM不确定的困难示例，并减少了关键的误分类和虚构标签生成错误。

更新时间: 2024-06-24 22:30:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17163v1

Virtual Mines -- Component-level recycling of printed circuit boards using deep learning

This contribution gives an overview of an ongoing project using machine learning and computer vision components for improving the electronic waste recycling process. In circular economy, the "virtual mines" concept refers to production cycles where interesting raw materials are reclaimed in an efficient and cost-effective manner from end-of-life items. In particular, the growth of e-waste, due to the increasingly shorter life cycle of hi-tech goods, is a global problem. In this paper, we describe a pipeline based on deep learning model to recycle printed circuit boards at the component level. A pre-trained YOLOv5 model is used to analyze the results of the locally developed dataset. With a different distribution of class instances, YOLOv5 managed to achieve satisfactory precision and recall, with the ability to optimize with large component instances.

Updated: 2024-06-24 22:29:30

标题: 虚拟矿山-利用深度学习对印刷电路板进行组件级回收

摘要: 这篇论文概述了一个正在进行的项目，该项目利用机器学习和计算机视觉组件来改进电子废物回收过程。在循环经济中，“虚拟矿山”概念指的是从废弃物中以高效且具有成本效益的方式回收有价值的原材料的生产循环。特别是，由于高科技产品寿命越来越短，电子废物的增长是一个全球性问题。在本文中，我们描述了一种基于深度学习模型的管道，用于在组件级别回收印刷电路板。使用预训练的YOLOv5模型分析了本地开发的数据集的结果。由于类实例分布不同，YOLOv5成功实现了令人满意的精确度和召回率，并且能够优化大型组件实例。

更新时间: 2024-06-24 22:29:30

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17162v1

Supervised learning of spatial features with STDP and homeostasis using Spiking Neural Networks on SpiNNaker

Artificial Neural Networks (ANN) have gained significant popularity thanks to their ability to learn using the well-known backpropagation algorithm. Conversely, Spiking Neural Networks (SNNs), despite having broader capabilities than ANNs, have always posed challenges in the training phase. This paper shows a new method to perform supervised learning on SNNs, using Spike Timing Dependent Plasticity (STDP) and homeostasis, aiming at training the network to identify spatial patterns. Spatial patterns refer to spike patterns without a time component, where all spike events occur simultaneously. The method is tested using the SpiNNaker digital architecture. A SNN is trained to recognise one or multiple patterns and performance metrics are extracted to measure the performance of the network. Some considerations are drawn from the results showing that, in the case of a single trained pattern, the network behaves as the ideal detector, with 100% accuracy in detecting the trained pattern. However, as the number of trained patterns on a single network increases, the accuracy of identification is linked to the similarities between these patterns. This method of training an SNN to detect spatial patterns may be applied to pattern recognition in static images or traffic analysis in computer networks, where each network packet represents a spatial pattern. It will be stipulated that the homeostatic factor may enable the network to detect patterns with some degree of similarity, rather than only perfectly matching patterns.The principles outlined in this article serve as the fundamental building blocks for more complex systems that utilise both spatial and temporal patterns by converting specific features of input signals into spikes.One example of such a system is a computer network packet classifier, tasked with real-time identification of packet streams based on features within the packet content

Updated: 2024-06-24 22:15:57

标题: 在SpiNNaker上使用脉冲神经网络通过STDP和稳态学习空间特征的监督学习

摘要: 人工神经网络（ANN）由于能够利用著名的反向传播算法进行学习而获得了显著的流行度。相反，尽管脉冲神经网络（SNNs）具有比ANNs更广泛的功能，但在训练阶段始终存在挑战。本文展示了一种在SNNs上执行监督学习的新方法，使用脉冲时序相关可塑性（STDP）和稳态调节，旨在训练网络识别空间模式。空间模式指的是没有时间成分的脉冲模式，其中所有脉冲事件同时发生。该方法使用SpiNNaker数字架构进行测试。一个SNN被训练来识别一个或多个模式，并提取性能指标来衡量网络的性能。从结果中得出一些考虑，表明在单个训练模式的情况下，网络表现为理想的检测器，在检测经过训练的模式时准确率达到100％。然而，随着在单个网络上训练模式数量的增加，识别准确度与这些模式之间的相似性有关。训练SNN以检测空间模式的方法可以应用于静态图像的模式识别或计算机网络中的流量分析，其中每个网络数据包代表一个空间模式。将规定稳态调节因子可能使网络能够检测具有一定相似性的模式，而不仅仅是完全匹配模式。本文概述的原则是更复杂系统的基本构建模块，这些系统通过将输入信号的特定特征转换为脉冲来利用空间和时间模式。这种系统的一个例子是计算机网络数据包分类器，负责基于数据包内容中的特征实时识别数据包流。

更新时间: 2024-06-24 22:15:57

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2312.02659v2

Sparse Expansion and Neuronal Disentanglement

We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other one-shot sparsification approaches for the same inference FLOP budget per token, and that this gap grows as sparsity increases, leading to inference speedups. But why? To answer this, we provide strong evidence that the mixture of sparse experts is effectively $\textit{disentangling}$ the input-output relationship of every individual neuron across clusters of inputs. Specifically, sparse experts approximate the dense neuron output distribution with fewer weights by decomposing the distribution into a collection of simpler ones, each with a separate sparse dot product covering it. Interestingly, we show that the Wasserstein distance between a neuron's output distribution and a Gaussian distribution is an indicator of its entanglement level and contribution to the accuracy of the model. Every layer of an LLM has a fraction of highly entangled Wasserstein neurons, and model performance suffers more when these are sparsified as opposed to others. The code for Sparse Expansion is available at: https://github.com/Shavit-Lab/Sparse-Expansion .

Updated: 2024-06-24 22:14:42

标题: 稀疏扩展和神经元解缠

摘要: 我们展示了如何通过将LLM扩展为稀疏专家混合体来提高推理效率，其中每个专家是原始权重的副本，针对特定输入值簇进行了一次性修剪。我们称这种方法为$\textit{Sparse Expansion}$。我们展示了对于像Llama 270B这样的模型，随着稀疏专家数量的增加，Sparse Expansion在相同推理FLOP预算下优于所有其他一次性稀疏化方法，并且随着稀疏度的增加，这种差距会扩大，导致推理速度提升。但是为什么呢？为了回答这个问题，我们提供了强有力的证据，即稀疏专家混合体有效地$\textit{解耦}$了每个神经元在输入簇中的输入输出关系。具体来说，稀疏专家通过将分布分解为一系列更简单的分布，并使用单独的稀疏点积来覆盖它们，从而用更少的权重近似密集神经元输出分布。有趣的是，我们展示了神经元输出分布与高斯分布之间的Wasserstein距离是其纠缠程度和对模型准确性的贡献的指标。LLM的每一层都有一部分高度纠缠的Wasserstein神经元，并且当这些神经元稀疏化时，模型性能会受到更大影响。Sparse Expansion的代码可在以下链接找到：https://github.com/Shavit-Lab/Sparse-Expansion.

更新时间: 2024-06-24 22:14:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.15756v2

Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning

Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.

Updated: 2024-06-24 22:07:55

标题: 赫西安感知低秩扰动用于顺序稳健的持续学习

摘要: 持续学习旨在连续学习一系列任务，而不会忘记先前学到的知识。在这项工作中，我们提出了用于持续学习的Hessian Aware Low-Rank Perturbation算法。通过使用权重矩阵转换来建模参数在顺序任务中的转换，我们提出在神经网络的每一层中应用低秩逼近于任务自适应参数。具体来说，我们在理论上证明了Hessian和提出的低秩逼近之间的定量关系。然后根据由特定层梯度和低秩逼近误差估计的经验损失的边际增量全局确定逼近秩。此外，我们通过修剪较不重要的参数来控制模型容量，以减少参数增长。我们在各种基准测试上进行了广泛实验，包括一个具有大规模任务的数据集，并将我们的方法与一些最近的最先进方法进行了比较，以展示我们提出的方法的有效性和可扩展性。实证结果表明，我们的方法在不同基准测试中表现更好，特别是在实现任务顺序稳健性和处理遗忘问题方面。源代码位于https://github.com/lijiaqi/HALRP。

更新时间: 2024-06-24 22:07:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.15161v3

Demystifying the Compression of Mixture-of-Experts Through a Unified Framework

Scaling large language models has revolutionized the performance across diverse domains, yet the continual growth in model size poses significant challenges for real-world deployment. The Mixture of Experts (MoE) approach addresses this by dynamically selecting and activating only a subset of experts, significantly reducing computational costs while maintaining high performance. However, MoE introduces potential redundancy (e.g., parameters) and extra costs (e.g., communication overhead). Despite numerous compression techniques developed for mitigating the redundancy in dense models, the compression of MoE remains under-explored. We first bridge this gap with a cutting-edge unified framework that not only seamlessly integrates mainstream compression methods but also helps systematically understand MoE compression. This framework approaches compression from two perspectives: Expert Slimming which compresses individual experts and Expert Trimming which removes structured modules. Within this framework, we explore the optimization space unexplored by existing methods,and further introduce aggressive Expert Trimming techniques, i.e., Layer Drop and Block Drop, to eliminate redundancy at larger scales. Based on these insights,we present a comprehensive recipe to guide practitioners in compressing MoE effectively. Extensive experimental results demonstrate the effectiveness of the compression methods under our framework and the proposed recipe, achieving a 6.05x speedup and only 20.0GB memory usage while maintaining over 92% of performance on Mixtral-8x7B. Code is released at \url{https://github.com/DaizeDong/Unified-MoE-Compression}.

Updated: 2024-06-24 21:51:23

标题: 通过统一框架揭示混合专家压缩

摘要: 扩展大型语言模型已经彻底改变了各个领域的性能，然而模型规模的持续增长为实际部署带来了重大挑战。混合专家（MoE）方法通过动态选择和激活仅仅一个子集的专家，显著降低了计算成本同时保持高性能。然而，MoE引入了潜在的冗余（例如参数）和额外成本（例如通信开销）。尽管已经开发了许多用于减少密集模型中冗余的压缩技术，但MoE的压缩仍未得到充分探索。我们首先利用一种尖端的统一框架弥合了这一鸿沟，该框架不仅无缝集成主流的压缩方法，而且有助于系统地理解MoE的压缩。该框架从两个角度处理压缩问题：专家减薄，压缩单个专家，和专家修剪，删除结构化模块。在这个框架内，我们探索了现有方法未曾涉足的优化空间，并进一步引入了激进的专家修剪技术，即层丢弃和块丢弃，以消除更大规模的冗余。基于这些见解，我们提出了一套全面的指南，指导从业者有效地压缩MoE。广泛的实验结果表明，在我们的框架和提出的指南下，压缩方法的有效性，实现了6.05倍的加速和仅20.0GB的内存使用，同时在Mixtral-8x7B上保持了超过92%的性能。代码已发布在\url{https://github.com/DaizeDong/Unified-MoE-Compression}。

更新时间: 2024-06-24 21:51:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.02500v2

Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction

Mixture of experts is a prediction aggregation method in machine learning that aggregates the predictions of specialized experts. This method often outperforms Bayesian methods despite the Bayesian having stronger inductive guarantees. We argue that this is due to the greater functional capacity of mixture of experts. We prove that in a limiting case of mixture of experts will have greater capacity than equivalent Bayesian methods, which we vouchsafe through experiments on non-limiting cases. Finally, we conclude that mixture of experts is a type of abductive reasoning in the Peircian sense of hypothesis construction.

Updated: 2024-06-24 21:44:37

标题: 皮尔斯在机器中：专家混合模型如何进行假设构建

摘要: 专家混合是一种机器学习中的预测聚合方法，它聚合了专业专家的预测。尽管贝叶斯方法具有更强的归纳保证，但这种方法通常优于贝叶斯方法。我们认为这是由于专家混合具有更大的功能容量。我们证明，在专家混合的极限情况下，其容量将大于等效的贝叶斯方法，我们通过非极限情况的实验证实了这一点。最后，我们得出结论，专家混合是皮尔斯式假设构建的一种推理类型。

更新时间: 2024-06-24 21:44:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17150v1

Quantifying Heterogeneous Ecosystem Services With Multi-Label Soft Classification

Understanding and quantifying ecosystem services are crucial for sustainable environmental management, conservation efforts, and policy-making. The advancement of remote sensing technology and machine learning techniques has greatly facilitated this process. Yet, ground truth labels, such as biodiversity, are very difficult and expensive to measure. In addition, more easily obtainable proxy labels, such as land use, often fail to capture the complex heterogeneity of the ecosystem. In this paper, we demonstrate how land use proxy labels can be implemented with a soft, multi-label classifier to predict ecosystem services with complex heterogeneity.

Updated: 2024-06-24 21:38:13

标题: 用多标签软分类量化异质生态系统服务

摘要: 理解和量化生态系统服务对于可持续的环境管理、保护工作和政策制定至关重要。遥感技术和机器学习技术的进步极大地促进了这一过程。然而，地面真实标签，如生物多样性，很难且昂贵进行测量。此外，更容易获取的代理标签，如土地利用，通常无法捕捉生态系统的复杂异质性。在本文中，我们展示了如何将土地利用代理标签与软多标签分类器结合，以预测具有复杂异质性的生态系统服务。

更新时间: 2024-06-24 21:38:13

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2406.17147v1

Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity and Performance Restoration

Vision-Language Models (VLMs) integrate information from multiple modalities and have shown remarkable success across various tasks. However, deploying large-scale VLMs in resource-constrained scenarios is challenging. Pruning followed by finetuning offers a potential solution but remains underexplored for VLMs. This study addresses two key questions: how to distribute sparsity across different modality-specific models, and how to restore the performance of pruned sparse VLMs. Our preliminary studies identified two effective pruning settings: applying the same sparsity to both vision and language models, and pruning only the language models. While LoRA finetuning aims to restore sparse models, it faces challenges due to incompatibility with sparse models, disrupting the pruned sparsity. To overcome these issues, we propose SparseLoRA, which applies sparsity directly to LoRA weights. Our experimental results demonstrate significant improvements, including an 11.3\% boost under 2:4 sparsity and a 47.6\% enhancement under unstructured 70\% sparsity. Code is released at: \url{https://github.com/Shwai-He/VLM-Compression}.

Updated: 2024-06-24 21:37:45

标题: 重新思考视觉语言模型的修剪：有效稀疏性和性能恢复策略

摘要: 视觉-语言模型（VLMs）集成了来自多种模态的信息，在各种任务中取得了显著的成功。然而，在资源受限的情况下部署大规模的VLMs是具有挑战性的。修剪后再微调提供了一个潜在的解决方案，但对于VLMs来说仍未被充分探索。本研究解决了两个关键问题：如何在不同的模态特定模型之间分配稀疏性，以及如何恢复修剪后稀疏VLMs的性能。我们的初步研究确定了两种有效的修剪设置：将相同的稀疏性应用于视觉和语言模型，并只修剪语言模型。虽然LoRA微调旨在恢复稀疏模型，但由于与稀疏模型不兼容而面临挑战，破坏了修剪后的稀疏性。为了克服这些问题，我们提出了SparseLoRA，直接将稀疏性应用于LoRA权重。我们的实验结果显示了显著的改进，包括在2:4稀疏性下提升了11.3\%，在非结构化70\%稀疏性下提升了47.6\%。代码已发布在：\url{https://github.com/Shwai-He/VLM-Compression}。

更新时间: 2024-06-24 21:37:45

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.02424v2

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting in reduced memory requirement and improved GPU performance. In addition, we develop GraphPipe, a distributed system that exploits GPP strategies to enable performant and scalable DNN training. GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies. Evaluation on a variety of DNNs shows that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6X. GraphPipe also reduces the search time by 9-21X compared to PipeDream and Piper.

Updated: 2024-06-24 21:32:51

标题: GraphPipe：通过图管道并行性提高DNN训练的性能和可扩展性

摘要: 深度神经网络（DNNs）在规模上持续快速增长，使得它们无法在单个设备上训练。管道并行性通常用于现有的DNN系统中，通过将DNN分成多个阶段，同时以管道方式为不同的微批次进行DNN训练，从而支持大规模的DNN训练。然而，现有的管道并行方法只考虑顺序管道阶段，因此忽略了DNN的拓扑结构，导致错过了模型并行的机会。本文提出了图管道并行（GPP），一种新的管道并行方案，将DNN分成通过有向无环图确定的依赖关系的管道阶段。GPP推广了现有的顺序管道并行性，并保留了DNN的固有拓扑结构，以实现计算独立运算符的并发执行，从而减少内存需求并提高GPU性能。此外，我们开发了GraphPipe，一个利用GPP策略实现高性能和可扩展DNN训练的分布式系统。GraphPipe将DNN分成一系列阶段的图，为这些阶段优化微批次计划，并使用发现的GPP策略并行化DNN训练。对各种DNN的评估表明，GraphPipe在性能上优于现有的管道并行系统，如PipeDream和Piper，最多提高了1.6倍。与PipeDream和Piper相比，GraphPipe还将搜索时间缩短了9-21倍。

更新时间: 2024-06-24 21:32:51

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17145v1

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models

Today's large language models (LLMs) typically train on short text segments (e.g., <4K tokens) due to the quadratic complexity of their Transformer architectures. As a result, their performance suffers drastically on inputs longer than those encountered during training, substantially limiting their applications in real-world tasks involving long contexts such as encoding scientific articles, code repositories, or long dialogues. Through theoretical analysis and empirical investigation, this work identifies three major factors contributing to this length generalization failure. Our theoretical analysis further reveals that commonly used techniques like truncating the attention window or relative positional encodings are inadequate to address them. Answering these challenges, we propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts. LM-Infinite is highly flexible and can be used with most modern LLMs off-the-shelf. Without any parameter updates, it allows LLMs pre-trained with 2K or 4K-long segments to generalize to up to 200M length inputs while retaining perplexity. It also improves performance on downstream tasks such as Passkey Retrieval and Qasper in the zero-shot setting. LM-Infinite brings substantial efficiency improvements: it achieves 2.7x decoding speed up and 7.5x memory saving over the original model. Our codes are released at \url{https://github.com/Glaciohound/LM-Infinite}.

Updated: 2024-06-24 21:22:00

标题: LM-Infinite：大型语言模型的零射程极长泛化

摘要: 当今的大型语言模型（LLMs）通常在短文本段（例如<4K tokens）上进行训练，这是由于它们的Transformer架构的二次复杂性。因此，它们在长于训练过程中遇到的输入时性能急剧下降，严重限制了它们在涉及长篇背景的实际任务中的应用，如编码科学文章、代码存储库或长对话。通过理论分析和实证调查，本研究确定了导致这种长度泛化失败的三个主要因素。我们的理论分析进一步揭示，常用的技术，如截断注意力窗口或相对位置编码，无法解决这些问题。针对这些挑战，我们提出了LM-Infinite，这是一种简单而有效的方法，用于增强LLMs处理长篇背景的能力。LM-Infinite非常灵活，可以与大多数现代LLMs直接使用。在没有任何参数更新的情况下，它使得LLMs在预训练时使用2K或4K长段泛化到长达200M长度输入，同时保持困惑度。它还提高了零样本设置下的下游任务（如Passkey检索和Qasper）的性能。LM-Infinite带来了显著的效率提升：它实现了2.7倍的解码速度提升和7.5倍的内存节省，相比原始模型。我们的代码已发布在\url{https://github.com/Glaciohound/LM-Infinite}。

更新时间: 2024-06-24 21:22:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2308.16137v7

Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy

Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules. However, most existing ML models for molecular electronic properties use density functional theory (DFT) databases as ground truth in training, and their prediction accuracy cannot surpass that of DFT. In this work, we developed a unified ML method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data. Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties. As case studies, we apply the model to aromatic compounds and semiconducting polymers on both ground state and excited state properties, demonstrating its accuracy and generalization capability to complex systems that are hard to calculate using CCSD(T)-level methods.

Updated: 2024-06-24 21:16:36

标题: 多任务学习用于接近耦合簇准确性的分子电子结构

摘要: 机器学习（ML）在量子化学中发挥着重要作用，为分子的各种性质提供了快速评估的预测模型。然而，目前大多数用于分子电子性质的ML模型在训练中使用密度泛函理论（DFT）数据库作为基准，它们的预测准确度无法超过DFT。在这项工作中，我们开发了一种统一的ML方法，用于有机分子的电子结构，使用黄金标准CCSD(T)计算作为训练数据。在烃分子上进行测试，我们的模型在计算成本和各种量子化学性质的预测准确度方面优于DFT中广泛使用的混合和双重混合泛函。作为案例研究，我们将该模型应用于芳香化合物和半导体聚合物的基态和激发态性质，展示了其对难以使用CCSD(T)级方法计算的复杂系统的准确性和泛化能力。

更新时间: 2024-06-24 21:16:36

领域: physics.chem-ph,cond-mat.mtrl-sci,cs.AI,cs.CE,physics.comp-ph

下载: http://arxiv.org/abs/2405.12229v2

Bayesian temporal biclustering with applications to multi-subject neuroscience studies

We consider the problem of analyzing multivariate time series collected on multiple subjects, with the goal of identifying groups of subjects exhibiting similar trends in their recorded measurements over time as well as time-varying groups of associated measurements. To this end, we propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of subjects induces a time-varying partition of measurements. Our approach allows for data-driven determination of the number of subject and measurement clusters as well as estimation of the number and location of changepoints in measurement partitions. To efficiently perform model fitting and posterior estimation with Markov Chain Monte Carlo, we derive a blocked update of measurements' cluster-assignment sequences. We illustrate the performance of our model in two applications to functional magnetic resonance imaging data and to an electroencephalogram dataset. The results indicate that the proposed model can combine information from potentially many subjects to discover a set of interpretable, dynamic patterns. Experiments on simulated data compare the estimation performance of the proposed model against ground-truth values and other statistical methods, showing that it performs well at identifying ground-truth subject and measurement clusters even when no subject or time dependence is present.

Updated: 2024-06-24 20:41:37

标题: 具有多个主题的神经科学研究的贝叶斯时间双聚类算法

摘要: 我们考虑在多个受试者上收集的多变量时间序列分析问题，旨在识别在记录的测量数据随时间呈现相似趋势的受试者群组，以及相关测量数据的时间变化群组。为此，我们提出了一个具有嵌套分区的时间双聚类的贝叶斯模型，其中受试者的时间不变分区引起了测量数据的时间变化分区。我们的方法允许数据驱动地确定受试者和测量数据的群组数量，以及估计测量数据分区中的变点数量和位置。为了有效地进行模型拟合和后验估计，我们推导了测量数据群组分配序列的阻塞更新。我们在功能性磁共振成像数据和脑电图数据集的两个应用中展示了我们模型的性能。结果表明，所提出的模型可以结合来自多个受试者的信息，发现一组可解释的动态模式。在模拟数据上的实验比较了所提出模型的估计性能与基本真值和其他统计方法，结果显示即使没有出现受试者或时间依赖性，该模型在识别基本真值受试者和测量数据群组方面表现良好。

更新时间: 2024-06-24 20:41:37

领域: stat.ME,cs.LG,stat.AP

下载: http://arxiv.org/abs/2406.17131v1

Attention-based Dynamic Multilayer Graph Neural Networks for Loan Default Prediction

Whereas traditional credit scoring tends to employ only individual borrower- or loan-level predictors, it has been acknowledged for some time that connections between borrowers may result in default risk propagating over a network. In this paper, we present a model for credit risk assessment leveraging a dynamic multilayer network built from a Graph Neural Network and a Recurrent Neural Network, each layer reflecting a different source of network connection. We test our methodology in a behavioural credit scoring context using a dataset provided by U.S. mortgage financier Freddie Mac, in which different types of connections arise from the geographical location of the borrower and their choice of mortgage provider. The proposed model considers both types of connections and the evolution of these connections over time. We enhance the model by using a custom attention mechanism that weights the different time snapshots according to their importance. After testing multiple configurations, a model with GAT, LSTM, and the attention mechanism provides the best results. Empirical results demonstrate that, when it comes to predicting probability of default for the borrowers, our proposed model brings both better results and novel insights for the analysis of the importance of connections and timestamps, compared to traditional methods.

Updated: 2024-06-24 20:32:18

标题: 基于注意力机制的动态多层图神经网络用于贷款违约预测

摘要: 传统信用评分往往只采用个人借款人或贷款级别的预测因子，但长期以来已经认识到借款人之间的关联可能导致违约风险在网络中传播。本文提出了一种利用从图神经网络和递归神经网络构建的动态多层网络的信用风险评估模型，每个层反映不同的网络连接来源。我们在美国抵押融资机构房地美提供的数据集中测试了我们的方法，该数据集中不同类型的连接源自借款人的地理位置和他们选择的抵押贷款提供方。所提出的模型考虑了这两种连接类型及这些连接随时间的演化。我们通过使用自定义注意机制来增强模型，该机制根据其重要性对不同的时间快照进行加权。经过多种配置的测试，一个采用GAT、LSTM和注意机制的模型提供了最佳结果。实证结果表明，在预测借款人违约概率时，与传统方法相比，我们提出的模型既带来了更好的结果，也为分析连接和时间戳的重要性提供了新的见解。

更新时间: 2024-06-24 20:32:18

领域: q-fin.GN,cs.LG

下载: http://arxiv.org/abs/2402.00299v2

Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars

In a post-ChatGPT world, this paper explores the potential of leveraging scalable artificial intelligence for scientific discovery. We propose that scaling up artificial intelligence on high-performance computing platforms is essential to address such complex problems. This perspective focuses on scientific use cases like cognitive simulations, large language models for scientific inquiry, medical image analysis, and physics-informed approaches. The study outlines the methodologies needed to address such challenges at scale on supercomputers or the cloud and provides exemplars of such approaches applied to solve a variety of scientific problems.

Updated: 2024-06-24 20:29:29

标题: 可扩展的科学人工智能：观点、方法和示例

摘要: 在后ChatGPT世界，本文探讨了利用可扩展人工智能进行科学发现的潜力。我们提出，在高性能计算平台上扩展人工智能至关重要，以解决这类复杂问题。这一观点聚焦于科学用例，如认知模拟、大型语言模型用于科学探究、医学图像分析和基于物理的方法。研究概述了在超级计算机或云上解决此类挑战所需的方法，并提供了应用这些方法解决各种科学问题的示例。

更新时间: 2024-06-24 20:29:29

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2406.17812v1

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe robustness pitfall in deep learning models trained on single modality data. Multimodal Large Language Models (MLLMs), which integrate both vision and language models, have demonstrated strong capability in joint vision-language understanding. However, whether spurious biases are prevalent in MLLMs remains under-explored. We mitigate this gap by analyzing the spurious biases in a multimodal setting, uncovering the specific test data patterns that can manifest this problem when biases in the vision model cascade into the alignment between visual and text tokens in MLLMs. To better understand this problem, we introduce MM-SpuBench, a comprehensive visual question-answering (VQA) benchmark designed to evaluate MLLMs' reliance on nine distinct categories of spurious correlations from five open-source image datasets. The VQA dataset is built from human-understandable concept information (attributes). Leveraging this benchmark, we conduct a thorough evaluation of current state-of-the-art MLLMs. Our findings illuminate the persistence of the reliance on spurious correlations from these models and underscore the urge for new methodologies to mitigate spurious biases. To support the MLLM robustness research, we release our VQA benchmark at https://huggingface.co/datasets/mmbench/MM-SpuBench.

Updated: 2024-06-24 20:29:16

标题: MM-SpuBench：走向更好地理解多模态LLMs中的虚假偏差

摘要: 虚假偏见是指在预测中倾向于使用非必要的输入属性和目标变量之间的虚假相关性，已经揭示了在单模态数据上训练的深度学习模型中存在严重的鲁棒性缺陷。融合视觉和语言模型的多模态大型语言模型（MLLMs）在联合视觉-语言理解方面表现出强大的能力。然而，MLLMs中是否存在虚假偏见仍未得到充分探讨。我们通过分析多模态设置中的虚假偏见来弥补这一空白，揭示了特定的测试数据模式，当视觉模型中的偏见级联到MLLMs中视觉和文本标记之间的对齐时，可能会出现这个问题。为了更好地理解这个问题，我们引入了MM-SpuBench，一个全面的视觉问答（VQA）基准，旨在评估MLLMs对来自五个开源图像数据集的九个不同类别的虚假相关性的依赖。VQA数据集是从人类可理解的概念信息（属性）构建的。利用这个基准，我们对当前最先进的MLLMs进行了彻底评估。我们的研究结果揭示了这些模型对虚假相关性的依赖性的持续存在，并强调了需要新方法来减轻虚假偏见。为支持MLLMs鲁棒性研究，我们在https://huggingface.co/datasets/mmbench/MM-SpuBench上发布了我们的VQA基准。

更新时间: 2024-06-24 20:29:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17126v1

A Wiener process perspective on local intrinsic dimension estimation methods

Local intrinsic dimension (LID) estimation methods have received a lot of attention in recent years thanks to the progress in deep neural networks and generative modeling. In opposition to old non-parametric methods, new methods use generative models to approximate diffused dataset density and scale the methods to high-dimensional datasets like images. In this paper, we investigate the recent state-of-the-art parametric LID estimation methods from the perspective of the Wiener process. We explore how these methods behave when their assumptions are not met. We give an extended mathematical description of those methods and their error as a function of the probability density of the data.

Updated: 2024-06-24 20:27:13

标题: 一个关于局部内在维度估计方法的维纳过程视角

摘要: 最近几年，由于深度神经网络和生成建模的进展，本地固有维度（LID）估计方法受到了很多关注。与旧的非参数方法相反，新方法使用生成模型来逼近扩散的数据集密度，并将方法扩展到像图像这样的高维数据集。本文从维纳过程的角度探讨了最近的最先进的参数化LID估计方法。我们研究了当这些方法的假设不成立时它们的行为如何。我们给出了这些方法及其误差的扩展数学描述，作为数据的概率密度函数的函数。

更新时间: 2024-06-24 20:27:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.17125v1

SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning

Large Language Models (LLMs) have highlighted the necessity of effective unlearning mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims at removing undesired data influences and associated model capabilities without compromising utility beyond the scope of unlearning. While interest in studying LLM unlearning is growing, the impact of the optimizer choice for LLM unlearning remains unexplored. In this work, we shed light on the significance of optimizer selection in LLM unlearning for the first time, establishing a clear connection between second-order optimization and influence unlearning (a classical approach using influence functions to update the model for data influence removal). This insight propels us to develop a second-order optimization-based LLM unlearning framework, termed Second-Order UnLearning (SOUL), which extends the static, one-shot model update using influence unlearning to a dynamic, iterative unlearning process. Our extensive experiments show that SOUL consistently outperforms conventional first-order methods across various unlearning tasks, models, and metrics, indicating that second-order optimization offers an effective and broadly applicable solution for LLM unlearning. Codes are available at https://github.com/OPTML-Group/SOUL.

Updated: 2024-06-24 20:24:53

标题: SOUL：解锁LLM反学习的二阶优化力量

摘要: 大型语言模型（LLMs）凸显了有效的遗忘机制对于遵守数据规定和道德人工智能实践的必要性。LLM遗忘旨在去除不受欢迎的数据影响和相关模型功能，而不会损害遗忘范围以外的效用。虽然对LLM遗忘的研究兴趣正在增长，但LLM遗忘中优化器选择的影响仍未被探索。在这项工作中，我们首次阐明了LLM遗忘中优化器选择的重要性，建立了二阶优化和影响遗忘之间的明确联系（使用影响函数更新模型以去除数据影响的经典方法）。这一洞察驱使我们开发了基于二阶优化的LLM遗忘框架，称为Second-Order UnLearning（SOUL），将使用影响遗忘进行静态、一次性模型更新扩展为动态、迭代的遗忘过程。我们的广泛实验表明，SOUL在各种遗忘任务、模型和指标中始终优于传统的一阶方法，表明二阶优化为LLM遗忘提供了一种有效且广泛适用的解决方案。代码可在https://github.com/OPTML-Group/SOUL找到。

更新时间: 2024-06-24 20:24:53

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.18239v4

Investigating Confidence Estimation Measures for Speaker Diarization

Speaker diarization systems segment a conversation recording based on the speakers' identity. Such systems can misclassify the speaker of a portion of audio due to a variety of factors, such as speech pattern variation, background noise, and overlapping speech. These errors propagate to, and can adversely affect, downstream systems that rely on the speaker's identity, such as speaker-adapted speech recognition. One of the ways to mitigate these errors is to provide segment-level diarization confidence scores to downstream systems. In this work, we investigate multiple methods for generating diarization confidence scores, including those derived from the original diarization system and those derived from an external model. Our experiments across multiple datasets and diarization systems demonstrate that the most competitive confidence score methods can isolate ~30% of the diarization errors within segments with the lowest ~10% of confidence scores.

Updated: 2024-06-24 20:21:38

标题: 调查说话人辨识的置信度估计措施

摘要: 说话者分离系统基于说话者的身份对对话录音进行分段。这种系统可能会由于各种因素（如语音模式变化、背景噪音和重叠语音）而错误地对音频的某部分进行说话者分类。这些错误会传播到依赖说话者身份的下游系统，并可能对其产生不利影响，例如说话者自适应语音识别。缓解这些错误的一种方法是向下游系统提供分段级别的分离信心评分。在这项工作中，我们研究了多种生成分离信心评分的方法，包括那些源自原始分离系统和那些源自外部模型的方法。我们在多个数据集和分离系统上的实验表明，最具竞争力的信心评分方法可以将约30%的分离错误隔离在信心评分最低的约10%的段内。

更新时间: 2024-06-24 20:21:38

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.17124v1

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized benchmarks has limited the study of Bayesian optimization within the domain. To address this, we present CATBench, a comprehensive benchmarking suite that captures the complexities of compiler autotuning, ranging from discrete, conditional, and permutation parameter types to known and unknown binary constraints, as well as both multi-fidelity and multi-objective evaluations. The benchmarks in CATBench span a range of machine learning-oriented computations, from tensor algebra to image processing and clustering, and uses state-of-the-art compilers, such as TACO and RISE/ELEVATE. CATBench offers a unified interface for evaluating Bayesian optimization algorithms, promoting reproducibility and innovation through an easy-to-use, fully containerized setup of both surrogate and real-world compiler optimization tasks. We validate CATBench on several state-of-the-art algorithms, revealing their strengths and weaknesses and demonstrating the suite's potential for advancing both Bayesian optimization and compiler autotuning research.

Updated: 2024-06-24 20:15:04

标题: CATBench：用于黑盒优化的编译器自动调优基准套件

摘要: 贝叶斯优化是一种强大的方法，用于自动调整编译器。自动调整的复杂景观为黑盒优化器提供了许多很少考虑到的结构性挑战，而缺乏标准化的基准测试限制了贝叶斯优化在该领域内的研究。为了解决这个问题，我们提出了CATBench，一个全面的基准测试套件，捕捉了编译器自动调整的复杂性，从离散、条件和排列参数类型到已知和未知的二进制约束，以及多种精度和多目标评估。CATBench中的基准测试涵盖了一系列面向机器学习的计算，从张量代数到图像处理和聚类，并使用了最先进的编译器，如TACO和RISE/ELEVATE。CATBench为评估贝叶斯优化算法提供了统一的界面，通过易于使用的完全容器化设置，促进了对代理和真实世界编译器优化任务的可重复性和创新。我们在几种最先进的算法上验证了CATBench，揭示了它们的优势和劣势，并展示了该套件在推进贝叶斯优化和编译器自动调整研究方面的潜力。

更新时间: 2024-06-24 20:15:04

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.17811v1

Accelerating Phase Field Simulations Through a Hybrid Adaptive Fourier Neural Operator with U-Net Backbone

Prolonged contact between a corrosive liquid and metal alloys can cause progressive dealloying. For such liquid-metal dealloying (LMD) process, phase field models have been developed. However, the governing equations often involve coupled non-linear partial differential equations (PDE), which are challenging to solve numerically. In particular, stiffness in the PDEs requires an extremely small time steps (e.g. $10^{-12}$ or smaller). This computational bottleneck is especially problematic when running LMD simulation until a late time horizon is required. This motivates the development of surrogate models capable of leaping forward in time, by skipping several consecutive time steps at-once. In this paper, we propose U-Shaped Adaptive Fourier Neural Operators (U-AFNO), a machine learning (ML) model inspired by recent advances in neural operator learning. U-AFNO employs U-Nets for extracting and reconstructing local features within the physical fields, and passes the latent space through a vision transformer (ViT) implemented in the Fourier space (AFNO). We use U-AFNOs to learn the dynamics mapping the field at a current time step into a later time step. We also identify global quantities of interest (QoI) describing the corrosion process (e.g. the deformation of the liquid-metal interface) and show that our proposed U-AFNO model is able to accurately predict the field dynamics, in-spite of the chaotic nature of LMD. Our model reproduces the key micro-structure statistics and QoIs with a level of accuracy on-par with the high-fidelity numerical solver. We also investigate the opportunity of using hybrid simulations, in which we alternate forward leap in time using the U-AFNO with high-fidelity time stepping. We demonstrate that while advantageous for some surrogate model design choices, our proposed U-AFNO model in fully auto-regressive settings consistently outperforms hybrid schemes.

Updated: 2024-06-24 20:13:23

标题: 通过具有U-Net骨干的混合自适应傅里叶神经操作器加速相场模拟

摘要: 长时间接触腐蚀性液体和金属合金会导致渐进性脱合金化。针对这种液体-金属脱合金化（LMD）过程，已经开发了相场模型。然而，控制方程通常涉及耦合的非线性偏微分方程（PDE），这些方程在数值上具有挑战性。特别是PDE中的刚度需要非常小的时间步长（例如$10^{-12}$或更小）。当需要运行LMD模拟直到较晚的时间范围时，这种计算瓶颈特别棘手。这促使开发能够通过一次跳过多个连续时间步长向前跃进时间的代理模型。在本文中，我们提出了U形自适应傅里叶神经算子（U-AFNO），这是一种受神经算子学习最新进展启发的机器学习（ML）模型。U-AFNO利用U-Net来提取和重构物理场内的局部特征，并通过在傅里叶空间中实现的ViT（视觉变换器）传递潜在空间（AFNO）。我们使用U-AFNO来学习将当前时间步的场映射到较晚时间步的动态。我们还确定描述腐蚀过程的全局感兴趣量（QoI）（例如液体-金属界面的变形），并展示我们提出的U-AFNO模型能够准确预测场的动态，尽管LMD的混沌性质。我们的模型以与高保真度数值求解器相媲美的准确性水平重现了关键微观结构统计和QoI。我们还调查了使用混合模拟的机会，在这种模拟中，我们交替使用U-AFNO进行向前时间跳跃和高保真度的时间步进。我们证明，尽管对于某些代理模型设计选择有利，我们提出的U-AFNO模型在完全自回归设置中一贯优于混合方案。

更新时间: 2024-06-24 20:13:23

领域: cs.CE,cs.CV,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.17119v1

Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models

Despite the rapid progress and outstanding performance of Large Vision-Language Models (LVLMs) in recent years, LVLMs have been plagued by the issue of hallucination, i.e., LVLMs tend to generate responses that are inconsistent with the corresponding visual inputs. To evaluate the degree of hallucination in LVLMs, previous works have proposed a series of benchmarks featuring different types of tasks and evaluation metrics. However, we find that the quality of the existing hallucination benchmarks varies, with some suffering from problems, e.g., inconsistent evaluation results under repeated tests, and misalignment with human evaluation. To this end, we propose a Hallucination benchmark Quality Measurement framework (HQM), which leverages various indicators to assess the reliability and validity of existing hallucination benchmarks separately. Specifically, for reliability we explore test-retest reliability and parallel-forms reliability, while for validity we examine criterion validity and coverage of hallucination types. Furthermore, based on the results of our quality measurement, we construct a High-Quality Hallucination Benchmark (HQH) for LVLMs. We conduct an extensive evaluation of over 10 representative LVLMs, including GPT-4o and Gemini-Vision-Pro, to provide an in-depth analysis of the hallucination issues in existing models. Our benchmark is publicly available at https://github.com/HQHBench/HQHBench.

Updated: 2024-06-24 20:08:07

标题: 评估用于大型视觉-语言模型的幻觉基准的质量

摘要: 尽管近年来大规模视觉语言模型（LVLMs）取得了快速进展和出色表现，但LVLMs一直受到幻觉问题的困扰，即LVLMs倾向于生成与相应视觉输入不一致的响应。为了评估LVLMs中幻觉的程度，先前的研究提出了一系列具有不同类型任务和评估指标的基准。然而，我们发现现有幻觉基准的质量存在差异，有些存在问题，例如在重复测试中评估结果不一致，与人类评估不一致等。因此，我们提出了一个幻觉基准质量测量框架（HQM），利用各种指标分别评估现有幻觉基准的可靠性和有效性。具体来说，我们探讨了测试-重测可靠性和平行形式可靠性，而对于有效性，我们检查了标准有效性和幻觉类型的覆盖范围。此外，根据我们的质量测量结果，我们构建了一个高质量的幻觉基准（HQH）用于LVLMs。我们对超过10个代表性LVLMs进行了广泛评估，包括GPT-4o和Gemini-Vision-Pro，以深入分析现有模型中的幻觉问题。我们的基准可在https://github.com/HQHBench/HQHBench 上公开获取。

更新时间: 2024-06-24 20:08:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17115v1

Inception: Efficiently Computable Misinformation Attacks on Markov Games

We study security threats to Markov games due to information asymmetry and misinformation. We consider an attacker player who can spread misinformation about its reward function to influence the robust victim player's behavior. Given a fixed fake reward function, we derive the victim's policy under worst-case rationality and present polynomial-time algorithms to compute the attacker's optimal worst-case policy based on linear programming and backward induction. Then, we provide an efficient inception ("planting an idea in someone's mind") attack algorithm to find the optimal fake reward function within a restricted set of reward functions with dominant strategies. Importantly, our methods exploit the universal assumption of rationality to compute attacks efficiently. Thus, our work exposes a security vulnerability arising from standard game assumptions under misinformation.

Updated: 2024-06-24 20:01:43

标题: 《Inception：基于马尔可夫博弈的高效计算的虚假信息攻击》

摘要: 我们研究了由于信息不对称和错误信息而导致的马尔可夫博弈的安全威胁。我们考虑一个攻击者玩家可以散布关于其奖励函数的错误信息，以影响强大受害者玩家的行为。在给定固定的虚假奖励函数的情况下，我们推导出受害者在最坏情况下的策略，并提出基于线性规划和反向归纳的多项式时间算法来计算攻击者的最优最坏情况策略。然后，我们提供了一种有效的“植入想法”攻击算法，以在具有主导策略的奖励函数受限集合中找到最佳的虚假奖励函数。重要的是，我们的方法利用了理性的普遍假设来高效地计算攻击。因此，我们的工作揭示了由于错误信息下的标准游戏假设而产生的安全漏洞。

更新时间: 2024-06-24 20:01:43

领域: cs.LG,cs.CR,cs.GT

下载: http://arxiv.org/abs/2406.17114v1

Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates

Solidarity is a crucial concept to understand social relations in societies. In this paper, we explore fine-grained solidarity frames to study solidarity towards women and migrants in German parliamentary debates between 1867 and 2022. Using 2,864 manually annotated text snippets (with a cost exceeding 18k Euro), we evaluate large language models (LLMs) like Llama 3, GPT-3.5, and GPT-4. We find that GPT-4 outperforms other LLMs, approaching human annotation quality. Using GPT-4, we automatically annotate more than 18k further instances (with a cost of around 500 Euro) across 155 years and find that solidarity with migrants outweighs anti-solidarity but that frequencies and solidarity types shift over time. Most importantly, group-based notions of (anti-)solidarity fade in favor of compassionate solidarity, focusing on the vulnerability of migrant groups, and exchange-based anti-solidarity, focusing on the lack of (economic) contribution. Our study highlights the interplay of historical events, socio-economic needs, and political ideologies in shaping migration discourse and social cohesion. We also show that powerful LLMs, if carefully prompted, can be cost-effective alternatives to human annotation for hard social scientific tasks.

Updated: 2024-06-24 20:01:19

标题: 155年德国议会辩论中针对妇女和移民团结的细粒度检测

摘要: 团结是理解社会关系的一个关键概念。在这篇论文中，我们探讨了细粒度的团结框架，以研究德国议会辩论中对妇女和移民的团结情况，时间跨度为1867年至2022年。我们使用2,864个手动注释的文本片段（成本超过18,000欧元），评估了大型语言模型（LLMs）如Llama 3、GPT-3.5和GPT-4。我们发现GPT-4优于其他LLMs，在接近人类注释质量的同时。利用GPT-4，我们自动标记了超过18,000个进一步的实例（成本约为500欧元），跨越155年的时间，并发现与移民的团结超过反团结，但随着时间的推移，频率和团结类型发生变化。最重要的是，以团体为基础的（反）团结观念逐渐被富有同情心的团结观念所取代，重点放在移民群体的脆弱性上，以及基于交换的反团结，重点放在缺乏（经济）贡献上。我们的研究突显了历史事件、社会经济需求和政治意识形态在塑造移民话语和社会凝聚力方面的相互作用。我们还展示了，如果仔细提示，强大的LLMs可以成为困难社会科学任务的经济有效替代品。

更新时间: 2024-06-24 20:01:19

领域: cs.CL,cs.LG,cs.SI

下载: http://arxiv.org/abs/2210.04359v2

MambaTab: A Plug-and-Play Model for Learning Tabular Data

Despite the prevalence of images and texts in machine learning, tabular data remains widely used across various domains. Existing deep learning models, such as convolutional neural networks and transformers, perform well however demand extensive preprocessing and tuning limiting accessibility and scalability. This work introduces an innovative approach based on a structured state-space model (SSM), MambaTab, for tabular data. SSMs have strong capabilities for efficiently extracting effective representations from data with long-range dependencies. MambaTab leverages Mamba, an emerging SSM variant, for end-to-end supervised learning on tables. Compared to state-of-the-art baselines, MambaTab delivers superior performance while requiring significantly fewer parameters, as empirically validated on diverse benchmark datasets. MambaTab's efficiency, scalability, generalizability, and predictive gains signify it as a lightweight, "plug-and-play" solution for diverse tabular data with promise for enabling wider practical applications.

Updated: 2024-06-24 19:58:06

标题: MambaTab：一种用于学习表格数据的即插即用模型

摘要: 尽管图像和文本在机器学习中很普遍，但表格数据仍然广泛应用于各个领域。现有的深度学习模型，如卷积神经网络和变压器，在表现上表现良好，但需要大量的预处理和调整，限制了可访问性和可扩展性。本文介绍了一种基于结构化状态空间模型（SSM）的创新方法MambaTab，用于处理表格数据。SSM具有从具有长距离依赖性数据中高效提取有效表示的强大能力。MambaTab利用了Mamba，一种新兴的SSM变体，进行端到端的有监督学习。与最先进的基线相比，MambaTab在各种基准数据集上经验验证时表现出卓越的性能，同时需要的参数数量明显较少。MambaTab的效率、可扩展性、泛化能力和预测性能显示出它是一种轻量级的、"即插即用"的解决方案，适用于多样的表格数据，有望实现更广泛的实际应用。

更新时间: 2024-06-24 19:58:06

领域: cs.LG

下载: http://arxiv.org/abs/2401.08867v2

Integrating Generative AI with Network Digital Twins for Enhanced Network Operations

As telecommunications networks become increasingly complex, the integration of advanced technologies such as network digital twins and generative artificial intelligence (AI) emerges as a pivotal solution to enhance network operations and resilience. This paper explores the synergy between network digital twins, which provide a dynamic virtual representation of physical networks, and generative AI, particularly focusing on Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). We propose a novel architectural framework that incorporates these technologies to significantly improve predictive maintenance, network scenario simulation, and real-time data-driven decision-making. Through extensive simulations, we demonstrate how generative AI can enhance the accuracy and operational efficiency of network digital twins, effectively handling real-world complexities such as unpredictable traffic loads and network failures. The findings suggest that this integration not only boosts the capability of digital twins in scenario forecasting and anomaly detection but also facilitates a more adaptive and intelligent network management system.

Updated: 2024-06-24 19:54:58

标题: 将生成式人工智能与网络数字孪生体集成，以增强网络运营

摘要: 随着电信网络变得越来越复杂，集成先进技术，如网络数字孪生和生成人工智能（AI），成为增强网络运营和韧性的关键解决方案。本文探讨了网络数字孪生和生成AI之间的协同作用，网络数字孪生提供了物理网络的动态虚拟表示，特别关注生成对抗网络（GANs）和变分自动编码器（VAEs）。我们提出了一个新颖的架构框架，结合了这些技术，以显著提高预测性维护、网络场景模拟和实时数据驱动的决策制定。通过大量模拟，我们展示了生成AI如何增强网络数字孪生的准确性和运营效率，有效处理不可预测的流量负载和网络故障等现实世界的复杂性。研究结果表明，这种集成不仅增强了数字孪生在场景预测和异常检测方面的能力，还促进了更具适应性和智能的网络管理系统。

更新时间: 2024-06-24 19:54:58

领域: cs.LG,cs.GR,cs.NI

下载: http://arxiv.org/abs/2406.17112v1

Sphere Neural-Networks for Rational Reasoning

The success of Large Language Models (LLMs), e.g., ChatGPT, is witnessed by their planetary popularity, their capability of human-like communication, and also by their steadily improved reasoning performance. However, it remains unclear whether LLMs reason. It is an open problem how traditional neural networks can be qualitatively extended to go beyond the statistic paradigm and achieve high-level cognition. Here, we present a novel qualitative extension by generalising computational building blocks from vectors to spheres. We propose Sphere Neural Networks (SphNNs) for human-like reasoning through model construction and inspection, and develop SphNN for syllogistic reasoning, a microcosm of human rationality. SphNN is a hierarchical neuro-symbolic Kolmogorov-Arnold geometric GNN, and uses a neuro-symbolic transition map of neighbourhood spatial relations to transform the current sphere configuration towards the target. SphNN is the first neural model that can determine the validity of long-chained syllogistic reasoning in one epoch without training data, with the worst computational complexity of O(N). SphNN can evolve into various types of reasoning, such as spatio-temporal reasoning, logical reasoning with negation and disjunction, event reasoning, neuro-symbolic unification, and humour understanding (the highest level of cognition). All these suggest a new kind of Herbert A. Simon's scissors with two neural blades. SphNNs will tremendously enhance interdisciplinary collaborations to develop the two neural blades and realise deterministic neural reasoning and human-bounded rationality and elevate LLMs to reliable psychological AI. This work suggests that the non-zero radii of spheres are the missing components that prevent traditional deep-learning systems from reaching the realm of rational reasoning and cause LLMs to be trapped in the swamp of hallucination.

Updated: 2024-06-24 19:45:42

标题: 球形神经网络用于理性推理

摘要: 大型语言模型（LLMs）的成功，例如ChatGPT，可以通过它们的广泛受欢迎、类似人类交流的能力以及持续改进的推理性能来证明。然而，LLMs是否有推理能力仍不清楚。传统神经网络如何在质量上扩展以超越统计范式并实现高层认知是一个开放问题。在这里，我们通过将计算构建块从向量推广到球体，提出了一种新颖的质量扩展。我们提出了用于人类推理的Sphere Neural Networks（SphNNs），通过模型构建和检验，开发了SphNN用于三段论推理，这是人类理性的微观。SphNN是一个分层的神经符号Kolmogorov-Arnold几何GNN，并使用邻域空间关系的神经符号过渡映射将当前球体配置转换为目标。SphNN是第一个可以在一个时代内确定长链三段论推理的有效性而无需训练数据的神经模型，最坏的计算复杂度为O（N）。SphNN可以演化成各种类型的推理，如时空推理、带否定和析取的逻辑推理、事件推理、神经符号统一和幽默理解（认知的最高水平）。所有这些都暗示了一个具有两个神经刀片的新型赫伯特A.西蒙的剪刀。SphNN将极大地增强跨学科合作，开发这两个神经刀片，并实现确定性神经推理和人类有限理性，并将LLMs提升至可靠的心理AI。这项工作表明，球体的非零半径是阻止传统深度学习系统达到理性推理领域并导致LLMs陷入幻觉泥潭的缺失组件。

更新时间: 2024-06-24 19:45:42

领域: cs.AI

下载: http://arxiv.org/abs/2403.15297v3

Enabling Accelerators for Graph Computing

The advent of Graph Neural Networks (GNNs) has revolutionized the field of machine learning, offering a novel paradigm for learning on graph-structured data. Unlike traditional neural networks, GNNs are capable of capturing complex relationships and dependencies inherent in graph data, making them particularly suited for a wide range of applications including social network analysis, molecular chemistry, and network security. GNNs, with their unique structure and operation, present new computational challenges compared to conventional neural networks. This requires comprehensive benchmarking and a thorough characterization of GNNs to obtain insight into their computational requirements and to identify potential performance bottlenecks. In this thesis, we aim to develop a better understanding of how GNNs interact with the underlying hardware and will leverage this knowledge as we design specialized accelerators and develop new optimizations, leading to more efficient and faster GNN computations. A pivotal component within GNNs is the Sparse General Matrix-Matrix Multiplication (SpGEMM) kernel, known for its computational intensity and irregular memory access patterns. In this thesis, we address the challenges posed by SpGEMM by implementing a highly optimized hashing-based SpGEMM kernel tailored for a custom accelerator. Synthesizing these insights and optimizations, we design state-of-the-art hardware accelerators capable of efficiently handling various GNN workloads. Our accelerator architectures are built on our characterization of GNN computational demands, providing clear motivation for our approaches. This exploration into novel models underlines our comprehensive approach, as we strive to enable accelerators that are not just performant, but also versatile, able to adapt to the evolving landscape of graph computing.

Updated: 2024-06-24 19:43:21

标题: 实现用于图计算的加速器

摘要: 图神经网络（GNNs）的出现彻底改变了机器学习领域，为图结构化数据的学习提供了一种全新的范式。与传统神经网络不同，GNNs能够捕捉图数据中固有的复杂关系和依赖关系，使它们特别适用于包括社交网络分析、分子化学和网络安全在内的广泛应用。由于其独特的结构和运作方式，GNNs相对于传统神经网络提出了新的计算挑战。这需要全面的基准测试和对GNNs的彻底表征，以揭示其计算需求并识别潜在的性能瓶颈。在本文中，我们旨在更好地理解GNNs与底层硬件的交互，并将利用这一知识设计专门的加速器并开发新的优化方法，从而实现更高效和更快速的GNN计算。GNNs中的一个关键组件是稀疏一般矩阵乘法（SpGEMM）核，以其计算强度和不规则的内存访问模式而闻名。在本文中，我们通过实现一个高度优化的基于哈希的SpGEMM核来解决SpGEMM所带来的挑战，专门为自定义加速器量身定制。综合这些见解和优化，我们设计了最先进的硬件加速器，能够高效处理各种GNN工作负载。我们的加速器架构建立在我们对GNN计算需求的表征基础上，为我们的方法提供明确的动机。这种对新型模型的探索突显了我们的全面方法，因为我们努力实现的不仅是高性能的加速器，还应具备多样性，能够适应图计算领域不断发展的格局。

更新时间: 2024-06-24 19:43:21

领域: cs.AR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2312.10561v3

Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment

We describe a new method for estimating the direction of sound in a reverberant environment from basic principles of sound propagation. The method utilizes SNR-adaptive features from time-delay and energy of the directional components after acoustic wave decomposition of the observed sound field to estimate the line-of-sight direction under noisy and reverberant conditions. The effectiveness of the approach is established with real-data of different microphone array configurations under various usage scenarios.

Updated: 2024-06-24 19:42:22

标题: 在一个混响嘈杂环境中声音方向的最大似然估计

摘要: 我们描述了一种新的方法，用于从声音传播的基本原理中估计混响环境中声音的方向。该方法利用了观测声场经声波分解后的方向性分量的时延和能量的信噪比自适应特征，以在嘈杂和混响条件下估计视线方向。该方法的有效性已通过不同麦克风阵列配置在各种使用场景下的真实数据得到验证。

更新时间: 2024-06-24 19:42:22

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.17103v1

Achieving Fairness Across Local and Global Models in Federated Learning

Achieving fairness across diverse clients in Federated Learning (FL) remains a significant challenge due to the heterogeneity of the data and the inaccessibility of sensitive attributes from clients' private datasets. This study addresses this issue by introducing \texttt{EquiFL}, a novel approach designed to enhance both local and global fairness in federated learning environments. \texttt{EquiFL} incorporates a fairness term into the local optimization objective, effectively balancing local performance and fairness. The proposed coordination mechanism also prevents bias from propagating across clients during the collaboration phase. Through extensive experiments across multiple benchmarks, we demonstrate that \texttt{EquiFL} not only strikes a better balance between accuracy and fairness locally at each client but also achieves global fairness. The results also indicate that \texttt{EquiFL} ensures uniform performance distribution among clients, thus contributing to performance fairness. Furthermore, we showcase the benefits of \texttt{EquiFL} in a real-world distributed dataset from a healthcare application, specifically in predicting the effects of treatments on patients across various hospital locations.

Updated: 2024-06-24 19:42:16

标题: 在联邦学习中实现本地和全局模型的公平性

摘要: 在联邦学习（FL）中实现跨多元客户的公平性仍然是一个重大挑战，这是由于数据的异质性和来自客户私有数据集的敏感属性的不可访问性所致。本研究通过引入EquiFL，一种旨在增强联邦学习环境中本地和全局公平性的新方法，解决了这一问题。EquiFL将公平性项纳入本地优化目标中，有效平衡了本地性能和公平性。所提出的协调机制还防止了在协作阶段客户之间传播偏见。通过在多个基准测试中进行广泛实验，我们展示了EquiFL不仅在每个客户端在准确性和公平性之间取得更好的平衡，而且实现了全局公平性。结果还表明，EquiFL确保了客户之间的性能分布均匀，从而有助于性能公平性。此外，我们展示了EquiFL在来自医疗应用的真实世界分布式数据集中的好处，特别是在预测不同医院位置的患者治疗效果方面。

更新时间: 2024-06-24 19:42:16

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2406.17102v1

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning that involve reaching goals, allowing one to estimate the transit time between two states. However, prior attempts to define such temporal distances in stochastic settings have been stymied by an important limitation: these prior approaches do not satisfy the triangle inequality. This is not merely a definitional concern, but translates to an inability to generalize and find shortest paths. In this paper, we build on prior work in contrastive learning and quasimetrics to show how successor features learned by contrastive learning (after a change of variables) form a temporal distance that does satisfy the triangle inequality, even in stochastic settings. Importantly, this temporal distance is computationally efficient to estimate, even in high-dimensional and stochastic settings. Experiments in controlled settings and benchmark suites demonstrate that an RL algorithm based on these new temporal distances exhibits combinatorial generalization (i.e., "stitching") and can sometimes learn more quickly than prior methods, including those based on quasimetrics.

Updated: 2024-06-24 19:36:45

标题: 学习时间距离：对比继承特征可以为决策提供度量结构

摘要: 时间距离是许多涉及到到达目标的规划、控制和强化学习算法的核心，它们允许我们估计两个状态之间的传输时间。然而，在随机环境中定义这种时间距离的先前尝试受到一个重要限制的阻碍：这些先前方法不满足三角不等式。这不仅仅是一个定义上的问题，而且导致了无法泛化和找到最短路径。在本文中，我们借鉴对比学习和拟度量的先前工作，展示了通过对比学习学习的后继特征（在变量变换后）形成了一个满足三角不等式的时间距离，即使在随机环境中也成立。重要的是，这种时间距离在高维和随机环境中也具有计算效率。在受控环境和基准套件中的实验表明，基于这些新的时间距离的强化学习算法表现出组合泛化（即“拼接”），有时比基于拟度量的先前方法学习更快。

更新时间: 2024-06-24 19:36:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17098v1

Model-Free Robust Reinforcement Learning with Sample Complexity Analysis

Distributionally Robust Reinforcement Learning (DR-RL) aims to derive a policy optimizing the worst-case performance within a predefined uncertainty set. Despite extensive research, previous DR-RL algorithms have predominantly favored model-based approaches, with limited availability of model-free methods offering convergence guarantees or sample complexities. This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo (MLMC) technique to close such a gap. Our innovative approach integrates a threshold mechanism that ensures finite sample requirements for algorithmic implementation, a significant improvement than previous model-free algorithms. We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provide finite sample analyses under all three cases. Remarkably, our algorithms represent the first model-free DR-RL approach featuring finite sample complexity for total variation and Chi-square divergence uncertainty sets, while also offering an improved sample complexity and broader applicability compared to existing model-free DR-RL algorithms for the KL divergence model. The complexities of our method establish the tightest results for all three uncertainty models in model-free DR-RL, underscoring the effectiveness and efficiency of our algorithm, and highlighting its potential for practical applications.

Updated: 2024-06-24 19:35:26

标题: 无模型的强化学习与样本复杂性分析

摘要: 分布鲁棒强化学习（DR-RL）旨在推导出在预定义不确定性集合内优化最坏情况性能的策略。尽管进行了大量研究，先前的DR-RL算法主要偏向于基于模型的方法，模型无方法的可用性有限，无法提供收敛保证或样本复杂度。本文提出了一种利用多级蒙特卡罗（MLMC）技术来弥补这一差距的无模型DR-RL算法。我们的创新方法整合了一个阈值机制，确保了算法实现的有限样本要求，这比先前的无模型算法有了显著的改进。我们为由总变差、卡方散度和KL散度定义的不确定性集合开发了算法，并在所有三种情况下提供了有限样本分析。值得注意的是，我们的算法代表了第一个针对总变差和卡方散度不确定性集合具有有限样本复杂度的无模型DR-RL方法，同时与现有的KL散度模型的无模型DR-RL算法相比，也提供了更好的样本复杂度和更广泛的适用性。我们的方法的复杂性确定了所有三种不确定性模型在无模型DR-RL中的最紧密结果，突显了我们算法的有效性和效率，并突显了其在实际应用中的潜力。

更新时间: 2024-06-24 19:35:26

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.17096v1

ammBoost: State Growth Control for AMMs

Automated market makers (AMMs) are a form of decentralized cryptocurrency exchanges and considered a prime example of Decentralized Finance (DeFi) applications. Their popularity and high trading activity have resulted in millions of on-chain transactions leading to serious scalability issues. In this paper, we address the on-chain storage overhead problem of AMMs by utilizing a new sidechain architecture as a layer 2 solution, building a system called ammBoost. Our system reduces the amount of on-chain transactions, boosts throughput, and supports blockchain pruning. We devise several techniques to enable layer 2 processing for AMMs while preserving correctness and security of the underlying AMM. We also build a proof-of-concept of ammBoost for a Uniswap-inspired use case to empirically evaluate its performance. Our experiments show that ammBoost decreases the gas cost by 94.53% and the chain growth by at least 80%, and that it can support up to 500x of the daily traffic volume observed for Uniswap in practice.

Updated: 2024-06-24 19:34:05

标题: ammBoost：AMM的状态增长控制

摘要: 自动做市商（AMM）是一种分散式加密货币交易所，被认为是去中心化金融（DeFi）应用的典范。它们的流行和高交易活动导致了数百万笔链上交易，引发了严重的可扩展性问题。本文通过利用一种新的侧链架构作为第二层解决方案，构建了一个名为ammBoost的系统，解决了AMM的链上存储开销问题。我们的系统减少了链上交易数量，提高了吞吐量，并支持区块链修剪。我们设计了几种技术来实现AMM的第二层处理，同时保持底层AMM的正确性和安全性。我们还为受Uniswap启发的用例构建了ammBoost的概念验证，以经验性评估其性能。我们的实验表明，ammBoost将燃气成本降低了94.53％，链增长至少减少了80％，并且可以支持实际观察到的Uniswap每日流量的500倍。

更新时间: 2024-06-24 19:34:05

领域: cs.CR

下载: http://arxiv.org/abs/2406.17094v1

BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space. Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations. Experiments show BEEAR reduces the success rate of RLHF time backdoor attacks from >95% to <1% and from 47% to 0% for instruction-tuning time backdoors targeting malicious code generation, without compromising model utility. Requiring only defender-defined safe and unwanted behaviors, BEEAR represents a step towards practical defenses against safety backdoors in LLMs, providing a foundation for further advancements in AI safety and security.

Updated: 2024-06-24 19:29:47

标题: BEEAR：基于嵌入的对指令调整语言模型中安全后门的对抗去除

摘要: 在大型语言模型（LLMs）中，安全后门攻击可以在规避正常交互检测的情况下悄悄触发不安全行为。在令牌空间中潜在触发器的高维度和恶意行为的多样性使得这成为一个关键挑战。我们提出了BEEAR，一种减轻方法，利用了后门触发器在模型嵌入空间中引起相对均匀漂移的洞察力。我们的双层优化方法确定了引起不良行为的通用嵌入扰动，并调整模型参数以加强对这些扰动的安全行为。实验表明，BEEAR将RLHF时间后门攻击的成功率从>95%降低到<1%，将针对恶意代码生成的指令调整时间后门攻击的成功率从47%降低到0%，而不影响模型效用。BEEAR只需需要防御者定义的安全和不良行为，代表了对LLMs中安全后门的实际防御措施的一步，为AI安全和安全领域的进一步发展奠定了基础。

更新时间: 2024-06-24 19:29:47

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.17092v1

Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model

In the evolving field of Explainable AI (XAI), interpreting the decisions of deep neural networks (DNNs) in computer vision tasks is an important process. While pixel-based XAI methods focus on identifying significant pixels, existing concept-based XAI methods use pre-defined or human-annotated concepts. The recently proposed Segment Anything Model (SAM) achieved a significant step forward to prepare automatic concept sets via comprehensive instance segmentation. Building upon this, the Explain Any Concept (EAC) model emerged as a flexible method for explaining DNN decisions. EAC model is based on using a surrogate model which has one trainable linear layer to simulate the target model. In this paper, by introducing an additional nonlinear layer to the original surrogate model, we show that we can improve the performance of the EAC model. We compare our proposed approach to the original EAC model and report improvements obtained on both ImageNet and MS COCO datasets.

Updated: 2024-06-24 19:28:08

标题: 通过向可训练的替代模型引入非线性改进解释任何概念

摘要: 在可解释人工智能（XAI）领域，解释深度神经网络（DNNs）在计算机视觉任务中的决策是一个重要的过程。虽然基于像素的XAI方法关注识别显著像素，现有的基于概念的XAI方法使用预定义或人工注释的概念。最近提出的Segment Anything Model（SAM）通过全面实例分割取得了重要进展，以准备自动概念集。在此基础上，Explain Any Concept（EAC）模型出现为一种灵活的解释DNN决策的方法。EAC模型基于使用一个可训练线性层来模拟目标模型的替代模型。本文通过在原始替代模型中引入额外的非线性层，展示了我们可以改善EAC模型的性能。我们将我们提出的方法与原始EAC模型进行比较，并报告在ImageNet和MS COCO数据集上获得的改进。

更新时间: 2024-06-24 19:28:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.11837v2

Exploring Biomarker Relationships in Both Type 1 and Type 2 Diabetes Mellitus Through a Bayesian Network Analysis Approach

Understanding the complex relationships of biomarkers in diabetes is pivotal for advancing treatment strategies, a pressing need in diabetes research. This study applies Bayesian network structure learning to analyze the Shanghai Type 1 and Type 2 diabetes mellitus datasets, revealing complex relationships among key diabetes-related biomarkers. The constructed Bayesian network presented notable predictive accuracy, particularly for Type 2 diabetes mellitus, with root mean squared error (RMSE) of 18.23 mg/dL, as validated through leave-one-domain experiments and Clarke error grid analysis. This study not only elucidates the intricate dynamics of diabetes through a deeper understanding of biomarker interplay but also underscores the significant potential of integrating data-driven and knowledge-driven methodologies in the realm of personalized diabetes management. Such an approach paves the way for more custom and effective treatment strategies, marking a notable advancement in the field.

Updated: 2024-06-24 19:27:34

标题: 通过贝叶斯网络分析方法探索1型和2型糖尿病的生物标志物关系

摘要: 理解糖尿病生物标志物之间复杂关系对于推进治疗策略至关重要，这是糖尿病研究中迫切需要的。本研究运用贝叶斯网络结构学习分析上海1型和2型糖尿病数据集，揭示了关键糖尿病相关生物标志物之间复杂的关系。构建的贝叶斯网络表现出显著的预测准确性，特别是对于2型糖尿病，其根均方误差为18.23 mg/dL，通过留一领域实验和Clarke误差网格分析验证。本研究不仅通过更深入了解生物标志物相互作用来阐明糖尿病的复杂动态，而且强调在个性化糖尿病管理领域整合数据驱动和知识驱动方法的重要潜力。这种方法为更加定制和有效的治疗策略铺平了道路，标志着该领域的显著进步。

更新时间: 2024-06-24 19:27:34

领域: q-bio.QM,cs.AI,cs.CE,cs.LG

下载: http://arxiv.org/abs/2406.17090v1

Lomas: A Platform for Confidential Analysis of Private Data

Public services collect massive volumes of data to fulfill their missions. These data fuel the generation of regional, national, and international statistics across various sectors. However, their immense potential remains largely untapped due to strict and legitimate privacy regulations. In this context, Lomas is a novel open-source platform designed to realize the full potential of the data held by public administrations. It enables authorized users, such as approved researchers and government analysts, to execute algorithms on confidential datasets without directly accessing the data. The Lomas platform is designed to operate within a trusted computing environment, such as governmental IT infrastructure. Authorized users access the platform remotely to submit their algorithms for execution on private datasets. Lomas executes these algorithms without revealing the data to the user and returns the results protected by Differential Privacy, a framework that introduces controlled noise to the results, rendering any attempt to extract identifiable information unreliable. Differential Privacy allows for the mathematical quantification and control of the risk of disclosure while allowing for a complete transparency regarding how data is protected and utilized. The contributions of this project will significantly transform how data held by public services are used, unlocking valuable insights from previously inaccessible data. Lomas empowers research, informing policy development, e.g., public health interventions, and driving innovation across sectors, all while upholding the highest data confidentiality standards.

Updated: 2024-06-24 19:16:58

标题: 洛马斯：一种用于私人数据机密分析的平台

摘要: 公共服务机构收集海量数据以履行其使命。这些数据推动了各个领域的区域、国家和国际统计数据的生成。然而，由于严格和合法的隐私法规，这些数据的巨大潜力仍然大部分未被利用。在这种情况下，Lomas是一款新颖的开源平台，旨在实现公共管理部门所持有数据的全部潜力。它使得授权用户，如经批准的研究人员和政府分析师，能够在机密数据集上执行算法，而无需直接访问数据。Lomas平台旨在在受信任的计算环境中运行，比如政府IT基础设施。授权用户通过远程访问平台，提交他们的算法以在私人数据集上执行。Lomas执行这些算法而不向用户透露数据，并通过差分隐私保护返回结果，差分隐私是一种框架，通过向结果引入受控噪声，使得任何试图提取可识别信息的尝试都变得不可靠。差分隐私允许对泄露风险进行数学量化和控制，同时允许完全透明地了解数据如何受到保护和利用。该项目的贡献将显著改变公共服务机构所持有数据的使用方式，从先前无法访问的数据中解锁有价值的见解。Lomas赋予研究力量，促进政策制定，例如公共卫生干预，并在各个领域推动创新，同时维护最高的数据保密标准。

更新时间: 2024-06-24 19:16:58

领域: cs.CR

下载: http://arxiv.org/abs/2406.17087v1

BrainMAE: A Region-aware Self-supervised Learning Framework for Brain Signals

The human brain is a complex, dynamic network, which is commonly studied using functional magnetic resonance imaging (fMRI) and modeled as network of Regions of interest (ROIs) for understanding various brain functions. Recent studies utilize deep learning approaches to learn the brain network representation based on functional connectivity (FC) profile, broadly falling into two main categories. The Fixed-FC approaches, utilizing the FC profile which represents the linear temporal relation within the brain network, are limited by failing to capture informative brain temporal dynamics. On the other hand, the Dynamic-FC approaches, modeling the evolving FC profile over time, often exhibit less satisfactory performance due to challenges in handling the inherent noisy nature of fMRI data. To address these challenges, we propose Brain Masked Auto-Encoder (BrainMAE) for learning representations directly from fMRI time-series data. Our approach incorporates two essential components: a region-aware graph attention mechanism designed to capture the relationships between different brain ROIs, and a novel self-supervised masked autoencoding framework for effective model pre-training. These components enable the model to capture rich temporal dynamics of brain activity while maintaining resilience to inherent noise in fMRI data. Our experiments demonstrate that BrainMAE consistently outperforms established baseline methods by significant margins in four distinct downstream tasks. Finally, leveraging the model's inherent interpretability, our analysis of model-generated representations reveals findings that resonate with ongoing research in the field of neuroscience.

Updated: 2024-06-24 19:16:24

标题: BrainMAE：一种针对脑信号的区域感知自监督学习框架

摘要: 人类大脑是一个复杂、动态的网络，通常通过功能性磁共振成像（fMRI）来研究，并建模为感兴趣区域（ROIs）的网络，以了解各种大脑功能。最近的研究利用深度学习方法来学习基于功能连接性（FC）剖面的大脑网络表示，大致分为两个主要类别。固定FC方法利用表示大脑网络内线性时间关系的FC剖面，但由于无法捕捉信息丰富的大脑时间动态而受到限制。另一方面，动态FC方法对随时间演变的FC剖面进行建模，通常由于处理fMRI数据固有噪声的挑战而表现出较差的性能。为解决这些挑战，我们提出了Brain Masked Auto-Encoder（BrainMAE）来直接从fMRI时间序列数据中学习表示。我们的方法包括两个关键组件：一个设计用于捕捉不同大脑ROIs之间关系的区域感知图注意机制，以及一个新颖的自监督掩码自编码框架，用于有效的模型预训练。这些组件使模型能够捕捉大脑活动丰富的时间动态，同时保持对fMRI数据固有噪声的韧性。我们的实验表明，BrainMAE在四个不同的下游任务中始终表现出比已建立的基线方法更好的性能。最后，利用模型固有的可解释性，我们对模型生成的表示进行的分析揭示了与神经科学领域正在进行的研究相一致的发现。

更新时间: 2024-06-24 19:16:24

领域: q-bio.QM,cs.LG,q-bio.NC,92-08 (Primary) 68T07, 68T05 (Secondary),J.3; I.5.4

下载: http://arxiv.org/abs/2406.17086v1

Hunting for Polluted White Dwarfs and Other Treasures with Gaia XP Spectra and Unsupervised Machine Learning

White dwarfs (WDs) polluted by exoplanetary material provide the unprecedented opportunity to directly observe the interiors of exoplanets. However, spectroscopic surveys are often limited by brightness constraints, and WDs tend to be very faint, making detections of large populations of polluted WDs difficult. In this paper, we aim to increase considerably the number of WDs with multiple metals in their atmospheres. Using 96,134 WDs with Gaia DR3 BP/RP (XP) spectra, we constructed a 2D map using an unsupervised machine learning technique called Uniform Manifold Approximation and Projection (UMAP) to organize the WDs into identifiable spectral regions. The polluted WDs are among the distinct spectral groups identified in our map. We have shown that this selection method could potentially increase the number of known WDs with 5 or more metal species in their atmospheres by an order of magnitude. Such systems are essential for characterizing exoplanet diversity and geology.

Updated: 2024-06-24 19:11:57

标题: 使用Gaia XP光谱和无监督机器学习寻找受污染的白矮星和其他宝藏

摘要: 白矮星（WDs）受到外行星物质污染提供了直接观察外行星内部的前所未有的机会。然而，光谱调查通常受到亮度限制，白矮星往往非常暗淡，使得检测到大量受污染的白矮星困难重重。在本文中，我们旨在大幅增加带有多种金属的白矮星的数量。利用Gaia DR3 BP/RP（XP）光谱的96,134颗白矮星，我们使用一种无监督的机器学习技术称为均匀流形逼近和投影（UMAP）构建了一个2D地图，将白矮星组织成可识别的光谱区域。受污染的白矮星是我们地图中确定的不同光谱组中的一部分。我们已经表明，这种选择方法有可能将已知的带有5种或更多金属物种的白矮星的数量增加一个数量级。这种系统对于表征外行星多样性和地质学至关重要。

更新时间: 2024-06-24 19:11:57

领域: astro-ph.SR,astro-ph.EP,cs.LG

下载: http://arxiv.org/abs/2405.17667v2

Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we show that the jail-broken effect can be mitigated by separating states in the finetuning stage to optimize the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards consensus could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa's convergence. Empirically, our results on four downstream finetuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM's accuracy on the user tasks. Code is available at \url{https://github.com/git-disl/Lisa}.

Updated: 2024-06-24 18:59:50

标题: 大型语言模型的懒惰安全对齐：抵御有害微调

摘要: 最近的研究表明，具有安全对齐功能的大型语言模型（LLMs）可以通过在混合有害数据的数据集上进行微调而被破解。首次在文献中，我们展示了在微调阶段分离状态以优化对齐和用户数据集可以减轻破解效果。不幸的是，我们随后的研究表明，当在对齐状态中投入的步骤过小时，这种简单的双状态优化（BSO）解决方案会经历收敛不稳定，导致对齐性能下降。通过统计分析，我们表明朝着共识方向的“过度漂移”可能是不稳定性的一个可能原因。为了解决这个问题，我们提出了“懒安全对齐”（Lisa），引入一个接近项来约束每个状态的漂移。从理论上讲，接近项的好处得到了收敛性分析的支持，我们表明一个足够大的接近因子是必要的，以确保Lisa的收敛性。从经验上看，我们在四个下游微调任务上的结果表明，Lisa配合接近项可以显著提高对齐性能，同时保持LLM在用户任务上的准确性。代码可在https://github.com/git-disl/Lisa找到。

更新时间: 2024-06-24 18:59:50

领域: cs.LG

下载: http://arxiv.org/abs/2405.18641v3

Meta-GCN: A Dynamically Weighted Loss Minimization Method for Dealing with the Data Imbalance in Graph Neural Networks

Although many real-world applications, such as disease prediction, and fault detection suffer from class imbalance, most existing graph-based classification methods ignore the skewness of the distribution of classes; therefore, tend to be biased towards the majority class(es). Conventional methods typically tackle this problem through the assignment of weights to each one of the class samples based on a function of their loss, which can lead to over-fitting on outliers. In this paper, we propose a meta-learning algorithm, named Meta-GCN, for adaptively learning the example weights by simultaneously minimizing the unbiased meta-data set loss and optimizing the model weights through the use of a small unbiased meta-data set. Through experiments, we have shown that Meta-GCN outperforms state-of-the-art frameworks and other baselines in terms of accuracy, the area under the receiver operating characteristic (AUC-ROC) curve, and macro F1-Score for classification tasks on two different datasets.

Updated: 2024-06-24 18:59:24

标题: Meta-GCN：一种动态加权损失最小化方法，用于处理图神经网络中的数据不平衡问题

摘要: 尽管许多现实世界的应用，如疾病预测和故障检测受到类别不平衡的影响，但大多数现有基于图的分类方法忽视了类别分布的偏斜性；因此，往往偏向于多数类。传统方法通常通过对每个类别样本分配权重来解决这个问题，这些权重是基于它们的损失函数计算而来，这可能导致对异常值的过拟合。在本文中，我们提出了一种元学习算法，名为Meta-GCN，通过同时最小化无偏元数据集的损失和通过使用一个小的无偏元数据集优化模型权重来自适应地学习示例权重。通过实验证明，Meta-GCN在两个不同数据集上的分类任务中在准确性、接收器工作特性曲线下面积（AUC-ROC）和宏F1-Score方面优于现有的框架和其他基准。

更新时间: 2024-06-24 18:59:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17073v1

Aligner: Efficient Alignment by Learning to Correct

With the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective alignment method has never been more critical. However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate under these constraints. In this paper, we introduce Aligner, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model. Designed as a model-agnostic, plug-and-play module, Aligner can be directly applied to various open-source and API-based models with only one-off training, making it suitable for rapid iteration. Notably, Aligner can be applied to any powerful, large-scale upstream models. Moreover, it can even iteratively bootstrap the upstream models using corrected responses as synthetic human preference data, breaking through the model's performance ceiling. Our experiments demonstrate performance improvements by deploying the same Aligner model across 11 different LLMs, evaluated on the 3H dimensions (helpfulness, harmlessness, and honesty). Specifically, Aligner-7B has achieved an average improvement of 68.9% in helpfulness and 23.8% in harmlessness across the tested LLMs while also effectively reducing hallucination. In the Alpaca-Eval leaderboard, stacking Aligner-2B on GPT-4 Turbo improved its LC Win Rate from 55.0% to 58.3%, surpassing GPT-4 Omni's 57.5% Win Rate (community report).

Updated: 2024-06-24 18:55:16

标题: Aligner: 通过学习纠正实现高效对齐

摘要: 随着大型语言模型（LLMs）的快速发展和不断变化的实际需求，找到一种高效且有效的对齐方法变得更加关键。然而，当前对齐方法的复杂性与部署场景中快速迭代的需求之间的紧张关系需要开发一种与模型无关的对齐方法，可以在这些约束条件下运行。本文介绍了Aligner，这是一种新颖且简单的对齐范式，它使用一个小模型学习首选和次优答案之间的校正残差。作为一个与模型无关的即插即用模块，Aligner可以直接应用于各种开源和基于API的模型，只需进行一次训练，适用于快速迭代。值得注意的是，Aligner可以应用于任何强大的大规模上游模型。此外，它甚至可以通过使用校正响应作为合成的人类偏好数据，从而突破模型的性能上限。我们的实验表明，通过在11个不同的LLMs上部署相同的Aligner模型，可以在3H维度（帮助性、无害性和诚实性）上实现性能改进。具体而言，Aligner-7B在所测试的LLMs上帮助性平均提高了68.9%，无害性提高了23.8%，同时有效减少了幻觉。在Alpaca-Eval排行榜上，将Aligner-2B堆叠在GPT-4 Turbo上，其LC Win Rate从55.0%提高到58.3%，超过了GPT-4 Omni的57.5% Win Rate（社区报告）。

更新时间: 2024-06-24 18:55:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.02416v4

Tolerance of Reinforcement Learning Controllers against Deviations in Cyber Physical Systems

Cyber-physical systems (CPS) with reinforcement learning (RL)-based controllers are increasingly being deployed in complex physical environments such as autonomous vehicles, the Internet-of-Things(IoT), and smart cities. An important property of a CPS is tolerance; i.e., its ability to function safely under possible disturbances and uncertainties in the actual operation. In this paper, we introduce a new, expressive notion of tolerance that describes how well a controller is capable of satisfying a desired system requirement, specified using Signal Temporal Logic (STL), under possible deviations in the system. Based on this definition, we propose a novel analysis problem, called the tolerance falsification problem, which involves finding small deviations that result in a violation of the given requirement. We present a novel, two-layer simulation-based analysis framework and a novel search heuristic for finding small tolerance violations. To evaluate our approach, we construct a set of benchmark problems where system parameters can be configured to represent different types of uncertainties and disturbancesin the system. Our evaluation shows that our falsification approach and heuristic can effectively find small tolerance violations.

Updated: 2024-06-24 18:33:45

标题: Reinforcement Learning控制器对于网络物理系统中偏差的容忍性

摘要: 具有基于强化学习（RL）控制器的网络物理系统（CPS）越来越多地部署在复杂的物理环境中，例如自动驾驶车辆、物联网（IoT）和智能城市。 CPS的一个重要特性是容忍度；即其在实际运行中能够安全地运行在可能的干扰和不确定性下。在本文中，我们引入了一个新的、富有表现力的容忍度概念，描述了控制器如何能够满足使用信号时间逻辑（STL）指定的期望系统要求，在系统可能发生偏差的情况下。基于这个定义，我们提出了一个新颖的分析问题，称为容忍度验证问题，涉及寻找导致给定要求违反的小偏差。我们提出了一个新颖的、基于模拟的两层分析框架和一种新颖的搜索启发式方法，用于找到小的容忍度违规。为了评估我们的方法，我们构建了一组基准问题，其中系统参数可以配置为代表系统中不同类型的不确定性和干扰。我们的评估结果表明，我们的验证方法和启发式方法可以有效地找到小的容忍度违规。

更新时间: 2024-06-24 18:33:45

领域: eess.SY,cs.AI,cs.LO,cs.RO,cs.SY

下载: http://arxiv.org/abs/2406.17066v1

Perceiver-based CDF Modeling for Time Series Forecasting

Transformers have demonstrated remarkable efficacy in forecasting time series data. However, their extensive dependence on self-attention mechanisms demands significant computational resources, thereby limiting their practical applicability across diverse tasks, especially in multimodal problems. In this work, we propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data. Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction. By leveraging the perceiver, our model efficiently transforms high-dimensional and multimodal data into a compact latent space, thereby significantly reducing computational demands. Subsequently, we implement a copula-based attention mechanism to construct the joint distribution of missing data for prediction. Further, we propose an output variance testing mechanism to effectively mitigate error propagation during prediction. To enhance efficiency and reduce complexity, we introduce midpoint inference for the local attention mechanism. This enables the model to efficiently capture dependencies within nearby imputed samples without considering all previous samples. The experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods while utilizing less than half of the computational resources.

Updated: 2024-06-24 18:31:38

标题: 基于感知者的时间序列预测的CDF建模

摘要: Transformers在预测时间序列数据方面表现出了显著的有效性。然而，它们对自注意机制的广泛依赖需要大量的计算资源，从而限制了它们在各种任务中的实际适用性，特别是在多模态问题中。在这项工作中，我们提出了一种新的架构，称为perceiver-CDF，用于建模时间序列数据的累积分布函数（CDF）。我们的方法将perceiver架构与基于copula的注意机制相结合，专门用于多模态时间序列预测。通过利用perceiver，我们的模型有效地将高维和多模态数据转换为紧凑的潜在空间，从而显著降低了计算需求。随后，我们实现了一个基于copula的注意机制，用于构建缺失数据的联合分布以进行预测。此外，我们提出了一个输出方差测试机制，以有效减轻预测过程中的错误传播。为了提高效率和减少复杂性，我们引入了局部注意机制的中点推理。这使得模型能够有效地捕捉附近插补样本之间的依赖关系，而无需考虑所有先前的样本。对一模态和多模态基准的实验一致表明，我们的方法在利用不到一半的计算资源的情况下，相对于现有技术方法实现了20%的改进。

更新时间: 2024-06-24 18:31:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.01720v2

Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction

This paper addresses task planning problems for language-instructed robot teams. Tasks are expressed in natural language (NL), requiring the robots to apply their capabilities at various locations and semantic objects. Several recent works have addressed similar planning problems by leveraging pre-trained Large Language Models (LLMs) to design effective multi-robot plans. However, these approaches lack mission completion guarantees. To address this challenge, we introduce a new decentralized LLM-based planner, called S-ATLAS for Safe plAnning for Teams of Language-instructed AgentS, that is capable of achieving user-defined mission success rates. This is accomplished by leveraging conformal prediction (CP), a distribution-free uncertainty quantification tool in black-box models. CP allows the proposed multi-robot planner to reason about its inherent uncertainty in a decentralized fashion, enabling robots to make individual decisions when they are sufficiently certain and seek help otherwise. We show, both theoretically and empirically, that the proposed planner can achieve user-specified task success rates while minimizing the overall number of help requests. We provide comparative experiments against related works showing that our method is significantly more computational efficient and achieves lower help rates. The advantage of our algorithm over baselines becomes more pronounced with increasing robot team size.

Updated: 2024-06-24 18:27:35

标题: 使用一致预测技术对语言指导的多机器人系统进行安全任务规划

摘要: 这篇论文探讨了语言指导的机器人团队的任务规划问题。任务以自然语言（NL）表达，要求机器人在各个位置和语义对象上应用其能力。最近一些研究利用预先训练的大型语言模型（LLMs）来设计有效的多机器人计划来解决类似的规划问题。然而，这些方法缺乏任务完成的保证。为了解决这一挑战，我们引入了一种新的基于分布式LLM的规划器，称为S-ATLAS，用于语言指导的代理团队的安全规划，能够实现用户定义的任务成功率。通过利用符合预测（CP），一种在黑盒模型中无分布的不确定性量化工具，来实现这一目标。CP允许所提出的多机器人规划器以分布式方式推断其固有不确定性，使机器人在足够确定时做出个别决策，在其他情况下寻求帮助。我们在理论和经验上展示，所提出的规划器可以实现用户指定的任务成功率，同时最小化总的帮助请求次数。我们通过与相关研究的比较实验表明，我们的方法在计算效率上显著更高，并且达到更低的帮助率。我们的算法相对于基线的优势在机器人团队规模增大时变得更加明显。

更新时间: 2024-06-24 18:27:35

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2402.15368v2

Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning

Binary code analysis is the foundation of crucial tasks in the security domain; thus building effective binary analysis techniques is more important than ever. Large language models (LLMs) although have brought impressive improvement to source code tasks, do not directly generalize to assembly code due to the unique challenges of assembly: (1) the low information density of assembly and (2) the diverse optimizations in assembly code. To overcome these challenges, this work proposes a hierarchical attention mechanism that builds attention summaries to capture the semantics more effectively, and designs contrastive learning objectives to train LLMs to learn assembly optimization. Equipped with these techniques, this work develops Nova, a generative LLM for assembly code. Nova outperforms existing techniques on binary code decompilation by up to 146.54%, and outperforms the latest binary code similarity detection techniques by up to 6.17%, showing promising abilities on both assembly generation and understanding tasks.

Updated: 2024-06-24 18:18:59

标题: 新星：具有分层注意力和对比学习的汇编代码生成语言模型

摘要: 二进制代码分析是安全领域中关键任务的基础；因此，构建有效的二进制分析技术比以往任何时候都更加重要。虽然大型语言模型（LLMs）为源代码任务带来了令人印象深刻的改进，但由于汇编代码的独特挑战，它们并不直接推广到汇编代码：（1）汇编语言信息密度低，（2）汇编代码中存在多样化的优化。为了克服这些挑战，这项工作提出了一种分层注意机制，用于构建注意力摘要以更有效地捕捉语义，并设计对比学习目标来训练LLMs学习汇编优化。借助这些技术，这项工作开发了Nova，一个用于汇编代码的生成LLM。Nova在二进制代码反编译方面的表现高出146.54%，在最新的二进制代码相似性检测技术方面高出6.17%，表现出在汇编生成和理解任务上具有很大潜力。

更新时间: 2024-06-24 18:18:59

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2311.13721v4

Bayesian Deep ICE

Deep Independent Component Estimation (DICE) has many applications in modern day machine learning as a feature engineering extraction method. We provide a novel latent variable representation of independent component analysis that enables both point estimates via expectation-maximization (EM) and full posterior sampling via Markov Chain Monte Carlo (MCMC) algorithms. Our methodology also applies to flow-based methods for nonlinear feature extraction. We discuss how to implement conditional posteriors and envelope-based methods for optimization. Through this representation hierarchy, we unify a number of hitherto disjoint estimation procedures. We illustrate our methodology and algorithms on a numerical example. Finally, we conclude with directions for future research.

Updated: 2024-06-24 18:18:58

标题: 贝叶斯深度ICE

摘要: Deep Independent Component Estimation (DICE)在现代机器学习中具有许多应用，作为一种特征工程提取方法。我们提供了独立成分分析的一种新颖的隐变量表示，可以通过期望最大化（EM）进行点估计和通过马尔可夫链蒙特卡洛（MCMC）算法进行完整后验抽样。我们的方法论也适用于非线性特征提取的基于流的方法。我们讨论了如何实现条件后验和基于包络的优化方法。通过这种表示层次结构，我们统一了一些迄今为止分离的估计程序。我们在一个数值示例上说明了我们的方法学和算法。最后，我们总结了未来研究的方向。

更新时间: 2024-06-24 18:18:58

领域: stat.ME,cs.LG,62F15, 62H25, 68T07

下载: http://arxiv.org/abs/2406.17058v1

At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models

Vision-Language multimodal Models (VLMs) offer the possibility for zero-shot classification in astronomy: i.e. classification via natural language prompts, with no training. We investigate two models, GPT-4o and LLaVA-NeXT, for zero-shot classification of low-surface brightness galaxies and artifacts, as well as morphological classification of galaxies. We show that with natural language prompts these models achieved significant accuracy (above 80 percent typically) without additional training/fine tuning. We discuss areas that require improvement, especially for LLaVA-NeXT, which is an open source model. Our findings aim to motivate the astronomical community to consider VLMs as a powerful tool for both research and pedagogy, with the prospect that future custom-built or fine-tuned models could perform better.

Updated: 2024-06-24 18:17:54

标题: 乍一看：使用大型多模态模型进行天文图像的零样本分类

摘要: Vision-Language multimodal Models (VLMs)提供了在天文学中进行零样本分类的可能性：即通过自然语言提示进行分类，无需训练。我们研究了两种模型，GPT-4o和LLaVA-NeXT，用于低表面亮度星系和人造物的零样本分类，以及星系形态分类。我们展示了这些模型在自然语言提示下实现了显著的准确性（通常超过80％），而无需额外的训练/微调。我们讨论了需要改进的领域，特别是LLaVA-NeXT，这是一个开源模型。我们的发现旨在激励天文学界将VLMs视为研究和教学的强大工具，展望未来定制或微调的模型可能会表现更好。

更新时间: 2024-06-24 18:17:54

领域: astro-ph.IM,astro-ph.GA,cs.AI

下载: http://arxiv.org/abs/2406.17057v1

PIC2O-Sim: A Physics-Inspired Causality-Aware Dynamic Convolutional Neural Operator for Ultra-Fast Photonic Device FDTD Simulation

The finite-difference time-domain (FDTD) method, which is important in photonic hardware design flow, is widely adopted to solve time-domain Maxwell equations. However, FDTD is known for its prohibitive runtime cost, taking minutes to hours to simulate a single device. Recently, AI has been applied to realize orders-of-magnitude speedup in partial differential equation (PDE) solving. However, AI-based FDTD solvers for photonic devices have not been clearly formulated. Directly applying off-the-shelf models to predict the optical field dynamics shows unsatisfying fidelity and efficiency since the model primitives are agnostic to the unique physical properties of Maxwell equations and lack algorithmic customization. In this work, we thoroughly investigate the synergy between neural operator designs and the physical property of Maxwell equations and introduce a physics-inspired AI-based FDTD prediction framework PIC2O-Sim which features a causality-aware dynamic convolutional neural operator as its backbone model that honors the space-time causality constraints via careful receptive field configuration and explicitly captures the permittivity-dependent light propagation behavior via an efficient dynamic convolution operator. Meanwhile, we explore the trade-offs among prediction scalability, fidelity, and efficiency via a multi-stage partitioned time-bundling technique in autoregressive prediction. Multiple key techniques have been introduced to mitigate iterative error accumulation while maintaining efficiency advantages during autoregressive field prediction. Extensive evaluations on three challenging photonic device simulation tasks have shown the superiority of our PIC2O-Sim method, showing 51.2% lower roll-out prediction error, 23.5 times fewer parameters than state-of-the-art neural operators, providing 300-600x higher simulation speed than an open-source FDTD numerical solver.

Updated: 2024-06-24 18:15:36

标题: PIC2O-Sim：一种受物理启发的因果感知动态卷积神经算子，用于超快光子器件FDTD模拟

摘要: 有限差分时域（FDTD）方法在光子硬件设计流程中非常重要，被广泛应用于求解时域麦克斯韦方程。然而，FDTD因其高昂的运行成本而闻名，需要几分钟到几小时来模拟单个设备。最近，人工智能被应用于实现偏微分方程（PDE）求解的数量级速度提升。然而，用于光子器件的基于人工智能的FDTD求解器尚未明确定义。直接应用现成模型来预测光场动态显示出不满意的准确性和效率，因为模型原语对麦克斯韦方程的独特物理特性一无所知，缺乏算法定制。在本研究中，我们深入探讨了神经算子设计与麦克斯韦方程物理特性之间的协同关系，并引入了一种物理启发的基于人工智能的FDTD预测框架PIC2O-Sim，其以能够遵守时空因果约束的动态卷积神经算子作为骨干模型，通过仔细的感受野配置明确捕捉介电依赖的光传播行为。同时，我们通过自回归预测中的多阶段分割时间捆绑技术来探索预测可扩展性、准确性和效率之间的权衡。引入了多种关键技术来减轻迭代误差积累，同时在自回归场预测期间保持效率优势。对三个具有挑战性的光子器件模拟任务的广泛评估显示了我们的PIC2O-Sim方法的优越性，显示出比最先进的神经算子减少51.2%的展开预测误差，比开源FDTD数值求解器提供300-600倍的模拟速度。

更新时间: 2024-06-24 18:15:36

领域: physics.comp-ph,cs.AI,physics.optics

下载: http://arxiv.org/abs/2406.17810v1

Large Language Models Assume People are More Rational than We Really are

In order for AI systems to communicate effectively with people, they must understand how we make decisions. However, people's decisions are not always rational, so the implicit internal models of human decision-making in Large Language Models (LLMs) must account for this. Previous empirical evidence seems to suggest that these implicit models are accurate -- LLMs offer believable proxies of human behavior, acting how we expect humans would in everyday interactions. However, by comparing LLM behavior and predictions to a large dataset of human decisions, we find that this is actually not the case: when both simulating and predicting people's choices, a suite of cutting-edge LLMs (GPT-4o & 4-Turbo, Llama-3-8B & 70B, Claude 3 Opus) assume that people are more rational than we really are. Specifically, these models deviate from human behavior and align more closely with a classic model of rational choice -- expected value theory. Interestingly, people also tend to assume that other people are rational when interpreting their behavior. As a consequence, when we compare the inferences that LLMs and people draw from the decisions of others using another psychological dataset, we find that these inferences are highly correlated. Thus, the implicit decision-making models of LLMs appear to be aligned with the human expectation that other people will act rationally, rather than with how people actually act.

Updated: 2024-06-24 18:15:27

标题: 大型语言模型假设人们比实际更理性

摘要: 为了使人工智能系统能够有效地与人类沟通，它们必须理解我们做决定的方式。然而，人们的决定并不总是理性的，因此大型语言模型（LLMs）中的隐含内部模型必须考虑到这一点。先前的实证证据似乎表明这些隐含模型是准确的--LLMs提供了人类行为的可信代理，表现出了我们在日常互动中所期望的人类行为。然而，通过将LLM的行为和预测与大量人类决策的数据集进行比较，我们发现情况并非如此：在模拟和预测人们的选择时，一系列最新的LLMs（GPT-4o & 4-Turbo，Llama-3-8B & 70B，Claude 3 Opus）假设人们比我们实际上更理性。具体地，这些模型偏离了人类行为，并更加接近于经典的理性选择模型--期望值理论。有趣的是，人们在解释他人行为时也倾向于假设其他人是理性的。因此，当我们比较LLMs和人们从其他人的决定中得出的推断时，使用另一个心理数据集，我们发现这些推断之间高度相关。因此，LLMs的隐含决策模型似乎与人类的期望相一致，即其他人会以理性的方式行动，而不是与人们实际行动相一致。

更新时间: 2024-06-24 18:15:27

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.17055v1

Leveraging Knowledge Distillation for Lightweight Skin Cancer Classification: Balancing Accuracy and Computational Efficiency

Skin cancer is a major concern to public health, accounting for one-third of the reported cancers. If not detected early, the cancer has the potential for severe consequences. Recognizing the critical need for effective skin cancer classification, we address the limitations of existing models, which are often too large to deploy in areas with limited computational resources. In response, we present a knowledge distillation based approach for creating a lightweight yet high-performing classifier. The proposed solution involves fusing three models, namely ResNet152V2, ConvNeXtBase, and ViT Base, to create an effective teacher model. The teacher model is then employed to guide a lightweight student model of size 2.03 MB. This student model is further compressed to 469.77 KB using 16-bit quantization, enabling smooth incorporation into edge devices. With six-stage image preprocessing, data augmentation, and a rigorous ablation study, the model achieves an impressive accuracy of 98.75% on the HAM10000 dataset and 98.94% on the Kaggle dataset in classifying benign and malignant skin cancers. With its high accuracy and compact size, our model appears to be a potential choice for accurate skin cancer classification, particularly in resource-constrained settings.

Updated: 2024-06-24 18:13:09

标题: 利用知识蒸馏技术实现轻量化皮肤癌分类：平衡准确性和计算效率

摘要: 皮肤癌是公共卫生的一个重要问题，占报告的癌症的三分之一。如果不能及早检测，癌症可能会导致严重后果。认识到有效皮肤癌分类的迫切需求，我们解决了现有模型的局限性，这些模型通常太大，无法在计算资源有限的地区部署。作为回应，我们提出了一种基于知识蒸馏的方法，用于创建一个轻量但高性能的分类器。所提出的解决方案涉及融合三个模型，即ResNet152V2、ConvNeXtBase和ViT Base，以创建一个有效的教师模型。然后，教师模型被用来指导一个大小为2.03MB的轻量级学生模型。这个学生模型进一步通过16位量化压缩到469.77KB，使其顺利地整合到边缘设备中。通过六阶段的图像预处理、数据增强和严格的消融研究，该模型在HAM10000数据集上实现了98.75%的准确率，在Kaggle数据集上实现了98.94%的准确率，对良性和恶性皮肤癌进行分类。由于其高准确性和紧凑的尺寸，我们的模型似乎是在资源受限的环境中进行准确皮肤癌分类的一个潜在选择。

更新时间: 2024-06-24 18:13:09

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17051v1

StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available in hf.co/Stable-X

Updated: 2024-06-24 17:59:58

标题: 稳定正常：减少扩散方差以获得稳定和清晰的正常。

摘要: 这项工作解决了从单目彩色输入（即图像和视频）中高质量表面法线估计的挑战，该领域最近通过重新利用扩散先验而发生了革命。然而，先前的尝试仍然在随机推断方面存在困难，与“Image2Normal”任务的确定性性质相冲突，并且昂贵的集成步骤会减慢估计过程。我们的方法StableNormal通过减少推断方差来减轻扩散过程的随机性，从而产生“稳定而清晰”的法线估计，而无需任何额外的集成过程。StableNormal在具有极端光照、模糊和低质量等具有挑战性的成像条件下表现出稳健性。它还能够抵抗透明和反射表面，以及具有大量物体的混乱场景。具体来说，StableNormal采用了一种由粗到细的策略，从一个一步法线估计器（YOSO）开始，推导出一个相对粗糙但可靠的初始法线猜测，然后通过一个语义引导的细化过程（SG-DRN）来细化法线以恢复几何细节。StableNormal的有效性通过在标准数据集（如DIODE-indoor、iBims、ScannetV2和NYUv2）以及各种下游任务（如表面重建和法线增强）中的竞争性表现得到证明。这些结果证明了StableNormal保留了准确法线估计的“稳定性”和“清晰度”。StableNormal代表了重新利用扩散先验进行确定性估计的初步尝试。为了使这一技术民主化，代码和模型已在hf.co/Stable-X上公开提供。

更新时间: 2024-06-24 17:59:58

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2406.16864v1

Meta-learning and Data Augmentation for Stress Testing Forecasting Models

The effectiveness of univariate forecasting models is often hampered by conditions that cause them stress. A model is considered to be under stress if it shows a negative behaviour, such as higher-than-usual errors or increased uncertainty. Understanding the factors that cause stress to forecasting models is important to improve their reliability, transparency, and utility. This paper addresses this problem by contributing with a novel framework called MAST (Meta-learning and data Augmentation for Stress Testing). The proposed approach aims to model and characterize stress in univariate time series forecasting models, focusing on conditions where they exhibit large errors. In particular, MAST is a meta-learning approach that predicts the probability that a given model will perform poorly on a given time series based on a set of statistical time series features. MAST also encompasses a novel data augmentation technique based on oversampling to improve the metadata concerning stress. We conducted experiments using three benchmark datasets that contain a total of 49.794 time series to validate the performance of MAST. The results suggest that the proposed approach is able to identify conditions that lead to large errors. The method and experiments are publicly available in a repository.

Updated: 2024-06-24 17:59:33

标题: 元学习和数据增强用于压力测试预测模型

摘要: 单变量预测模型的有效性经常受到导致其压力的条件的阻碍。如果模型显示负面行为，如高于平常的错误或增加的不确定性，则认为模型处于压力之下。了解导致预测模型压力的因素对于改进它们的可靠性、透明度和效用是重要的。本文通过提出一个名为MAST（元学习和数据增强用于压力测试）的新框架来解决这个问题。所提出的方法旨在对单变量时间序列预测模型中的压力进行建模和表征，重点关注它们展现出大错误的条件。特别是，MAST是一种元学习方法，根据一组统计时间序列特征预测给定模型在给定时间序列上表现不佳的概率。MAST还包括一种基于过采样的新型数据增强技术，以改进有关压力的元数据。我们使用包含总共49,794个时间序列的三个基准数据集进行实验证明MAST的性能。结果表明，所提出的方法能够识别导致大错误的条件。这种方法和实验证明已在一个代码库中公开。

更新时间: 2024-06-24 17:59:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.17008v1

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

Inference with modern Large Language Models (LLMs) is expensive and time-consuming, and speculative sampling has proven to be an effective solution. Most speculative sampling methods such as EAGLE use a static draft tree, implicitly assuming that the acceptance rate of draft tokens depends only on their position. Interestingly, we found that the acceptance rate of draft tokens is also context-dependent. In this paper, building upon EAGLE, we propose EAGLE-2, which introduces a new technique of context-aware dynamic draft tree into drafting modeling. This improvement leverages the fact that the draft model of EAGLE is well-calibrated: the confidence scores from the draft model approximate acceptance rates with small errors. We conducted extensive evaluations on three series of LLMs and six tasks, with EAGLE-2 achieving speedup ratios 3.05x-4.26x, which is 20%-40% faster than EAGLE-1. EAGLE-2 also ensures that the distribution of the generated text remains unchanged, making it a lossless acceleration algorithm.

Updated: 2024-06-24 17:59:11

标题: EAGLE-2：利用动态草稿树加速语言模型推理

摘要: 使用现代大型语言模型（LLMs）进行推理是昂贵且耗时的，而猜测抽样已被证明是一种有效的解决方案。大多数猜测抽样方法（如EAGLE）使用静态草稿树，隐含地假设草稿令牌的接受率仅取决于它们的位置。有趣的是，我们发现草稿令牌的接受率也与上下文相关。在本文中，基于EAGLE，我们提出了EAGLE-2，引入了一种新的上下文感知动态草稿树技术到起草建模中。这种改进利用了EAGLE的草稿模型是良好校准的事实：草稿模型的置信度分数近似于接受率，误差很小。我们在三个系列的LLMs和六个任务上进行了广泛评估，EAGLE-2实现了3.05倍至4.26倍的加速比，比EAGLE-1快20%-40%。EAGLE-2还确保生成文本的分布保持不变，使其成为一种无损加速算法。

更新时间: 2024-06-24 17:59:11

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.16858v1

GeoMFormer: A General Architecture for Geometric Molecular Representation Learning

Molecular modeling, a central topic in quantum mechanics, aims to accurately calculate the properties and simulate the behaviors of molecular systems. The molecular model is governed by physical laws, which impose geometric constraints such as invariance and equivariance to coordinate rotation and translation. While numerous deep learning approaches have been developed to learn molecular representations under these constraints, most of them are built upon heuristic and costly modules. We argue that there is a strong need for a general and flexible framework for learning both invariant and equivariant features. In this work, we introduce a novel Transformer-based molecular model called GeoMFormer to achieve this goal. Using the standard Transformer modules, two separate streams are developed to maintain and learn invariant and equivariant representations. Carefully designed cross-attention modules bridge the two streams, allowing information fusion and enhancing geometric modeling in each stream. As a general and flexible architecture, we show that many previous architectures can be viewed as special instantiations of GeoMFormer. Extensive experiments are conducted to demonstrate the power of GeoMFormer. All empirical results show that GeoMFormer achieves strong performance on both invariant and equivariant tasks of different types and scales. Code and models will be made publicly available at https://github.com/c-tl/GeoMFormer.

Updated: 2024-06-24 17:58:13

标题: GeoMFormer：几何分子表示学习的通用架构

摘要: 分子建模是量子力学中的一个核心主题，旨在准确计算分子系统的属性并模拟其行为。分子模型受物理定律的控制，这些定律施加了几何约束，如对坐标旋转和平移的不变性和等变性。尽管已经开发了许多深度学习方法来学习这些约束下的分子表示，但大多数是基于启发式和昂贵的模块构建的。我们认为有必要为学习不变和等变特征提供一个通用和灵活的框架。在这项工作中，我们介绍了一种名为GeoMFormer的基于Transformer的分子模型，以实现这一目标。使用标准的Transformer模块，开发了两个独立的流来维护和学习不变和等变表示。精心设计的交叉注意力模块连接了这两个流，允许信息融合并增强每个流中的几何建模。作为一个通用和灵活的架构，我们展示了许多先前的架构可以被视为GeoMFormer的特殊实例。进行了大量实验来展示GeoMFormer的强大性能。所有实证结果显示，GeoMFormer在不同类型和规模的不变和等变任务上都取得了良好的表现。代码和模型将在https://github.com/c-tl/GeoMFormer 上公开提供。

更新时间: 2024-06-24 17:58:13

领域: cs.LG,cond-mat.mtrl-sci,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2406.16853v1

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

We present LoCoVQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (VLMs). LoCoVQA augments test examples for mathematical reasoning, VQA, and character recognition tasks with increasingly long visual contexts composed of both in-distribution and out-of-distribution distractor images. Across these tasks, a diverse set of VLMs rapidly lose performance as the visual context length grows, often exhibiting a striking exponential decay trend. This test assesses how well VLMs can ignore irrelevant information when answering queries -- a task that is quite easy for language models (LMs) in the text domain -- demonstrating that current state-of-the-art VLMs lack this essential capability for many long-context applications.

Updated: 2024-06-24 17:58:03

标题: 在图像堆中丢失视觉针：视觉语言模型在短期和长期背景下容易分心

摘要: 我们提出了LoCoVQA，一个用于评估视觉语言模型（VLMs）中长上下文抽取推理的动态基准生成器。LoCoVQA通过逐渐增加由分布内和分布外干扰图像组成的视觉上下文来扩充数学推理、VQA和字符识别任务的测试示例。在这些任务中，各种类型的VLMs随着视觉上下文长度的增加迅速失去性能，通常呈现出明显的指数衰减趋势。这一测试评估了VLMs在回答查询时如何忽略无关信息的能力，这对于文本领域的语言模型（LMs）来说是相当容易的任务，表明目前的最先进VLMs在许多长上下文应用中缺乏这种基本能力。

更新时间: 2024-06-24 17:58:03

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.16851v1

Improving physics-informed DeepONets with hard constraints

Current physics-informed (standard or deep operator) neural networks still rely on accurately learning the initial and/or boundary conditions of the system of differential equations they are solving. In contrast, standard numerical methods involve such conditions in computations without needing to learn them. In this study, we propose to improve current physics-informed deep learning strategies such that initial and/or boundary conditions do not need to be learned and are represented exactly in the predicted solution. Moreover, this method guarantees that when a deep operator network is applied multiple times to time-step a solution of an initial value problem, the resulting function is at least continuous.

Updated: 2024-06-24 17:54:58

标题: 改进物理信息DeepONets并带有硬约束

摘要: 目前的物理学-启发式（标准或深度操作符）神经网络仍然依赖于准确学习它们正在解决的微分方程系统的初始和/或边界条件。相反，标准数值方法在计算中涉及这些条件，而无需学习它们。在这项研究中，我们提出改进当前的物理学-启发式深度学习策略，使得初始和/或边界条件无需学习，并且在预测的解中准确表示。此外，该方法保证当深度操作符网络多次应用于时间步长初始值问题的解时，所得到的函数至少是连续的。

更新时间: 2024-06-24 17:54:58

领域: cs.LG,cs.NA,math.NA,physics.comp-ph

下载: http://arxiv.org/abs/2309.07899v2

Robust Distribution Learning with Local and Global Adversarial Corruptions

We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (global corruptions) and the remaining perturbations have average magnitude bounded by $\rho$ (local corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance $\mathsf{W}_1(\hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$ for all orthogonal projections $\Pi \in \mathbb{R}^{d \times d}$, with performance scaling with $\mathrm{rank}(\Pi) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $\sqrt{\varepsilon k} + \rho + \tilde{O}(d\sqrt{k}n^{-1/(k \lor 2)})$ when $P$ has bounded covariance. This guarantee holds uniformly in $k$ and is minimax optimal up to the sub-optimality of the plug-in estimator when $\rho = \varepsilon = 0$. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.

Updated: 2024-06-24 17:53:33

标题: 具有局部和全局敌对性扰动的鲁棒分布学习

摘要: 我们考虑在对抗环境中学习，其中来自分布$P$的$\varepsilon$比例的样本被任意修改（全局破坏），剩余扰动的平均幅度受到$\rho$的限制（局部破坏）。在得到$n$个这样的受损样本后，我们寻求一个计算效率高的估计量$\hat{P}_n$，以最小化Wasserstein距离$\mathsf{W}_1(\hat{P}_n,P)$。实际上，我们攻击了最小化$\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$的细粒度任务，其中所有正交投影$\Pi \in \mathbb{R}^{d \times d}$的性能与$\mathrm{rank}(\Pi) = k$成比例。这使我们能够同时考虑均值估计（$k=1$）、分布估计（$k=d$）以及这两个极端之间的情况。我们表征了该任务的最优总体风险，并开发了一个高效的有限样本算法，当$P$具有有界协方差时，其误差受到$\sqrt{\varepsilon k} + \rho + \tilde{O}(d\sqrt{k}n^{-1/(k \lor 2)})$的限制。当$\rho = \varepsilon = 0$时，这个保证在$k$中是一致的，并且在插件估计器的次优性方面是极小化的。我们的高效程序依赖于对理想但难以处理的2-Wasserstein投影估计的迹范数近似。我们将这个算法应用于鲁棒随机优化，并在这个过程中发现了一种克服Wasserstein分布鲁棒优化中维度诅咒的新方法。

更新时间: 2024-06-24 17:53:33

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06509v2

Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection

Machine learning models can fail on subgroups that are underrepresented during training. While techniques such as dataset balancing can improve performance on underperforming groups, they require access to training group annotations and can end up removing large portions of the dataset. In this paper, we introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and removes specific training examples that drive the model's failures on minority groups. Our approach enables us to efficiently train debiased classifiers while removing only a small number of examples, and does not require training group annotations or additional hyperparameter tuning.

Updated: 2024-06-24 17:51:01

标题: 数据模型（D3M）中的数据去偏见：通过数据选择改善子群体的稳健性

摘要: 机器学习模型可能在训练过程中对少数群体失败。虽然诸如数据集平衡等技术可以改善表现不佳群体的性能，但它们需要访问训练群体的注释，并且可能会删除数据集的大部分内容。在本文中，我们介绍了一种名为Data Debiasing with Datamodels（D3M）的去偏见方法，该方法隔离并删除导致模型在少数群体上失败的特定训练示例。我们的方法使我们能够高效地训练去偏见的分类器，同时只删除少数示例，不需要训练组注释或额外的超参数调整。

更新时间: 2024-06-24 17:51:01

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2406.16846v1

Scaling and renormalization in high-dimensional regression

This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

Updated: 2024-06-24 17:47:41

标题: 高维回归中的缩放和重整化

摘要: 这篇论文利用随机矩阵理论和自由概率的基本工具，简洁推导了各种高维岭回归模型的训练和泛化性能。我们介绍并回顾了最近在这些主题上的研究结果，面向具有物理和深度学习背景的读者。通过自由概率的$S$-变换的特性，我们直接从代数中得到了训练和泛化误差的解析公式。这使得我们能够直观地识别模型性能中幂律缩放的来源。我们计算了广泛类别的随机特征模型的泛化误差。我们发现在所有模型中，$S$-变换对应于训练-测试泛化差距，并产生了广义交叉验证估计器的类似物。利用这些技术，我们对具有结构化协变量的非常普遍的随机特征模型进行了精细的偏差-方差分解。这些新颖的结果使我们能够发现随机特征模型的缩放区域，在这个区域中，特征导致的方差限制了超参数化设置中的性能。我们还展示了随机特征模型中各向异性权重结构如何限制性能，并导致超参数化设置中有限宽度修正的非平凡指数。我们的结果扩展并提供了一个统一的视角，解释了先前神经缩放定律模型。

更新时间: 2024-06-24 17:47:41

领域: stat.ML,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2405.00592v2

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.

Updated: 2024-06-24 17:45:59

标题: 从解码到元生成：大型语言模型的推理时间算法

摘要: 现代大型语言模型（LLMs）研究中最引人注目的发现之一是，在训练过程中扩展计算能力会导致更好的结果。然而，在推理过程中扩展计算能力的好处受到的关注较少。本调查重点关注这些推理时间方法。我们在统一的数学形式主义下探讨了三个领域：标记级生成算法、元生成算法和高效生成。标记级生成算法，通常称为解码算法，通过逐个抽样一个标记或构建标记级搜索空间，然后选择一个输出来运行。这些方法通常假设可以访问语言模型的逻辑回归、下一个标记分布或概率分数。元生成算法处理部分或完整序列，结合领域知识，实现回溯，整合外部信息。高效生成方法旨在减少标记成本并提高生成速度。我们的调查结合了传统自然语言处理、现代LLMs和机器学习系统的三个研究社区的观点。

更新时间: 2024-06-24 17:45:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.16838v1

Concentration Inequalities for $(f,Γ)$-GANs

Generative adversarial networks (GANs) are unsupervised learning methods for training a generator distribution to produce samples that approximate those drawn from a target distribution. Many such methods can be formulated as minimization of a metric or divergence. Recent works have proven the statistical consistency of GANs that are based on integral probability metrics (IPMs), e.g., WGAN which is based on the 1-Wasserstein metric. IPMs are defined by optimizing a linear functional (difference of expectations) over a space of discriminators. A much larger class of GANs, which allow for the use of nonlinear objective functionals, can be constructed using $(f,\Gamma)$-divergences; these generalize and interpolate between IPMs and $f$-divergences (e.g., KL or $\alpha$-divergences). Instances of $(f,\Gamma)$-GANs have been shown to exhibit improved performance in a number of applications. In this work we study the statistical consistency of $(f,\Gamma)$-GANs for general $f$ and $\Gamma$. Specifically, we derive finite-sample concentration inequalities. These derivations require novel arguments due to nonlinearity of the objective functional. We demonstrate that our new results reduce to the known results for IPM-GANs in the appropriate limit while also significantly extending the domain of applicability of this theory.

Updated: 2024-06-24 17:42:03

标题: Concentration Inequalities for $(f,Γ)$-GANs 生成对抗网络$(f,Γ)$的集中不等式

摘要: 生成对抗网络（GANs）是一种无监督学习方法，用于训练生成器分布以产生近似于从目标分布中抽取的样本。许多这样的方法可以被表述为度量或散度的最小化。最近的研究已经证明基于积分概率度量（IPMs）的GANs的统计一致性，例如基于1-瓦瑟斯坦度量的WGAN。IPMs由于在鉴别器空间上优化线性泛函（期望差）来定义。一个更大的类别的GANs，允许使用非线性目标泛函，可以使用$(f,\Gamma)$-散度构建；这些泛化并插值了IPMs和$f$-散度（例如KL或$\alpha$-散度）。已经证明$(f,\Gamma)$-GANs的实例在许多应用中表现出改进的性能。在这项工作中，我们研究了一般$f$和$\Gamma$的$(f,\Gamma)$-GANs的统计一致性。具体来说，我们推导了有限样本集中不等式。这些推导需要由于目标泛函的非线性而产生的新论点。我们证明我们的新结果在适当的极限下归结为IPM-GANs的已知结果，同时显著扩展了该理论的适用范围。

更新时间: 2024-06-24 17:42:03

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.16834v1

USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations

Identifying user's opinions and stances in long conversation threads on various topics can be extremely critical for enhanced personalization, market research, political campaigns, customer service, conflict resolution, targeted advertising, and content moderation. Hence, training language models to automate this task is critical. However, to train such models, gathering manual annotations has multiple challenges: 1) It is time-consuming and costly; 2) Conversation threads could be very long, increasing chances of noisy annotations; and 3) Interpreting instances where a user changes their opinion within a conversation is difficult because often such transitions are subtle and not expressed explicitly. Inspired by the recent success of large language models (LLMs) for complex natural language processing (NLP) tasks, we leverage Mistral Large and GPT-4 to automate the human annotation process on the following two tasks while also providing reasoning: i) User Stance classification, which involves labeling a user's stance of a post in a conversation on a five-point scale; ii) User Dogmatism classification, which deals with labeling a user's overall opinion in the conversation on a four-point scale. The majority voting on zero-shot, one-shot, and few-shot annotations from these two LLMs on 764 multi-user Reddit conversations helps us curate the USDC dataset. USDC is then used to finetune and instruction-tune multiple deployable small language models for the 5-class stance and 4-class dogmatism classification tasks. We make the code and dataset publicly available [https://anonymous.4open.science/r/USDC-0F7F].

Updated: 2024-06-24 17:41:53

标题: USDC：一份关于长对话中用户立场和教条主义的数据集

摘要: 在各种话题上识别用户在长对话线程中的意见和立场对于增强个性化、市场研究、政治运动、客户服务、冲突解决、定向广告和内容管理非常关键。因此，训练语言模型来自动化这一任务至关重要。然而，要训练这样的模型，收集手动标注存在多个挑战：1）耗时且成本高昂；2）对话线程可能非常长，增加了噪音标注的可能性；3）解释用户在对话中改变意见的情况很困难，因为这种转变通常是微妙的，不是明确表达的。受大型语言模型（LLMs）在复杂自然语言处理（NLP）任务中取得的最近成功的启发，我们利用Mistral Large和GPT-4自动化人类标注过程，同时提供推理，用于以下两项任务：i）用户立场分类，涉及在五点量表上对对话中用户帖子的立场进行标记；ii）用户教条主义分类，涉及在四点量表上标记用户在对话中的整体观点。从这两个LLM在764个多用户Reddit对话中的零射击、一射击和少射击标注的多数投票帮助我们筛选USDC数据集。然后，USDC用于微调和指导多个可部署的小语言模型，用于5类立场和4类教条主义分类任务。我们将代码和数据集公开提供。

更新时间: 2024-06-24 17:41:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16833v1

Understanding and Mitigating Tokenization Bias in Language Models

State-of-the-art language models are autoregressive and operate on subword units known as tokens. Specifically, one must encode the conditioning string into a list of tokens before passing to the language models for next-token prediction. We show that, for encoding schemes such as maximum prefix matching, tokenization induces a sampling bias that cannot be mitigated with more training or data. To counter this universal problem, we propose a novel algorithm to obtain unbiased estimates from a model that was trained on tokenized data. Our method does not require finetuning the model, and its complexity, defined as the number of model runs, scales linearly with the sequence length. As a consequence, we show that one can simulate token-free behavior from a tokenized language model. We empirically verify the correctness of our method through a Markov-chain setup, where it accurately recovers the transition probabilities, as opposed to the conventional method of directly prompting tokens into the language model.

Updated: 2024-06-24 17:38:02

标题: 理解和减轻语言模型中的标记化偏差

摘要: 最先进的语言模型是自回归的，并且操作的子词单位被称为标记。具体来说，必须将条件字符串编码为标记列表，然后将其传递给语言模型进行下一个标记的预测。我们表明，对于诸如最大前缀匹配的编码方案，标记化会引起无法通过更多训练或数据来缓解的抽样偏差。为了解决这个普遍问题，我们提出了一种新颖的算法，可以从在标记化数据上训练的模型中获得无偏估计。我们的方法不需要微调模型，并且其复杂度，定义为模型运行次数，与序列长度成线性关系。因此，我们表明可以从一个标记化语言模型中模拟无标记的行为。我们通过一个马尔可夫链设置在实证上验证了我们方法的正确性，它能够准确恢复转移概率，而不是直接将标记输入到语言模型中的传统方法。

更新时间: 2024-06-24 17:38:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16829v1

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large language models (LLMs) to provide a well-informed, attributed, concise summary in contrast to the traditional search paradigm that relies on displaying a ranked list of documents. Therefore, given these recent advancements, it is crucial to have an arena to build, test, visualize, and systematically evaluate RAG-based search systems. With this in mind, we propose the TREC 2024 RAG Track to foster innovation in evaluating RAG systems. In our work, we lay out the steps we've made towards making this track a reality -- we describe the details of our reusable framework, Ragnar\"ok, explain the curation of the new MS MARCO V2.1 collection choice, release the development topics for the track, and standardize the I/O definitions which assist the end user. Next, using Ragnar\"ok, we identify and provide key industrial baselines such as OpenAI's GPT-4o or Cohere's Command R+. Further, we introduce a web-based user interface for an interactive arena allowing benchmarking pairwise RAG systems by crowdsourcing. We open-source our Ragnar\"ok framework and baselines to achieve a unified standard for future RAG systems.

Updated: 2024-06-24 17:37:52

标题: Ragnarök：TREC 2024检索增强生成赛道的可重复使用的RAG框架和基线

摘要: 您尝试过新的Bing搜索吗？或者也许您玩过谷歌AI概览？这些可能听起来很熟悉，因为现代搜索堆栈最近已经发展到包括检索增强生成（RAG）系统。它们允许搜索和将实时数据整合到大型语言模型（LLMs）中，以提供一个见识丰富、具有属性的简洁摘要，与依赖显示排名文档列表的传统搜索范式形成对比。因此，鉴于这些最新进展，建立、测试、可视化和系统评估基于RAG的搜索系统的领域至关重要。考虑到这一点，我们提出了TREC 2024 RAG Track，以促进评估RAG系统的创新。在我们的工作中，我们详细描述了我们为使这个轨道成为现实所采取的步骤--我们描述了我们可重复使用的框架Ragnar\"ok的细节，解释了新的MS MARCO V2.1收集选择的策划，发布了该轨道的开发主题，并标准化了有助于最终用户的I/O定义。接下来，使用Ragnar\"ok，我们识别并提供关键的工业基线，如OpenAI的GPT-4o或Cohere的Command R+。此外，我们推出了一个基于Web的用户界面，用于允许通过众包对RAG系统进行成对基准测试的交互式领域。我们开源我们的Ragnar\"ok框架和基线，以实现未来RAG系统的统一标准。

更新时间: 2024-06-24 17:37:52

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.16828v1

A Multi-Party, Multi-Blockchain Atomic Swap Protocol with Universal Adaptor Secret

The increasing complexity of digital asset transactions across multiple blockchains necessitates a robust atomic swap protocol that can securely handle more than two participants. Traditional atomic swap protocols, including those based on adaptor signatures, are vulnerable to malicious dropout attacks, which break atomicity and compromise the security of the transaction. This paper presents a novel multi-party atomic swap protocol that operates almost entirely off-chain, requiring only a single on-chain transaction for finalization. Our protocol leverages Schnorr-like signature verification and a universal adaptor secret to ensure atomicity and scalability across any number of participants and blockchains without the need for smart contracts or trusted third parties. By addressing key challenges such as collusion attacks and malicious dropouts, our protocol significantly enhances the security and efficiency of multi-party atomic swaps. Our contributions include the first scalable, fully off-chain protocol for atomic swaps involving any number of participants, adding zero overhead to native blockchains, and providing a practical and cost-effective solution for decentralized asset exchanges.

Updated: 2024-06-24 17:33:03

标题: 一个具有通用适配器秘钥的多方、多区块链原子交换协议

摘要: 数字资产跨多个区块链的交易日益复杂，需要一个强大的原子交换协议，可以安全地处理超过两个参与者。传统的原子交换协议，包括基于适配器签名的协议，容易受到恶意退出攻击的威胁，这会破坏原子性并 compromise 交易的安全性。本文提出了一种新颖的多方原子交换协议，几乎完全在链下运作，只需要单个链上交易来完成。我们的协议利用类似Schnorr的签名验证和一个通用适配器秘钥，确保原子性和可扩展性跨任意数量的参与者和区块链，而无需智能合约或信任的第三方。通过解决合谋攻击和恶意退出等关键挑战，我们的协议显著提高了多方原子交换的安全性和效率。我们的贡献包括第一个可扩展的、完全在链下运作的原子交换协议，涉及任意数量的参与者，对本地区块链没有额外开销，并为去中心化资产交换提供了实用和经济有效的解决方案。

更新时间: 2024-06-24 17:33:03

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.16822v1

General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design

Structure-Based Drug Design (SBDD) focuses on generating valid ligands that strongly and specifically bind to a designated protein pocket. Several methods use machine learning for SBDD to generate these ligands in 3D space, conditioned on the structure of a desired protein pocket. Recently, diffusion models have shown success here by modeling the underlying distributions of atomic positions and types. While these methods are effective in considering the structural details of the protein pocket, they often fail to explicitly consider the binding affinity. Binding affinity characterizes how tightly the ligand binds to the protein pocket, and is measured by the change in free energy associated with the binding process. It is one of the most crucial metrics for benchmarking the effectiveness of the interaction between a ligand and protein pocket. To address this, we propose BADGER: Binding Affinity Diffusion Guidance with Enhanced Refinement. BADGER is a general guidance method to steer the diffusion sampling process towards improved protein-ligand binding, allowing us to adjust the distribution of the binding affinity between ligands and proteins. Our method is enabled by using a neural network (NN) to model the energy function, which is commonly approximated by AutoDock Vina (ADV). ADV's energy function is non-differentiable, and estimates the affinity based on the interactions between a ligand and target protein receptor. By using a NN as a differentiable energy function proxy, we utilize the gradient of our learned energy function as a guidance method on top of any trained diffusion model. We show that our method improves the binding affinity of generated ligands to their protein receptors by up to 60\%, significantly surpassing previous machine learning methods. We also show that our guidance method is flexible and can be easily applied to other diffusion-based SBDD frameworks.

Updated: 2024-06-24 17:31:41

标题: 结构基药物设计中扩散模型的一般结合亲和力指导

摘要: 结构基药物设计（SBDD）专注于生成有效的配体，这些配体可以强烈而特异地结合到指定的蛋白质口袋上。几种方法使用机器学习来进行SBDD，以在三维空间中生成这些配体，条件是所需蛋白质口袋的结构。最近，扩散模型已经在这方面取得成功，通过建模原子位置和类型的基础分布。虽然这些方法在考虑蛋白质口袋的结构细节方面很有效，但它们常常无法明确考虑结合亲和力。结合亲和力表征配体与蛋白质口袋结合的紧密程度，并通过与结合过程相关的自由能的变化来衡量。这是用于评估配体与蛋白质口袋之间相互作用效果的最关键的指标之一。为了解决这个问题，我们提出了BADGER：结合亲和力扩散引导增强精炼。BADGER是一种通用引导方法，用于引导扩散采样过程朝着改善蛋白质-配体结合的方向发展，从而允许我们调整配体和蛋白质之间的结合亲和力分布。我们的方法通过使用神经网络（NN）来建模能量函数，该函数通常由AutoDock Vina（ADV）来近似。ADV的能量函数是不可微分的，并根据配体与目标蛋白质受体之间的相互作用来估计亲和力。通过使用NN作为可微分能量函数代理，我们利用学习到的能量函数的梯度作为任何经过训练的扩散模型之上的引导方法。我们展示了我们的方法将生成的配体与其蛋白质受体的结合亲和力提高了高达60％，明显超过了先前的机器学习方法。我们还展示了我们的引导方法是灵活的，并且可以轻松应用于其他基于扩散的SBDD框架。

更新时间: 2024-06-24 17:31:41

领域: cs.LG,cs.AI,physics.bio-ph,physics.chem-ph,q-bio.BM

下载: http://arxiv.org/abs/2406.16821v1

PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs

Recently, machine unlearning, which seeks to erase specific data stored in the pre-trained or fine-tuned models, has emerged as a crucial protective measure for LLMs. However, unlearning approaches for LLMs that have been considered thus far have focused on the removal of independent data points and have not taken into account that the stored facts are logically connected to one another and form an implicit knowledge graph. To facilitate the development of structural unlearning methods, which are essential for the practical application of unlearning, we propose PISTOL, a pipeline for compiling multi-scenario datasets for benchmarking structural LLM unlearning. Additionally, leveraging sample datasets synthesized using PISTOL, we conducted benchmarks with four distinct unlearning methods on both Llama2-7B and Mistral-7B models. This analysis helps to illustrate the prevailing challenges in effectively and robustly removing highly inter-connected data, batched data, or data skewed towards a specific domain. It also highlights the choice of pre-trained model can impact unlearning performance. This work not only advances our understandings on the limitation of current LLMs unlearning methods and proposes future research directions, but also provides a replicable framework for ongoing exploration and validation in the field.

Updated: 2024-06-24 17:22:36

标题: PISTOL：用于LMM结构遗忘的数据集编译管道

摘要: 最近，机器遗忘技术已经成为LLMs的重要保护措施，旨在擦除预训练或微调模型中存储的特定数据。然而，迄今为止考虑的LLMs遗忘方法主要集中在删除独立数据点，并未考虑存储的事实之间逻辑上的连接，形成一个隐含的知识图。为促进结构遗忘方法的发展，这些方法对于遗忘的实际应用至关重要，我们提出了PISTOL，一个用于编译多场景数据集以用于LLMs结构遗忘基准测试的流程。此外，利用使用PISTOL合成的样本数据集，我们对Llama2-7B和Mistral-7B模型进行了四种不同遗忘方法的基准测试。这一分析有助于阐明有效且稳健地删除高度相互连接的数据、批量数据或偏向于特定领域的数据所面临的挑战，同时也突显了预训练模型的选择对遗忘性能的影响。这项工作不仅推进了我们对当前LLMs遗忘方法限制的理解，并提出了未来的研究方向，还为该领域的持续探索和验证提供了可复制的框架。

更新时间: 2024-06-24 17:22:36

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.16810v1

Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional coarse-grained feedback (for example, thumbs up/down or ranking between a set of options). While fine-grained feedback holds promise, particularly for systems catering to diverse societal preferences, we show that demonstrating its superiority to coarse-grained feedback is not automatic. Through experiments on real and synthetic preference data, we surface the complexities of building effective models due to the interplay of model choice, feedback type, and the alignment between human judgment and computational interpretation. We identify key challenges in eliciting and utilizing fine-grained feedback, prompting a reassessment of its assumed benefits and practicality. Our findings -- e.g., that fine-grained feedback can lead to worse models for a fixed budget, in some settings; however, in controlled settings with known attributes, fine grained rewards can indeed be more helpful -- call for careful consideration of feedback attributes and potentially beckon novel modeling approaches to appropriately unlock the potential value of fine-grained feedback in-the-wild.

Updated: 2024-06-24 17:19:34

标题: 超越赞/踩：解开文本到图像生成的细粒度反馈挑战

摘要: 人类反馈在学习和完善文本到图像生成的奖励模型中起着至关重要的作用，但关于学习准确奖励函数所应采取的最佳反馈形式尚未得出定论。本文研究了精细化反馈的有效性，该反馈捕捉了图像质量和提示对齐中微妙的区别，与传统的粗粒度反馈（例如，赞/踩或在一组选项之间排名）进行了对比。尽管精细化反馈具有潜力，特别是针对多样化社会偏好的系统，但我们表明展示其优于粗粒度反馈并非自动发生。通过对真实和合成偏好数据进行实验，我们揭示了由于模型选择、反馈类型以及人类判断和计算机解释之间的协调而构建有效模型的复杂性。我们确定了在引发和利用精细化反馈方面的关键挑战，促使重新评估其假定的益处和实用性。我们的研究结果——例如，在某些情境下，对于固定预算，精细化反馈可能导致更糟糕的模型；然而，在已知属性的受控情境中，精细化奖励确实可能更有帮助——要求仔细考虑反馈属性，并可能呼吁采用新的建模方法来适当释放精细化反馈在实践中的潜在价值。

更新时间: 2024-06-24 17:19:34

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.16807v1

Comment on Chen et al.'s Authentication Protocol for Internet of Health Things

The Internet of Medical Things has revolutionized the healthcare industry, enabling the seamless integration of connected medical devices and wearable sensors to enhance patient care and optimize healthcare services. However, the rapid adoption of the Internet of Medical Things also introduces significant security challenges that must be effectively addressed to preserve patient privacy, protect sensitive medical data, and ensure the overall reliability and safety of Internet of Medical Things systems. In this context, a key agreement protocol is used to securely establish shared cryptographic keys between interconnected medical devices and the central system, ensuring confidential and authenticated communication. Recently Chen et al. proposed a lightweight authentication and key agreement protocol for the Internet of health things. In this article, we provide a descriptive analysis of their proposed scheme and prove that Chen et al.'s scheme is vulnerable to Known session-specific temporary information attacks and stolen verifier attacks.

Updated: 2024-06-24 17:16:29

标题: 对陈等人关于健康物联网身份验证协议的评论

摘要: 医疗物联网已经彻底改变了医疗保健行业，实现了连接的医疗设备和可穿戴传感器的无缝集成，以增强患者护理和优化医疗服务。然而，医疗物联网的快速采用也引入了重要的安全挑战，必须有效解决，以保护患者隐私，保护敏感医疗数据，并确保医疗物联网系统的整体可靠性和安全性。在这种情况下，一个关键的协议被用来在连接的医疗设备和中央系统之间安全建立共享的加密密钥，确保机密和经过认证的通信。最近陈等人提出了一种轻量级的健康物联网身份验证和密钥协议。在本文中，我们提供了对他们提出的方案的描述性分析，并证明陈等人的方案容易受到已知的会话特定临时信息攻击和盗取验证器攻击。

更新时间: 2024-06-24 17:16:29

领域: cs.CR

下载: http://arxiv.org/abs/2406.16804v1

Improved Regret Bounds for Bandits with Expert Advice

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $\sqrt{K T (\ln N) / (\ln K)}$. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Updated: 2024-06-24 17:14:31

标题: 带有专家建议的赌徒的改进后悔界限

摘要: 在这篇研究笔记中，我们重新审视了带专家建议问题。在受限反馈模型下，我们证明了最坏情况遗憾的下界为$\sqrt{K T \ln(N/K)}$，其中$K$是行动数量，$N>K$是专家数量，$T$是时间跨度。这与先前已知的同等级别上界相匹配，并改进了最佳可用下界$\sqrt{K T (\ln N) / (\ln K)}$。对于标准反馈模型，我们证明了一个基于实例的新上界，取决于专家之间的一致性，并与先前结果相比提供了对数改进。

更新时间: 2024-06-24 17:14:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.16802v1

Generative Fractional Diffusion Models

We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Although diffusion models have excelled at capturing data distributions, they still suffer from various limitations such as slow convergence, mode-collapse on imbalanced data, and lack of diversity. These issues are partially linked to the use of light-tailed Brownian motion (BM) with independent increments. In this paper, we replace BM with an approximation of its non-Markovian counterpart, fractional Brownian motion (fBM), characterized by correlated increments and Hurst index $H \in (0,1)$, where $H=1/2$ recovers the classical BM. To ensure tractable inference and learning, we employ a recently popularized Markov approximation of fBM (MA-fBM) and derive its reverse time model, resulting in generative fractional diffusion models (GFDMs). We characterize the forward dynamics using a continuous reparameterization trick and propose an augmented score matching loss to efficiently learn the score-function, which is partly known in closed form, at minimal added cost. The ability to drive our diffusion model via fBM provides flexibility and control. $H \leq 1/2$ enters the regime of rough paths whereas $H>1/2$ regularizes diffusion paths and invokes long-term memory as well as a heavy-tailed behaviour (super-diffusion). The Markov approximation allows added control by varying the number of Markov processes linearly combined to approximate fBM. Our evaluations on real image datasets demonstrate that GFDM achieves greater pixel-wise diversity and enhanced image quality, as indicated by a lower FID, offering a promising alternative to traditional diffusion models.

Updated: 2024-06-24 17:00:44

标题: 生成分数扩散模型

摘要: 我们介绍了第一个利用分数扩散过程作为基础动力学的连续时间基于得分的生成模型。尽管扩散模型在捕捉数据分布方面表现出色，但仍然存在各种限制，如收敛速度慢，在不平衡数据上出现模式崩溃，以及缺乏多样性。这些问题部分与使用具有独立增量的轻尾布朗运动（BM）有关。在本文中，我们用其非马尔可夫对应物近似替换BM，即分数布朗运动（fBM），其特点是相关增量和Hurst指数$H \in (0,1)$，其中$H=1/2$恢复了经典BM。为了确保可行的推断和学习，我们采用了最近流行的fBM的马尔可夫逼近（MA-fBM），并推导了其逆时间模型，从而得到了生成性分数扩散模型（GFDMs）。我们使用连续重参数化技巧来表征正向动力学，并提出了一个增强的得分匹配损失，以有效学习得分函数，该函数在封闭形式中部分已知，附加成本最小。通过fBM驱动我们的扩散模型可以提供灵活性和控制。当$H \leq 1/2$时，进入粗糙路径的领域，而$H>1/2$则正则化扩散路径，并调用长期记忆以及重尾行为（超扩散）。马尔可夫逼近允许通过改变线性组合以逼近fBM的马尔可夫过程的数量来增加控制。我们在真实图像数据集上的评估表明，GFDM实现了更大的像素多样性和增强的图像质量，如较低的FID所示，为传统扩散模型提供了一个有前途的替代方案。

更新时间: 2024-06-24 17:00:44

领域: cs.LG,stat.ML,I.2.4; F.4.1; G.3

下载: http://arxiv.org/abs/2310.17638v2

Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge

Fine-tuning pre-trained language models (PLMs), e.g., SciBERT, generally requires large numbers of annotated data to achieve state-of-the-art performance on a range of NLP tasks in the scientific domain. However, obtaining the fine-tune data for scientific NLP task is still challenging and expensive. Inspired by recent advancement in prompt learning, in this paper, we propose the Mix Prompt Tuning (MPT), which is a semi-supervised method to alleviate the dependence on annotated data and improve the performance of multi-granularity academic function recognition tasks with a small number of labeled examples. Specifically, the proposed method provides multi-perspective representations by combining manual prompt templates with automatically learned continuous prompt templates to help the given academic function recognition task take full advantage of knowledge in PLMs. Based on these prompt templates and the fine-tuned PLM, a large number of pseudo labels are assigned to the unlabeled examples. Finally, we fine-tune the PLM using the pseudo training set. We evaluate our method on three academic function recognition tasks of different granularity including the citation function, the abstract sentence function, and the keyword function, with datasets from computer science domain and biomedical domain. Extensive experiments demonstrate the effectiveness of our method and statistically significant improvements against strong baselines. In particular, it achieves an average increase of 5% in Macro-F1 score compared with fine-tuning, and 6% in Macro-F1 score compared with other semi-supervised method under low-resource settings. In addition, MPT is a general method that can be easily applied to other low-resource scientific classification tasks.

Updated: 2024-06-24 17:00:43

标题: 基于多提示知识的低资源多粒度学术功能识别

摘要: Feine调整预训练语言模型（PLM），例如SciBERT，通常需要大量带注释的数据，才能在科学领域的一系列NLP任务上实现最先进的性能。然而，获取科学NLP任务的精细调整数据仍然具有挑战性和昂贵。受最近在提示学习方面的进展启发，在本文中，我们提出了Mix Prompt Tuning（MPT），这是一种半监督方法，可减少对带标记数据的依赖，并利用少量标记示例提高多粒度学术功能识别任务的性能。具体来说，所提出的方法通过将手动提示模板与自动学习的连续提示模板相结合，提供多视角表示，以帮助给定的学术功能识别任务充分利用PLM中的知识。基于这些提示模板和微调的PLM，大量伪标签分配给未标记示例。最后，我们使用伪训练集对PLM进行微调。我们在包括计算机科学领域和生物医学领域的数据集上对我们的方法进行评估，涉及不同粒度的三项学术功能识别任务，包括引文功能、摘要句功能和关键词功能。大量实验证明了我们方法的有效性，并与强基线相比获得了统计显著的改进。特别是在低资源环境下，与微调相比，平均Macro-F1分数增加了5％，与其他半监督方法相比，Macro-F1分数增加了6％。此外，MPT是一种通用方法，可以轻松应用于其他低资源科学分类任务。

更新时间: 2024-06-24 17:00:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.03287v2

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

Existing methods for adapting large language models (LLMs) to new tasks are not suited to multi-task adaptation because they modify all the model weights -- causing destructive interference between tasks. The resulting effects, such as catastrophic forgetting of earlier tasks, make it challenging to obtain good performance on multiple tasks at the same time. To mitigate this, we propose Lottery Ticket Adaptation (LoTA), a sparse adaptation method that identifies and optimizes only a sparse subnetwork of the model. We evaluate LoTA on a wide range of challenging tasks such as instruction following, reasoning, math, and summarization. LoTA obtains better performance than full fine-tuning and low-rank adaptation (LoRA), and maintains good performance even after training on other tasks -- thus, avoiding catastrophic forgetting. By extracting and fine-tuning over \emph{lottery tickets} (or \emph{sparse task vectors}), LoTA also enables model merging over highly dissimilar tasks.

Updated: 2024-06-24 16:58:23

标题: 彩票券适应：减轻LLMs中的破坏性干扰

摘要: 现有的大型语言模型（LLMs）适应新任务的方法并不适用于多任务适应，因为它们修改所有模型权重，导致任务之间的破坏性干扰。由此产生的影响，如对早期任务的灾难性遗忘，使得同时在多个任务上获得良好性能具有挑战性。为了缓解这一问题，我们提出了一种稀疏适应方法——Lottery Ticket Adaptation（LoTA），该方法仅识别和优化模型的稀疏子网络。我们在一系列具有挑战性的任务上评估了LoTA，例如遵循指令、推理、数学和摘要。LoTA比完全微调和低秩适应（LoRA）获得更好的性能，并且在训练其他任务后仍保持良好性能，从而避免灾难性遗忘。通过提取和对“中奖票”（或“稀疏任务向量”）进行微调，LoTA还使得在高度不同的任务之间进行模型合并成为可能。

更新时间: 2024-06-24 16:58:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16797v1

Adam-mini: Use Fewer Learning Rates To Gain More

We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the number of learning rates in Adam: Instead of assigning an individual learning rate for each parameter using $1/\sqrt{v}$, Adam-mini uses the average of $v$ within a pre-defined parameter block as the learning rate for that block. Such a design is inspired by two empirical findings. First, the Hessian of Transformers exhibits a near-block diagonal structure with different sizes of dense sub-blocks. Second, for each of these dense sub-blocks, there exists a single high-quality learning rate that can outperform Adam, provided that sufficient resources are available to search it out. Adam-mini provides one cost-effective way to find these good learning rates and manage to cut down $\geq 90% v$ in Adam. Empirically, we verify that Adam-mini performs on par or better than AdamW on various language models sized from 125M to 7B for pre-training, supervised fine-tuning, and RLHF. The reduced memory footprint of Adam-mini also alleviates communication overheads among GPUs and CPUs, thereby increasing throughput. For instance, Adam-mini achieves 49.6% higher throughput than AdamW when pre-training Llama2-7B on 2x A800-80GB GPUs, which saves 33% wall-clock time for pre-training.

Updated: 2024-06-24 16:56:41

标题: Adam-mini：使用更少的学习速率获取更多

摘要: 我们提出了Adam-mini，这是一种优化器，其在内存占用上比AdamW表现相当或更好，而内存占用仅为AdamW的45%至50%。Adam-mini通过减少Adam中的学习率数量来降低内存使用：与使用$1/\sqrt{v}$为每个参数分配单独学习率不同，Adam-mini使用预定义参数块内$v$的平均值作为该块的学习率。这种设计受到两个经验发现的启发。首先，Transformer的Hessian矩阵呈现出近似分块对角结构，具有不同大小的稠密子块。其次，对于这些稠密子块中的每一个，存在一个高质量的学习率可以胜过Adam，只要有足够的资源来搜索它。Adam-mini提供了一种经济高效的方式来找到这些良好的学习率，并成功削减了Adam中的$\geq 90% v$。在实证上，我们验证了Adam-mini在不同规模的语言模型（从125M到7B）的预训练、监督微调和RLHF任务中表现与AdamW相当或更好。Adam-mini的减少内存占用还减轻了GPU和CPU之间的通信开销，从而提高了吞吐量。例如，当在2x A800-80GB GPU上对Llama2-7B进行预训练时，Adam-mini比AdamW实现了49.6%的更高吞吐量，为预训练节约了33%的墙钟时间。

更新时间: 2024-06-24 16:56:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16793v1

Deep Learning and Chaos: A combined Approach To Image Encryption and Decryption

In this paper, we introduce a novel image encryption and decryption algorithm using hyperchaotic signals from the novel 3D hyperchaotic map, 2D memristor map, Convolutional Neural Network (CNN), and key sensitivity analysis to achieve robust security and high efficiency. The encryption starts with the scrambling of gray images by using a 3D hyperchaotic map to yield complex sequences under disruption of pixel values; the robustness of this original encryption is further reinforced by employing a CNN to learn the intricate patterns and add the safety layer. The robustness of the encryption algorithm is shown by key sensitivity analysis, i.e., the average sensitivity of the algorithm to key elements. The other factors and systems of unauthorized decryption, even with slight variations in the keys, can alter the decryption procedure, resulting in the ineffective recreation of the decrypted image. Statistical analysis includes entropy analysis, correlation analysis, histogram analysis, and other security analyses like anomaly detection, all of which confirm the high security and effectiveness of the proposed encryption method. Testing of the algorithm under various noisy conditions is carried out to test robustness against Gaussian noise. Metrics for differential analysis, such as the NPCR (Number of Pixel Change Rate)and UACI (Unified Average Change Intensity), are also used to determine the strength of encryption. At the same time, the empirical validation was performed on several test images, which showed that the proposed encryption techniques have practical applicability and are robust to noise. Simulation results and comparative analyses illustrate that our encryption scheme possesses excellent visual security, decryption quality, and computational efficiency, and thus, it is efficient for secure image transmission and storage in big data applications.

Updated: 2024-06-24 16:56:22

标题: 深度学习与混沌：图像加密和解密的结合方法

摘要: 在这篇论文中，我们引入了一种新颖的图像加密和解密算法，使用来自新颖的3D超混沌映射、2D忆阻器映射、卷积神经网络（CNN）和密钥敏感性分析来实现强大的安全性和高效性。加密从使用3D超混沌映射对灰度图像进行混淆开始，以在像素值扰乱下产生复杂序列；通过采用CNN来学习复杂模式并添加安全层，进一步增强了这种原始加密的稳健性。通过密钥敏感性分析展示了加密算法的稳健性，即算法对密钥元素的平均敏感性。其他因素和未经授权的解密系统，即使密钥有轻微变化，也会改变解密过程，导致解密图像的无效重建。统计分析包括熵分析、相关性分析、直方图分析和其他安全性分析，如异常检测，所有这些都证实了建议的加密方法具有高安全性和有效性。在各种嘈杂条件下对算法进行测试，以测试对高斯噪声的抗干扰能力。差分分析的指标，如NPCR（像素变化率）和UACI（统一平均变化强度），也用于确定加密的强度。同时，在几幅测试图像上进行了经验验证，结果显示所提出的加密技术具有实际适用性并且对噪声具有稳健性。仿真结果和比较分析说明我们的加密方案具有出色的视觉安全性、解密质量和计算效率，因此适用于大数据应用中的安全图像传输和存储。

更新时间: 2024-06-24 16:56:22

领域: cs.CR,nlin.CD

下载: http://arxiv.org/abs/2406.16792v1

Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments

In this white paper, I present my community effort to automatically co-design cheaper, faster and more energy-efficient software and hardware for AI, ML and other popular workloads with the help of the Collective Mind framework (CM), virtualized MLOps, MLPerf benchmarks and reproducible optimization tournaments. I developed CM to modularize, automate and virtualize the tedious process of building, running, profiling and optimizing complex applications across rapidly evolving open-source and proprietary AI/ML models, datasets, software and hardware. I achieved that with the help of portable, reusable and technology-agnostic automation recipes (ResearchOps) for MLOps and DevOps (CM4MLOps) discovered in close collaboration with academia and industry when reproducing more than 150 research papers and organizing the 1st mass-scale community benchmarking of ML and AI systems using CM and MLPerf. I donated CM and CM4MLOps to MLCommons to help connect academia and industry to learn how to build and run AI and other emerging workloads in the most efficient and cost-effective way using a common and technology-agnostic automation, virtualization and reproducibility framework while unifying knowledge exchange, protecting everyone's intellectual property, enabling portable skills, and accelerating transfer of the state-of-the-art research to production. My long-term vision is to make AI accessible to everyone by making it a commodity automatically produced from the most suitable open-source and proprietary components from different vendors based on user demand, requirements and constraints such as cost, latency, throughput, accuracy, energy, size and other important characteristics.

Updated: 2024-06-24 16:55:03

标题: 用Collective Mind、虚拟化MLOps、MLPerf、Collective Knowledge Playground和可重复优化比赛实现更高效、更具成本效益的人工智能/机器学习系统

摘要: 在这篇白皮书中，我介绍了我开展的社区努力，利用集体思维框架（CM）、虚拟化MLOps、MLPerf基准和可重复优化比赛，自动协同设计更便宜、更快速、更节能的人工智能、机器学习和其他热门工作负载的软件和硬件。我开发了CM来模块化、自动化和虚拟化枯燥的构建、运行、剖析和优化复杂应用程序的过程，涵盖了不断发展的开源和专有人工智能/机器学习模型、数据集、软件和硬件。在与学术界和工业界紧密合作的过程中，我使用可移植、可重用和技术无关的自动化配方（ResearchOps）为MLOps和DevOps（CM4MLOps）制定了，复制了超过150篇研究论文，并组织了第一次规模大的社区基准测试，使用CM和MLPerf评估了机器学习和人工智能系统。我将CM和CM4MLOps捐赠给MLCommons，以帮助学术界和工业界学习如何以最有效和成本效益最高的方式构建和运行人工智能和其他新兴工作负载，使用共同的技术无关的自动化、虚拟化和可重现性框架，促进知识交流，保护每个人的知识产权，实现技能的可移植性，并加速最前沿研究成果向生产环境的转移。我的长期愿景是通过从不同供应商的开源和专有组件中自动选择最适合用户需求、要求和约束（如成本、延迟、吞吐量、准确性、能耗、大小等重要特征）生产出人工智能，使其对所有人都可访问。

更新时间: 2024-06-24 16:55:03

领域: cs.LG,cs.ET,cs.PF

下载: http://arxiv.org/abs/2406.16791v1

The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers

The transformer neural network architecture allows for autoregressive sequence-to-sequence modeling through the use of attention layers. It was originally created with the application of machine translation but has revolutionized natural language processing. Recently, transformers have also been applied across a wide variety of pattern recognition tasks, particularly in computer vision. In this literature review, we describe major advances in computer vision utilizing transformers. We then focus specifically on Multi-Object Tracking (MOT) and discuss how transformers are increasingly becoming competitive in state-of-the-art MOT works, yet still lag behind traditional deep learning methods.

Updated: 2024-06-24 16:45:28

标题: 从语言到视觉再到MOT的变压器发展历程：基于变压器的多目标跟踪文献综述

摘要: The transformer神经网络架构允许通过注意力层进行自回归序列到序列建模。最初是为了应用于机器翻译而创建的，但已经彻底改变了自然语言处理。最近，transformers也被应用于各种模式识别任务，特别是在计算机视觉领域。在这篇文献综述中，我们描述了计算机视觉中利用transformers取得的重大进展。然后我们专注于多目标跟踪（MOT），讨论了transformers如何在最新的MOT工作中越来越具有竞争力，但仍然落后于传统的深度学习方法。

更新时间: 2024-06-24 16:45:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16784v1

M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. Numerous effective IFT datasets have been proposed in the recent past, but most focus on high resource languages such as English. In this work, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instruction finetuning dataset, called M2Lingual, to better align LLMs on a diverse set of languages and tasks. M2Lingual contains a total of 182K IFT pairs that are built upon diverse seeds, covering 70 languages, 17 NLP tasks and general instruction-response pairs. LLMs finetuned with M2Lingual substantially outperform the majority of existing multilingual IFT datasets. Importantly, LLMs trained with M2Lingual consistently achieve competitive results across a wide variety of evaluation benchmarks compared to existing multilingual IFT datasets. Specifically, LLMs finetuned with M2Lingual achieve strong performance on our translated multilingual, multi-turn evaluation benchmark as well as a wide variety of multilingual tasks. Thus we contribute, and the 2 step Evol taxonomy used for its creation. M2Lingual repository - https://huggingface.co/datasets/ServiceNow-AI/M2Lingual

Updated: 2024-06-24 16:45:13

标题: M2Lingual: 在大型语言模型中增强多语言、多轮指导对齐

摘要: 指导微调（IFT）对于使大型语言模型（LLMs）遵循指令至关重要。最近提出了许多有效的IFT数据集，但大多数集中在英语等高资源语言上。在这项工作中，我们提出了一个全新的完全合成的分类（Evol）引导的多语言、多轮指令微调数据集，称为M2Lingual，以更好地使LLMs对各种语言和任务进行对齐。M2Lingual包含总共182K个IFT对，建立在不同的种子上，涵盖70种语言，17个自然语言处理任务和一般指令-响应对。使用M2Lingual微调的LLMs明显优于大多数现有的多语言IFT数据集。重要的是，使用M2Lingual训练的LLMs在各种评估基准上始终取得竞争性结果，相比现有的多语言IFT数据集。具体而言，使用M2Lingual微调的LLMs在我们翻译的多语言、多轮评估基准以及各种多语言任务上取得了出色的表现。因此，我们贡献了用于创建M2Lingual的2步Evol分类法。M2Lingual存储库 - https://huggingface.co/datasets/ServiceNow-AI/M2Lingual

更新时间: 2024-06-24 16:45:13

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16783v1

Confidence Aware Inverse Constrained Reinforcement Learning

In coming up with solutions to real-world problems, humans implicitly adhere to constraints that are too numerous and complex to be specified completely. However, reinforcement learning (RL) agents need these constraints to learn the correct optimal policy in these settings. The field of Inverse Constraint Reinforcement Learning (ICRL) deals with this problem and provides algorithms that aim to estimate the constraints from expert demonstrations collected offline. Practitioners prefer to know a measure of confidence in the estimated constraints, before deciding to use these constraints, which allows them to only use the constraints that satisfy a desired level of confidence. However, prior works do not allow users to provide the desired level of confidence for the inferred constraints. This work provides a principled ICRL method that can take a confidence level with a set of expert demonstrations and outputs a constraint that is at least as constraining as the true underlying constraint with the desired level of confidence. Further, unlike previous methods, this method allows a user to know if the number of expert trajectories is insufficient to learn a constraint with a desired level of confidence, and therefore collect more expert trajectories as required to simultaneously learn constraints with the desired level of confidence and a policy that achieves the desired level of performance.

Updated: 2024-06-24 16:44:45

标题: 信心感知反向约束强化学习

摘要: 在提出解决实际问题的解决方案时，人类隐式地遵守了太多和太复杂以至于无法完全指定的约束。然而，强化学习（RL）代理需要这些约束来学习这些情境中的正确最优策略。反约束强化学习（ICRL）领域处理这一问题，并提供旨在从线下收集的专家示范中估计约束的算法。从业者更喜欢知道对估计的约束的信心度量，然后再决定使用这些约束，这使他们只使用符合所需信心水平的约束。然而，以前的研究不允许用户为推断的约束提供所需的信心水平。这项研究提供了一种有原则的ICRL方法，可以采用一组专家示范的置信水平，并输出一个至少与真实基础约束一样限制的约束，具有所需的信心水平。此外，与以前的方法不同，这种方法允许用户知道专家轨迹的数量是否不足以学习具有所需信心水平的约束，因此根据需要收集更多的专家轨迹，以同时学习具有所需信心水平和实现所需性能水平的策略。

更新时间: 2024-06-24 16:44:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.16782v1

Why Transformers Need Adam: A Hessian Perspective

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear. In this work, we provide an explanation through the lens of Hessian: (i) Transformers are "heterogeneous": the Hessian spectrum across parameter blocks vary dramatically, a phenomenon we call "block heterogeneity"; (ii) Heterogeneity hampers SGD: SGD performs worse than Adam on problems with block heterogeneity. To validate (i) and (ii), we check various Transformers, CNNs, MLPs, and quadratic problems, and find that SGD can perform on par with Adam on problems without block heterogeneity, but performs worse than Adam when the heterogeneity exists. Our initial theoretical analysis indicates that SGD performs worse because it applies one single learning rate to all blocks, which cannot handle the heterogeneity among blocks. This limitation could be ameliorated if we use coordinate-wise learning rates, as designed in Adam.

Updated: 2024-06-24 16:41:30

标题: 为什么变压器需要Adam：一个Hessian透视

摘要: SGD在变压器上的表现明显不如Adam，但原因仍不清楚。在这项工作中，我们通过Hessian的视角提供了一个解释：（i）变压器是“异质的”：参数块之间的Hessian谱差异巨大，这种现象我们称之为“块异质性”；（ii）异质性阻碍了SGD的表现：SGD在存在块异质性问题上表现不如Adam。为了验证（i）和（ii），我们检查了各种变压器、CNNs、MLPs和二次问题，发现SGD在没有块异质性问题的情况下可以与Adam表现相当，但在存在异质性时表现不如Adam。我们的初步理论分析表明，SGD表现不佳是因为它对所有块应用一个单一的学习率，无法处理块之间的异质性。如果我们使用Adam中设计的坐标-wise学习率，这种局限性可以得到改善。

更新时间: 2024-06-24 16:41:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.16788v3

Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024

Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST). In this paper, we present KIT's offline submission in the constrained + LLM track by incorporating recently proposed techniques that can be added to any cascaded speech translation. Specifically, we integrate Mistral-7B\footnote{mistralai/Mistral-7B-Instruct-v0.1} into our system to enhance it in two ways. Firstly, we refine the ASR outputs by utilizing the N-best lists generated by our system and fine-tuning the LLM to predict the transcript accurately. Secondly, we refine the MT outputs at the document level by fine-tuning the LLM, leveraging both ASR and MT predictions to improve translation quality. We find that integrating the LLM into the ASR and MT systems results in an absolute improvement of $0.3\%$ in Word Error Rate and $0.65\%$ in COMET for tst2019 test set. In challenging test sets with overlapping speakers and background noise, we find that integrating LLM is not beneficial due to poor ASR performance. Here, we use ASR with chunked long-form decoding to improve context usage that may be unavailable when transcribing with Voice Activity Detection segmentation alone.

Updated: 2024-06-24 16:38:17

标题: 将LLMs融入级联语音翻译：KIT的离线语音翻译系统用于IWSLT 2024

摘要: 目前，大型语言模型（LLMs）正在被用于各种任务的研究，包括自动语音识别（ASR）、机器翻译（MT）甚至端到端语音翻译（ST）。在本文中，我们介绍了KIT在受限+LLM赛道中的离线提交，通过整合最近提出的技术，可以应用于任何级联语音翻译系统。具体来说，我们将Mistral-7B\footnote{mistralai/Mistral-7B-Instruct-v0.1}整合到我们的系统中，以两种方式增强系统。首先，我们通过利用系统生成的N-best列表并对LLM进行微调来精确预测转录内容，从而改进ASR的输出。其次，通过对LLM进行微调，利用ASR和MT的预测结果来提高文档级别的MT输出质量。我们发现将LLM整合到ASR和MT系统中，使得在tst2019测试集中，词错误率和COMET分别提高了$0.3\%$和$0.65\%$。在具有重叠说话者和背景噪音的挑战性测试集中，我们发现整合LLM并不有利，因为ASR性能较差。在这里，我们使用分块长篇解码的ASR方法来提高上下文使用，这在仅使用语音活动检测分割时可能不可用。

更新时间: 2024-06-24 16:38:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16777v1

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena).

Updated: 2024-06-24 16:31:12

标题: 奥林匹克竞技场奖牌排名：到目前为止最聪明的人工智能是谁？

摘要: 在这份报告中，我们提出了以下问题：到目前为止，根据奥林匹克竞技场（一个奥林匹克级别的、多学科、多模态的超智能AI基准），谁是最聪明的AI模型？我们特别关注最近发布的模型：Claude-3.5-Sonnet、Gemini-1.5-Pro和GPT-4o。我们首次提出使用奥运奖牌榜的方法来根据AI模型在各种学科上的综合表现对其进行排名。实证结果显示：（1）Claude-3.5-Sonnet在整体表现上与GPT-4o竞争激烈，甚至在某些学科（即物理学、化学和生物学）上超过了GPT-4o。（2）Gemini-1.5-Pro和GPT-4V在排名上紧随GPT-4o和Claude-3.5-Sonnet之后，但它们之间存在明显的表现差距。（3）开源社区的AI模型的表现明显落后于这些专有模型。（4）这些模型在这一基准测试上的表现令人不满，表明我们在实现超级智能之前仍有很长的路要走。我们致力于持续跟踪和评估最新强大模型在这一基准上的表现（可在https://github.com/GAIR-NLP/OlympicArena找到）。

更新时间: 2024-06-24 16:31:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16772v1

WARP: On the Benefits of Weight Averaged Rewarded Policies

Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs) by encouraging their generations to have high rewards, using a reward model trained on human preferences. To prevent the forgetting of pre-trained knowledge, RLHF usually incorporates a KL regularization; this forces the policy to remain close to its supervised fine-tuned initialization, though it hinders the reward optimization. To tackle the trade-off between KL and reward, in this paper we introduce a novel alignment strategy named Weight Averaged Rewarded Policies (WARP). WARP merges policies in the weight space at three distinct stages. First, it uses the exponential moving average of the policy as a dynamic anchor in the KL regularization. Second, it applies spherical interpolation to merge independently fine-tuned policies into a new enhanced one. Third, it linearly interpolates between this merged model and the initialization, to recover features from pre-training. This procedure is then applied iteratively, with each iteration's final model used as an advanced initialization for the next, progressively refining the KL-reward Pareto front, achieving superior rewards at fixed KL. Experiments with GEMMA policies validate that WARP improves their quality and alignment, outperforming other open-source LLMs.

Updated: 2024-06-24 16:24:34

标题: WARP：关于加权平均奖励策略的好处

摘要: 人类反馈的强化学习（RLHF）通过鼓励大型语言模型（LLMs）生成高回报的内容，使用训练有人类偏好的奖励模型来对齐它们。为了防止预训练知识的遗忘，RLHF通常会包含KL正则化；这迫使策略保持接近其监督微调初始化，尽管这会阻碍奖励优化。为了解决KL和奖励之间的权衡，本文介绍了一种名为加权平均奖励策略（WARP）的新的对齐策略。WARP在三个不同阶段在权重空间中合并策略。首先，它使用策略的指数移动平均作为KL正则化中的动态锚点。其次，它应用球面插值将独立微调的策略合并成一个新的增强策略。第三，它在合并模型和初始化之间进行线性插值，以恢复来自预训练的特征。然后，这个过程被迭代地应用，每次迭代的最终模型被用作下一个迭代的高级初始化，逐渐完善KL-奖励帕累托前沿，实现在固定KL下更高的奖励。对GEMMA策略的实验验证了WARP提高了它们的质量和对齐性，优于其他开源LLMs。

更新时间: 2024-06-24 16:24:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16768v1

Deep Reinforcement Learning: A Convex Optimization Approach

In this paper, we consider reinforcement learning of nonlinear systems with continuous state and action spaces. We present an episodic learning algorithm, where we for each episode use convex optimization to find a two-layer neural network approximation of the optimal $Q$-function. The convex optimization approach guarantees that the weights calculated at each episode are optimal, with respect to the given sampled states and actions of the current episode. For stable nonlinear systems, we show that the algorithm converges and that the converging parameters of the trained neural network can be made arbitrarily close to the optimal neural network parameters. In particular, if the regularization parameter in the training phase is given by $\rho$, then the parameters of the trained neural network converge to $w$, where the distance between $w$ and the optimal parameters $w^\star$ is bounded by $\mathcal{O}(\rho)$. That is, when the number of episodes goes to infinity, there exists a constant $C$ such that \[ \|w-w^\star\| \le C\rho. \] In particular, our algorithm converges arbitrarily close to the optimal neural network parameters as the regularization parameter goes to zero. As a consequence, our algorithm converges fast due to the polynomial-time convergence of convex optimization algorithms.

Updated: 2024-06-24 16:23:42

标题: 深度强化学习：凸优化方法

摘要: 在本文中，我们考虑具有连续状态和动作空间的非线性系统的强化学习。我们提出了一种分集学习算法，其中我们在每个集中使用凸优化来找到最优$Q$-函数的两层神经网络近似。凸优化方法保证每个集中计算的权重是最优的，关于当前集的采样状态和动作。对于稳定的非线性系统，我们表明该算法收敛，并且训练好的神经网络的收敛参数可以被无限接近于最优神经网络参数。特别是，如果训练阶段中的正则化参数为$\rho$，那么训练好的神经网络的参数收敛到$w$，其中$w$和最优参数$w^\star$之间的距离受到$\mathcal{O}(\rho)$的限制。也就是说，当集的数量趋于无穷大时，存在一个常数$C$，使得\[ \|w-w^\star\| \le C\rho. \] 特别地，我们的算法在正则化参数趋近于零时会收敛到最优神经网络参数。由于凸优化算法的多项式时间收敛，我们的算法因此收敛迅速。

更新时间: 2024-06-24 16:23:42

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2402.19212v6

Conformal time series decomposition with component-wise exchangeability

Conformal prediction offers a practical framework for distribution-free uncertainty quantification, providing finite-sample coverage guarantees under relatively mild assumptions on data exchangeability. However, these assumptions cease to hold for time series due to their temporally correlated nature. In this work, we present a novel use of conformal prediction for time series forecasting that incorporates time series decomposition. This approach allows us to model different temporal components individually. By applying specific conformal algorithms to each component and then merging the obtained prediction intervals, we customize our methods to account for the different exchangeability regimes underlying each component. Our decomposition-based approach is thoroughly discussed and empirically evaluated on synthetic and real-world data. We find that the method provides promising results on well-structured time series, but can be limited by factors such as the decomposition step for more complex data.

Updated: 2024-06-24 16:23:30

标题: 具有组件可交换性的共形时间序列分解

摘要: Conformal prediction提供了一个实用的框架，用于无分布不确定性量化，在对数据交换性的相对温和假设下提供有限样本覆盖保证。然而，由于时间序列具有时间相关性，这些假设在时间序列中不再成立。在这项工作中，我们提出了一种新颖的使用conformal prediction进行时间序列预测的方法，该方法结合了时间序列分解。这种方法允许我们单独建模不同的时间组件。通过将特定的conformal算法应用于每个组件，然后合并获得的预测区间，我们定制了我们的方法以考虑支撑每个组件的不同交换性制度。我们基于分解的方法进行了全面讨论，并在合成和真实数据上进行了实证评估。我们发现该方法在结构良好的时间序列上提供了有希望的结果，但对于更复杂的数据，可能会受到分解步骤等因素的限制。

更新时间: 2024-06-24 16:23:30

领域: stat.ML,cs.LG,stat.AP

下载: http://arxiv.org/abs/2406.16766v1

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VideoScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between VideoScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that VideoScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe VideoScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.

Updated: 2024-06-24 16:22:55

标题: VideoScore：构建用于模拟视频生成的细致人类反馈的自动度量标准

摘要: 近年来，视频生成领域取得了巨大进展。然而，自动视频评估指标的发展明显滞后。目前没有任何现有的指标能够在生成的视频上提供可靠的评分。主要障碍是缺乏大规模的人工标注数据集。本文发布了VideoFeedback，这是第一个包含人工提供的多方面评分的大规模数据集，其中包含来自11种现有视频生成模型的37.6K合成视频。我们基于VideoFeedback训练了VideoScore（从Mantis初始化），以实现自动视频质量评估。实验表明，VideoScore与人类之间的Spearman相关性在VideoFeedback-test上可以达到77.1，比之前最佳指标高出约50个点。在其他保留的EvalCrafter、GenAI-Bench和VBench上的进一步结果显示，VideoScore与人类评审的相关性始终比其他指标高得多。基于这些结果，我们相信VideoScore可以作为人类评分者的一个良好代理，用来（1）评价不同的视频模型以跟踪进展，（2）在人类反馈强化学习（RLHF）中模拟细粒度的人类反馈，以改进当前的视频生成模型。

更新时间: 2024-06-24 16:22:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.15252v2

Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models

In this paper we develop state-of-the-art privacy attacks against Large Language Models (LLMs), where an adversary with some access to the model tries to learn something about the underlying training data. Our headline results are new membership inference attacks (MIAs) against pretrained LLMs that perform hundreds of times better than baseline attacks, and a pipeline showing that over 50% (!) of the fine-tuning dataset can be extracted from a fine-tuned LLM in natural settings. We consider varying degrees of access to the underlying model, pretraining and fine-tuning data, and both MIAs and training data extraction. For pretraining data, we propose two new MIAs: a supervised neural network classifier that predicts training data membership on the basis of (dimensionality-reduced) model gradients, as well as a variant of this attack that only requires logit access to the model by leveraging recent model-stealing work on LLMs. To our knowledge this is the first MIA that explicitly incorporates model-stealing information. Both attacks outperform existing black-box baselines, and our supervised attack closes the gap between MIA attack success against LLMs and the strongest known attacks for other machine learning models. In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance; we then leverage our MIA to extract a large fraction of the fine-tuning dataset from fine-tuned Pythia and Llama models. Our code is available at github.com/safr-ai-lab/pandora-llm.

Updated: 2024-06-24 16:18:45

标题: 潘多拉的白匣子：大型语言模型中精确的训练数据检测和提取

摘要: 在这篇论文中，我们针对大型语言模型（LLMs）开发了最先进的隐私攻击，其中一个具有对模型的某些访问权限的对手试图了解基础训练数据的信息。我们的主要结果是针对预训练LLMs的新成员推断攻击（MIAs），其性能比基准攻击提高了数百倍，并且通过一个流程展示在自然环境中可以从微调LLM中提取超过50%的微调数据集。我们考虑了对基础模型、预训练和微调数据的不同访问权限程度，以及MIAs和训练数据提取。对于预训练数据，我们提出了两种新的MIAs：一个基于（降维后的）模型梯度进行训练数据成员预测的监督神经网络分类器，以及这种攻击的一个变体，它只需要访问模型的logit，通过利用最近关于LLMs的模型窃取工作。据我们所知，这是第一个明确纳入模型窃取信息的MIA。这两种攻击表现优于现有的黑盒基线，并且我们的监督攻击缩小了针对LLMs的MIA攻击成功率与其他机器学习模型已知最强攻击之间的差距。在微调中，我们发现一个基于基础模型和微调模型之间损失比例的简单攻击能够实现接近完美的MIA性能；然后我们利用我们的MIA从微调的Pythia和Llama模型中提取了大量微调数据集。我们的代码可以在github.com/safr-ai-lab/pandora-llm找到。

更新时间: 2024-06-24 16:18:45

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.17012v3

An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification

Predictive models may generate biased predictions when classifying imbalanced datasets. This happens when the model favors the majority class, leading to low performance in accurately predicting the minority class. To address this issue, balancing or resampling methods are critical pre-processing steps in the modeling process. However, there have been debates and questioning of the functionality of these methods in recent years. In particular, many candidate models may exhibit very similar predictive performance, which is called the Rashomon effect, in model selection. Selecting one of them without considering predictive multiplicity which is the case of yielding conflicting models' predictions for any sample may lead to a loss of using another model. In this study, in addition to the existing debates, the impact of balancing methods on predictive multiplicity is examined through the Rashomon effect. It is important because the blind model selection is risky from a set of approximately equally accurate models. This may lead to serious problems in model selection, validation, and explanation. To tackle this matter, we conducted real dataset experiments to observe the impact of balancing methods on predictive multiplicity through the Rashomon effect. Our findings showed that balancing methods inflate the predictive multiplicity, and they yield varying results. To monitor the trade-off between performance and predictive multiplicity for conducting the modeling process responsibly, we proposed using the extended performance-gain plot for the Rashomon effect.

Updated: 2024-06-24 16:08:51

标题: 一项关于不平衡分类中平衡方法的拉肢门效应的实验研究

摘要: 预测模型在对不平衡数据集进行分类时可能会产生偏见预测。当模型偏向于多数类时，会导致在准确预测少数类方面性能较低。为解决这一问题，平衡或重采样方法是建模过程中关键的预处理步骤。然而，近年来对这些方法的功能性存在争议和质疑。特别是，在模型选择中，许多候选模型可能表现出非常相似的预测性能，这被称为拉肖蒙效应。在没有考虑到预测多样性的情况下选择其中一个，可能导致错失使用另一个模型的机会。本研究除了现有的争论外，还通过拉肖蒙效应研究了平衡方法对预测多样性的影响。这很重要，因为盲目地选择模型从一组几乎同样准确的模型中是有风险的。这可能会导致模型选择、验证和解释方面的严重问题。为了解决这个问题，我们进行了真实数据集实验，观察了平衡方法对预测多样性的影响。我们的研究结果显示，平衡方法会增加预测多样性，并产生不同的结果。为了负责地进行建模过程，我们提出使用扩展的性能增益图来监控性能和预测多样性之间的权衡，以应对拉肖蒙效应。

更新时间: 2024-06-24 16:08:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.01557v2

Positive concave deep equilibrium models

Deep equilibrium (DEQ) models are widely recognized as a memory efficient alternative to standard neural networks, achieving state-of-the-art performance in language modeling and computer vision tasks. These models solve a fixed point equation instead of explicitly computing the output, which sets them apart from standard neural networks. However, existing DEQ models often lack formal guarantees of the existence and uniqueness of the fixed point, and the convergence of the numerical scheme used for computing the fixed point is not formally established. As a result, DEQ models are potentially unstable in practice. To address these drawbacks, we introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models. Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant. By imposing these constraints, we can easily ensure the existence and uniqueness of the fixed point without relying on additional complex assumptions commonly found in the DEQ literature, such as those based on monotone operator theory in convex analysis. Furthermore, the fixed point can be computed with the standard fixed point algorithm, and we provide theoretical guarantees of its geometric convergence, which, in particular, simplifies the training process. Experiments demonstrate the competitiveness of our pcDEQ models against other implicit models.

Updated: 2024-06-24 16:08:46

标题: 积极凹陷深度均衡模型

摘要: 深度均衡（DEQ）模型被广泛认可为一种内存高效的替代标准神经网络的选择，在语言建模和计算机视觉任务中取得了最先进的性能。这些模型解决一个固定点方程，而不是显式计算输出，这使它们与标准神经网络有所不同。然而，现有的DEQ模型往往缺乏固定点存在和唯一性的正式保证，以及用于计算固定点的数值方案的收敛性没有经过正式确定。因此，在实践中，DEQ模型可能不稳定。为了解决这些缺点，我们引入了一种称为正凹深度均衡（pcDEQ）模型的新型DEQ模型类。我们的方法基于非线性Perron-Frobenius理论，强制执行非负权重和在正卦象上是凹的激活函数。通过施加这些约束，我们可以轻松确保固定点的存在和唯一性，而无需依赖于DEQ文献中常见的基于凸分析中单调算子理论的进一步复杂假设。此外，固定点可以用标准的固定点算法计算，我们提供其几何收敛的理论保证，特别是简化了训练过程。实验证明我们的pcDEQ模型与其他隐式模型具有竞争力。

更新时间: 2024-06-24 16:08:46

领域: cs.LG

下载: http://arxiv.org/abs/2402.04029v2

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutations. To solve the problem, we introduce Branch and Play (B&P), an efficient and exact algorithm that provably converges to a socially optimal order of play and its Stackelberg equilibrium. As a subroutine for B&P, we employ and extend sequential trajectory planning, i.e., a popular multi-agent control approach, to scalably compute valid local Stackelberg equilibria for any given order of play. We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets. We find that B&P consistently outperforms various baselines, and computes the socially optimal equilibrium.

Updated: 2024-06-24 16:06:50

标题: 谁先行动？优化具有多个机器人的斯塔克贝格博弈中的行动顺序

摘要: 我们考虑计算多智能体空间导航问题的社会最优决策顺序，即智能体做出决策的顺序，并与N个玩家Stackelberg轨迹博弈中的平衡相关联。我们将这个问题建模为一个混合整数优化问题，涉及与决策顺序排列相关的所有可能的Stackelberg游戏空间。为了解决这个问题，我们引入了Branch and Play（B&P），这是一个高效且准确的算法，可以收敛到一个社会最优的决策顺序和其Stackelberg均衡。作为B&P的子程序，我们利用并扩展了顺序轨迹规划，即一种流行的多智能体控制方法，可以可扩展地计算任何给定决策顺序的有效的局部Stackelberg均衡。我们展示了B&P在协调空中交通管制、群体形成和交付车队方面的实际效用。我们发现B&P始终优于各种基线，并计算出社会最优均衡。

更新时间: 2024-06-24 16:06:50

领域: cs.RO,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2402.09246v3

Addressing Polarization and Unfairness in Performative Prediction

When machine learning (ML) models are used in applications that involve humans (e.g., online recommendation, school admission, hiring, lending), the model itself may trigger changes in the distribution of targeted data it aims to predict. Performative prediction (PP) is a framework that explicitly considers such model-dependent distribution shifts when learning ML models. While significant efforts have been devoted to finding performative stable (PS) solutions in PP for system robustness, their societal implications are less explored and it is unclear whether PS solutions are aligned with social norms such as fairness. In this paper, we set out to examine the fairness property of PS solutions in performative prediction. We first show that PS solutions can incur severe polarization effects and group-wise loss disparity. Although existing fairness mechanisms commonly used in literature can help mitigate unfairness, they may fail and disrupt the stability under model-dependent distribution shifts. We thus propose novel fairness intervention mechanisms that can simultaneously achieve both stability and fairness in PP settings. Both theoretical analysis and experiments are provided to validate the proposed method.

Updated: 2024-06-24 16:03:57

标题: 解决表现性预测中的极化和不公正问题

摘要: 当机器学习（ML）模型用于涉及人类的应用（例如在线推荐、学校录取、招聘、借贷）时，模型本身可能会触发目标数据分布的变化，从而影响其预测。执行性预测（PP）是一个框架，明确考虑了学习ML模型时这种依赖模型的分布变化。虽然已经付出了大量努力在PP中寻找具有执行稳定性（PS）的解决方案以确保系统的鲁棒性，但其社会影响尚未得到深入探讨，不清楚PS解决方案是否符合公平等社会规范。本文旨在研究执行性预测中PS解决方案的公平性属性。我们首先展示了PS解决方案可能造成严重的极化效应和组内损失不平衡。尽管文献中常用的现有公平机制可以帮助减少不公平性，但它们可能会在模型依赖的分布变化下失败并破坏稳定性。因此，我们提出了一种可以在PP设置中同时实现稳定性和公平性的新颖公平干预机制。提出的方法通过理论分析和实验证实。

更新时间: 2024-06-24 16:03:57

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.16756v1

The MRI Scanner as a Diagnostic: Image-less Active Sampling

Despite the high diagnostic accuracy of Magnetic Resonance Imaging (MRI), using MRI as a Point-of-Care (POC) disease identification tool poses significant accessibility challenges due to the use of high magnetic field strength and lengthy acquisition times. We ask a simple question: Can we dynamically optimise acquired samples, at the patient level, according to an (automated) downstream decision task, while discounting image reconstruction? We propose an ML-based framework that learns an active sampling strategy, via reinforcement learning, at a patient-level to directly infer disease from undersampled k-space. We validate our approach by inferring Meniscus Tear in undersampled knee MRI data, where we achieve diagnostic performance comparable with ML-based diagnosis, using fully sampled k-space data. We analyse task-specific sampling policies, showcasing the adaptability of our active sampling approach. The introduced frugal sampling strategies have the potential to reduce high field strength requirements that in turn strengthen the viability of MRI-based POC disease identification and associated preliminary screening tools.

Updated: 2024-06-24 16:00:20

标题: MRI扫描仪作为一种诊断工具：无图像主动采样

摘要: 尽管磁共振成像（MRI）具有较高的诊断准确性，但将MRI作为一种现场疾病识别工具面临着重要的可及性挑战，这是由于使用高磁场强度和较长的采集时间所导致的。我们提出一个简单的问题：我们是否可以根据（自动）下游决策任务在患者水平上动态优化已获取的样本，而不考虑图像重建？我们提出了一个基于机器学习的框架，通过强化学习学习一种主动采样策略，以在患者水平上直接推断出未经采样的k空间中的疾病。我们通过在未经采样的膝关节MRI数据中推断半月板撕裂来验证我们的方法，在这里我们实现了与基于机器学习的诊断相媲美的诊断性能，使用了完整采样的k空间数据。我们分析了特定任务的采样策略，展示了我们主动采样方法的适应性。引入的节俭采样策略有潜力减少高磁场强度要求，从而增强基于MRI的现场疾病识别和相关初步筛查工具的可行性。

更新时间: 2024-06-24 16:00:20

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2406.16754v1

Inferring stochastic low-rank recurrent neural networks from neural data

A central aim in computational neuroscience is to relate the activity of large populations of neurons to an underlying dynamical system. Models of these neural dynamics should ideally be both interpretable and fit the observed data well. Low-rank recurrent neural networks (RNNs) exhibit such interpretability by having tractable dynamics. However, it is unclear how to best fit low-rank RNNs to data consisting of noisy observations of an underlying stochastic system. Here, we propose to fit stochastic low-rank RNNs with variational sequential Monte Carlo methods. We validate our method on several datasets consisting of both continuous and spiking neural data, where we obtain lower dimensional latent dynamics than current state of the art methods. Additionally, for low-rank models with piecewise linear nonlinearities, we show how to efficiently identify all fixed points in polynomial rather than exponential cost in the number of units, making analysis of the inferred dynamics tractable for large RNNs. Our method both elucidates the dynamical systems underlying experimental recordings and provides a generative model whose trajectories match observed trial-to-trial variability.

Updated: 2024-06-24 15:57:49

标题: 从神经数据推断随机低秩循环神经网络

摘要: 计算神经科学中的一个核心目标是将大量神经元的活动与基础动力系统联系起来。这些神经动力模型理想情况下既要具有可解释性，又要与观测数据拟合良好。低秩循环神经网络（RNNs）通过具有可处理的动态性而展现出这种可解释性。然而，如何最佳拟合由噪声观测组成的低秩RNNs到基础随机系统的数据仍不清楚。在这里，我们提议用变分顺序蒙特卡洛方法拟合随机低秩RNNs。我们在几个数据集上验证了我们的方法，其中包括连续和尖峰神经数据，我们获得了比当前最先进方法更低维度的潜在动力学。此外，对于具有分段线性非线性的低秩模型，我们展示了如何高效地识别所有固定点，成本是单位数量的多项式而不是指数级的，使得对大型RNNs的推断动态的分析变得可处理。我们的方法阐明了实验记录背后的动力系统，并提供了一个生成模型，其轨迹与观察到的试验间变异性相匹配。

更新时间: 2024-06-24 15:57:49

领域: cs.LG,q-bio.NC,stat.ML

下载: http://arxiv.org/abs/2406.16749v1

OCALM: Object-Centric Assessment with Language Models

Properly defining a reward signal to efficiently train a reinforcement learning (RL) agent is a challenging task. Designing balanced objective functions from which a desired behavior can emerge requires expert knowledge, especially for complex environments. Learning rewards from human feedback or using large language models (LLMs) to directly provide rewards are promising alternatives, allowing non-experts to specify goals for the agent. However, black-box reward models make it difficult to debug the reward. In this work, we propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for RL agents from natural language task descriptions. OCALM uses the extensive world-knowledge of LLMs while leveraging the object-centric nature common to many environments to derive reward functions focused on relational concepts, providing RL agents with the ability to derive policies from task descriptions.

Updated: 2024-06-24 15:57:48

标题: OCALM：基于语言模型的对象中心评估

摘要: 恰当定义奖励信号以有效训练强化学习（RL）代理是一项具有挑战性的任务。设计平衡的客观函数，使所需行为得以出现，需要专业知识，特别是对于复杂环境。从人类反馈中学习奖励或使用大型语言模型（LLMs）直接提供奖励是有前途的选择，允许非专家为代理设定目标。然而，黑盒奖励模型使奖励的调试变得困难。在这项工作中，我们提出了基于语言模型的物体中心评估（OCALM），从自然语言任务描述中推导出对RL代理的可解释奖励函数。OCALM利用LLMs的广泛世界知识，同时利用许多环境共有的物体中心特性，推导出以关系概念为重点的奖励函数，为RL代理提供从任务描述中推导策略的能力。

更新时间: 2024-06-24 15:57:48

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.16748v1

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and substantial KV memory requirements inherent in self-attention mechanisms. In this work, we introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome these computational and memory obstacles while maintaining performance. Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query, thereby enabling gradient-based optimization. As a result, SPARSEK Attention offers linear time complexity and constant memory footprint during generation. Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods and provides significant speed improvements during both training and inference, particularly in language modeling and downstream tasks. Furthermore, our method can be seamlessly integrated into pre-trained Large Language Models (LLMs) with minimal fine-tuning, offering a practical solution for effectively managing long-range dependencies in diverse applications.

Updated: 2024-06-24 15:55:59

标题: 更稀疏更快，更少更有效：用于长距离Transformer的高效稀疏注意力

摘要: 在自回归Transformer中高效地容纳长序列，特别是在扩展的上下文窗口内，由于自注意机制中固有的二次计算复杂性和大量的KV内存需求，存在重大挑战。在这项工作中，我们引入了SPARSEK Attention，这是一种新颖的稀疏注意机制，旨在克服这些计算和内存障碍，同时保持性能。我们的方法集成了一个评分网络和一个可微的top-k掩码运算符SPARSEK，以选择每个查询的恒定数量的KV对，从而实现基于梯度的优化。因此，SPARSEK Attention在生成过程中具有线性时间复杂度和常数内存占用。实验结果显示，SPARSEK Attention优于先前的稀疏注意方法，并在训练和推断期间提供显著的速度改进，特别是在语言建模和下游任务中。此外，我们的方法可以无缝地集成到预训练的大型语言模型（LLMs）中，只需进行最少的微调，为有效处理各种应用中的长距离依赖关系提供了实用解决方案。

更新时间: 2024-06-24 15:55:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.16747v1

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation, frameworks, guides, and practical tools) that support informed data selection, processing, and understanding, precise and limitation-aware artifact documentation, efficient model training, advance awareness of the environmental impact from training, careful model evaluation of capabilities, risks, and claims, as well as responsible model release, licensing and deployment practices. We hope this curated collection of resources helps guide more responsible development. The process of curating this list, enabled us to review the AI development ecosystem, revealing what tools are critically missing, misused, or over-used in existing practices. We find that (i) tools for data sourcing, model evaluation, and monitoring are critically under-serving ethical and real-world needs, (ii) evaluations for model safety, capabilities, and environmental impact all lack reproducibility and transparency, (iii) text and particularly English-centric analyses continue to dominate over multilingual and multi-modal analyses, and (iv) evaluation of systems, rather than just models, is needed so that capabilities and impact are assessed in context.

Updated: 2024-06-24 15:55:49

标题: 负责任的基金会模型发展速查表：工具和资源评估

摘要: 基础模型开发吸引了越来越多的贡献者、科学家和应用。为了帮助塑造负责任的发展实践，我们推出了基础模型开发速查表：这是一个不断增长的包含250多种文本、视觉和语音模式的工具和资源的集合。我们借鉴了大量之前的工作，对支持知情数据选择、处理和理解、准确和具有限制意识的工件文档、高效的模型训练、培养对训练的环境影响的意识、谨慎的模型评估能力、风险和声明，以及负责任的模型发布、许可和部署实践的资源进行了调查（例如软件、文档、框架、指南和实用工具）。我们希望这个经过策划的资源集合能够引导更加负责任的开发。整理这个列表的过程使我们能够审查AI开发生态系统，揭示了现有实践中缺失、误用或过度使用的工具。我们发现（i）数据采集、模型评估和监控工具严重不足以满足道德和真实世界的需求，（ii）对模型安全性、能力和环境影响的评估都缺乏可重复性和透明性，（iii）文本和尤其是以英语为中心的分析继续主导多语言和多模态分析，（iv）需要对系统进行评估，而不仅仅是模型，以便在上下文中评估能力和影响。

更新时间: 2024-06-24 15:55:49

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.16746v1

Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as the neural feature ansatz (NFA). Through the NFA, the authors introduce mapping with the AGOP as a general mechanism for neural feature learning. However, these works do not provide a theoretical explanation for this correlation or its origins. In this work, we further clarify the nature of this correlation, and explain its emergence. We show that this correlation is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent features at each layer. We further establish that the alignment is driven by the interaction of weight changes induced by SGD with the pre-activation features, and analyze the resulting dynamics analytically at early times in terms of simple statistics of the inputs and labels. Finally, motivated by the observation that the NFA is driven by this centered correlation, we introduce a simple optimization rule that dramatically increases the NFA correlations at any given layer and improves the quality of features learned.

Updated: 2024-06-24 15:55:34

标题: 特征学习作为对齐：非线性神经网络中梯度下降的结构特性

摘要: 理解神经网络通过特征学习从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。先前的研究表明，权重的格拉姆矩阵（神经特征矩阵，NFM）和平均梯度外积（AGOP）在训练过程中会变得相关，这被称为神经特征假设（NFA）。通过NFA，作者引入了使用AGOP作为神经特征学习的一般机制。然而，这些研究并未对这种相关性或其起源提供理论解释。在本研究中，我们进一步阐明了这种相关性的性质，并解释了其出现。我们展示了这种相关性等同于权重矩阵的左奇异结构与每层新定义的预激活切线特征之间的对齐。我们进一步确认，这种对齐是由SGD引起的权重变化与预激活特征的相互作用驱动的，并在早期时期分析了由此产生的动态，以输入和标签的简单统计数据为基础。最后，受到NFA受到这种中心化相关性驱动的观察的启发，我们引入了一个简单的优化规则，极大地增加了在任何给定层的NFA相关性，并提高了学习到的特征的质量。

更新时间: 2024-06-24 15:55:34

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.05271v3

Bandits with Preference Feedback: A Stackelberg Game Perspective

Bandits with preference feedback present a powerful tool for optimizing unknown target functions when only pairwise comparisons are allowed instead of direct value queries. This model allows for incorporating human feedback into online inference and optimization and has been employed in systems for fine-tuning large language models. The problem is well understood in simplified settings with linear target functions or over finite small domains that limit practical interest. Taking the next step, we consider infinite domains and nonlinear (kernelized) rewards. In this setting, selecting a pair of actions is quite challenging and requires balancing exploration and exploitation at two levels: within the pair, and along the iterations of the algorithm. We propose MAXMINLCB, which emulates this trade-off as a zero-sum Stackelberg game, and chooses action pairs that are informative and yield favorable rewards. MAXMINLCB consistently outperforms existing algorithms and satisfies an anytime-valid rate-optimal regret guarantee. This is due to our novel preference-based confidence sequences for kernelized logistic estimators.

Updated: 2024-06-24 15:53:11

标题: 具有偏好反馈的强盗：斯塔克尔贝格博弈视角

摘要: 带有偏好反馈的强盗问题是一种优化未知目标函数的强大工具，当只允许进行成对比较而不是直接查询数值时。这种模型允许将人类反馈纳入在线推断和优化，已经被应用于调整大型语言模型的系统中。该问题在简化设置中已被很好地理解，其中目标函数是线性的或者在有限小域上限制了实际兴趣。接下来，我们考虑无限域和非线性（核化）奖励。在这种情况下，选择一对动作是非常具有挑战性的，并且需要在两个层面平衡探索和开发：在一对动作内部，以及算法的迭代过程中。我们提出了MAXMINLCB算法，将这种权衡模拟为零和斯塔克尔贝格博弈，并选择信息量丰富且获得有利奖励的动作对。MAXMINLCB一直表现优于现有算法，并满足任何时候有效的速率优化遗憾保证。这是由于我们对于核化逻辑估计器的基于偏好的置信序列的创新。

更新时间: 2024-06-24 15:53:11

领域: cs.LG,cs.AI,cs.GT,stat.ML

下载: http://arxiv.org/abs/2406.16745v1

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing technique that adjusts the relation label adaptively according to the quality of the detected objects. By the relation smoothing, the model is trained according to the continuous curriculum that focuses on object detection task at the beginning of training and performs multi-task learning as the object detection performance gradually improves. Furthermore, we propose a connectivity prediction task that predicts whether a relation exists between object pairs as an auxiliary task of the relation extraction. We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets. Our code is publicly available at https://github.com/naver-ai/egtr.

Updated: 2024-06-24 15:52:57

标题: EGTR：从变压器中提取图用于场景图生成

摘要: 场景图生成（SGG）是一个具有挑战性的任务，它涉及到检测对象并预测对象之间的关系。在开发了DETR之后，基于一阶段目标检测器的一阶段SGG模型得到了积极研究。然而，为了预测对象之间的关系，使用了复杂的建模方法，而忽略了目标检测器的多头自注意力中学习到的对象查询之间的固有关系。我们提出了一种轻量级的一阶段SGG模型，从DETR解码器的多头自注意力层中学习到的各种关系中提取关系图。通过充分利用自注意力的副产品，可以通过浅层关系提取头有效地提取关系图。考虑到关系提取任务对目标检测任务的依赖性，我们提出了一种新颖的关系平滑技术，根据检测到的对象的质量自适应地调整关系标签。通过关系平滑，模型根据连续的课程进行训练，该课程在训练开始时侧重于目标检测任务，并随着目标检测性能逐渐提高而进行多任务学习。此外，我们提出了一个连接性预测任务，作为关系提取的辅助任务，预测对象对之间是否存在关系。我们展示了我们的方法在Visual Genome和Open Image V6数据集上的有效性和效率。我们的代码可在https://github.com/naver-ai/egtr 上公开获取。

更新时间: 2024-06-24 15:52:57

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.02072v5

Attribute Diversity Determines the Systematicity Gap in VQA

The degree to which neural networks can generalize to new combinations of familiar concepts, and the conditions under which they are able to do so, has long been an open question. In this work, we study the systematicity gap in visual question answering: the performance difference between reasoning on previously seen and unseen combinations of object attributes. To test, we introduce a novel diagnostic dataset, CLEVR-HOPE. We find that while increased quantity of training data does not reduce the systematicity gap, increased training data diversity of the attributes in the unseen combination does. In all, our experiments suggest that the more distinct attribute type combinations are seen during training, the more systematic we can expect the resulting model to be.

Updated: 2024-06-24 15:51:13

标题: 属性多样性决定了视觉问答中的系统性差距

摘要: 神经网络能够泛化到新的熟悉概念组合的程度，以及它们能够做到这一点的条件，一直是一个悬而未决的问题。在这项工作中，我们研究了视觉问答中的系统性差距：在以前见过和未见过的对象属性组合上推理的性能差异。为了测试，我们引入了一个新的诊断数据集，CLEVR-HOPE。我们发现，虽然增加训练数据的数量并不能减少系统性差距，但增加未见组合中属性的训练数据多样性会有所帮助。总的来说，我们的实验表明，在训练过程中看到的属性类型组合越不同，我们可以期望得到的模型就越系统化。

更新时间: 2024-06-24 15:51:13

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2311.08695v2

Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking

Divergent thinking, the cognitive process of generating diverse solutions, is a hallmark of human creativity and problem-solving. For machines, sampling diverse solution trajectories in complex reasoning problems is crucial for robust outcomes, data augmentation, and enhanced model generalization. Large language models (LLMs) often struggle with generating high-quality, diverse reasoning. While supervised fine-tuning helps with quality, it requires extensive supervision data to capture the full diversity of solutions. Alternatively, reinforcement learning methods like PPO aim to find limited highest-reward solutions while neglecting the solution diversity, akin to convergent thinking. To address these limitations, we propose Flow of Reasoning (FoR) -- an efficient LLM training approach enabling diverse reasoning with minimal data. FoR formulates multi-step LLM reasoning as a Markovian flow from an initial state to terminal states. The formulation allows to adapt principled GFlowNet approaches to train the LLM as a policy, which is able to sample multiple reasoning paths with probabilities proportional to the unnormalized reward. Empirical results show that, with limited training data (e.g., 15 examples), FoR can discover diverse high-quality solutions that excel greatly beyond current state-of-the-art methods across three tasks, including embodied reasoning (BlocksWorld), math puzzle solving (Game24), and logical reasoning (PrOntoQA). Code is available at https://github.com/Yu-Fangxu/FoR.

Updated: 2024-06-24 15:49:09

标题: 思维流程：具有发散思维的LLM政策的高效训练

摘要: 发散性思维是生成多样化解决方案的认知过程，是人类创造力和解决问题的特征。对于机器而言，在复杂推理问题中采样多样化的解决路径对于稳健的结果、数据增强和增强模型泛化性能至关重要。大型语言模型（LLMs）经常难以生成高质量、多样化推理。虽然有监督微调有助于提高质量，但需要大量监督数据来捕捉全部解决方案的多样性。另外，像PPO这样的强化学习方法旨在找到有限的最高奖励解决方案，而忽视解决方案的多样性，类似于收敛性思维。为了解决这些限制，我们提出了推理流（FoR）——一种高效的LLM训练方法，能够在最少数据的情况下进行多样化推理。FoR将多步LLM推理形式化为从初始状态到终端状态的马尔可夫流。这种形式化允许采用基于GFlowNet的方法训练LLM作为策略，能够以与未归一化奖励成比例的概率采样多个推理路径。实证结果表明，仅使用有限的训练数据（例如15个示例），FoR能够发现在三个任务中明显优于当前最先进方法的多样高质量解决方案，包括具体推理（BlocksWorld）、数学难题解决（Game24）和逻辑推理（PrOntoQA）。代码可在https://github.com/Yu-Fangxu/FoR 上获得。

更新时间: 2024-06-24 15:49:09

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05673v2

Extracting thin film structures of energy materials using transformers

Neutron-Transformer Reflectometry and Advanced Computation Engine (N-TRACE ), a neural network model using transformer architecture, is introduced for neutron reflectometry data analysis. It offers fast, accurate initial parameter estimations and efficient refinements, improving efficiency and precision for real-time data analysis of lithium-mediated nitrogen reduction for electrochemical ammonia synthesis, with relevance to other chemical transformations and batteries. Despite limitations in generalizing across systems, it shows promises for the use of transformers as the basis for models that could replace trial-and-error approaches to modeling reflectometry data.

Updated: 2024-06-24 15:48:19

标题: 利用变压器提取能源材料的薄膜结构

摘要: 中子变压器反射计和高级计算引擎（N-TRACE）是一种使用变压器架构的神经网络模型，用于中子反射数据分析。它提供快速、准确的初始参数估计和高效的优化，提高了锂介导的氮还原的实时数据分析的效率和精度，与其他化学转化和电池相关。尽管存在系统泛化的局限性，但它显示了将变压器作为模型基础的潜力，可以取代试错方法来建模反射数据。

更新时间: 2024-06-24 15:48:19

领域: physics.comp-ph,cs.AI

下载: http://arxiv.org/abs/2406.16741v1

Learning the boundary-to-domain mapping using Lifting Product Fourier Neural Operators for partial differential equations

Neural operators such as the Fourier Neural Operator (FNO) have been shown to provide resolution-independent deep learning models that can learn mappings between function spaces. For example, an initial condition can be mapped to the solution of a partial differential equation (PDE) at a future time-step using a neural operator. Despite the popularity of neural operators, their use to predict solution functions over a domain given only data over the boundary (such as a spatially varying Dirichlet boundary condition) remains unexplored. In this paper, we refer to such problems as boundary-to-domain problems; they have a wide range of applications in areas such as fluid mechanics, solid mechanics, heat transfer etc. We present a novel FNO-based architecture, named Lifting Product FNO (or LP-FNO) which can map arbitrary boundary functions defined on the lower-dimensional boundary to a solution in the entire domain. Specifically, two FNOs defined on the lower-dimensional boundary are lifted into the higher dimensional domain using our proposed lifting product layer. We demonstrate the efficacy and resolution independence of the proposed LP-FNO for the 2D Poisson equation.

Updated: 2024-06-24 15:45:37

标题: 学习使用Lifting Product Fourier神经算子进行偏微分方程边界到域的映射

摘要: 神经算子如傅立叶神经算子（FNO）已被证明提供了分辨率无关的深度学习模型，可以学习函数空间之间的映射。例如，可以使用神经算子将初始条件映射到未来时间步的偏微分方程（PDE）的解。尽管神经算子很受欢迎，但它们在仅给定边界数据（如空间变化的狄利克雷边界条件）的情况下预测域中的解函数的使用尚未被探索。在本文中，我们将这类问题称为边界到域的问题；它们在流体力学、固体力学、热传递等领域具有广泛的应用。我们提出了一种新颖的基于FNO的架构，称为Lifting Product FNO（LP-FNO），它可以将定义在低维边界上的任意边界函数映射到整个域中的解。具体地，我们使用提出的提升乘积层将定义在低维边界上的两个FNO提升到更高维度的域中。我们展示了提出的LP-FNO对于2D泊松方程的有效性和分辨率独立性。

更新时间: 2024-06-24 15:45:37

领域: cs.LG,cs.NA,math.NA,65N99, 68T07,I.2.1; J.2

下载: http://arxiv.org/abs/2406.16740v1

Inducing Group Fairness in LLM-Based Decisions

Prompting Large Language Models (LLMs) has created new and interesting means for classifying textual data. While evaluating and remediating group fairness is a well-studied problem in classifier fairness literature, some classical approaches (e.g., regularization) do not carry over, and some new opportunities arise (e.g., prompt-based remediation). We measure fairness of LLM-based classifiers on a toxicity classification task, and empirically show that prompt-based classifiers may lead to unfair decisions. We introduce several remediation techniques and benchmark their fairness and performance trade-offs. We hope our work encourages more research on group fairness in LLM-based classifiers.

Updated: 2024-06-24 15:45:20

标题: 在基于LLM的决策中引入群体公平性

摘要: 使用大型语言模型（LLMs）来提示已经创造了对文本数据进行分类的新而有趣的方法。虽然在分类器公平性文献中评估和纠正群体公平性是一个经过深入研究的问题，但一些经典方法（例如正则化）并不适用，而一些新的机会也出现了（例如基于提示的纠正）。我们在毒性分类任务上衡量LLM-based分类器的公平性，并实证表明基于提示的分类器可能会导致不公平决策。我们引入了几种纠正技术，并对它们的公平性和性能权衡进行了基准测试。我们希望我们的工作能够鼓励更多关于LLM-based分类器中群体公平性的研究。

更新时间: 2024-06-24 15:45:20

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.16738v1

A practical existence theorem for reduced order models based on convolutional autoencoders

In recent years, deep learning has gained increasing popularity in the fields of Partial Differential Equations (PDEs) and Reduced Order Modeling (ROM), providing domain practitioners with new powerful data-driven techniques such as Physics-Informed Neural Networks (PINNs), Neural Operators, Deep Operator Networks (DeepONets) and Deep-Learning based ROMs (DL-ROMs). In this context, deep autoencoders based on Convolutional Neural Networks (CNNs) have proven extremely effective, outperforming established techniques, such as the reduced basis method, when dealing with complex nonlinear problems. However, despite the empirical success of CNN-based autoencoders, there are only a few theoretical results supporting these architectures, usually stated in the form of universal approximation theorems. In particular, although the existing literature provides users with guidelines for designing convolutional autoencoders, the subsequent challenge of learning the latent features has been barely investigated. Furthermore, many practical questions remain unanswered, e.g., the number of snapshots needed for convergence or the neural network training strategy. In this work, using recent techniques from sparse high-dimensional function approximation, we fill some of these gaps by providing a new practical existence theorem for CNN-based autoencoders when the parameter-to-solution map is holomorphic. This regularity assumption arises in many relevant classes of parametric PDEs, such as the parametric diffusion equation, for which we discuss an explicit application of our general theory.

Updated: 2024-06-24 15:42:52

标题: 一个基于卷积自编码器的降阶模型的实用存在定理

摘要: 近年来，深度学习在偏微分方程（PDEs）和降阶建模（ROM）领域越来越受欢迎，为领域从业者提供了新的强大的数据驱动技术，如基于物理信息的神经网络（PINNs）、神经算子、深度算子网络（DeepONets）和基于深度学习的ROMs（DL-ROMs）。在这种情况下，基于卷积神经网络（CNNs）的深度自编码器已被证明极其有效，在处理复杂非线性问题时胜过传统技术，如降维基方法。然而，尽管基于CNN的自编码器在实践中取得了成功，但只有少数理论结果支持这些架构，通常以万能逼近定理的形式陈述。特别是，尽管现有文献为设计卷积自编码器提供了指导，但学习潜在特征的后续挑战几乎没有被研究。此外，许多实际问题仍未解答，例如收敛所需的快照数量或神经网络训练策略。在这项工作中，利用稀疏高维函数逼近的最新技术，我们通过为CNN-based自编码器提供一个新的实际存在定理来填补其中一些空白，当参数到解映射是全纯的时候。这种正则性假设在许多相关的参数化PDE类中都会出现，例如参数扩散方程，我们讨论了我们的一般理论的一个明确应用。

更新时间: 2024-06-24 15:42:52

领域: math.NA,cs.AI,cs.LG,cs.NA

下载: http://arxiv.org/abs/2402.00435v2

Fusion of Movement and Naive Predictions for Point Forecasting in Univariate Random Walks

Traditional methods for point forecasting in univariate random walks often fail to surpass naive benchmarks due to data unpredictability. This study introduces a novel forecasting method that fuses movement prediction (binary classification) with naive forecasts for accurate one-step-ahead point forecasting. The method's efficacy is demonstrated through theoretical analysis, simulations, and real-world data experiments. It reliably exceeds naive forecasts with movement prediction accuracies as low as 0.55, outperforming baseline models like ARIMA, linear regression, MLP, and LSTM networks in forecasting the S\&P 500 index and Bitcoin prices. This method is particularly advantageous when accurate point predictions are challenging but accurate movement predictions are attainable, translating movement predictions into point forecasts in random walk contexts.

Updated: 2024-06-24 15:40:40

标题: 在单变量随机游走中将运动和天真预测融合用于点预测

摘要: 在单变量随机游走的传统点预测方法通常由于数据的不可预测性而未能超越天真基准。本研究引入了一种新颖的预测方法，将运动预测（二元分类）与天真预测相融合，实现准确的一步预测点预测。通过理论分析、模拟和真实世界数据实验，证明了该方法的有效性。在运动预测准确率低至0.55时，该方法可可靠地超越天真预测，优于ARIMA、线性回归、MLP和LSTM网络等基准模型在预测标普500指数和比特币价格方面的表现。当准确的点预测具有挑战性但准确的运动预测是可实现的时，这种方法尤其有优势，将运动预测转化为随机游走情境中的点预测。

更新时间: 2024-06-24 15:40:40

领域: cs.CE,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.14469v2

Versatile Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers

Deep neural networks (DNNs) can be manipulated to exhibit specific behaviors when exposed to specific trigger patterns, without affecting their performance on benign samples, dubbed \textit{backdoor attack}. Currently, implementing backdoor attacks in physical scenarios still faces significant challenges. Physical attacks are labor-intensive and time-consuming, and the triggers are selected in a manual and heuristic way. Moreover, expanding digital attacks to physical scenarios faces many challenges due to their sensitivity to visual distortions and the absence of counterparts in the real world. To address these challenges, we define a novel trigger called the \textbf{V}isible, \textbf{S}emantic, \textbf{S}ample-Specific, and \textbf{C}ompatible (VSSC) trigger, to achieve effective, stealthy and robust simultaneously, which can also be effectively deployed in the physical scenario using corresponding objects. To implement the VSSC trigger, we propose an automated pipeline comprising three modules: a trigger selection module that systematically identifies suitable triggers leveraging large language models, a trigger insertion module that employs generative models to seamlessly integrate triggers into images, and a quality assessment module that ensures the natural and successful insertion of triggers through vision-language models. Extensive experimental results and analysis validate the effectiveness, stealthiness, and robustness of the VSSC trigger. It can not only maintain robustness under visual distortions but also demonstrates strong practicality in the physical scenario. We hope that the proposed VSSC trigger and implementation approach could inspire future studies on designing more practical triggers in backdoor attacks.

Updated: 2024-06-24 15:40:01

标题: 具有可见、语义、样本特定和兼容触发器的多功能后门攻击

摘要: 深度神经网络（DNNs）可以被操纵以展示特定行为，当暴露于特定触发模式时，而不影响它们在良性样本上的表现，被称为\textit{后门攻击}。目前，在物理场景中实施后门攻击仍然面临重大挑战。物理攻击需要大量的人力和时间，并且触发器是以手动和启发式方式选择的。此外，将数字攻击扩展到物理场景面临许多挑战，因为它们对视觉扭曲的敏感性以及在现实世界中的对应物的缺失。为了解决这些挑战，我们定义了一种称为\textbf{V}isible，\textbf{S}emantic，\textbf{S}ample-Specific和\textbf{C}ompatible（VSSC）触发器的新型触发器，以同时实现有效、隐秘和稳健，还可以通过相应的对象有效部署在物理场景中。为了实现VSSC触发器，我们提出了一个自动化流程，包括三个模块：一个触发器选择模块，通过利用大型语言模型系统地识别合适的触发器；一个触发器插入模块，利用生成模型将触发器无缝集成到图像中；以及一个质量评估模块，通过视觉语言模型确保触发器的自然和成功插入。广泛的实验结果和分析验证了VSSC触发器的有效性、隐秘性和稳健性。它不仅可以在视觉扭曲下保持稳健性，还展现了在物理场景中的强实用性。我们希望所提出的VSSC触发器和实施方法可以激发未来研究设计更实用触发器的后门攻击。

更新时间: 2024-06-24 15:40:01

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2306.00816v4

FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages

Question Answering (QA) datasets are crucial in assessing reading comprehension skills for both machines and humans. While numerous datasets have been developed in English for this purpose, a noticeable void exists in less-resourced languages. To alleviate this gap, our paper introduces machine-translated versions of FairytaleQA, a renowned QA dataset designed to assess and enhance narrative comprehension skills in young children. By employing fine-tuned, modest-scale models, we establish benchmarks for both Question Generation (QG) and QA tasks within the translated datasets. In addition, we present a case study proposing a model for generating question-answer pairs, with an evaluation incorporating quality metrics such as question well-formedness, answerability, relevance, and children suitability. Our evaluation prioritizes quantifying and describing error cases, along with providing directions for future work. This paper contributes to the advancement of QA and QG research in less-resourced languages, promoting accessibility and inclusivity in the development of these models for reading comprehension. The code and data is publicly available at github.com/bernardoleite/fairytaleqa-translated.

Updated: 2024-06-24 15:39:17

标题: FairytaleQA翻译：在资源较少的语言中实现教育问答生成

摘要: 问答（QA）数据集对于评估机器和人类的阅读理解能力至关重要。尽管已经为此目的开发了许多英语数据集，但在资源较少的语言中存在明显的空白。为了弥补这一差距，我们的论文介绍了FairytaleQA的机器翻译版本，这是一个旨在评估和提高年幼儿童叙述理解能力的知名QA数据集。通过使用经过微调的适度规模模型，我们在翻译后的数据集中建立了问题生成（QG）和QA任务的基准。此外，我们提出了一个案例研究，提出了一个用于生成问题-答案对的模型，并评估包括问题形式良好性、可回答性、相关性和适合儿童的质量指标。我们的评估重点是量化和描述错误案例，同时为未来工作提供方向。这篇论文有助于推动资源较少语言中QA和QG研究的进展，促进这些模型的阅读理解发展的可访问性和包容性。代码和数据可在github.com/bernardoleite/fairytaleqa-translated上公开获取。

更新时间: 2024-06-24 15:39:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.04233v2

Convolutional neural network for Lyman break galaxies classification and redshift regression in DESI (Dark Energy Spectroscopic Instrument)

DESI is a groundbreaking international project to observe more than 40 million quasars and galaxies over a 5-year period to create a 3D map of the sky. This map will enable us to probe multiple aspects of cosmology, from dark energy to neutrino mass. We are focusing here on one type of object observed by DESI, the Lyman Break Galaxies (LBGs). The aim is to use their spectra to determine whether they are indeed LBGs, and if so, to determine their distance from the Earth using a phenomenon called redshift. This will enable us to place these galaxies on the DESI 3D map. The aim is therefore to develop a convolutional neural network (CNN) inspired by QuasarNET (See arXiv:1808.09955), performing simultaneously a classification (LBG type or not) and a regression task (determine the redshift of the LBGs). Initially, data augmentation techniques such as shifting the spectra in wavelengths, adding noise to the spectra, or adding synthetic spectra were used to increase the model training dataset from 3,019 data to over 66,000. In a second phase, modifications to the QuasarNET architecture, notably through transfer learning and hyperparameter tuning with Bayesian optimization, boosted model performance. Gains of up to 26% were achieved on the Purity/Efficiency curve, which is used to evaluate model performance, particularly in areas with interesting redshifts, at low (around 2) and high (around 4) redshifts. The best model obtained an average score of 94%, compared with 75% for the initial model.

Updated: 2024-06-24 15:35:51

标题: DESI（暗能量光谱仪）中用于Lyman突破星系分类和红移回归的卷积神经网络

摘要: DESI是一个开创性的国际项目，将在5年时间内观测超过4000万个类星体和星系，以创建天空的3D地图。这张地图将使我们能够探测宇宙学的多个方面，从暗能量到中微子质量。我们在这里关注DESI观测到的一种对象，即莱曼断裂星系（LBGs）。旨在利用它们的光谱来确定它们是否确实是LBGs，并且如果是的话，利用一个称为红移的现象来确定它们与地球的距离。这将使我们能够将这些星系放在DESI的3D地图上。因此，旨在开发一个受QuasarNET启发的卷积神经网络（CNN）（参见arXiv：1808.09955），同时执行分类（LBG类型或非LBG）和回归任务（确定LBGs的红移）。最初，采用数据增强技术，如在波长上移动光谱，向光谱添加噪音，或添加合成光谱，将模型训练数据集从3,019个增加到超过66,000个。在第二阶段，通过转移学习和使用贝叶斯优化调整超参数对QuasarNET架构进行修改，提高了模型性能。在Purity/Efficiency曲线上获得了高达26%的增益，该曲线用于评估模型性能，特别是在有趣的红移区域，低（约2）和高（约4）红移。最佳模型的平均得分为94％，而初始模型为75％。

更新时间: 2024-06-24 15:35:51

领域: astro-ph.CO,cs.AI

下载: http://arxiv.org/abs/2406.16730v1

DeepReShape: Redesigning Neural Networks for Efficient Private Inference

Prior work on Private Inference (PI) -- inferences performed directly on encrypted input -- has focused on minimizing a network's ReLUs, which have been assumed to dominate PI latency rather than FLOPs. Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties. In this paper, we develop DeepReShape, a technique that optimizes neural network architectures under PI's constraints, optimizing for both ReLUs and FLOPs for the first time. The key insight is strategically allocating channels to position the network's ReLUs in order of their criticality to network accuracy, simultaneously optimizes ReLU and FLOPs efficiency. DeepReShape automates network development with an efficient process, and we call generated networks HybReNets. We evaluate DeepReShape using standard PI benchmarks and demonstrate a 2.1% accuracy gain with a 5.2$\times$ runtime improvement at iso-ReLU on CIFAR-100 and an 8.7$\times$ runtime improvement at iso-accuracy on TinyImageNet. Furthermore, we investigate the significance of network selection in prior ReLU optimizations and shed light on the key network attributes for superior PI performance.

Updated: 2024-06-24 15:34:40

标题: DeepReShape: 为高效私密推理重新设计神经网络

摘要: 私密推断（PI）的先前工作--对加密输入进行直接推断--已经集中在最小化网络的ReLUs上，这被认为是占主导地位的PI延迟而不是FLOPs。最近的工作表明，PI的FLOPs不再可以忽视，并且会产生高延迟惩罚。在本文中，我们开发了DeepReShape，一种在PI约束下优化神经网络架构的技术，首次同时优化ReLUs和FLOPs。关键洞察是通过战略性地分配通道来将网络的ReLUs按其对网络准确性的关键性顺序排列，同时优化ReLU和FLOPs的效率。DeepReShape通过一个高效的过程自动化网络开发，并且我们称生成的网络为HybReNets。我们使用标准的PI基准测试评估DeepReShape，并展示了在CIFAR-100上在iso-ReLU上获得了2.1%的准确率增益，并在TinyImageNet上在iso-accuracy上获得了8.7倍的运行时改进。此外，我们调查了先前ReLU优化中网络选择的显著性，并揭示了卓越PI性能的关键网络属性。

更新时间: 2024-06-24 15:34:40

领域: cs.CR

下载: http://arxiv.org/abs/2304.10593v4

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Large language models (LLMs) have recently experienced tremendous popularity and are widely used from casual conversations to AI-driven programming. However, despite their considerable success, LLMs are not entirely reliable and can give detailed guidance on how to conduct harmful or illegal activities. While safety measures can reduce the risk of such outputs, adversarial jailbreak attacks can still exploit LLMs to produce harmful content. These jailbreak templates are typically manually crafted, making large-scale testing challenging. In this paper, we introduce GPTFuzz, a novel black-box jailbreak fuzzing framework inspired by the AFL fuzzing framework. Instead of manual engineering, GPTFuzz automates the generation of jailbreak templates for red-teaming LLMs. At its core, GPTFuzz starts with human-written templates as initial seeds, then mutates them to produce new templates. We detail three key components of GPTFuzz: a seed selection strategy for balancing efficiency and variability, mutate operators for creating semantically equivalent or similar sentences, and a judgment model to assess the success of a jailbreak attack. We evaluate GPTFuzz against various commercial and open-source LLMs, including ChatGPT, LLaMa-2, and Vicuna, under diverse attack scenarios. Our results indicate that GPTFuzz consistently produces jailbreak templates with a high success rate, surpassing human-crafted templates. Remarkably, GPTFuzz achieves over 90% attack success rates against ChatGPT and Llama-2 models, even with suboptimal initial seed templates. We anticipate that GPTFuzz will be instrumental for researchers and practitioners in examining LLM robustness and will encourage further exploration into enhancing LLM safety.

Updated: 2024-06-24 15:34:17

标题: GPTFUZZER：使用自动生成的越狱提示对大型语言模型进行红队测试

摘要: 大型语言模型（LLMs）最近经历了巨大的流行，被广泛应用于从日常对话到人工智能驱动的编程等各个领域。然而，尽管取得了显著的成功，LLMs并不完全可靠，可能会提供详细的指导，教导如何进行有害或非法活动。虽然安全措施可以减少这类输出的风险，但对抗性越狱攻击仍然可以利用LLMs生成有害内容。这些越狱模板通常是手工制作的，使得大规模测试具有挑战性。在本文中，我们介绍了GPTFuzz，这是一种受AFL模糊测试框架启发的新型黑盒越狱模糊测试框架。与手工工程不同，GPTFuzz自动化生成越狱模板，用于对抗LLMs的红队操作。在核心部分，GPTFuzz从人类编写的模板作为初始种子开始，然后对其进行变异以生成新的模板。我们详细介绍了GPTFuzz的三个关键组成部分：一种用于平衡效率和变异性的种子选择策略，用于创建语义等效或相似句子的变异操作符，以及用于评估越狱攻击成功程度的评判模型。我们对各种商业和开源LLMs，包括ChatGPT、LLaMa-2和Vicuna，以不同的攻击场景进行了GPTFuzz的评估。我们的结果表明，GPTFuzz始终能够产生成功率高的越狱模板，超过了人工制作的模板。值得注意的是，即使初始种子模板不够优化，GPTFuzz在对ChatGPT和Llama-2模型的攻击成功率超过90%。我们预计GPTFuzz将成为研究人员和从业者检验LLM鲁棒性的重要工具，并鼓励进一步探索增强LLM安全性的方向。

更新时间: 2024-06-24 15:34:17

领域: cs.AI

下载: http://arxiv.org/abs/2309.10253v3

CausalMMM: Learning Causal Structure for Marketing Mix Modeling

In online advertising, marketing mix modeling (MMM) is employed to predict the gross merchandise volume (GMV) of brand shops and help decision-makers to adjust the budget allocation of various advertising channels. Traditional MMM methods leveraging regression techniques can fail in handling the complexity of marketing. Although some efforts try to encode the causal structures for better prediction, they have the strict restriction that causal structures are prior-known and unchangeable. In this paper, we define a new causal MMM problem that automatically discovers the interpretable causal structures from data and yields better GMV predictions. To achieve causal MMM, two essential challenges should be addressed: (1) Causal Heterogeneity. The causal structures of different kinds of shops vary a lot. (2) Marketing Response Patterns. Various marketing response patterns i.e., carryover effect and shape effect, have been validated in practice. We argue that causal MMM needs dynamically discover specific causal structures for different shops and the predictions should comply with the prior known marketing response patterns. Thus, we propose CausalMMM that integrates Granger causality in a variational inference framework to measure the causal relationships between different channels and predict the GMV with the regularization of both temporal and saturation marketing response patterns. Extensive experiments show that CausalMMM can not only achieve superior performance of causal structure learning on synthetic datasets with improvements of 5.7%\sim 7.1%, but also enhance the GMV prediction results on a representative E-commerce platform.

Updated: 2024-06-24 15:33:47

标题: CausalMMM：学习市场营销组合建模的因果结构

摘要: 在线广告中，营销组合建模（MMM）被用于预测品牌商店的总商品交易量（GMV），并帮助决策者调整各种广告渠道的预算分配。传统的利用回归技术的MMM方法在处理营销复杂性方面可能会失败。尽管一些努力试图为更好的预测编码因果结构，但它们有严格的限制，即因果结构是事先已知且不可改变的。在本文中，我们定义了一个新的因果MMM问题，该问题可以自动从数据中发现可解释的因果结构，并产生更好的GMV预测结果。要实现因果MMM，需要解决两个关键挑战：（1）因果异质性。不同类型商店的因果结构差异很大。（2）营销响应模式。各种营销响应模式，如传导效应和形状效应，在实践中得到验证。我们认为因果MMM需要动态地为不同商店发现特定的因果结构，并且预测结果应符合事先已知的营销响应模式。因此，我们提出了CausalMMM，该方法在变分推断框架中集成了Granger因果关系，以衡量不同渠道之间的因果关系，并通过时间和饱和营销响应模式的正则化来预测GMV。广泛的实验表明，CausalMMM不仅可以在合成数据集上实现因果结构学习的优越表现，改进了5.7%\sim 7.1%，还可以提高代表性电子商务平台上的GMV预测结果。

更新时间: 2024-06-24 15:33:47

领域: cs.AI

下载: http://arxiv.org/abs/2406.16728v1

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. LatentExplainer tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. By perturbing latent variables and interpreting changes in generated data, the framework provides a systematic approach to understanding and controlling the data generation process, enhancing the transparency and interpretability of deep generative models. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations of latent variables.

Updated: 2024-06-24 15:30:34

标题: 潜在解释器：使用多模态基础模型解释深度生成模型中的潜在表示

摘要: 深度生成模型如VAEs和扩散模型通过利用潜变量来学习数据分布并生成高质量样本，推动了各种生成任务的发展。尽管可解释AI领域在解释机器学习模型方面取得了进展，但理解生成模型中的潜变量仍具有挑战性。本文介绍了LatentExplainer，这是一个用于自动生成深度生成模型中潜变量的语义解释的框架。LatentExplainer解决了三个主要挑战：推断潜变量的含义，将解释与归纳偏差对齐，以及处理不同程度的可解释性。通过扰动潜变量并解释生成数据的变化，该框架提供了一种系统化方法来理解和控制数据生成过程，增强了深度生成模型的透明度和可解释性。我们在几个真实和合成数据集上评估了我们提出的方法，结果表明在生成潜变量的高质量解释方面表现出卓越的性能。

更新时间: 2024-06-24 15:30:34

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.14862v2

FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection

Early and accurate detection of anomalous events on the freeway, such as accidents, can improve emergency response and clearance. However, existing delays and errors in event identification and reporting make it a difficult problem to solve. Current large-scale freeway traffic datasets are not designed for anomaly detection and ignore these challenges. In this paper, we introduce the first large-scale lane-level freeway traffic dataset for anomaly detection. Our dataset consists of a month of weekday radar detection sensor data collected in 4 lanes along an 18-mile stretch of Interstate 24 heading toward Nashville, TN, comprising over 3.7 million sensor measurements. We also collect official crash reports from the Nashville Traffic Management Center and manually label all other potential anomalies in the dataset. To show the potential for our dataset to be used in future machine learning and traffic research, we benchmark numerous deep learning anomaly detection models on our dataset. We find that unsupervised graph neural network autoencoders are a promising solution for this problem and that ignoring spatial relationships leads to decreased performance. We demonstrate that our methods can reduce reporting delays by over 10 minutes on average while detecting 75% of crashes. Our dataset and all preprocessing code needed to get started are publicly released at https://vu.edu/ft-aed/ to facilitate future research.

Updated: 2024-06-24 15:24:15

标题: FT-AED：早期高速公路交通异常事件检测的基准数据集

摘要: 高速公路上异常事件的早期和准确检测，如事故，可以改善应急响应和清理。然而，现有的事件识别和报告中存在的延迟和错误使其成为一个难题。目前的大规模高速公路交通数据集并未设计用于异常检测，并忽视了这些挑战。在本文中，我们介绍了用于异常检测的第一个大规模车道级高速公路交通数据集。我们的数据集包括在18英里长的24号州际公路上收集到的4个车道上一个月的工作日雷达检测传感器数据，包括超过370万个传感器测量值。我们还收集了来自纳什维尔交通管理中心的官方碰撞报告，并手动标记了数据集中的所有其他潜在异常。为了展示我们的数据集在未来机器学习和交通研究中的潜力，我们在数据集上对许多深度学习异常检测模型进行了基准测试。我们发现，无监督的图神经网络自动编码器是这个问题的一个有前途的解决方案，并且忽视空间关系会导致性能下降。我们证明了我们的方法可以平均减少超过10分钟的报告延迟，同时检测到75%的碰撞。我们的数据集以及所有必要的预处理代码已经公开发布在https://vu.edu/ft-aed/上，以促进未来研究。

更新时间: 2024-06-24 15:24:15

领域: cs.LG

下载: http://arxiv.org/abs/2406.15283v2

One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection

As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafide samples to define their centroid, which can yield a specialized centroid for one-class learning. Integrating our ACS with one-class learning gathers bonafide representations into a single cluster, forming well-separated embeddings robust to unseen spoofing attacks. Our proposed method achieves an equal error rate (EER) of 2.19% on the ASVspoof 2021 deepfake dataset, outperforming all existing systems. Furthermore, the t-SNE visualization illustrates that our method effectively maps the bonafide embeddings into a single cluster and successfully disentangles the bonafide and spoof classes.

Updated: 2024-06-24 15:21:50

标题: 一类学习中自适应质心漂移用于音频深度伪造检测

摘要: 随着语音合成系统在最近几年取得了显著进展，对于在未知系统中表现良好的稳健深度伪造检测系统的重要性日益增长。在本文中，我们提出了一种新颖的自适应质心偏移（ACS）方法，通过不断地将质心表示更新为真实表示的加权平均值来进行偏移。我们的方法仅使用真实样本来定义它们的质心，这可以产生用于单类学习的专门质心。将我们的ACS与单类学习相结合，将真实表示聚集到一个单一的簇中，形成对未知欺诈攻击具有良好分离的嵌入。我们提出的方法在ASVspoof 2021深度伪造数据集上实现了2.19％的等误差率（EER），优于所有现有系统。此外，t-SNE可视化表明我们的方法有效地将真实嵌入映射到一个单一簇，并成功地解开了真实和伪造类别。

更新时间: 2024-06-24 15:21:50

领域: eess.AS,cs.CR,cs.SD

下载: http://arxiv.org/abs/2406.16716v1

State Representation Learning Using an Unbalanced Atlas

The manifold hypothesis posits that high-dimensional data often lies on a lower-dimensional manifold and that utilizing this manifold as the target space yields more efficient representations. While numerous traditional manifold-based techniques exist for dimensionality reduction, their application in self-supervised learning has witnessed slow progress. The recent MSimCLR method combines manifold encoding with SimCLR but requires extremely low target encoding dimensions to outperform SimCLR, limiting its applicability. This paper introduces a novel learning paradigm using an unbalanced atlas (UA), capable of surpassing state-of-the-art self-supervised learning approaches. We investigated and engineered the DeepInfomax with an unbalanced atlas (DIM-UA) method by adapting the Spatiotemporal DeepInfomax (ST-DIM) framework to align with our proposed UA paradigm. The efficacy of DIM-UA is demonstrated through training and evaluation on the Atari Annotated RAM Interface (AtariARI) benchmark, a modified version of the Atari 2600 framework that produces annotated image samples for representation learning. The UA paradigm improves existing algorithms significantly as the number of target encoding dimensions grows. For instance, the mean F1 score averaged over categories of DIM-UA is ~75% compared to ~70% of ST-DIM when using 16384 hidden units.

Updated: 2024-06-24 15:19:44

标题: 使用不平衡的地图学习状态表示

摘要: 多样性假设认为，高维数据通常位于较低维度的流形上，并利用这个流形作为目标空间可以得到更高效的表示。虽然存在许多传统的基于流形的降维技术，但它们在自监督学习中的应用进展缓慢。最近的MSimCLR方法将流形编码与SimCLR相结合，但需要极低的目标编码维度才能胜过SimCLR，从而限制了其适用性。本文介绍了一种使用不平衡图谱（UA）的新型学习范式，能够超越最先进的自监督学习方法。我们通过调整时空DeepInfomax（ST-DIM）框架来适应我们提出的UA范式，研究并设计了一种名为DeepInfomax with an unbalanced atlas（DIM-UA）的方法。DIM-UA的有效性通过在Atari Annotated RAM Interface（AtariARI）基准上的训练和评估得以证明，这是Atari 2600框架的修改版本，用于生成用于表示学习的标注图像样本。随着目标编码维度的增加，UA范式显著改进了现有算法。例如，当使用16384个隐藏单元时，DIM-UA在各类别上的平均F1分数约为75%，而ST-DIM为70%左右。

更新时间: 2024-06-24 15:19:44

领域: cs.LG

下载: http://arxiv.org/abs/2305.10267v3

GC-Bench: A Benchmark Framework for Graph Condensation with New Insights

Graph condensation (GC) is an emerging technique designed to learn a significantly smaller graph that retains the essential information of the original graph. This condensed graph has shown promise in accelerating graph neural networks while preserving performance comparable to those achieved with the original, larger graphs. Additionally, this technique facilitates downstream applications such as neural architecture search and enhances our understanding of redundancy in large graphs. Despite the rapid development of GC methods, a systematic evaluation framework remains absent, which is necessary to clarify the critical designs for particular evaluative aspects. Furthermore, several meaningful questions have not been investigated, such as whether GC inherently preserves certain graph properties and offers robustness even without targeted design efforts. In this paper, we introduce GC-Bench, a comprehensive framework to evaluate recent GC methods across multiple dimensions and to generate new insights. Our experimental findings provide a deeper insights into the GC process and the characteristics of condensed graphs, guiding future efforts in enhancing performance and exploring new applications. Our code is available at \url{https://github.com/Emory-Melody/GraphSlim/tree/main/benchmark}.

Updated: 2024-06-24 15:17:49

标题: GC-Bench：一种用于图结构压缩的基准测试框架，具有新的见解

摘要: 图压缩（GC）是一种新兴技术，旨在学习一个保留原始图的基本信息的显著较小的图。这种压缩图已经显示出在加速图神经网络的同时保持与使用原始、较大图相当的性能的潜力。此外，这种技术有助于下游应用，如神经结构搜索，并增强我们对大图中冗余的理解。尽管GC方法迅速发展，但缺乏系统评估框架，这对于澄清特定评估方面的关键设计是必要的。此外，还有一些有意义的问题尚未得到研究，比如GC是否本质上保留了某些图属性，并且即使没有特定设计努力也能提供鲁棒性。在本文中，我们介绍了GC-Bench，一个综合评估最近GC方法的框架，跨多个维度生成新的见解。我们的实验结果提供了对GC过程和压缩图特性的更深入了解，指导未来在提升性能和探索新应用方面的努力。我们的代码可以在\url{https://github.com/Emory-Melody/GraphSlim/tree/main/benchmark}找到。

更新时间: 2024-06-24 15:17:49

领域: cs.LG

下载: http://arxiv.org/abs/2406.16715v1

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding tasks. As these unexpected errors could lead to severe consequences in practical deployments, it is crucial to investigate the limitations within LLMs systematically. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies, while manual inspections are costly and not scalable. In this paper, we introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks. Inspired by the educational assessment process that measures students' learning outcomes, AutoDetect consists of three LLM-powered agents: Examiner, Questioner, and Assessor. The collaboration among these three agents is designed to realize comprehensive and in-depth weakness identification. Our framework demonstrates significant success in uncovering flaws, with an identification success rate exceeding 30% in prominent models such as ChatGPT and Claude. More importantly, these identified weaknesses can guide specific model improvements, proving more effective than untargeted data augmentation methods like Self-Instruct. Our approach has led to substantial enhancements in popular LLMs, including the Llama series and Mistral-7b, boosting their performance by over 10% across several benchmarks. Code and data are publicly available at https://github.com/thu-coai/AutoDetect.

Updated: 2024-06-24 15:16:45

标题: AutoDetect：朝向大型语言模型自动弱点检测的统一框架

摘要: 尽管大型语言模型（LLMs）变得越来越强大，但它们仍然表现出明显但微妙的弱点，例如在遵循指令或编码任务中出现错误。由于这些意外错误可能导致实际部署中的严重后果，系统地调查LLMs内部的限制至关重要。传统的基准测试方法无法全面指出特定模型的缺陷，而手动检查成本高且不可扩展。在本文中，我们介绍了一个统一的框架AutoDetect，用于自动暴露LLMs在各种任务中的弱点。受教育评估过程的启发，该过程测量学生的学习成果，AutoDetect由三个LLM驱动的代理组成：考官，提问者和评估者。这三个代理之间的协作旨在实现全面和深入的弱点识别。我们的框架在发现缺陷方面取得了显著成功，对于ChatGPT和Claude等知名模型，识别成功率超过30％。更重要的是，这些确定的弱点可以指导特定模型的改进，证明比像Self-Instruct这样的非定向数据增强方法更有效。我们的方法已经在流行的LLMs中取得了显著的增强，包括Llama系列和Mistral-7b，将它们在几个基准测试中的性能提高了超过10％。代码和数据可在https://github.com/thu-coai/AutoDetect上公开获得。

更新时间: 2024-06-24 15:16:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16714v1

Improving the Diversity of Bootstrapped DQN by Replacing Priors With Noise

Q-learning is one of the most well-known Reinforcement Learning algorithms. There have been tremendous efforts to develop this algorithm using neural networks. Bootstrapped Deep Q-Learning Network is amongst them. It utilizes multiple neural network heads to introduce diversity into Q-learning. Diversity can sometimes be viewed as the amount of reasonable moves an agent can take at a given state, analogous to the definition of the exploration ratio in RL. Thus, the performance of Bootstrapped Deep Q-Learning Network is deeply connected with the level of diversity within the algorithm. In the original research, it was pointed out that a random prior could improve the performance of the model. In this article, we further explore the possibility of replacing priors with noise and sample the noise from a Gaussian distribution to introduce more diversity into this algorithm. We conduct our experiment on the Atari benchmark and compare our algorithm to both the original and other related algorithms. The results show that our modification of the Bootstrapped Deep Q-Learning algorithm achieves significantly higher evaluation scores across different types of Atari games. Thus, we conclude that replacing priors with noise can improve Bootstrapped Deep Q-Learning's performance by ensuring the integrity of diversities.

Updated: 2024-06-24 15:09:35

标题: 用噪声替换先验知识改进引导式深度 Q 网络的多样性

摘要: Q学习是最知名的强化学习算法之一。已经进行了大量工作，使用神经网络开发这种算法。引入多个神经网络头部的自举深度Q学习网络是其中之一。它利用多个神经网络头部为Q学习引入多样性。多样性有时可以被视为代理在给定状态下可以采取的合理动作数量，类似于RL中探索比率的定义。因此，自举深度Q学习网络的性能与算法内的多样性水平深深相关。在原始研究中指出，随机先验可以提高模型的性能。在本文中，我们进一步探讨了用噪声替换先验的可能性，并从高斯分布中对噪声进行采样，以增加该算法的多样性。我们在Atari基准上进行实验，并将我们的算法与原始算法和其他相关算法进行比较。结果显示，我们对自举深度Q学习算法的修改在不同类型的Atari游戏中实现了显著更高的评估分数。因此，我们得出结论，用噪声替换先验可以通过确保多样性的完整性来提高自举深度Q学习的性能。

更新时间: 2024-06-24 15:09:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2203.01004v3

CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

Temporal causal discovery is a crucial task aimed at uncovering the causal relations within time series data. The latest temporal causal discovery methods usually train deep learning models on prediction tasks to uncover the causality between time series. They capture causal relations by analyzing the parameters of some components of the trained models, e.g., attention weights and convolution weights. However, this is an incomplete mapping process from the model parameters to the causality and fails to investigate the other components, e.g., fully connected layers and activation functions, that are also significant for causal discovery. To facilitate the utilization of the whole deep learning models in temporal causal discovery, we proposed an interpretable transformer-based causal discovery model termed CausalFormer, which consists of the causality-aware transformer and the decomposition-based causality detector. The causality-aware transformer learns the causal representation of time series data using a prediction task with the designed multi-kernel causal convolution which aggregates each input time series along the temporal dimension under the temporal priority constraint. Then, the decomposition-based causality detector interprets the global structure of the trained causality-aware transformer with the proposed regression relevance propagation to identify potential causal relations and finally construct the causal graph. Experiments on synthetic, simulated, and real datasets demonstrate the state-of-the-art performance of CausalFormer on discovering temporal causality. Our code is available at https://github.com/lingbai-kong/CausalFormer.

Updated: 2024-06-24 15:09:29

标题: CausalFormer：一种用于时间因果发现的可解释Transformer

摘要: 时间因果发现是一个关键任务，旨在揭示时间序列数据中的因果关系。最新的时间因果发现方法通常在预测任务上训练深度学习模型，以揭示时间序列之间的因果关系。它们通过分析训练模型中的某些组件的参数（例如注意力权重和卷积权重）来捕捉因果关系。然而，这是一个从模型参数到因果关系的不完全映射过程，未能调查其他组件（例如全连接层和激活函数），这些组件对因果发现也很重要。为了促进在时间因果发现中利用整个深度学习模型，我们提出了一种可解释的基于Transformer的因果发现模型，称为CausalFormer，它由具有因果意识的Transformer和基于分解的因果检测器组成。因果感知Transformer使用设计的多核因果卷积在时间优先约束下沿时间维度聚合每个输入时间序列，学习时间序列数据的因果表示。然后，基于分解的因果检测器利用提出的回归相关传播解释训练的因果感知Transformer的全局结构，识别潜在的因果关系，并最终构建因果图。在合成、模拟和真实数据集上进行的实验表明CausalFormer在发现时间因果方面具有最先进的性能。我们的代码可在https://github.com/lingbai-kong/CausalFormer 上找到。

更新时间: 2024-06-24 15:09:29

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.16708v1

Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

In goal-conditioned hierarchical reinforcement learning (HRL), a high-level policy specifies a subgoal for the low-level policy to reach. Effective HRL hinges on a suitable subgoal represen tation function, abstracting state space into latent subgoal space and inducing varied low-level behaviors. Existing methods adopt a subgoal representation that provides a deterministic mapping from state space to latent subgoal space. Instead, this paper utilizes Gaussian Processes (GPs) for the first probabilistic subgoal representation. Our method employs a GP prior on the latent subgoal space to learn a posterior distribution over the subgoal representation functions while exploiting the long-range correlation in the state space through learnable kernels. This enables an adaptive memory that integrates long-range subgoal information from prior planning steps allowing to cope with stochastic uncertainties. Furthermore, we propose a novel learning objective to facilitate the simultaneous learning of probabilistic subgoal representations and policies within a unified framework. In experiments, our approach outperforms state-of-the-art baselines in standard benchmarks but also in environments with stochastic elements and under diverse reward conditions. Additionally, our model shows promising capabilities in transferring low-level policies across different tasks.

Updated: 2024-06-24 15:09:22

标题: Hierarchical Reinforcement Learning的概率子目标表示

摘要: 在目标条件层次强化学习（HRL）中，高层策略指定一个子目标供低层策略达到。有效的HRL取决于合适的子目标表示函数，将状态空间抽象为潜在的子目标空间，并诱导出不同的低层行为。现有方法采用提供从状态空间到潜在子目标空间的确定性映射的子目标表示。相反，本文首次利用高斯过程（GPs）进行概率子目标表示。我们的方法在潜在子目标空间上采用GP先验来学习子目标表示函数的后验分布，同时利用可学习的核函数在状态空间中利用长程相关性。这使得具有自适应记忆，能够整合来自先前规划步骤的长程子目标信息，从而能够应对随机不确定性。此外，我们提出了一种新颖的学习目标，以促进在统一框架内同时学习概率子目标表示和策略。在实验中，我们的方法在标准基准测试中表现优于最先进的基线，同时在具有随机元素和不同奖励条件的环境中也具有优势。此外，我们的模型在不同任务之间转移低层策略方面展现了有希望的能力。

更新时间: 2024-06-24 15:09:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16707v1

Incorporating temporal dynamics of mutations to enhance the prediction capability of antiretroviral therapy's outcome for HIV-1

Motivation: In predicting HIV therapy outcomes, a critical clinical question is whether using historical information can enhance predictive capabilities compared with current or latest available data analysis. This study analyses whether historical knowledge, which includes viral mutations detected in all genotypic tests before therapy, their temporal occurrence, and concomitant viral load measurements, can bring improvements. We introduce a method to weigh mutations, considering the previously enumerated factors and the reference mutation-drug Stanford resistance tables. We compare a model encompassing history (H) with one not using it (NH). Results: The H-model demonstrates superior discriminative ability, with a higher ROC-AUC score (76.34%) than the NH-model (74.98%). Significant Wilcoxon test results confirm that incorporating historical information improves consistently predictive accuracy for treatment outcomes. The better performance of the H-model might be attributed to its consideration of latent HIV reservoirs, probably obtained when leveraging historical information. The findings emphasize the importance of temporal dynamics in mutations, offering insights into HIV infection complexities. However, our result also shows that prediction accuracy remains relatively high even when no historical information is available. Supplementary information: Supplementary material is available.

Updated: 2024-06-24 15:07:56

标题: 将突变的时间动态纳入其中，以增强HIV-1抗逆转录病毒治疗结果预测能力

摘要: 动机：在预测HIV疗法结果时，一个关键的临床问题是是否使用历史信息可以增强预测能力，与当前或最新可用的数据分析相比。本研究分析了历史知识是否可以带来改进，其中包括在治疗前检测到的所有基因型测试中检测到的病毒突变、它们的时间发生以及伴随的病毒载量测量。我们引入了一种方法来权衡突变，考虑先前列举的因素和参考突变-药物Stanford抗药性表。我们比较了一个包含历史记录（H）的模型和一个不使用历史记录（NH）的模型。结果：H模型表现出优越的判别能力，ROC-AUC分数（76.34%）高于NH模型（74.98%）。显著的Wilcoxon检验结果证实，纳入历史信息可以持续提高治疗结果的预测准确性。H模型的更好表现可能归因于其考虑到潜在的HIV储存库，可能是在利用历史信息时获得的。这些发现强调了突变的时间动态在HIV感染复杂性中的重要性。然而，我们的结果也显示，即使没有历史信息可用，预测准确性仍然相对较高。补充信息：提供了补充材料。

更新时间: 2024-06-24 15:07:56

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2311.04846v2

A Systematic Review of Few-Shot Learning in Medical Imaging

The lack of annotated medical images limits the performance of deep learning models, which usually need large-scale labelled datasets. Few-shot learning techniques can reduce data scarcity issues and enhance medical image analysis, especially with meta-learning. This systematic review gives a comprehensive overview of few-shot learning in medical imaging. We searched the literature systematically and selected 80 relevant articles published from 2018 to 2023. We clustered the articles based on medical outcomes, such as tumour segmentation, disease classification, and image registration; anatomical structure investigated (i.e. heart, lung, etc.); and the meta-learning method used. For each cluster, we examined the papers' distributions and the results provided by the state-of-the-art. In addition, we identified a generic pipeline shared among all the studies. The review shows that few-shot learning can overcome data scarcity in most outcomes and that meta-learning is a popular choice to perform few-shot learning because it can adapt to new tasks with few labelled samples. In addition, following meta-learning, supervised learning and semi-supervised learning stand out as the predominant techniques employed to tackle few-shot learning challenges in medical imaging and also best performing. Lastly, we observed that the primary application areas predominantly encompass cardiac, pulmonary, and abdominal domains. This systematic review aims to inspire further research to improve medical image analysis and patient care.

Updated: 2024-06-24 15:03:52

标题: 《医学影像中少样本学习的系统综述》

摘要: 缺乏标注的医学图像限制了深度学习模型的性能，这些模型通常需要大规模标记的数据集。少样本学习技术可以减少数据稀缺问题，并增强医学图像分析，特别是在元学习方面。本系统性综述全面概述了医学成像中的少样本学习。我们系统地搜索文献，并选择了2018年至2023年发表的80篇相关文章。我们根据医学结果（如肿瘤分割、疾病分类和图像配准）、研究的解剖结构（如心脏、肺等）以及使用的元学习方法对文章进行了聚类。针对每个簇，我们检查了文章的分布以及最先进的结果。此外，我们还确定了所有研究中共享的通用流程。综述显示，少样本学习可以克服大多数结果中的数据稀缺问题，并且元学习是执行少样本学习的热门选择，因为它可以适应具有少量标记样本的新任务。此外，随着元学习的发展，监督学习和半监督学习成为医学成像中应对少样本学习挑战的主要技术，并且表现最佳。最后，我们观察到主要的应用领域主要涵盖心脏、肺部和腹部领域。本系统性综述旨在激发进一步研究，以改进医学图像分析和患者护理。

更新时间: 2024-06-24 15:03:52

领域: cs.CV,cs.AI,I.2.6; I.4; I.5; J.3

下载: http://arxiv.org/abs/2309.11433v2

Learning Interpretable Fair Representations

Numerous approaches have been recently proposed for learning fair representations that mitigate unfair outcomes in prediction tasks. A key motivation for these methods is that the representations can be used by third parties with unknown objectives. However, because current fair representations are generally not interpretable, the third party cannot use these fair representations for exploration, or to obtain any additional insights, besides the pre-contracted prediction tasks. Thus, to increase data utility beyond prediction tasks, we argue that the representations need to be fair, yet interpretable. We propose a general framework for learning interpretable fair representations by introducing an interpretable "prior knowledge" during the representation learning process. We implement this idea and conduct experiments with ColorMNIST and Dsprite datasets. The results indicate that in addition to being interpretable, our representations attain slightly higher accuracy and fairer outcomes in a downstream classification task compared to state-of-the-art fair representations.

Updated: 2024-06-24 15:01:05

标题: 学习可解释的公平表示

摘要: 最近提出了许多方法来学习公平表示，以减轻预测任务中的不公平结果。这些方法的一个关键动机是，表示可以被具有未知目标的第三方使用。然而，由于当前的公平表示通常不具有可解释性，第三方无法利用这些公平表示进行探索，或获得除预先约定的预测任务之外的任何额外见解。因此，为了增加数据效用超出预测任务，我们认为表示需要既公平又可解释。我们提出了一个学习可解释公平表示的通用框架，通过在表示学习过程中引入可解释的“先验知识”来实现这一目标。我们实施了这一想法，并在ColorMNIST和Dsprite数据集上进行了实验。结果表明，除了具有可解释性外，与最先进的公平表示相比，我们的表示在下游分类任务中实现了略高的准确性和更加公平的结果。

更新时间: 2024-06-24 15:01:05

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2406.16698v1

Expected Runtime Comparisons Between Breadth-First Search and Constant-Depth Restarting Random Walks

When greedy search algorithms encounter a local minima or plateau, the search typically devolves into a breadth-first search (BrFS), or a local search technique is used in an attempt to find a way out. In this work, we formally analyze the performance of BrFS and constant-depth restarting random walks (RRW) -- two methods often used for finding exits to a plateau/local minima -- to better understand when each is best suited. In particular, we formally derive the expected runtime for BrFS in the case of a uniformly distributed set of goals at a given goal depth. We then prove RRW will be faster than BrFS on trees if there are enough goals at that goal depth. We refer to this threshold as the crossover point. Our bound shows that the crossover point grows linearly with the branching factor of the tree, the goal depth, and the error in the random walk depth, while the size of the tree grows exponentially in branching factor and goal depth. Finally, we discuss the practical implications and applicability of this bound.

Updated: 2024-06-24 15:00:59

标题: 广度优先搜索和恒定深度重启随机游走之间的预期运行时间比较

摘要: 当贪婪搜索算法遇到局部最小值或平台时，搜索通常会退化为广度优先搜索（BrFS），或者使用局部搜索技术来尝试找到一条出路。在这项工作中，我们正式分析了BrFS和恒定深度重启随机漫步（RRW）的性能，这两种方法通常用于寻找平台/局部最小值的出口，以更好地理解每种方法何时最适用。具体来说，我们正式推导了在给定目标深度处均匀分布的目标集合的BrFS的预期运行时间。然后我们证明，如果在目标深度有足够的目标，RRW将比BrFS更快地在树上运行。我们将这个阈值称为交叉点。我们的界限显示，交叉点与树的分支因子、目标深度和随机漫步深度中的误差呈线性增长，而树的大小在分支因子和目标深度上呈指数增长。最后，我们讨论了这个界限的实际影响和适用性。

更新时间: 2024-06-24 15:00:59

领域: cs.AI

下载: http://arxiv.org/abs/2406.16697v1

Public Constitutional AI

We are increasingly subjected to the power of AI authorities. As AI decisions become inescapable, entering domains such as healthcare, education, and law, we must confront a vital question: how can we ensure AI systems have the legitimacy necessary for effective governance? This essay argues that to secure AI legitimacy, we need methods that engage the public in designing and constraining AI systems, ensuring these technologies reflect the community's shared values. Constitutional AI, proposed by Anthropic, represents a step towards this goal, offering a model for democratic control of AI. However, while Constitutional AI's commitment to hardcoding explicit principles into AI models enhances transparency and accountability, it falls short in two crucial aspects: addressing the opacity of individual AI decisions and fostering genuine democratic legitimacy. To overcome these limitations, this essay proposes "Public Constitutional AI." This approach envisions a participatory process where diverse stakeholders, including ordinary citizens, deliberate on the principles guiding AI development. The resulting "AI Constitution" would carry the legitimacy of popular authorship, grounding AI governance in the public will. Furthermore, the essay proposes "AI Courts" to develop "AI case law," providing concrete examples for operationalizing constitutional principles in AI training. This evolving combination of constitutional principles and case law aims to make AI governance more responsive to public values. By grounding AI governance in deliberative democratic processes, Public Constitutional AI offers a path to imbue automated authorities with genuine democratic legitimacy, addressing the unique challenges posed by increasingly powerful AI systems while ensuring their alignment with the public interest.

Updated: 2024-06-24 15:00:01

标题: 公共宪法人工智能

摘要: 我们越来越受到人工智能机构的权力支配。随着人工智能决策变得不可避免，进入医疗保健、教育和法律等领域，我们必须面对一个重要问题：我们如何确保人工智能系统具有有效治理所必需的合法性？本文认为，为了确保人工智能的合法性，我们需要一种能够让公众参与设计和约束人工智能系统的方法，确保这些技术反映社区共享的价值观。由Anthropic提出的“宪法人工智能”代表了朝着这个目标迈出的一步，提供了一个民主控制人工智能的模型。然而，虽然宪法人工智能致力于将明确原则硬编码到人工智能模型中以增强透明度和问责性，但在两个关键方面存在不足：解决个体人工智能决策的不透明性和促进真正的民主合法性。为了克服这些限制，本文提出了“公共宪法人工智能”。这种方法设想了一个参与性过程，各种利益相关者，包括普通公民，就指导人工智能发展的原则进行协商。由此产生的“人工智能宪法”将具有广泛的作者权威性，将人工智能治理落实在公众意志中。此外，本文提出了“人工智能法庭”来制定“人工智能判例法”，为在人工智能培训中操作宪法原则提供具体例子。这种不断发展的宪法原则和判例法的结合旨在使人工智能治理更加符合公共价值观。通过将人工智能治理根植于审慎的民主程序，公共宪法人工智能为赋予自动权威真正的民主合法性提供了一条道路，解决了日益强大的人工智能系统所带来的独特挑战，同时确保其与公共利益的一致。

更新时间: 2024-06-24 15:00:01

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.16696v1

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individuals' group identities. To address this gap, we introduce GIEBench, a comprehensive benchmark that includes 11 identity dimensions, covering 97 group identities with a total of 999 single-choice questions related to specific group identities. GIEBench is designed to evaluate the empathy of LLMs when presented with specific group identities such as gender, age, occupation, and race, emphasizing their ability to respond from the standpoint of the identified group. This supports the ongoing development of empathetic LLM applications tailored to users with different identities. Our evaluation of 23 LLMs revealed that while these LLMs understand different identity standpoints, they fail to consistently exhibit equal empathy across these identities without explicit instructions to adopt those perspectives. This highlights the need for improved alignment of LLMs with diverse values to better accommodate the multifaceted nature of human identities. Our datasets are available at https://github.com/GIEBench/GIEBench.

Updated: 2024-06-24 14:57:18

标题: GIEBench：面向大型语言模型的基于群体身份的共情的全面评估

摘要: 随着大型语言模型（LLMs）的持续发展和广泛应用，LLMs表现出对多元群体身份的同理心并理解他们的视角的能力越来越受到重视。大多数现有的用于评估LLMs同理心的基准主要关注普遍的人类情感，如悲伤和痛苦，往往忽视个体群体身份的背景。为了填补这一空白，我们推出了GIEBench，这是一个包括11个身份维度的综合基准，涵盖97个群体身份，总共有999个与特定群体身份相关的单选题。GIEBench旨在评估LLMs在面对特定群体身份（如性别、年龄、职业和种族）时的同理心，强调它们从被识别群体的立场来回应的能力。这支持了针对具有不同身份的用户定制的具有同理心的LLMs应用的持续发展。我们对23个LLMs的评估显示，虽然这些LLMs理解不同的身份立场，但在没有明确指示采用这些视角的情况下，它们未能始终展现出对这些身份的同等同理心。这凸显出了需要改进LLMs与多元价值观的协调以更好地适应人类身份的多方面性的需求。我们的数据集可在https://github.com/GIEBench/GIEBench 上获得。

更新时间: 2024-06-24 14:57:18

领域: cs.AI

下载: http://arxiv.org/abs/2406.14903v2

Deep Reinforcement Learning with Swin Transformers

Transformers are neural network models that utilize multiple layers of self-attention heads and have exhibited enormous potential in natural language processing tasks. Meanwhile, there have been efforts to adapt transformers to visual tasks of machine learning, including Vision Transformers and Swin Transformers. Although some researchers use Vision Transformers for reinforcement learning tasks, their experiments remain at a small scale due to the high computational cost. This article presents the first online reinforcement learning scheme that is based on Swin Transformers: Swin DQN. In contrast to existing research, our novel approach demonstrate the superior performance with experiments on 49 games in the Arcade Learning Environment. The results show that our approach achieves significantly higher maximal evaluation scores than the baseline method in 45 of all the 49 games (92%), and higher mean evaluation scores than the baseline method in 40 of all the 49 games (82%).

Updated: 2024-06-24 14:54:37

标题: 使用Swin Transformers进行深度强化学习

摘要: Transformers是一种神经网络模型，利用多层自注意力头，在自然语言处理任务中展现出巨大潜力。与此同时，一些努力将transformers调整为机器学习的视觉任务，包括Vision Transformers和Swin Transformers。尽管一些研究人员将Vision Transformers用于强化学习任务，但由于高计算成本，他们的实验仍然规模较小。本文提出了基于Swin Transformers的第一个在线强化学习方案：Swin DQN。与现有研究相反，我们的新方法在Arcade Learning Environment中的49个游戏实验中展现出了卓越的性能。结果显示，我们的方法在所有49个游戏中的45个（92%）中取得了显著更高的最大评估分数，以及在所有49个游戏中的40个（82%）中取得了比基准方法更高的平均评估分数。

更新时间: 2024-06-24 14:54:37

领域: cs.LG

下载: http://arxiv.org/abs/2206.15269v4

Coding schemes in neural networks learning classification tasks

Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate fully-connected, wide neural networks learning classification tasks using the Bayesian framework where learning shapes the posterior distribution of the network weights. Consistent with previous findings, our analysis of the feature learning regime (also known as `non-lazy', `rich', or `mean-field' regime) shows that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity. In linear networks, an analog coding scheme of the task emerges. Despite the strong representations, the mean predictor is identical to the lazy case. In nonlinear networks, spontaneous symmetry breaking leads to either redundant or sparse coding schemes. Our findings highlight how network properties such as scaling of weights and neuronal nonlinearity can profoundly influence the emergent representations.

Updated: 2024-06-24 14:50:05

标题: 神经网络学习分类任务中的编码方案

摘要: 神经网络具有生成任务相关特征有意义表示的关键能力。事实上，通过适当的缩放，神经网络中的监督学习可以导致强大的、任务相关的特征学习。然而，我们称之为“编码方案”的新出现表示的性质仍不清楚。为了理解新出现的编码方案，我们使用贝叶斯框架研究了学习形状网络权重的后验分布的全连接、宽神经网络学习分类任务。与先前的发现一致，我们对特征学习范式（也称为“非懒惰”、“丰富”或“平均场”范式）的分析显示网络获得了强大的、数据相关的特征。令人惊讶的是，内部表示的性质关键取决于神经元的非线性。在线性网络中，任务的模拟编码方案出现。尽管有强大的表示，但平均预测器与懒惰情况相同。在非线性网络中，自发的对称性破缺导致冗余或稀疏编码方案。我们的发现突出了网络属性，如权重的缩放和神经元的非线性，可以深刻影响新出现的表示。

更新时间: 2024-06-24 14:50:05

领域: cs.LG,cond-mat.dis-nn,cond-mat.stat-mech,stat.ML

下载: http://arxiv.org/abs/2406.16689v1

Watermark Stealing in Large Language Models

LLM watermarking has attracted attention as a promising way to detect AI-generated content, with some works suggesting that current schemes may already be fit for deployment. In this work we dispute this claim, identifying watermark stealing (WS) as a fundamental vulnerability of these schemes. We show that querying the API of the watermarked LLM to approximately reverse-engineer a watermark enables practical spoofing attacks, as hypothesized in prior work, but also greatly boosts scrubbing attacks, which was previously unnoticed. We are the first to propose an automated WS algorithm and use it in the first comprehensive study of spoofing and scrubbing in realistic settings. We show that for under $50 an attacker can both spoof and scrub state-of-the-art schemes previously considered safe, with average success rate of over 80%. Our findings challenge common beliefs about LLM watermarking, stressing the need for more robust schemes. We make all our code and additional examples available at https://watermark-stealing.org.

Updated: 2024-06-24 14:48:29

标题: 大型语言模型中的水印窃取

摘要: LLM数字水印技术作为一种检测人工智能生成内容的有效方法引起了关注，一些研究表明当前的方案可能已经准备部署。然而，本研究对此提出质疑，指出数字水印窃取（WS）是这些方案的一个基本漏洞。我们发现，通过查询带有数字水印的LLM的API来近似反向工程水印，从而实现实用的欺骗攻击，这与先前的研究假设一致，但还大大提高了清除攻击的效果，这一点先前并未引起注意。我们是第一个提出自动WS算法并将其用于在真实环境中进行欺骗和清除的全面研究。我们发现，攻击者仅需不到50美元就可以欺骗和清除之前被认为安全的最先进的方案，平均成功率超过80%。我们的研究挑战了人们对LLM数字水印技术的普遍信念，强调了更加健壮方案的必要性。我们将所有代码和其他示例提供在https://watermark-stealing.org。

更新时间: 2024-06-24 14:48:29

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2402.19361v2

Link Prediction with Untrained Message Passing Layers

Message passing neural networks (MPNNs) operate on graphs by exchanging information between neigbouring nodes. MPNNs have been successfully applied to various node-, edge-, and graph-level tasks in areas like molecular science, computer vision, natural language processing, and combinatorial optimization. However, most MPNNs require training on large amounts of labeled data, which can be costly and time-consuming. In this work, we explore the use of various untrained message passing layers in graph neural networks, i.e. variants of popular message passing architecture where we remove all trainable parameters that are used to transform node features in the message passing step. Focusing on link prediction, we find that untrained message passing layers can lead to competitive and even superior performance compared to fully trained MPNNs, especially in the presence of high-dimensional features. We provide a theoretical analysis of untrained message passing by relating the inner products of features implicitly produced by untrained message passing layers to path-based topological node similarity measures. As such, untrained message passing architectures can be viewed as a highly efficient and interpretable approach to link prediction.

Updated: 2024-06-24 14:46:34

标题: 使用未经训练的消息传递层进行链路预测

摘要: 消息传递神经网络（MPNNs）通过在节点之间交换信息来在图上操作。MPNNs已成功应用于分子科学、计算机视觉、自然语言处理和组合优化等领域的各种节点、边和图级任务。然而，大多数MPNNs需要在大量标记数据上进行训练，这可能既昂贵又耗时。在这项工作中，我们探讨了在图神经网络中使用各种未经训练的消息传递层的可能性，即一种流行消息传递架构的变体，其中我们删除了在消息传递步骤中用于转换节点特征的所有可训练参数。专注于链接预测，我们发现未经训练的消息传递层可以在高维特征存在时导致竞争性甚至优越的性能，与完全训练的MPNNs相比。我们通过将未经训练的消息传递层隐式产生的特征的内积与基于路径的拓扑节点相似性度量相关联，提供了未经训练的消息传递的理论分析。因此，未经训练的消息传递架构可以被视为一种高效且可解释的链接预测方法。

更新时间: 2024-06-24 14:46:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16687v1

Repulsive Score Distillation for Diverse Sampling of Diffusion Models

Score distillation sampling has been pivotal for integrating diffusion models into generation of complex visuals. Despite impressive results it suffers from mode collapse and lack of diversity. To cope with this challenge, we leverage the gradient flow interpretation of score distillation to propose Repulsive Score Distillation (RSD). In particular, we propose a variational framework based on repulsion of an ensemble of particles that promotes diversity. Using a variational approximation that incorporates a coupling among particles, the repulsion appears as a simple regularization that allows interaction of particles based on their relative pairwise similarity, measured e.g., via radial basis kernels. We design RSD for both unconstrained and constrained sampling scenarios. For constrained sampling we focus on inverse problems in the latent space that leads to an augmented variational formulation, that strikes a good balance between compute, quality and diversity. Our extensive experiments for text-to-image generation, and inverse problems demonstrate that RSD achieves a superior trade-off between diversity and quality compared with state-of-the-art alternatives.

Updated: 2024-06-24 14:43:02

标题: 排斥分数蒸馏用于扩散模型多样采样

摘要: 分数蒸馏采样在将扩散模型整合到复杂视觉生成中起着关键作用。尽管取得了令人印象深刻的结果，但它仍然存在模式崩溃和缺乏多样性的问题。为了应对这一挑战，我们利用分数蒸馏的梯度流解释提出了排斥分数蒸馏（RSD）。具体来说，我们提出了一个基于排斥粒子集合的多样性促进的变分框架。通过一个包含粒子之间耦合的变分近似，排斥作为一个简单的正则化出现，允许基于它们的相对成对相似性进行粒子之间的交互，例如通过径向基核来衡量。我们设计了适用于无约束和有约束采样场景的RSD。对于有约束采样，我们侧重于潜在空间中的逆问题，这导致了一个增强的变分公式，它在计算、质量和多样性之间取得了良好的平衡。我们对文本到图像生成和逆问题进行了大量实验，结果表明与最先进的替代方案相比，RSD在多样性和质量之间实现了卓越的权衡。

更新时间: 2024-06-24 14:43:02

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.16683v1

A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design

Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin-proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. As the field evolves, it becomes increasingly apparent that the traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we explore the impact of ML on de novo PROTAC design $-$ an aspect of molecular design that has not been comprehensively reviewed despite its significance. We delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for researchers in their pursuit of better design strategies for this new modality.

Updated: 2024-06-24 14:42:27

标题: 一篇综合评述：机器学习在De Novo PROTAC设计中新兴方法的综述

摘要: 靶向蛋白质降解（TPD）是现代药物发现领域中一个快速发展的领域，旨在通过利用细胞固有的降解途径，选择性地靶向和降解与疾病相关的蛋白质，从而调节蛋白质的细胞内水平。这种策略为那些基于占位的抑制剂未能成功的情况创造了新的治疗干预机会。蛋白酶靶向嵌合物（PROTACs）是TPD策略的核心，利用泛素-蛋白酶体系统对致病蛋白的选择性靶向和蛋白酶体降解。随着领域的发展，传统的设计这种复杂分子的方法存在局限性变得越来越明显。这导致了利用机器学习（ML）和生成建模来改进和加速开发过程。在这篇综述中，我们探讨了ML对于de novo PROTAC设计的影响 - 这是一种尽管具有重要意义但尚未全面审查的分子设计方面。我们深入探讨了PROTAC连接物设计的独特特征，强调了创建能够实现TPD的有效双功能分子所需的复杂性。然后，我们研究了在基于片段的药物设计（FBDD）背景下，ML如何为PROTAC连接物设计铺平道路，这在小分子药物发现领域得到了磨练。我们的综述对将这种方法应用于PROTAC开发的复杂领域中的局限性进行了关键评估。此外，我们还回顾了应用于PROTAC设计的现有ML作品，突出了开拓性努力以及这些研究所面临的重要局限性。通过提供对PROTAC开发的当前状态和ML在PROTAC设计中的重要作用的见解，我们旨在为研究人员在追求更好的设计策略这种新型药物形式中提供有价值的视角。

更新时间: 2024-06-24 14:42:27

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2406.16681v1

Benchmarking mortality risk prediction from electrocardiograms

Several recent high-impact studies leverage large hospital-owned electrocardiographic (ECG) databases to model and predict patient mortality. MIMIC-IV, released September 2023, is the first comparable public dataset and includes 800,000 ECGs from a U.S. hospital system. Previously, the largest public ECG dataset was Code-15, containing 345,000 ECGs collected during routine care in Brazil. These datasets now provide an excellent resource for a broader audience to explore ECG survival modeling. Here, we benchmark survival model performance on Code-15 and MIMIC-IV with two neural network architectures, compare four deep survival modeling approaches to Cox regressions trained on classifier outputs, and evaluate performance at one to ten years. Our results yield AUROC and concordance scores comparable to past work (circa 0.8) and reasonable AUPRC scores (MIMIC-IV: 0.4-0.5, Code-15: 0.05-0.13) considering the fraction of ECG samples linked to a mortality (MIMIC-IV: 27\%, Code-15: 4\%). When evaluating models on the opposite dataset, AUROC and concordance values drop by 0.1-0.15, which may be due to cohort differences. All code and results are made public.

Updated: 2024-06-24 14:37:17

标题: 基于心电图的死亡风险预测的基准比较

摘要: 最近几项具有高影响力的研究利用大型医院拥有的心电图（ECG）数据库对患者死亡进行建模和预测。MIMIC-IV是2023年9月发布的第一个可比较的公共数据集，包括来自美国医院系统的80万份心电图。之前，最大的公共心电图数据集是Code-15，其中包含在巴西日常护理中收集的34.5万份心电图。这些数据集现在为更广泛的受众提供了探索心电图生存建模的优秀资源。在这里，我们使用两种神经网络架构在Code-15和MIMIC-IV上对生存模型性能进行基准测试，比较了四种深度生存建模方法与基于分类器输出训练的Cox回归，评估了一至十年的性能。我们的结果产生的AUROC和一致性得分与过去的工作（约0.8）相当，AUPRC得分（MIMIC-IV：0.4-0.5，Code-15：0.05-0.13）在考虑到与死亡相关的心电图样本的比例（MIMIC-IV：27\%，Code-15：4\%）时是合理的。在对相反数据集评估模型时，AUROC和一致性值下降了0.1-0.15，这可能是由于队列差异所致。所有代码和结果都已公开。

更新时间: 2024-06-24 14:37:17

领域: eess.SP,cs.LG,cs.NE,stat.AP

下载: http://arxiv.org/abs/2406.17002v1

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

Segmenting text into sentences plays an early and crucial role in many NLP systems. This is commonly achieved by using rule-based or statistical methods relying on lexical features such as punctuation. Although some recent works no longer exclusively rely on punctuation, we find that no prior method achieves all of (i) robustness to missing punctuation, (ii) effective adaptability to new domains, and (iii) high efficiency. We introduce a new model - Segment any Text (SaT) - to solve this problem. To enhance robustness, we propose a new pretraining scheme that ensures less reliance on punctuation. To address adaptability, we introduce an extra stage of parameter-efficient fine-tuning, establishing state-of-the-art performance in distinct domains such as verses from lyrics and legal documents. Along the way, we introduce architectural modifications that result in a threefold gain in speed over the previous state of the art and solve spurious reliance on context far in the future. Finally, we introduce a variant of our model with fine-tuning on a diverse, multilingual mixture of sentence-segmented data, acting as a drop-in replacement and enhancement for existing segmentation tools. Overall, our contributions provide a universal approach for segmenting any text. Our method outperforms all baselines - including strong LLMs - across 8 corpora spanning diverse domains and languages, especially in practically relevant situations where text is poorly formatted. Our models and code, including documentation, are available at https://huggingface.co/segment-any-text under the MIT license.

Updated: 2024-06-24 14:36:11

标题: 将文献标题翻译为：分割任何文本：一种鲁棒、高效和可适应的句子分割通用方法

摘要: 将文本分割成句子在许多自然语言处理系统中起着早期和关键的作用。通常通过使用基于规则或统计方法依赖于标点等词汇特征来实现这一点。尽管一些最近的工作不再仅仅依赖于标点，但我们发现之前没有一种方法能够实现以下所有特点：(i) 对缺失标点的鲁棒性，(ii) 对新领域的有效适应性，以及(iii) 高效性。我们引入了一种新模型 - Segment any Text (SaT) - 来解决这个问题。为了提高鲁棒性，我们提出了一种新的预训练方案，确保更少地依赖于标点。为了解决适应性问题，我们引入了额外的参数高效微调阶段，在诸如歌词和法律文件等不同领域确立了最先进的性能。在这个过程中，我们引入了架构修改，使速度比之前的最新技术提高了三倍，并解决了对未来环境中的上下文的不必要依赖。最后，我们介绍了我们模型的一个变体，在多样化的、多语言的句子分割数据上进行了微调，作为现有分割工具的一种可替代和增强。总的来说，我们的贡献提供了一种通用的文本分割方法。我们的方法在涵盖各种领域和语言的8个语料库中表现优于所有基线，包括强大的LLMs，特别是在实际相关的文本格式混乱的情况下。我们的模型和代码，包括文档，可在MIT许可下的https://huggingface.co/segment-any-text上获得。

更新时间: 2024-06-24 14:36:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16678v1

Provable Adaptivity of Adam under Non-uniform Smoothness

Adam is widely adopted in practical applications due to its fast convergence. However, its theoretical analysis is still far from satisfactory. Existing convergence analyses for Adam rely on the bounded smoothness assumption, referred to as the \emph{L-smooth condition}. Unfortunately, this assumption does not hold for many deep learning tasks. Moreover, we believe that this assumption obscures the true benefit of Adam, as the algorithm can adapt its update magnitude according to local smoothness. This important feature of Adam becomes irrelevant when assuming globally bounded smoothness. This paper studies the convergence of randomly reshuffled Adam (RR Adam) with diminishing learning rate, which is the major version of Adam adopted in deep learning tasks. We present the first convergence analysis of RR Adam without the bounded smoothness assumption. We demonstrate that RR Adam can maintain its convergence properties when smoothness is linearly bounded by the gradient norm, referred to as the \emph{$(L_0, L_1)$-smooth condition. We further compare Adam to SGD when both methods use diminishing learning rate. We refine the existing lower bound of SGD and show that SGD can be slower than Adam. To our knowledge, this is the first time that Adam and SGD are rigorously compared in the same setting and the advantage of Adam is revealed.

Updated: 2024-06-24 14:33:17

标题: Adam在非均匀平滑情况下的可证适应性

摘要: 由于其快速收敛速度，Adam在实际应用中被广泛采用。然而，其理论分析仍然远未令人满意。现有的Adam收敛分析依赖于有界平滑性假设，称为“L-平滑条件”。不幸的是，这种假设对许多深度学习任务并不成立。此外，我们认为这一假设掩盖了Adam的真正优势，因为该算法可以根据局部平滑性调整其更新幅度。当假设全局有界平滑性时，Adam的这一重要特性变得无关紧要。本文研究了随机重排的Adam（RR Adam）在学习率递减时的收敛性，这是在深度学习任务中采用的主要版本的Adam。我们首次对没有有界平滑性假设的RR Adam进行了收敛性分析。我们证明了当平滑性被梯度范数线性限制时，RR Adam可以保持其收敛特性，称为“（L_0，L_1）-平滑条件”。我们进一步比较了当两种方法都使用学习率递减时的Adam和SGD。我们改进了现有的SGD的下界，并展示了SGD可能比Adam慢。据我们了解，这是第一次在相同环境中严格比较Adam和SGD，并揭示了Adam的优势。

更新时间: 2024-06-24 14:33:17

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2208.09900v2

Towards a Science Exocortex

Artificial intelligence (AI) methods are poised to revolutionize intellectual work, with generative AI enabling automation of text analysis, text generation, and simple decision making or reasoning. The impact to science is only just beginning, but the opportunity is significant since scientific research relies fundamentally on extended chains of cognitive work. Here, we review the state of the art in agentic AI systems, and discuss how these methods could be extended to have even greater impact on science. We propose the development of an exocortex, a synthetic extension of a person's cognition. A science exocortex could be designed as a swarm of AI agents, with each agent individually streamlining specific researcher tasks, and whose inter-communication leads to emergent behavior that greatly extend the researcher's cognition and volition.

Updated: 2024-06-24 14:32:32

标题: 走向一种科学外皮质的研究

摘要: 人工智能（AI）方法正准备彻底改变知识工作，生成式AI使得文本分析、文本生成以及简单的决策或推理自动化成为可能。对科学的影响才刚刚开始，但机会是巨大的，因为科学研究基本上依赖于延长的认知链。在这里，我们回顾了代理AI系统的最新技术，讨论了这些方法如何可以扩展以对科学产生更大的影响。我们提出开发一种外围大脑（exocortex），这是一个人类认知的合成扩展。一个科学外围大脑可以设计为一群AI代理，每个代理分别简化特定的研究者任务，它们之间的互通导致紧急行为，大大扩展了研究者的认知和意志。

更新时间: 2024-06-24 14:32:32

领域: cs.AI

下载: http://arxiv.org/abs/2406.17809v1

CAVE: Controllable Authorship Verification Explanations

Authorship Verification (AV) (do two documents have the same author?) is essential for many sensitive real-life applications. AV is often used in proprietary domains that require a private, offline model, making SOTA online models like ChatGPT undesirable. Other SOTA systems use methods, e.g. Siamese Networks, that are uninterpretable, and hence cannot be trusted in high-stakes applications. In this work, we take the first step to address the above challenges with our model CAVE (Controllable Authorship Verification Explanations): CAVE generates free-text AV explanations that are controlled to be 1) structured (can be decomposed into sub-explanations with respect to relevant linguistic features), and 2) easily verified for explanation-label consistency (via intermediate labels in sub-explanations). In this work, we train a Llama-3-8B as CAVE; since there are no human-written corpora for AV explanations, we sample silver-standard explanations from GPT-4-TURBO and distill them into a pretrained Llama-3-8B. Results on three difficult AV datasets IMdB2, Blog-Auth, and FanFiction show that CAVE generates high quality explanations (as measured by automatic and human evaluation) as well as competitive task accuracies.

Updated: 2024-06-24 14:27:54

标题: CAVE: 可控作者身份验证解释

摘要: 作者身份验证（AV）（两个文档是否由同一作者撰写？）对于许多敏感的现实生活应用至关重要。AV通常用于需要私人、离线模型的专有领域，使得像ChatGPT这样的SOTA在线模型不受欢迎。其他SOTA系统使用方法，例如Siamese Networks，这些方法无法解释，因此在高风险应用中无法信任。在这项工作中，我们首次采取措施来解决上述挑战，使用我们的模型CAVE（可控作者身份验证解释）：CAVE生成自由文本AV解释，这些解释受控制以保持1）结构化（可以根据相关的语言特征分解为子解释），并且2）易于验证解释标签一致性（通过子解释中的中间标签）。在这项工作中，我们将Llama-3-8B训练为CAVE；由于没有人工编写的AV解释语料库，我们从GPT-4-TURBO中抽样银标准解释，并将其提炼成预训练的Llama-3-8B。对IMdB2、Blog-Auth和FanFiction三个困难的AV数据集的结果表明，CAVE生成了高质量的解释（通过自动和人工评估衡量），以及具有竞争力的任务准确性。

更新时间: 2024-06-24 14:27:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16672v1

Never Gonna Give You Up: Exploring Deprecated NULL Ciphers in Commercial VoWiFi Deployments

In today's cellular network evolutions, such as 4G and 5G, the IMS (IP Multimedia Subsystem) serves as a crucial component in managing voice calls and handling short messages. Besides accessing the IMS over the traditional radio layer, many operators use Voice over Wi-Fi (VoWiFi) allowing customers to dial into their core network over the public Internet using an (insecure) Wi-Fi connection. To protect against malicious actors on the WiFi or Internet domain, the traffic is sent over a series of IPsec tunnels, ensuring confidentiality and integrity. Similar to other encrypted protocols (e.g. TLS), the client and server use a handshake protocol (i.e., IKEv2) to communicate their supported security configurations and to agree upon the used parameters (e.g., keys or an encryption algorithm) for the ongoing session. This however opens the door for security vulnerabilities introduced by misconfiguration. We want to analyze security configurations within commercial VoWiFi deployments, both on the client and server side, spotting deprecated configurations that undermine communication security.

Updated: 2024-06-24 14:24:15

标题: 永不放弃你：探索商用VoWiFi部署中废弃的NULL加密算法

摘要: 在当今的蜂窝网络演进中，如4G和5G，IMS（IP多媒体子系统）作为管理语音通话和处理短消息的关键组件。除了通过传统的无线电层访问IMS外，许多运营商还使用Wi-Fi语音（VoWiFi），允许客户通过公共互联网使用（不安全的）Wi-Fi连接拨入其核心网络。为了防止WiFi或互联网领域的恶意行为者，流量通过一系列IPsec隧道发送，确保机密性和完整性。类似于其他加密协议（例如TLS），客户端和服务器使用握手协议（即IKEv2）来通信其支持的安全配置，并就正在进行的会话使用的参数（例如密钥或加密算法）达成一致。然而，这也为由于错误配置引入的安全漏洞打开了大门。我们想要分析商业VoWiFi部署中的安全配置，无论是在客户端还是服务器端，发现削弱通信安全性的废弃配置。

更新时间: 2024-06-24 14:24:15

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2406.12348v2

Towards Theoretical Understandings of Self-Consuming Generative Models

This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models, including parametric and non-parametric models. Specifically, we derive bounds on the total variation (TV) distance between the synthetic data distributions produced by future models and the original real data distribution under various mixed training scenarios for diffusion models with a one-hidden-layer neural network score function. Our analysis demonstrates that this distance can be effectively controlled under the condition that mixed training dataset sizes or proportions of real data are large enough. Interestingly, we further unveil a phase transition induced by expanding synthetic data amounts, proving theoretically that while the TV distance exhibits an initial ascent, it declines beyond a threshold point. Finally, we present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.

Updated: 2024-06-24 14:23:30

标题: 朝向自我消耗生成模型的理论理解

摘要: 本文解决了在自我消耗循环中训练生成模型所面临的新挑战，即连续的模型代代训练，使用来自先前代的真实和合成数据的混合。我们构建了一个理论框架，严格评估了这种训练过程对未来模型学习的数据分布的影响，包括参数化和非参数化模型。具体来说，我们推导了扩散模型的一层隐藏层神经网络评分函数在各种混合训练场景下产生的合成数据分布与原始真实数据分布之间的总变异（TV）距离的界限。我们的分析表明，在混合训练数据集的大小或真实数据比例足够大的条件下，这种距离可以有效控制。有趣的是，我们进一步揭示了由扩展合成数据量引起的相变，理论证明了虽然TV距离表现出初始上升，但超过一个阈值点后将下降。最后，我们提供了核密度估计的结果，提供了关于混合数据训练对误差传播的影响的微妙见解。

更新时间: 2024-06-24 14:23:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.11778v2

Cubic regularized subspace Newton for non-convex optimization

This paper addresses the optimization problem of minimizing non-convex continuous functions, which is relevant in the context of high-dimensional machine learning applications characterized by over-parametrization. We analyze a randomized coordinate second-order method named SSCN which can be interpreted as applying cubic regularization in random subspaces. This approach effectively reduces the computational complexity associated with utilizing second-order information, rendering it applicable in higher-dimensional scenarios. Theoretically, we establish convergence guarantees for non-convex functions, with interpolating rates for arbitrary subspace sizes and allowing inexact curvature estimation. When increasing subspace size, our complexity matches $\mathcal{O}(\epsilon^{-3/2})$ of the cubic regularization (CR) rate. Additionally, we propose an adaptive sampling scheme ensuring exact convergence rate of $\mathcal{O}(\epsilon^{-3/2}, \epsilon^{-3})$ to a second-order stationary point, even without sampling all coordinates. Experimental results demonstrate substantial speed-ups achieved by SSCN compared to conventional first-order methods.

Updated: 2024-06-24 14:20:02

标题: 立方正则化子空间牛顿法用于非凸优化

摘要: 本文讨论了最小化非凸连续函数的优化问题，在高维机器学习应用中具有超参数化特征的情况下尤为相关。我们分析了一种名为SSCN的随机坐标二阶方法，可以解释为在随机子空间中应用三次正则化。这种方法有效地降低了利用二阶信息所带来的计算复杂性，使其适用于更高维的场景。在理论上，我们为非凸函数建立了收敛保证，对于任意子空间大小都有插值率，并允许不精确的曲率估计。当增加子空间大小时，我们的复杂度匹配了三次正则化（CR）速率的O(ε^(-3/2))。此外，我们提出了一种自适应抽样方案，确保实现O(ε^(-3/2), ε^(-3))的二阶稳定点的精确收敛速度，即使不抽样所有坐标。实验结果表明，与传统的一阶方法相比，SSCN实现了显著的加速。

更新时间: 2024-06-24 14:20:02

领域: cs.LG,cs.NA,math.NA,math.OC

下载: http://arxiv.org/abs/2406.16666v1

Deep Learning for Prediction and Classifying the Dynamical behaviour of Piecewise Smooth Maps

This paper explores the prediction of the dynamics of piecewise smooth maps using various deep learning models. We have shown various novel ways of predicting the dynamics of piecewise smooth maps using deep learning models. Moreover, we have used machine learning models such as Decision Tree Classifier, Logistic Regression, K-Nearest Neighbor, Random Forest, and Support Vector Machine for predicting the border collision bifurcation in the 1D normal form map and the 1D tent map. Further, we classified the regular and chaotic behaviour of the 1D tent map and the 2D Lozi map using deep learning models like Convolutional Neural Network (CNN), ResNet50, and ConvLSTM via cobweb diagram and phase portraits. We also classified the chaotic and hyperchaotic behaviour of the 3D piecewise smooth map using deep learning models such as the Feed Forward Neural Network (FNN), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN). Finally, deep learning models such as Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN) are used for reconstructing the two parametric charts of 2D border collision bifurcation normal form map.

Updated: 2024-06-24 14:12:03

标题: 深度学习用于预测和分类分段光滑映射的动态行为

摘要: 本文探讨了使用各种深度学习模型预测分段光滑映射的动态。我们展示了使用深度学习模型预测分段光滑映射动态的各种新颖方法。此外，我们使用决策树分类器、逻辑回归、K-最近邻、随机森林和支持向量机等机器学习模型，预测1D正规形式映射和1D帐篷映射中的边界碰撞分叉。此外，我们使用卷积神经网络（CNN）、ResNet50和ConvLSTM通过蛛网图和相位图对1D帐篷映射和2D Lozi映射的规则和混沌行为进行分类。我们还使用前馈神经网络（FNN）、长短期记忆（LSTM）和循环神经网络（RNN）等深度学习模型对3D分段光滑映射的混沌和超混沌行为进行分类。最后，我们使用长短期记忆（LSTM）和循环神经网络（RNN）模型重建2D边界碰撞分叉正规形式映射的两个参数图。

更新时间: 2024-06-24 14:12:03

领域: cs.LG,nlin.CD

下载: http://arxiv.org/abs/2406.17001v1

Data-driven Modeling in Metrology -- A Short Introduction, Current Developments and Future Perspectives

Mathematical models are vital to the field of metrology, playing a key role in the derivation of measurement results and the calculation of uncertainties from measurement data, informed by an understanding of the measurement process. These models generally represent the correlation between the quantity being measured and all other pertinent quantities. Such relationships are used to construct measurement systems that can interpret measurement data to generate conclusions and predictions about the measurement system itself. Classic models are typically analytical, built on fundamental physical principles. However, the rise of digital technology, expansive sensor networks, and high-performance computing hardware have led to a growing shift towards data-driven methodologies. This trend is especially prominent when dealing with large, intricate networked sensor systems in situations where there is limited expert understanding of the frequently changing real-world contexts. Here, we demonstrate the variety of opportunities that data-driven modeling presents, and how they have been already implemented in various real-world applications.

Updated: 2024-06-24 14:09:45

标题: 计量学中的数据驱动建模——简介、当前发展和未来展望

摘要: 数学模型对计量领域至关重要，在从测量数据中推导出测量结果和计算不确定性方面起着关键作用，这些数据是根据对测量过程的理解得出的。这些模型通常代表被测量数量与所有其他相关数量之间的关联。这些关系用于构建测量系统，可以解释测量数据以生成关于测量系统本身的结论和预测。经典模型通常是基于基本物理原理构建的分析模型。然而，数字技术的崛起、广泛的传感器网络和高性能计算硬件导致了向以数据驱动的方法学转变的趋势增长。在处理大型复杂网络传感器系统的情况下，其中对频繁变化的现实世界背景了解有限时，这种趋势尤为显著。在这里，我们展示了数据驱动建模所提供的各种机会，以及它们如何已经在各种实际应用中实现。

更新时间: 2024-06-24 14:09:45

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.16659v1

Adaptively Perturbed Mirror Descent for Learning in Games

This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or {\it slingshot}, strategy. In response, we propose {\it Adaptively Perturbed MD} (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence.

Updated: 2024-06-24 14:06:29

标题: 自适应扰动镜像下降算法在博弈学习中的应用

摘要: 本文提出了一种用于镜像下降（MD）算法的收益扰动技术，适用于收益函数的梯度在策略概要空间中是单调的情况，可能包含附加噪声。乐观的学习算法家族，例如乐观MD，在缺乏噪声的情况下成功地实现了“最后迭代”收敛，导致动态达到纳什均衡。最近重新出现的趋势强调了扰动方法的前景，其中基于与锚定策略或“弹弓”策略的距离扰动收益函数。作为回应，我们提出了“自适应扰动MD”（APMD），通过在预定义的间隔内重复更新弹弓策略来调整扰动的大小。这一创新使我们能够以保证速率找到基础游戏的纳什均衡。实证证明了我们的算法展示出明显加速的收敛速度。

更新时间: 2024-06-24 14:06:29

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2305.16610v5

Hyperbolic Random Forests

Hyperbolic space is becoming a popular choice for representing data due to the hierarchical structure - whether implicit or explicit - of many real-world datasets. Along with it comes a need for algorithms capable of solving fundamental tasks, such as classification, in hyperbolic space. Recently, multiple papers have investigated hyperbolic alternatives to hyperplane-based classifiers, such as logistic regression and SVMs. While effective, these approaches struggle with more complex hierarchical data. We, therefore, propose to generalize the well-known random forests to hyperbolic space. We do this by redefining the notion of a split using horospheres. Since finding the globally optimal split is computationally intractable, we find candidate horospheres through a large-margin classifier. To make hyperbolic random forests work on multi-class data and imbalanced experiments, we furthermore outline a new method for combining classes based on their lowest common ancestor and a class-balanced version of the large-margin loss. Experiments on standard and new benchmarks show that our approach outperforms both conventional random forest algorithms and recent hyperbolic classifiers.

Updated: 2024-06-24 13:57:01

标题: 双曲型随机森林

摘要: 双曲空间由于许多现实世界数据集的分层结构（无论是隐式还是显式）而变得越来越受欢迎，因此越来越多地被用于表示数据。随之而来的是对能够在双曲空间中解决分类等基本任务的算法的需求。最近，多篇论文已经调查了双曲替代品对基于超平面的分类器（如逻辑回归和支持向量机）的方法。虽然有效，但这些方法在处理更复杂的分层数据时会遇到困难。因此，我们建议将众所周知的随机森林推广到双曲空间。我们通过使用平面边界重新定义分割的概念来实现这一点。由于找到全局最优分割是计算上难以解决的问题，我们通过大边界分类器找到潜在的平面边界。为了使双曲随机森林在多类数据和不平衡实验中发挥作用，我们进一步概述了一种基于它们的最低共同祖先和大边界损失的类平衡版本的类组合的新方法。在标准和新的基准测试上的实验表明，我们的方法优于传统的随机森林算法和最近的双曲分类器。

更新时间: 2024-06-24 13:57:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2308.13279v2

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training

Selecting high-quality data for pre-training is crucial in shaping the downstream task performance of language models. A major challenge lies in identifying this optimal subset, a problem generally considered intractable, thus necessitating scalable and effective heuristics. In this work, we propose a data selection method, CoLoR-Filter (Conditional Loss Reduction Filtering), which leverages an empirical Bayes-inspired approach to derive a simple and computationally efficient selection criterion based on the relative loss values of two auxiliary models. In addition to the modeling rationale, we evaluate CoLoR-Filter empirically on two language modeling tasks: (1) selecting data from C4 for domain adaptation to evaluation on Books and (2) selecting data from C4 for a suite of downstream multiple-choice question answering tasks. We demonstrate favorable scaling both as we subselect more aggressively and using small auxiliary models to select data for large target models. As one headline result, CoLoR-Filter data selected using a pair of 150m parameter auxiliary models can train a 1.2b parameter target model to match a 1.2b parameter model trained on 25b randomly selected tokens with 25x less data for Books and 11x less data for the downstream tasks. Code: https://github.com/davidbrandfonbrener/color-filter-olmo Filtered data: https://huggingface.co/datasets/davidbrandfonbrener/color-filtered-c4

Updated: 2024-06-24 13:52:37

标题: CoLoR-Filter: 有针对性语言模型预训练的条件损失减少过滤器

摘要: 为了塑造语言模型的下游任务性能，选择高质量的数据进行预训练至关重要。一个主要挑战在于确定这个最佳子集，这个问题通常被认为难以解决，因此需要可扩展和有效的启发式方法。在这项工作中，我们提出了一种数据选择方法CoLoR-Filter（条件损失减少过滤），它利用经验贝叶斯启发式方法来推导一个简单且计算效率高的选择标准，该标准基于两个辅助模型的相对损失值。除了建模原理，我们还在两个语言建模任务上对CoLoR-Filter进行了实证评估：（1）从C4中选择数据，进行领域自适应，以在图书评估上进行评估，（2）从C4中选择数据，用于一系列下游多项选择问题回答任务。我们展示了在更积极地子选择更多数据和使用小的辅助模型来为大型目标模型选择数据时的优势扩展。作为一个主要结果，使用一对拥有150m参数的辅助模型选择的CoLoR-Filter数据可以训练一个拥有1.2b参数的目标模型，使其与使用25b随机选择的token训练的1.2b参数模型在图书和下游任务中使用25倍更少的数据相匹配。代码：https://github.com/davidbrandfonbrener/color-filter-olmo 过滤后的数据：https://huggingface.co/datasets/davidbrandfonbrener/color-filtered-c4

更新时间: 2024-06-24 13:52:37

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10670v2

Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment

Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural images. In addition, the consistency between AGIs and user input text prompts, which correlates with the perceptual quality of AGIs, is not investigated to guide AGIQA. In this letter, we propose vision-language consistency guided multi-modal prompt learning for blind AGIQA, dubbed CLIP-AGIQA. Specifically, we introduce learnable textual and visual prompts in language and vision branches of CLIP models, respectively. Moreover, we design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts. Experimental results on two public AGIQA datasets demonstrate that the proposed method outperforms state-of-the-art quality assessment models. The source code is available at https://github.com/JunFu1995/CLIP-AGIQA.

Updated: 2024-06-24 13:45:31

标题: 视觉-语言一致性引导的多模态提示学习用于盲人AI生成图像质量评估

摘要: 最近，文本提示调整在适应对自然图像质量评估的对比语言-图像预训练（CLIP）模型方面表现出了鼓舞人心的性能。然而，这种单模式提示学习方法仅调整了CLIP模型的语言分支。这对于将CLIP模型适应AI生成的图像质量评估（AGIQA）是不够的，因为AGIs在视觉上与自然图像有所不同。此外，AGIs与用户输入文本提示之间的一致性与AGIs的感知质量有关，但尚未进行调查以指导AGIQA。在本信中，我们提出了视觉-语言一致性引导的多模式提示学习方法，用于盲目AGIQA，被称为CLIP-AGIQA。具体来说，我们分别在CLIP模型的语言和视觉分支中引入可学习的文本和视觉提示。此外，我们设计了一个文本到图像对齐质量预测任务，其学到的视觉-语言一致性知识被用来指导上述多模式提示的优化。在两个公共AGIQA数据集上的实验结果表明，所提出的方法优于最先进的质量评估模型。源代码可在https://github.com/JunFu1995/CLIP-AGIQA 上找到。

更新时间: 2024-06-24 13:45:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16641v1

Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models

Human activity recognition (HAR) is a crucial area of research that involves understanding human movements using computer and machine vision technology. Deep learning has emerged as a powerful tool for this task, with models such as Convolutional Neural Networks (CNNs) and Transformers being employed to capture various aspects of human motion. One of the key contributions of this work is the demonstration of the effectiveness of feature fusion in improving HAR accuracy by capturing spatial and temporal features, which has important implications for the development of more accurate and robust activity recognition systems. The study uses sensory data from HuGaDB, PKU-MMD, LARa, and TUG datasets. Two model, the PO-MS-GCN and a Transformer were trained and evaluated, with PO-MS-GCN outperforming state-of-the-art models. HuGaDB and TUG achieved high accuracies and f1-scores, while LARa and PKU-MMD had lower scores. Feature fusion improved results across datasets.

Updated: 2024-06-24 13:44:06

标题: 特征融合用于人类活动识别的参数优化多阶段图卷积网络和Transformer模型

摘要: 人类活动识别（HAR）是一个关键的研究领域，涉及利用计算机和机器视觉技术理解人类动作。深度学习已经成为这项任务的强大工具，使用卷积神经网络（CNNs）和Transformer等模型来捕捉人类运动的各个方面。这项工作的一个关键贡献是展示了特征融合在改善HAR准确性方面的有效性，通过捕捉空间和时间特征，这对于开发更准确和稳健的活动识别系统具有重要意义。该研究使用了来自HuGaDB、PKU-MMD、LARa和TUG数据集的感知数据。训练和评估了两个模型，PO-MS-GCN和Transformer，其中PO-MS-GCN表现优于最先进的模型。HuGaDB和TUG实现了高准确性和f1分数，而LARa和PKU-MMD的分数较低。特征融合改善了各个数据集的结果。

更新时间: 2024-06-24 13:44:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16638v1

Validation of ML-UQ calibration statistics using simulated reference values: a sensitivity analysis

Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values and are mostly used in comparative studies. In consequence, calibration is almost never validated and the diagnostic is left to the appreciation of the reader. Simulated reference values, based on synthetic calibrated datasets derived from actual uncertainties, have been proposed to palliate this problem. As the generative probability distribution for the simulation of synthetic errors is often not constrained, the sensitivity of simulated reference values to the choice of generative distribution might be problematic, shedding a doubt on the calibration diagnostic. This study explores various facets of this problem, and shows that some statistics are excessively sensitive to the choice of generative distribution to be used for validation when the generative distribution is unknown. This is the case, for instance, of the correlation coefficient between absolute errors and uncertainties (CC) and of the expected normalized calibration error (ENCE). A robust validation workflow to deal with simulated reference values is proposed.

Updated: 2024-06-24 13:43:10

标题: 使用模拟参考值验证ML-UQ校准统计数据：敏感性分析

摘要: 一些流行的机器学习不确定性量化（ML-UQ）校准统计数据没有预定义的参考值，主要用于比较研究。因此，校准几乎从不被验证，诊断留给读者自行判断。为了缓解这个问题，已经提出了基于实际不确定性导出的合成校准数据集的模拟参考值。由于用于模拟合成误差的概率分布通常没有限制，模拟参考值对生成分布的选择敏感性可能会成为问题，对校准诊断产生疑问。本研究探讨了这个问题的各个方面，并显示一些统计数据对于用于验证的生成分布的选择过于敏感，尤其是当生成分布未知时。例如，绝对误差与不确定性之间的相关系数（CC）和预期归一化校准误差（ENCE）就是这种情况。提出了一个稳健的验证工作流程来处理模拟参考值。

更新时间: 2024-06-24 13:43:10

领域: stat.ML,cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2403.00423v2

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs because the permanent removal of attention heads or neurons from LLMs can significantly degrade accuracy. Prior work has attempted to model contextual sparsity using neural networks trained to predict activation magnitudes, which can be used to dynamically prune structures with low predicted activation magnitude. In this paper, we look beyond magnitude-based pruning criteria to assess attention head and neuron importance in LLMs. We developed a novel predictor called ShadowLLM, which can shadow the LLM behavior and enforce better sparsity patterns, resulting in over 15% improvement in end-to-end accuracy without increasing latency compared to previous methods. ShadowLLM achieves up to a 20\% speed-up over the state-of-the-art DejaVu framework. These enhancements are validated on models with up to 30 billion parameters. Our code is available at \href{https://github.com/abdelfattah-lab/shadow_llm/}{ShadowLLM}.

Updated: 2024-06-24 13:41:08

标题: ShadowLLM：基于预测的大型语言模型上下文稀疏性

摘要: 大型语言模型（LLMs）的高能耗和延迟敏感的部署推动了量化和稀疏等技术的发展。上下文稀疏，其中稀疏模式取决于输入，对LLMs至关重要，因为从LLMs中永久删除注意力头或神经元可能会显著降低准确性。先前的工作尝试使用经过训练以预测激活量的神经网络来建模上下文稀疏，这可以用于动态修剪预测激活量低的结构。在本文中，我们超越基于量级的修剪标准，评估LLMs中注意力头和神经元的重要性。我们开发了一个称为ShadowLLM的新型预测器，可以模仿LLM的行为并强制执行更好的稀疏模式，结果在不增加延迟的情况下，端到端准确性提高了超过15％，比之前的方法。ShadowLLM在速度上实现了高达20％的提速，超过了最先进的DejaVu框架。这些增强功能已在具有高达300亿参数的模型上进行验证。我们的代码可在\href{https://github.com/abdelfattah-lab/shadow_llm/}{ShadowLLM}上找到。

更新时间: 2024-06-24 13:41:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.16635v1

Detach-ROCKET: Sequential feature selection for time series classification with random convolutional kernels

Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural Networks and InceptionTime are successful in numerous applications, they can face scalability issues due to computational requirements. Recently, ROCKET has emerged as an efficient alternative, achieving state-of-the-art performance and simplifying training by utilizing a large number of randomly generated features from the time series data. However, many of these features are redundant or non-informative, increasing computational load and compromising generalization. Here we introduce Sequential Feature Detachment (SFD) to identify and prune non-essential features in ROCKET-based models, such as ROCKET, MiniRocket, and MultiRocket. SFD estimates feature importance using model coefficients and can handle large feature sets without complex hyperparameter tuning. Testing on the UCR archive shows that SFD can produce models with better test accuracy using only 10\% of the original features. We named these pruned models Detach-ROCKET. We also present an end-to-end procedure for determining an optimal balance between the number of features and model accuracy. On the largest binary UCR dataset, Detach-ROCKET improves test accuracy by 0.6\% while reducing features by 98.9\%. By enabling a significant reduction in model size without sacrificing accuracy, our methodology improves computational efficiency and contributes to model interpretability. We believe that Detach-ROCKET will be a valuable tool for researchers and practitioners working with time series data, who can find a user-friendly implementation of the model at \url{https://github.com/gon-uri/detach_rocket}.

Updated: 2024-06-24 13:36:56

标题: Detach-ROCKET：使用随机卷积核的时间序列分类的序贯特征选择

摘要: 时间序列分类（TSC）在医学、环境科学和金融等领域至关重要，可以实现诸如疾病诊断、异常检测和股票价格分析等任务。虽然诸如循环神经网络和InceptionTime等机器学习模型在许多应用中取得成功，但由于计算需求，它们可能面临可扩展性问题。最近，ROCKET作为一种高效的替代方案出现，通过利用从时间序列数据中随机生成的大量特征，实现了最新性能并简化了训练。然而，许多这些特征是冗余的或非信息性的，增加了计算负担并损害了泛化能力。在这里，我们介绍了顺序特征分离（SFD），用于识别和修剪基于ROCKET的模型中的非必要特征，例如ROCKET、MiniRocket和MultiRocket。SFD使用模型系数估计特征重要性，并且可以处理大量特征集合，而无需复杂的超参数调整。在UCR存档上的测试表明，SFD可以仅使用原始特征的10\%产生具有更好测试准确性的模型。我们将这些修剪过的模型称为Detach-ROCKET。我们还提出了一个端到端程序，用于确定特征数量和模型准确性之间的最佳平衡。在最大的二进制UCR数据集上，Detach-ROCKET将测试准确性提高了0.6\%，同时减少了98.9\%的特征。通过在不牺牲准确性的情况下显着减小模型大小，我们的方法提高了计算效率并有助于模型可解释性。我们相信Detach-ROCKET将成为处理时间序列数据的研究人员和从业者的有价值工具，他们可以在\url{https://github.com/gon-uri/detach_rocket}找到该模型的用户友好实现。

更新时间: 2024-06-24 13:36:56

领域: cs.LG

下载: http://arxiv.org/abs/2309.14518v3

A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers

Relation extraction (RE) involves identifying the relations between entities from underlying content. RE serves as the foundation for many natural language processing (NLP) and information retrieval applications, such as knowledge graph completion and question answering. In recent years, deep neural networks have dominated the field of RE and made noticeable progress. Subsequently, the large pre-trained language models have taken the state-of-the-art RE to a new level. This survey provides a comprehensive review of existing deep learning techniques for RE. First, we introduce RE resources, including datasets and evaluation metrics. Second, we propose a new taxonomy to categorize existing works from three perspectives, i.e., text representation, context encoding, and triplet prediction. Third, we discuss several important challenges faced by RE and summarize potential techniques to tackle these challenges. Finally, we outline some promising future directions and prospects in this field. This survey is expected to facilitate researchers' collaborative efforts to address the challenges of real-world RE systems.

Updated: 2024-06-24 13:26:47

标题: 关系抽取的综合调查：最新进展和新的前沿

摘要: 关系抽取（RE）涉及从基础内容中识别实体之间的关系。RE是许多自然语言处理（NLP）和信息检索应用的基础，例如知识图谱完成和问题回答。近年来，深度神经网络在RE领域占据主导地位并取得了显著进展。随后，大型预训练语言模型将最先进的RE推向了一个新水平。本调查对现有深度学习技术进行了全面审查。首先，我们介绍了RE资源，包括数据集和评估指标。其次，我们提出了一个新的分类法来从文本表示、上下文编码和三元组预测三个角度对现有工作进行分类。第三，我们讨论了RE面临的几个重要挑战，并总结了应对这些挑战的潜在技术。最后，我们概述了这一领域的一些有前途的未来方向和前景。这项调查有望促进研究人员共同努力解决现实世界RE系统的挑战。

更新时间: 2024-06-24 13:26:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2306.02051v3

Make Large Language Model a Better Ranker

Large Language Models (LLMs) demonstrate robust capabilities across various fields, leading to a paradigm shift in LLM-enhanced Recommender System (RS). Research to date focuses on point-wise and pair-wise recommendation paradigms, which are inefficient for LLM-based recommenders due to high computational costs. However, existing list-wise approaches also fall short in ranking tasks due to misalignment between ranking objectives and next-token prediction. Moreover, these LLM-based methods struggle to effectively address the order relation among candidates, particularly given the scale of ratings. To address these challenges, this paper introduces the large language model framework with Aligned Listwise Ranking Objectives (ALRO). ALRO is designed to bridge the gap between the capabilities of LLMs and the nuanced requirements of ranking tasks. Specifically, ALRO employs explicit feedback in a listwise manner by introducing soft lambda loss, a customized adaptation of lambda loss designed for optimizing order relations. This mechanism provides more accurate optimization goals, enhancing the ranking process. Additionally, ALRO incorporates a permutation-sensitive learning mechanism that addresses position bias, a prevalent issue in generative models, without imposing additional computational burdens during inference. Our evaluative studies reveal that ALRO outperforms both existing embedding-based recommendation methods and LLM-based recommendation baselines.

Updated: 2024-06-24 13:22:22

标题: 使大型语言模型成为更好的排名器

摘要: 大型语言模型（LLMs）在各个领域展示出强大的能力，引领了LLM增强型推荐系统（RS）的范式转变。迄今为止的研究集中在点对和成对推荐范式上，这对于基于LLM的推荐系统来说效率低下，因为计算成本高昂。然而，现有的列表式方法在排名任务中也存在不足，因为排名目标与下一个标记预测之间存在错位。此外，这些基于LLM的方法难以有效地处理候选项之间的顺序关系，尤其是在评分规模上。为了解决这些挑战，本文引入了具有对齐列表式排名目标（ALRO）的大型语言模型框架。ALRO旨在弥合LLMs的能力与排名任务的微妙要求之间的差距。具体而言，ALRO以列表方式使用显式反馈，通过引入软λ损失来优化排序关系，这是专门为优化排序关系设计的λ损失的定制适应。该机制提供了更准确的优化目标，增强了排名过程。此外，ALRO还融入了一种对排列敏感的学习机制，解决了生成模型中普遍存在的位置偏见问题，而不会在推理过程中施加额外的计算负担。我们的评估研究表明，ALRO胜过了现有的基于嵌入的推荐方法和基于LLM的推荐基线。

更新时间: 2024-06-24 13:22:22

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.19181v2

Learning Action-based Representations Using Invariance

Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors. A representation that captures controllability identifies these state elements by determining what affects agent control. While methods such as inverse dynamics and mutual information capture controllability for a limited number of timesteps, capturing long-horizon elements remains a challenging problem. Myopic controllability can capture the moment right before an agent crashes into a wall, but not the control-relevance of the wall while the agent is still some distance away. To address this we introduce action-bisimulation encoding, a method inspired by the bisimulation invariance pseudometric, that extends single-step controllability with a recursive invariance constraint. By doing this, action-bisimulation learns a multi-step controllability metric that smoothly discounts distant state features that are relevant for control. We demonstrate that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments, including a photorealistic 3D simulation domain, Habitat. Additionally, we provide theoretical analysis and qualitative results demonstrating the information captured by action-bisimulation.

Updated: 2024-06-24 13:19:17

标题: 学习基于动作的表示：使用不变性

摘要: 使用高维观察的强化学习智能体必须能够在众多外生干扰因素中识别相关的状态特征。一个能够捕捉可控性的表示确定了这些状态元素，通过确定对智能体控制的影响。虽然方法如逆动力学和互信息可以捕捉有限数量时间步长的可控性，但捕捉长期元素仍然是一个具有挑战性的问题。短视可控性可以捕捉到智能体撞墙之前的瞬间，但却不能在智能体离墙还有一段距离时捕捉到墙的控制相关性。为了解决这个问题，我们引入了动作等价编码，这是一种受到等价不变性伪度量激励的方法，它通过递归不变性约束扩展了单步可控性。通过这样做，动作等价学习到了一个多步可控性度量，平滑地降低了与控制相关的远程状态特征。我们展示了动作等价在奖励免费、均匀随机数据上的预训练在多个环境中提高了样本效率，包括一个逼真的3D模拟领域Habitat。此外，我们提供了理论分析和定性结果，展示了动作等价所捕捉到的信息。

更新时间: 2024-06-24 13:19:17

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2403.16369v3

Hacking a surrogate model approach to XAI

In recent years, the number of new applications for highly complex AI systems has risen significantly. Algorithmic decision-making systems (ADMs) are one of such applications, where an AI system replaces the decision-making process of a human expert. As one approach to ensure fairness and transparency of such systems, explainable AI (XAI) has become more important. One variant to achieve explainability are surrogate models, i.e., the idea to train a new simpler machine learning model based on the input-output-relationship of a black box model. The simpler machine learning model could, for example, be a decision tree, which is thought to be intuitively understandable by humans. However, there is not much insight into how well the surrogate model approximates the black box. Our main assumption is that a good surrogate model approach should be able to bring such a discriminating behavior to the attention of humans; prior to our research we assumed that a surrogate decision tree would identify such a pattern on one of its first levels. However, in this article we show that even if the discriminated subgroup - while otherwise being the same in all categories - does not get a single positive decision from the black box ADM system, the corresponding question of group membership can be pushed down onto a level as low as wanted by the operator of the system. We then generalize this finding to pinpoint the exact level of the tree on which the discriminating question is asked and show that in a more realistic scenario, where discrimination only occurs to some fraction of the disadvantaged group, it is even more feasible to hide such discrimination. Our approach can be generalized easily to other surrogate models.

Updated: 2024-06-24 13:18:02

标题: "利用替代模型方法进行XAI的黑客行为"

摘要: 近年来，高度复杂的人工智能系统的新应用数量显著增加。算法决策系统（ADMs）是其中之一，其中人工智能系统取代了人类专家的决策过程。作为确保这类系统公平和透明性的一种方法，可解释人工智能（XAI）变得更加重要。一种可实现解释性的变体是替代模型，即基于黑匣子模型的输入输出关系训练一个新的简单机器学习模型的想法。简单的机器学习模型可以是一个决策树，人们认为这种模型可以直观理解。然而，目前对替代模型如何有效地逼近黑匣子模型并没有多少见解。我们的主要假设是，一个好的替代模型方法应该能够引起人们对这种歧视性行为的关注；在我们的研究之前，我们假设替代决策树会在其最初的几个级别中识别出这种模式。然而，在本文中，我们展示了即使被歧视的子组 - 在其他方面都相同的情况下 - 从黑匣子ADM系统中没有获得单个积极决策，相应的群体归属问题可以被系统操作员推迟到任意低的级别。然后我们将这一发现推广到确定树中提出歧视性问题的确切级别，并展示在更现实的情景中，即歧视仅发生在一部分弱势群体中时，隐藏这种歧视更加可行。我们的方法可以轻松地推广到其他替代模型。

更新时间: 2024-06-24 13:18:02

领域: cs.AI

下载: http://arxiv.org/abs/2406.16626v1

Deep Prompt Multi-task Network for Abuse Language Detection

The detection of abusive language remains a long-standing challenge with the extensive use of social networks. The detection task of abusive language suffers from limited accuracy. We argue that the existing detection methods utilize the fine-tuning technique of the pre-trained language models (PLMs) to handle downstream tasks. Hence, these methods fail to stimulate the general knowledge of the PLMs. To address the problem, we propose a novel Deep Prompt Multi-task Network (DPMN) for abuse language detection. Specifically, DPMN first attempts to design two forms of deep prompt tuning and light prompt tuning for the PLMs. The effects of different prompt lengths, tuning strategies, and prompt initialization methods on detecting abusive language are studied. In addition, we propose a Task Head based on Bi-LSTM and FFN, which can be used as a short text classifier. Eventually, DPMN utilizes multi-task learning to improve detection metrics further. The multi-task network has the function of transferring effective knowledge. The proposed DPMN is evaluated against eight typical methods on three public datasets: OLID, SOLID, and AbuseAnalyzer. The experimental results show that our DPMN outperforms the state-of-the-art methods.

Updated: 2024-06-24 13:15:33

标题: 深度提示多任务网络用于滥用语言检测

摘要: 检测滥用语言仍然是一个长期存在的挑战，随着社交网络的广泛使用。滥用语言的检测任务受到准确性有限的影响。我们认为现有的检测方法利用预训练语言模型（PLMs）的微调技术来处理下游任务。因此，这些方法未能激发PLMs的通用知识。为了解决这个问题，我们提出了一种新颖的深度提示多任务网络（DPMN）用于滥用语言检测。具体来说，DPMN首先尝试为PLMs设计两种形式的深度提示调整和轻提示调整。研究了不同提示长度、调整策略和提示初始化方法对检测滥用语言的影响。此外，我们提出了基于Bi-LSTM和FFN的任务头，可用作短文本分类器。最终，DPMN利用多任务学习进一步改进检测指标。多任务网络具有传递有效知识的功能。所提出的DPMN在三个公共数据集OLID、SOLID和AbuseAnalyzer上对比了八种典型方法的表现。实验结果显示我们的DPMN胜过了最先进的方法。

更新时间: 2024-06-24 13:15:33

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.05268v2

Improved Dynamic Regret for Online Frank-Wolfe

To deal with non-stationary online problems with complex constraints, we investigate the dynamic regret of online Frank-Wolfe (OFW), which is an efficient projection-free algorithm for online convex optimization. It is well-known that in the setting of offline optimization, the smoothness of functions and the strong convexity of functions accompanying specific properties of constraint sets can be utilized to achieve fast convergence rates for the Frank-Wolfe (FW) algorithm. However, for OFW, previous studies only establish a dynamic regret bound of $O(\sqrt{T}(V_T+\sqrt{D_T}+1))$ by utilizing the convexity of problems, where $T$ is the number of rounds, $V_T$ is the function variation, and $D_T$ is the gradient variation. In this paper, we derive improved dynamic regret bounds for OFW by extending the fast convergence rates of FW from offline optimization to online optimization. The key technique for this extension is to set the step size of OFW with a line search rule. In this way, we first show that the dynamic regret bound of OFW can be improved to $O(\sqrt{T(V_T+1)})$ for smooth functions. Second, we achieve a better dynamic regret bound of $O(T^{1/3}(V_T+1)^{2/3})$ when functions are smooth and strongly convex, and the constraint set is strongly convex. Finally, for smooth and strongly convex functions with minimizers in the interior of the constraint set, we demonstrate that the dynamic regret of OFW reduces to $O(V_T+1)$, and can be further strengthened to $O(\min\{P_T^\ast,S_T^\ast,V_T\}+1)$ by performing a constant number of FW iterations per round, where $P_T^\ast$ and $S_T^\ast$ denote the path length and squared path length of minimizers, respectively.

Updated: 2024-06-24 13:11:08

标题: 在线动态Frank-Wolfe算法的改进动态遗憾

摘要: 为了处理具有复杂约束的非平稳在线问题，我们研究了在线Frank-Wolfe（OFW）的动态遗憾，这是一种用于在线凸优化的高效无投影算法。众所周知，在离线优化设置中，函数的光滑性和伴随特定约束集属性的函数的强凸性可以用来实现Frank-Wolfe（FW）算法的快速收敛速度。然而，对于OFW，先前的研究只通过利用问题的凸性建立了$O(\sqrt{T}(V_T+\sqrt{D_T}+1))$的动态遗憾界，其中$T$是轮数，$V_T$是函数变化量，$D_T$是梯度变化量。在本文中，我们通过将FW的快速收敛速度从离线优化扩展到在线优化，得出了OFW的改进动态遗憾界。这种扩展的关键技术是使用线搜索规则来设置OFW的步长。通过这种方式，我们首先证明了对于光滑函数，OFW的动态遗憾界可以改进为$O(\sqrt{T(V_T+1)})$。其次，当函数是光滑且强凸时，并且约束集也是强凸的时，我们实现了更好的动态遗憾界为$O(T^{1/3}(V_T+1)^{2/3})$。最后，对于具有约束集内极小值点的光滑且强凸函数，我们展示了OFW的动态遗憾降低到$O(V_T+1)$，并且通过每轮执行常数次FW迭代，可以进一步加强为$O(\min\{P_T^\ast,S_T^\ast,V_T\}+1)$，其中$P_T^\ast$和$S_T^\ast$分别表示最小值点的路径长度和平方路径长度。

更新时间: 2024-06-24 13:11:08

领域: cs.LG

下载: http://arxiv.org/abs/2302.05620v2

Compact Proofs of Model Performance via Mechanistic Interpretability

In this work, we propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.

Updated: 2024-06-24 13:06:01

标题: 通过机械解释的紧凑模型性能证明

摘要: 在这项工作中，我们提出使用机械可解释性——将模型权重逆向工程成人可解释算法的技术——来推导并简洁证明模型性能的形式保证。我们通过正式证明151个小型变压器在最大-$K$任务上的准确性下限来原型化这种方法。我们创建了102种不同的计算辅助证明策略，并评估它们在每个模型上的长度和紧密度。通过定量指标，我们发现较短的证明似乎需要并提供更多的机械理解。此外，我们发现更忠实的机械理解导致更紧密的性能边界。我们通过定性地检查我们的部分证明来确认这些联系。最后，我们确定复合结构噪声作为使用机械可解释性生成关于模型性能的紧凑证明的关键挑战。

更新时间: 2024-06-24 13:06:01

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2406.11779v5

What Do Privacy Advertisements Communicate to Consumers?

When companies release marketing materials aimed at promoting their privacy practices or highlighting specific privacy features, what do they actually communicate to consumers? In this paper, we explore the impact of privacy marketing on: (1) consumers' attitudes toward the organizations providing the campaigns, (2) overall privacy awareness, and (3) the actionability of suggested privacy advice. To this end, we investigated the impact of four privacy advertising videos and one privacy game published by five different technology companies. We conducted 24 semi-structured interviews with participants randomly assigned to view one or two of the videos or play the game. Our findings suggest that awareness of privacy features can contribute to positive perceptions of a company or its products. The ads we tested were more successful in communicating the advertised privacy features than the game we tested. We observed that advertising a single privacy feature using a single metaphor in a short ad increased awareness of the advertised feature. The game failed to communicate privacy features or motivate study participants to use the features. Our results also suggest that privacy campaigns can be useful for raising awareness about privacy features and improving brand image, but may not be the most effective way to teach viewers how to use privacy features.

Updated: 2024-06-24 13:04:31

标题: 隐私广告向消费者传达了什么信息？

摘要: 当公司发布旨在促进其隐私做法或突出特定隐私功能的营销材料时，他们实际上向消费者传达了什么？在本文中，我们探讨了隐私营销对以下方面的影响：（1）消费者对提供广告活动的组织的态度，（2）整体隐私意识，以及（3）建议隐私建议的可操作性。为此，我们调查了由五家不同技术公司发布的四个隐私广告视频和一个隐私游戏的影响。我们进行了24次半结构化访谈，参与者被随机分配观看一个或两个视频或玩游戏。我们的研究结果表明，对隐私功能的认识可以促进对公司或其产品的积极看法。我们测试的广告在传达广告的隐私功能方面比我们测试的游戏更成功。我们观察到，在短暂的广告中使用单一隐私功能和单一隐喻可以增加对广告功能的认识。游戏未能传达隐私功能或激励研究参与者使用这些功能。我们的结果还表明，隐私营销活动可以有助于提高对隐私功能的认识和改善品牌形象，但可能不是教导观众如何使用隐私功能的最有效方式。

更新时间: 2024-06-24 13:04:31

领域: cs.CR,cs.CY,cs.HC

下载: http://arxiv.org/abs/2405.13857v2

Expert with Clustering: Hierarchical Online Preference Learning Framework

Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users. We introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ options, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. To the best of the authors knowledge, this is the first work to analyze the regret of an integrated expert algorithm with k-Means clustering. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.

Updated: 2024-06-24 13:03:11

标题: 专家与聚类：分层在线偏好学习框架

摘要: 新兴的移动系统越来越能够向移动用户推荐选项，引导他们朝着个性化而可持续的系统结果前进。与典型的推荐系统相比，最大化遗憾的重要性更为关键，因为1）移动选项直接影响用户的生活，2）系统的可持续性依赖于足够的用户参与。在这项研究中，我们考虑通过利用一个捕捉用户移动偏好的低维潜在空间来加速用户偏好学习。我们引入了一个名为Expert with Clustering (EWC)的层次上下文强盗框架，该框架整合了聚类技术和专家意见预测。EWC有效地利用层次用户信息，并结合了一种新颖的Loss-guided Distance度量。这个度量在生成更具代表性的聚类中心方面起着关键作用。在一个包含N个用户、每个用户T轮以及K个选项的推荐场景中，我们的算法实现了一个遗憾界限为O(N\sqrt{T\log K} + NT)。这个界限由两部分组成：第一项是Hedge算法的遗憾，第二项取决于聚类的平均损失。据作者所知，这是第一项分析集成专家算法与k均值聚类的遗憾的工作。这个遗憾界限突出了EWC的理论和实验效力，特别适用于需要快速学习和适应的场景。实验结果表明，与LinUCB基准相比，EWC可以将遗憾大幅减少27.57%。我们的工作提供了一种数据高效的方法，既能捕捉个体行为又能捕捉集体行为，因此在具有层次结构的情境中非常适用。我们期望该算法适用于其他具有用户偏好和信息层次细微差别的设置。

更新时间: 2024-06-24 13:03:11

领域: cs.LG

下载: http://arxiv.org/abs/2401.15062v2

No More Sliding-Windows: Dynamic Functional Connectivity Based On Random Convolutions Without Learning

In the field of dynamic functional connectivity, the sliding-window method is widely used and its stability is generally recognized. However, the sliding-window method's data processing within the window is overly simplistic, which to some extent limits its effectiveness. This study proposes a feature expansion method based on random convolution, which achieves better and more noise-resistant results than the sliding-window method without requiring training. Experiments on simulated data show that the dynamic functional connectivity matrix and time series obtained using the random convolution method have a higher degree of fit (95.59\%) with the standard answers within shorter time windows, compared to the sliding-window method (45.99\%). Gender difference studies on real data also reveal that the random convolution method uncovers more gender differences than the sliding-window method. Through theoretical analysis, we propose a more comprehensive convolutional functional connectivity computation model, with the sliding-window method being a special case of this model, thereby opening up vast potential for research methods in dynamic functional connectivity.

Updated: 2024-06-24 13:02:36

标题: 不再使用滑动窗口：基于随机卷积的动态功能连接而非学习

摘要: 在动态功能连接领域，滑动窗口方法被广泛应用且其稳定性得到普遍认可。然而，滑动窗口方法在窗口内的数据处理过于简单化，在一定程度上限制了其有效性。本研究提出了一种基于随机卷积的特征扩展方法，与滑动窗口方法相比，该方法实现了更好且更抗噪声的结果，且无需训练。对模拟数据的实验结果显示，使用随机卷积方法获得的动态功能连接矩阵和时间序列在较短时间窗口内与标准答案的拟合度更高（95.59％），相比之下滑动窗口方法为（45.99％）。对真实数据的性别差异研究也表明，随机卷积方法比滑动窗口方法揭示了更多的性别差异。通过理论分析，我们提出了一种更全面的卷积功能连接计算模型，滑动窗口方法是该模型的特例，从而为动态功能连接的研究方法开辟了广阔的潜力。

更新时间: 2024-06-24 13:02:36

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2406.16619v1

Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings

Since the emergence of the Transformer architecture, language model development has increased, driven by their promising potential. However, releasing these models into production requires properly understanding their behavior, particularly in sensitive domains such as medicine. Despite this need, the medical literature still lacks technical assessments of pre-trained language models, which are especially valuable in resource-constrained settings in terms of computational power or limited budget. To address this gap, we provide a comprehensive survey of language models in the medical domain. In addition, we selected a subset of these models for thorough evaluation, focusing on classification and text generation tasks. Our subset encompasses 53 models, ranging from 110 million to 13 billion parameters, spanning the three families of Transformer-based models and from diverse knowledge domains. This study employs a series of approaches for text classification together with zero-shot prompting instead of model training or fine-tuning, which closely resembles the limited resource setting in which many users of language models find themselves. Encouragingly, our findings reveal remarkable performance across various tasks and datasets, underscoring the latent potential of certain models to contain medical knowledge, even without domain specialization. Consequently, our study advocates for further exploration of model applications in medical contexts, particularly in resource-constrained settings. The code is available on https://github.com/anpoc/Language-models-in-medicine.

Updated: 2024-06-24 12:52:02

标题: 在资源受限情境下评估医学语言模型

摘要: 自从Transformer架构出现以来，语言模型的发展已经增加，这是由它们有希望的潜力驱动的。然而，将这些模型投入生产需要充分理解它们的行为，特别是在医学等敏感领域。尽管存在这种需求，医学文献仍然缺乏对预训练语言模型的技术评估，这在资源受限的环境中尤为重要，无论是计算能力还是预算有限。为了填补这一空白，我们提供了医学领域语言模型的全面调查。此外，我们选择了这些模型的一个子集进行彻底评估，重点放在分类和文本生成任务上。我们的子集包括53个模型，参数范围从1.1亿到130亿，涵盖了基于Transformer的三个家族的模型，来自不同的知识领域。本研究采用一系列文本分类方法，以及零-shot提示，而不是模型训练或微调，这与许多语言模型用户所处的资源有限环境非常相似。令人鼓舞的是，我们的研究结果显示在各种任务和数据集上表现出色，强调了某些模型具有包含医学知识的潜在潜力，即使没有领域专业化。因此，我们的研究主张进一步探索在医学环境中的模型应用，特别是在资源受限的环境中。代码可以在https://github.com/anpoc/Language-models-in-medicine上找到。

更新时间: 2024-06-24 12:52:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16611v1

Causal Fair Machine Learning via Rank-Preserving Interventional Distributions

A decision can be defined as fair if equal individuals are treated equally and unequals unequally. Adopting this definition, the task of designing machine learning (ML) models that mitigate unfairness in automated decision-making systems must include causal thinking when introducing protected attributes: Following a recent proposal, we define individuals as being normatively equal if they are equal in a fictitious, normatively desired (FiND) world, where the protected attributes have no (direct or indirect) causal effect on the target. We propose rank-preserving interventional distributions to define a specific FiND world in which this holds and a warping method for estimation. Evaluation criteria for both the method and the resulting ML model are presented and validated through simulations. Experiments on empirical data showcase the practical application of our method and compare results with "fairadapt" (Ple\v{c}ko and Meinshausen, 2020), a different approach for mitigating unfairness by causally preprocessing data that uses quantile regression forests. With this, we show that our warping approach effectively identifies the most discriminated individuals and mitigates unfairness.

Updated: 2024-06-24 12:51:37

标题: 因果公平机器学习：通过保持排名干预分布

摘要: 一个决策可以被定义为公平的，如果相同的个体被平等对待，而不同的个体被不平等对待。采用这一定义，设计能够减少自动决策系统中不公平性的机器学习（ML）模型的任务必须在引入受保护属性时包括因果思维：根据最近的一个提议，我们定义个体在一个虚构的、规范上期望的（FiND）世界中相等，如果他们在这个世界中是平等的，而受保护属性对目标没有（直接或间接）因果影响。我们提出了保持排名的干预分布来定义一个特定的FiND世界，在这里这种情况成立，并提出了一种用于估计的扭曲方法。对该方法和结果ML模型的评估标准通过模拟进行了验证。对实证数据的实验展示了我们方法的实际应用，并将结果与"fairadapt"（Ple\v{c}ko和Meinshausen，2020）进行了比较，后者是一种通过因果预处理数据来减少不公平性的不同方法，使用分位数回归森林。通过这一点，我们展示了我们的扭曲方法有效地识别最受歧视的个体并减少不公平性。

更新时间: 2024-06-24 12:51:37

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2307.12797v2

Evaluating the Robustness of Deep-Learning Algorithm-Selection Models by Evolving Adversarial Instances

Deep neural networks (DNN) are increasingly being used to perform algorithm-selection in combinatorial optimisation domains, particularly as they accommodate input representations which avoid designing and calculating features. Mounting evidence from domains that use images as input shows that deep convolutional networks are vulnerable to adversarial samples, in which a small perturbation of an instance can cause the DNN to misclassify. However, it remains unknown as to whether deep recurrent networks (DRN) which have recently been shown promise as algorithm-selectors in the bin-packing domain are equally vulnerable. We use an evolutionary algorithm (EA) to find perturbations of instances from two existing benchmarks for online bin packing that cause trained DRNs to misclassify: adversarial samples are successfully generated from up to 56% of the original instances depending on the dataset. Analysis of the new misclassified instances sheds light on the `fragility' of some training instances, i.e. instances where it is trivial to find a small perturbation that results in a misclassification and the factors that influence this. Finally, the method generates a large number of new instances misclassified with a wide variation in confidence, providing a rich new source of training data to create more robust models.

Updated: 2024-06-24 12:48:44

标题: 评估深度学习算法选择模型的鲁棒性：通过进化对抗实例

摘要: 深度神经网络（DNN）越来越多地被用于在组合优化领域执行算法选择，特别是因为它们可以容纳避免设计和计算特征的输入表示。来自使用图像作为输入的领域的证据表明，深度卷积网络容易受到对抗样本的影响，即对实例的微小扰动可能导致DNN误分类。然而，目前尚不清楚最近被证明在装箱领域作为算法选择器具有潜力的深度递归网络（DRN）是否同样容易受到影响。我们使用进化算法（EA）来找出对在线装箱的两个现有基准实例造成训练过的DRN误分类的扰动：根据数据集，成功地生成了高达原始实例的56%的对抗性样本。对新的误分类实例的分析揭示了一些训练实例的“脆弱性”，即可以轻易找到导致误分类的微小扰动的实例以及影响这一现象的因素。最后，该方法生成了大量的新实例，这些实例被误分类且置信度变化很大，为创建更加健壮的模型提供了丰富的新训练数据来源。

更新时间: 2024-06-24 12:48:44

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16609v1

When Invariant Representation Learning Meets Label Shift: Insufficiency and Theoretical Insights

As a crucial step toward real-world learning scenarios with changing environments, dataset shift theory and invariant representation learning algorithm have been extensively studied to relax the identical distribution assumption in classical learning setting. Among the different assumptions on the essential of shifting distributions, generalized label shift (GLS) is the latest developed one which shows great potential to deal with the complex factors within the shift. In this paper, we aim to explore the limitations of current dataset shift theory and algorithm, and further provide new insights by presenting a comprehensive understanding of GLS. From theoretical aspect, two informative generalization bounds are derived, and the GLS learner is proved to be sufficiently close to optimal target model from the Bayesian perspective. The main results show the insufficiency of invariant representation learning, and prove the sufficiency and necessity of GLS correction for generalization, which provide theoretical supports and innovations for exploring generalizable model under dataset shift. From methodological aspect, we provide a unified view of existing shift correction frameworks, and propose a kernel embedding-based correction algorithm (KECA) to minimize the generalization error and achieve successful knowledge transfer. Both theoretical results and extensive experiment evaluations demonstrate the sufficiency and necessity of GLS correction for addressing dataset shift and the superiority of proposed algorithm.

Updated: 2024-06-24 12:47:21

标题: 当不变表示学习遇到标签变化：不足和理论洞见

摘要: 作为朝向具有变化环境的真实学习场景的关键步骤，数据集转移理论和不变表示学习算法已被广泛研究，以放宽经典学习设置中相同分布假设。在对转移分布的基本假设中，广义标签转移（GLS）是最新发展的一个，显示出处理转移中的复杂因素的巨大潜力。本文旨在探讨当前数据集转移理论和算法的局限性，并通过对GLS的全面理解提供新的见解。从理论角度出发，推导出两个信息泛化界限，并从贝叶斯视角证明GLS学习者与最优目标模型足够接近。主要结果显示不变表示学习的不足，并证明GLS校正对泛化的充分性和必要性，为探索在数据集转移下具有泛化能力的模型提供了理论支持和创新。从方法论角度，我们提供了现有转移校正框架的统一视角，并提出了一种基于核嵌入的校正算法（KECA），以最小化泛化误差并实现成功的知识转移。理论结果和广泛的实验评估都证明了GLS校正对解决数据集转移的充分性和必要性，以及所提出算法的优越性。

更新时间: 2024-06-24 12:47:21

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.16608v1

Cherry on the Cake: Fairness is NOT an Optimization Problem

Fair cake-cutting is a mathematical subfield that studies the problem of fairly dividing a resource among a number of participants. The so-called ``cake,'' as an object, represents any resource that can be distributed among players. This concept is connected to supervised multi-label classification: any dataset can be thought of as a cake that needs to be distributed, where each label is a player that receives its share of the dataset. In particular, any efficient cake-cutting solution for the dataset is equivalent to an optimal decision function. Although we are not the first to demonstrate this connection, the important ramifications of this parallel seem to have been partially forgotten. We revisit these classical results and demonstrate how this connection can be prolifically used for fairness in machine learning problems. Understanding the set of achievable fair decisions is a fundamental step in finding optimal fair solutions and satisfying fairness requirements. By employing the tools of cake-cutting theory, we have been able to describe the behavior of optimal fair decisions, which, counterintuitively, often exhibit quite unfair properties. Specifically, in order to satisfy fairness constraints, it is sometimes preferable, in the name of optimality, to purposefully make mistakes and deny giving the positive label to deserving individuals in a community in favor of less worthy individuals within the same community. This practice is known in the literature as cherry-picking and has been described as ``blatantly unfair.''

Updated: 2024-06-24 12:46:16

标题: 标题翻译：蛋糕上的樱桃：公平不是一个优化问题

摘要: 公平的蛋糕切割是研究如何公平地将资源分配给多个参与者的数学子领域。所谓的“蛋糕”作为一个对象，代表着可以分配给玩家的任何资源。这个概念与监督多标签分类有关：任何数据集都可以被视为需要分配的蛋糕，其中每个标签都是一个接收数据集份额的玩家。特别地，对于数据集的任何有效的蛋糕切割解决方案相当于一个最优决策函数。尽管我们并不是第一个展示这种联系的人，但这种并行的重要影响似乎已经部分被遗忘。我们重新审视这些经典结果，并展示了如何利用这种联系来解决机器学习中的公平性问题。理解可实现的公平决策集是寻找最优公平解决方案和满足公平要求的基本步骤。通过运用蛋糕切割理论的工具，我们能够描述最优公平决策的行为，这在直觉上往往表现出相当不公平的特性。具体而言，为了满足公平约束，有时在优化的名义下，有意犯错并拒绝将积极标签给予社区中值得的个体，而偏向于社区内不那么值得的个体，这种做法在文献中被称为“挑樱桃”并被描述为“明显不公平”。

更新时间: 2024-06-24 12:46:16

领域: cs.LG,cs.CY,cs.GT

下载: http://arxiv.org/abs/2406.16606v1

CLEAR: Can Language Models Really Understand Causal Graphs?

Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we develop a framework to define causal graph understanding, by assessing language models' behaviors through four practical criteria derived from diverse disciplines (e.g., philosophy and psychology). We then develop CLEAR, a novel benchmark that defines three complexity levels and encompasses 20 causal graph-based tasks across these levels. Finally, based on our framework and benchmark, we conduct extensive experiments on six leading language models and summarize five empirical findings. Our results indicate that while language models demonstrate a preliminary understanding of causal graphs, significant potential for improvement remains. Our project website is at https://github.com/OpenCausaLab/CLEAR.

Updated: 2024-06-24 12:46:15

标题: CLEAR: 语言模型真的能理解因果图吗？

摘要: 因果推理是人类解释世界的基石。为了对因果关系进行建模和推理，因果图提供了一种简洁而有效的解决方案。鉴于语言模型取得了令人瞩目的进展，一个关键问题出现了：它们是否真正理解因果图？为此，我们开展了一项关于语言模型对因果图理解的研究。具体来说，我们开发了一个框架来定义因果图理解，通过从不同学科（如哲学和心理学）衍生出的四个实用标准评估语言模型的行为。然后，我们开发了CLEAR，一个新颖的基准，定义了三个复杂性水平，并涵盖了20个跨越这些水平的基于因果图的任务。最后，基于我们的框架和基准，我们对六种主要语言模型进行了广泛的实验，并总结了五个经验结果。我们的项目网站位于https://github.com/OpenCausaLab/CLEAR。

更新时间: 2024-06-24 12:46:15

领域: cs.CL,cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.16605v1

LLMs Are Few-Shot In-Context Low-Resource Language Learners

In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages using only short in-context information, offering a crucial avenue for narrowing the gap between high-resource and low-resource languages. Nonetheless, there is only a handful of works explored ICL for low-resource languages with most of them focusing on relatively high-resource languages, such as French and Spanish. In this work, we extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages. Our study not only assesses the effectiveness of ICL with LLMs in low-resource languages but also identifies the shortcomings of in-context label alignment, and introduces a more effective alternative: query alignment. Moreover, we provide valuable insights into various facets of ICL for low-resource languages. Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs through semantically relevant information by closing the language gap in the target language and aligning the semantics between the targeted low-resource and the high-resource language that the model is proficient in. Our work highlights the importance of advancing ICL research, particularly for low-resource languages. Our code is publicly released at https://github.com/SamuelCahyawijaya/in-context-alignment

Updated: 2024-06-24 12:41:52

标题: LLMs是少量样本上下文低资源语言学习者

摘要: 上下文学习（ICL）赋予大型语言模型（LLMs）在少数被低估语言中执行多样任务的能力，仅使用短小的上下文信息，为缩小高资源语言和低资源语言之间差距提供了重要途径。然而，只有少数研究探讨了ICL用于低资源语言，其中大部分集中于相对高资源语言，如法语和西班牙语。在这项工作中，我们广泛研究了ICL及其跨语言变体（X-ICL）在25种低资源语言和7种相对较高资源语言上的应用。我们的研究不仅评估了LLMs在低资源语言中使用ICL的有效性，还发现了在上下文标签对齐中的缺陷，并引入了更有效的替代方案：查询对齐。此外，我们提供了有关ICL在低资源语言中各个方面的宝贵见解。我们的研究结论强调了少样本上下文信息对提高LLMs对低资源理解质量的重要性，通过关闭目标语言中的语言差距，将模型擅长的高资源语言与目标低资源语言之间的语义进行对齐。我们的工作突出了推进ICL研究的重要性，特别是对于低资源语言。我们的代码已公开发布在https://github.com/SamuelCahyawijaya/in-context-alignment。

更新时间: 2024-06-24 12:41:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.16512v4

Measuring the Recyclability of Electronic Components to Assist Automatic Disassembly and Sorting Waste Printed Circuit Boards

The waste of electrical and electronic equipment has been increased due to the fast evolution of technology products and competition of many IT sectors. Every year millions of tons of electronic waste are thrown into the environment which causes high consequences for human health. Therefore, it is crucial to control this waste flow using technology, especially using Artificial Intelligence but also reclamation of critical raw materials for new production processes. In this paper, we focused on the measurement of recyclability of waste electronic components (WECs) from waste printed circuit boards (WPCBs) using mathematical innovation model. This innovative approach evaluates both the recyclability and recycling difficulties of WECs, integrating an AI model for improved disassembly and sorting. Assessing the recyclability of individual electronic components present on WPCBs provides insight into the recovery potential of valuable materials and indicates the level of complexity involved in recycling in terms of economic worth and production utility. This novel measurement approach helps AI models in accurately determining the number of classes to be identified and sorted during the automated disassembly of discarded PCBs. It also facilitates the model in iterative training and validation of individual electronic components.

Updated: 2024-06-24 12:33:56

标题: 衡量电子元件的可回收性以帮助自动拆卸和分类废旧印刷电路板

摘要: 电子和电器设备的废弃物由于技术产品的快速发展和许多IT部门的竞争而增加。每年都有数百万吨的电子废物被丢弃到环境中，对人类健康造成严重后果。因此，控制这种废物流的使用技术至关重要，特别是使用人工智能，同时也回收关键原材料用于新的生产过程。本文重点研究了利用数学创新模型对废旧印刷电路板（WPCBs）中的废弃电子组件（WECs）的可回收性进行测量。这种创新方法评估了WECs的可回收性和回收难度，整合了用于改进拆卸和分拣的AI模型。评估WPCBs上存在的个别电子组件的可回收性可揭示有价值材料的回收潜力，并指示在回收过程中涉及的复杂性水平，以经济价值和生产效用来衡量。这种新颖的测量方法有助于AI模型准确确定在废弃PCB的自动拆卸过程中要识别和分类的类别数量。它还促进了模型对个别电子组件进行迭代训练和验证。

更新时间: 2024-06-24 12:33:56

领域: cs.CV,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.16593v1

CLUE: A Clinical Language Understanding Evaluation for LLMs

Large Language Models (LLMs) are expected to significantly contribute to patient care, diagnostics, and administrative processes. Emerging biomedical LLMs aim to address healthcare-specific challenges, including privacy demands and computational constraints. Assessing the models' suitability for this sensitive application area is of the utmost importance. However, evaluation has primarily been limited to non-clinical tasks, which do not reflect the complexity of practical clinical applications. To fill this gap, we present the Clinical Language Understanding Evaluation (CLUE), a benchmark tailored to evaluate LLMs on clinical tasks. CLUE includes six tasks to test the practical applicability of LLMs in complex healthcare settings. Our evaluation includes a total of $25$ LLMs. In contrast to previous evaluations, CLUE shows a decrease in performance for nine out of twelve biomedical models. Our benchmark represents a step towards a standardized approach to evaluating and developing LLMs in healthcare to align future model development with the real-world needs of clinical application. We open-source all evaluation scripts and datasets for future research at https://github.com/TIO-IKIM/CLUE.

Updated: 2024-06-24 12:32:41

标题: 线索：用于LLMs的临床语言理解评估

摘要: 大型语言模型（LLMs）预计将显著促进患者护理、诊断和行政流程。新兴的生物医学LLMs旨在解决医疗保健领域的特定挑战，包括隐私要求和计算限制。评估这些模型在这一敏感应用领域的适用性至关重要。然而，评估主要局限于非临床任务，这些任务并不反映实际临床应用的复杂性。为填补这一空白，我们提出了临床语言理解评估（CLUE），这是一个专门用于评估LLMs在临床任务上的基准。CLUE包括六个任务，用于测试LLMs在复杂医疗环境中的实际适用性。我们的评估包括总共25个LLMs。与先前的评估相比，CLUE显示出十二个生物医学模型中有九个性能下降。我们的基准代表了朝着在医疗保健领域评估和开发LLMs的标准化方法迈出的一步，以使未来的模型开发与临床应用的真实需求保持一致。我们开源所有评估脚本和数据集，以供未来研究使用，网址为https://github.com/TIO-IKIM/CLUE。

更新时间: 2024-06-24 12:32:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.04067v3

Forecasting with Deep Learning: Beyond Average of Average of Average Performance

Accurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. We hypothesize that averaging performance over all samples dilutes relevant information about the relative performance of models. Particularly, conditions in which this relative performance is different than the overall accuracy. We address this limitation by proposing a novel framework for evaluating univariate time series forecasting models from multiple perspectives, such as one-step ahead forecasting versus multi-step ahead forecasting. We show the advantages of this framework by comparing a state-of-the-art deep learning approach with classical forecasting techniques. While classical methods (e.g. ARIMA) are long-standing approaches to forecasting, deep neural networks (e.g. NHITS) have recently shown state-of-the-art forecasting performance in benchmark datasets. We conducted extensive experiments that show NHITS generally performs best, but its superiority varies with forecasting conditions. For instance, concerning the forecasting horizon, NHITS only outperforms classical approaches for multi-step ahead forecasting. Another relevant insight is that, when dealing with anomalies, NHITS is outperformed by methods such as Theta. These findings highlight the importance of aspect-based model evaluation.

Updated: 2024-06-24 12:28:22

标题: 使用深度学习进行预测：超越平均性能的平均值

摘要: 准确评估预测模型对于确保可靠的预测至关重要。目前评估和比较预测模型的当前做法集中在将性能总结为单一分数上，使用指标如SMAPE。我们假设对所有样本的性能进行平均会淡化关于模型相对性能的相关信息。特别是，在这种相对性能与整体准确性不同的情况下。我们通过提出一个新颖的框架来评估多个角度的单变量时间序列预测模型，例如一步向前预测与多步向前预测。我们通过将最先进的深度学习方法与经典预测技术进行比较，展示了这一框架的优势。虽然经典方法（如ARIMA）是长期以来的预测方法，但深度神经网络（如NHITS）最近在基准数据集中展现出最先进的预测性能。我们进行了大量实验，结果表明NHITS通常表现最佳，但其优越性在不同的预测条件下有所不同。例如，就预测视野而言，NHITS仅在多步向前预测时胜过经典方法。另一个相关的见解是，在处理异常值时，NHITS被Theta等方法超越。这些发现强调了基于方面的模型评估的重要性。

更新时间: 2024-06-24 12:28:22

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.16590v1

Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA

Reliable uncertainty estimation plays a crucial role in various safety-critical applications such as medical diagnosis and autonomous driving. In recent years, Bayesian neural networks (BayesNNs) have gained substantial research and industrial interests due to their capability to make accurate predictions with reliable uncertainty estimation. However, the algorithmic complexity and the resulting hardware performance of BayesNNs hinder their adoption in real-life applications. To bridge this gap, this paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads while achieving high accuracy and quality of uncertainty estimation. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient multi-exit BayesNNs. Several optimization techniques such as the mix of spatial and temporal mappings are introduced to reduce resource consumption and improve the overall hardware performance. Comprehensive experiments demonstrate that our approach can achieve higher energy efficiency compared to CPU, GPU, and other state-of-the-art hardware implementations. To support the future development of this research, we have open-sourced our code at: https://github.com/os-hxfan/MCME_FPGA_Acc.git

Updated: 2024-06-24 12:25:04

标题: 在FPGA上使用多路出口增强基于辍学的贝叶斯神经网络

摘要: 可靠的不确定性估计在各种安全关键应用中起着至关重要的作用，如医学诊断和自动驾驶。近年来，由于贝叶斯神经网络（BayesNNs）具有准确预测和可靠不确定性估计的能力，它们在研究和工业领域引起了广泛兴趣。然而，BayesNNs的算法复杂性及由此产生的硬件性能阻碍了它们在实际应用中的采用。为了弥合这一差距，本文提出了一种算法和硬件协同设计框架，可以生成基于现场可编程门阵列（FPGA）的加速器，用于高效的BayesNNs。在算法层面上，我们提出了一种新颖的基于多出口退出的Dropout的BayesNNs，减少了计算和存储开销，同时实现了高准确性和不确定性估计质量。在硬件层面上，本文引入了一个转换框架，可以为所提出的高效多出口BayesNNs生成FPGA加速器。多种优化技术，如空间和时间映射的混合，被引入以减少资源消耗并提高整体硬件性能。全面的实验表明，我们的方法相比于CPU、GPU和其他最先进的硬件实现，能够实现更高的能源效率。为了支持未来的研究发展，我们已将我们的代码开源在：https://github.com/os-hxfan/MCME_FPGA_Acc.git

更新时间: 2024-06-24 12:25:04

领域: cs.LG

下载: http://arxiv.org/abs/2406.14593v2

Identifying Easy Instances to Improve Efficiency of ML Pipelines for Algorithm-Selection

Algorithm-selection (AS) methods are essential in order to obtain the best performance from a portfolio of solvers over large sets of instances. However, many AS methods rely on an analysis phase, e.g. where features are computed by sampling solutions and used as input in a machine-learning model. For AS to be efficient, it is therefore important that this analysis phase is not computationally expensive. We propose a method for identifying easy instances which can be solved quickly using a generalist solver without any need for algorithm-selection. This saves computational budget associated with feature-computation which can then be used elsewhere in an AS pipeline, e.g., enabling additional function evaluations on hard problems. Experiments on the BBOB dataset in two settings (batch and streaming) show that identifying easy instances results in substantial savings in function evaluations. Re-allocating the saved budget to hard problems provides gains in performance compared to both the virtual best solver (VBS) computed with the original budget, the single best solver (SBS) and a trained algorithm-selector.

Updated: 2024-06-24 12:25:04

标题: 识别易处理实例以提高算法选择的机器学习流程效率

摘要: 算法选择（AS）方法对于在大量实例集合上获得最佳性能的解算器组合至关重要。然而，许多AS方法依赖于分析阶段，例如通过对解决方案进行采样计算特征，并将其用作机器学习模型的输入。为了使AS高效，因此这个分析阶段不应该计算成本过高是非常重要的。我们提出了一种方法来识别可以快速解决的简单实例，无需进行任何算法选择。这样可以节省与特征计算相关的计算预算，这样的预算可以用在AS管道的其他地方，例如，在难题上进行额外的函数评估。在BBOB数据集上进行的两种设置（批处理和流式处理）的实验证明，识别简单实例可以显著节省函数评估。将节省的预算重新分配给困难问题可以提高性能，而与原始预算计算的虚拟最佳解算器（VBS）、最佳解算器（SBS）和训练过的算法选择器相比也可以获得更好的性能。

更新时间: 2024-06-24 12:25:04

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2406.16999v1

Personalized federated learning based on feature fusion

Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy. However, due to the heterogeneity of data, models, and devices, the final global model may need to perform better for tasks on each client. Communication bottlenecks, data heterogeneity, and model heterogeneity have been common challenges in federated learning. In this work, we considered a label distribution skew problem, a type of data heterogeneity easily overlooked. In the context of classification, we propose a personalized federated learning approach called pFedPM. In our process, we replace traditional gradient uploading with feature uploading, which helps reduce communication costs and allows for heterogeneous client models. These feature representations play a role in preserving privacy to some extent. We use a hyperparameter $a$ to mix local and global features, which enables us to control the degree of personalization. We also introduced a relation network as an additional decision layer, which provides a non-linear learnable classifier to predict labels. Experimental results show that, with an appropriate setting of $a$, our scheme outperforms several recent FL methods on MNIST, FEMNIST, and CRIFAR10 datasets and achieves fewer communications.

Updated: 2024-06-24 12:16:51

标题: 基于特征融合的个性化联邦学习

摘要: 联邦学习使分布式客户端能够在保护客户隐私的同时合作进行训练。然而，由于数据、模型和设备的异质性，最终的全局模型可能需要在每个客户端的任务上表现更好。通信瓶颈、数据异质性和模型异质性一直是联邦学习中常见的挑战。在这项工作中，我们考虑了标签分布倾斜问题，这是一种容易被忽视的数据异质性类型。在分类的背景下，我们提出了一种名为pFedPM的个性化联邦学习方法。在我们的过程中，我们用特征上传代替传统的梯度上传，这有助于减少通信成本并允许异质性客户端模型。这些特征表示在一定程度上起到了保护隐私的作用。我们使用超参数$a$来混合本地和全局特征，这使我们能够控制个性化程度。我们还引入了一个关系网络作为额外的决策层，提供一个非线性可学习的分类器来预测标签。实验结果表明，在适当设置$a$的情况下，我们的方案在MNIST、FEMNIST和CRIFAR10数据集上优于几种最近的FL方法，并实现较少的通信。

更新时间: 2024-06-24 12:16:51

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.16583v1

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-making; ii) mastering agile control of locomotion and path planning; iii) developing advanced cognition to execute long-term objectives. QuadrupedGPT processes human command and environmental contexts using a large multimodal model (LMM). Empowered by its extensive knowledge base, our agent autonomously assigns appropriate parameters for adaptive locomotion policies and guides the agent in planning a safe but efficient path towards the goal, utilizing semantic-aware terrain analysis. Moreover, QuadrupedGPT is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals through high-level reasoning. Extensive experiments across various benchmarks confirm that QuadrupedGPT can adeptly handle multiple tasks with intricate instructions, demonstrating a significant step towards the versatile quadruped agents in open-ended worlds. Our website and codes can be found at https://quadruped-hub.github.io/Quadruped-GPT/.

Updated: 2024-06-24 12:14:24

标题: 四足GPT：朝向开放世界中多才多艺的四足代理的发展

摘要: 尽管宠物提供了陪伴，但它们有限的智力限制了与人类的高级推理和自主互动。考虑到这一点，我们提出了QuadrupedGPT，这是一个多才多艺的代理人，旨在精通广泛的复杂任务，其灵活性可与宠物相媲美。为了实现这一目标，主要挑战包括：i)有效利用多模态观察进行决策；ii)掌握灵活的运动控制和路径规划；iii)开发高级认知以执行长期目标。QuadrupedGPT使用大型多模态模型（LMM）处理人类指令和环境上下文。凭借其丰富的知识库，我们的代理人自主分配适当的参数用于适应性运动策略，并引导代理人规划一条安全但高效的路径，利用语义感知地形分析。此外，QuadrupedGPT具有解决问题的能力，使其能够通过高层推理将长期目标分解为一系列可执行的子目标。通过对各种基准测试的广泛实验，证实QuadrupedGPT能够灵活处理多个任务和复杂指令，展示了朝着开放世界中多才多艺的四足代理人迈出的重要一步。我们的网站和代码可以在https://quadruped-hub.github.io/Quadruped-GPT/找到。

更新时间: 2024-06-24 12:14:24

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.16578v1

Differentiable Distributionally Robust Optimization Layers

In recent years, there has been a growing research interest in decision-focused learning, which embeds optimization problems as a layer in learning pipelines and demonstrates a superior performance than the prediction-focused approach. However, for distributionally robust optimization (DRO), a popular paradigm for decision-making under uncertainty, it is still unknown how to embed it as a layer, i.e., how to differentiate decisions with respect to an ambiguity set. In this paper, we develop such differentiable DRO layers for generic mixed-integer DRO problems with parameterized second-order conic ambiguity sets and discuss its extension to Wasserstein ambiguity sets. To differentiate the mixed-integer decisions, we propose a novel dual-view methodology by handling continuous and discrete parts of decisions via different principles. Specifically, we construct a differentiable energy-based surrogate to implement the dual-view methodology and use importance sampling to estimate its gradient. We further prove that such a surrogate enjoys the asymptotic convergency under regularization. As an application of the proposed differentiable DRO layers, we develop a novel decision-focused learning pipeline for contextual distributionally robust decision-making tasks and compare it with the prediction-focused approach in experiments.

Updated: 2024-06-24 12:09:19

标题: 可微的分布鲁棒优化层

摘要: 近年来，人们对以决策为焦点的学习表现出越来越浓厚的研究兴趣，该方法将优化问题嵌入到学习流程中，并展示出比以预测为焦点的方法更出色的表现。然而，对于在不确定性下进行决策制定的流行范式——分布鲁棒优化（DRO），仍然不清楚如何将其嵌入为一个层，即如何根据模糊集合对决策进行微分。在本文中，我们为具有参数化二阶锥模糊集合的通用混合整数DRO问题开发了可微分DRO层，并讨论了其扩展到Wasserstein模糊集合的方法。为了区分混合整数决策，我们提出了一种新颖的双视角方法论，通过不同的原则处理决策的连续和离散部分。具体来说，我们构建了一个可微能量基准替代品来实现双视角方法，并使用重要性抽样来估计其梯度。我们进一步证明，这样的替代品在正则化下具有渐近收敛性。作为所提出的可微分DRO层的一个应用，我们为上下文分布鲁棒决策任务开发了一个新颖的以决策为焦点的学习流程，并在实验中与以预测为焦点的方法进行比较。

更新时间: 2024-06-24 12:09:19

领域: math.OC,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.16571v1

A Survey on Neural Topic Models: Methods, Applications, and Challenges

Topic models have been prevalent for decades to discover latent topics and infer topic proportions of documents in an unsupervised fashion. They have been widely used in various applications like text analysis and context recommendation. Recently, the rise of neural networks has facilitated the emergence of a new research field -- Neural Topic Models (NTMs). Different from conventional topic models, NTMs directly optimize parameters without requiring model-specific derivations. This endows NTMs with better scalability and flexibility, resulting in significant research attention and plentiful new methods and applications. In this paper, we present a comprehensive survey on neural topic models concerning methods, applications, and challenges. Specifically, we systematically organize current NTM methods according to their network structures and introduce the NTMs for various scenarios like short texts and bilingual documents. We also discuss a wide range of popular applications built on NTMs. Finally, we highlight the challenges confronted by NTMs to inspire future research. We accompany this survey with a repository for easier access to the mentioned paper resources: https://github.com/bobxwu/Paper-Neural-Topic-Models.

Updated: 2024-06-24 12:08:06

标题: 一个关于神经主题模型的调查：方法、应用和挑战

摘要: 主题模型已经盛行了几十年，以无监督的方式发现文档的潜在主题并推断主题比例。它们已被广泛应用于文本分析和上下文推荐等各种应用领域。最近，神经网络的兴起促进了一个新的研究领域 -- 神经主题模型（NTMs）。与传统主题模型不同，NTMs直接优化参数而无需模型特定的派生。这赋予了NTMs更好的可扩展性和灵活性，导致了大量新的方法和应用的研究关注。在本文中，我们对神经主题模型进行了全面调查，涉及方法、应用和挑战。具体来说，我们根据它们的网络结构系统地组织目前的NTM方法，并介绍适用于短文本和双语文档等各种场景的NTMs。我们还讨论了建立在NTMs上的各种流行应用程序。最后，我们强调了NTMs所面临的挑战，以激发未来的研究。我们配合这项调查提供了一个存储库，以便更容易访问提到的论文资源：https://github.com/bobxwu/Paper-Neural-Topic-Models。

更新时间: 2024-06-24 12:08:06

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2401.15351v2

AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image

The process of estimating and counting tree density using only a single aerial or satellite image is a difficult task in the fields of photogrammetry and remote sensing. However, it plays a crucial role in the management of forests. The huge variety of trees in varied topography severely hinders tree counting models to perform well. The purpose of this paper is to propose a framework that is learnt from the source domain with sufficient labeled trees and is adapted to the target domain with only a limited number of labeled trees. Our method, termed as AdaTreeFormer, contains one shared encoder with a hierarchical feature extraction scheme to extract robust features from the source and target domains. It also consists of three subnets: two for extracting self-domain attention maps from source and target domains respectively and one for extracting cross-domain attention maps. For the latter, an attention-to-adapt mechanism is introduced to distill relevant information from different domains while generating tree density maps; a hierarchical cross-domain feature alignment scheme is proposed that progressively aligns the features from the source and target domains. We also adopt adversarial learning into the framework to further reduce the gap between source and target domains. Our AdaTreeFormer is evaluated on six designed domain adaptation tasks using three tree counting datasets, \ie Jiangsu, Yosemite, and London. Experimental results show that AdaTreeFormer significantly surpasses the state of the art, \eg in the cross domain from the Yosemite to Jiangsu dataset, it achieves a reduction of 15.9 points in terms of the absolute counting errors and an increase of 10.8\% in the accuracy of the detected trees' locations. The codes and datasets are available at \emph{\color{magenta}{https://github.com/HAAClassic/AdaTreeFormer}}.

Updated: 2024-06-24 12:07:52

标题: AdaTreeFormer：从单个高分辨率图像进行树木计数的少样本域自适应

摘要: 使用仅一幅航空或卫星图像来估计和计算树木密度的过程在摄影测量和遥感领域中是一项困难的任务。然而，这在森林管理中起着至关重要的作用。各种各样的树木和多样的地形严重阻碍了树木计数模型的良好表现。本文旨在提出一个从具有足够标记树木的源域学习并适应仅有有限数量标记树木的目标域的框架。我们的方法被称为AdaTreeFormer，包含一个共享编码器，具有分层特征提取方案，从源域和目标域中提取稳健特征。它还包括三个子网络：两个分别用于从源域和目标域提取自域关注图，一个用于提取跨域关注图。对于后者，引入了一种关注适应机制，从不同域中提取相关信息，同时生成树木密度图；还提出了一种分层跨域特征对齐方案，逐步将源域和目标域的特征对齐。我们还将对抗学习引入框架中，进一步减少源域和目标域之间的差距。我们的AdaTreeFormer在三个树木计数数据集（江苏、约塞米蒂和伦敦）上进行了六个设计的域自适应任务评估。实验结果表明，AdaTreeFormer明显超越了现有技术，例如在从约塞米蒂到江苏数据集的跨域任务中，绝对计数误差减少了15.9个点，检测到的树木位置准确率提高了10.8％。代码和数据集可在\emph{\color{magenta}{https://github.com/HAAClassic/AdaTreeFormer}}上获得。

更新时间: 2024-06-24 12:07:52

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.02956v2

Generation of Asset Administration Shell with Large Language Model Agents: Toward Semantic Interoperability in Digital Twins in the Context of Industry 4.0

This research introduces a novel approach for achieving semantic interoperability in digital twins and assisting the creation of Asset Administration Shell (AAS) as digital twin model within the context of Industry 4.0. The foundational idea of our research is that the communication based on semantics and the generation of meaningful textual data are directly linked, and we posit that these processes are equivalent if the exchanged information can be serialized in text form. Based on this, we construct a "semantic node" data structure in our research to capture the semantic essence of textual data. Then, a system powered by large language models is designed and implemented to process the "semantic node" and generate standardized digital twin models from raw textual data collected from datasheets describing technical assets. Our evaluation demonstrates an effective generation rate of 62-79%, indicating a substantial proportion of the information from the source text can be translated error-free to the target digital twin instance model with the generative capability of large language models. This result has a direct application in the context of Industry 4.0, and the designed system is implemented as a data model generation tool for reducing the manual effort in creating AAS model. In our evaluation, a comparative analysis of different LLMs and an in-depth ablation study of Retrieval-Augmented Generation (RAG) mechanisms provide insights into the effectiveness of LLM systems for interpreting technical concepts and translating data. Our findings emphasize LLMs' capability to automate AAS instance creation and contribute to the broader field of semantic interoperability for digital twins in industrial applications. The prototype implementation and evaluation results are presented on our GitHub Repository: https://github.com/YuchenXia/AASbyLLM.

Updated: 2024-06-24 12:04:06

标题: 使用大型语言模型代理生成资产管理外壳：在工业4.0背景下实现数字孪生的语义互操作性

摘要: 这项研究介绍了一种新颖的方法，用于实现数字孪生中的语义互操作性，并协助在工业4.0背景下创建资产管理外壳（AAS）作为数字孪生模型。我们研究的基本理念是，基于语义的通信和生成有意义的文本数据直接相关，我们认为，如果交换的信息可以以文本形式序列化，这些过程是等效的。基于此，我们在研究中构建了一个“语义节点”数据结构，以捕获文本数据的语义本质。然后，设计并实施了一个由大型语言模型驱动的系统，用于处理“语义节点”并从描述技术资产的数据表中收集的原始文本数据生成标准化的数字孪生模型。我们的评估表明，生成率为62-79％，表明源文本中的信息的相当大比例可以利用大型语言模型的生成能力无误地转换为目标数字孪生实例模型。这一结果在工业4.0背景下有直接应用，所设计的系统被实施为用于减少创建AAS模型中的手动工作量的数据模型生成工具。在我们的评估中，对不同LLMs进行了比较分析，并对检索增强生成（RAG）机制进行了深入的消融研究，为LLM系统解释技术概念和翻译数据的有效性提供了洞见。我们的发现强调了LLMs自动化AAS实例创建的能力，并为工业应用中数字孪生的语义互操作性领域做出了贡献。我们在GitHub仓库上呈现了原型实现和评估结果：https://github.com/YuchenXia/AASbyLLM。

更新时间: 2024-06-24 12:04:06

领域: cs.AI,cs.IR,cs.MA,cs.SE

下载: http://arxiv.org/abs/2403.17209v4

Knowledge Accumulation in Continually Learned Representations and the Issue of Feature Forgetting

Continual learning research has shown that neural networks suffer from catastrophic forgetting "at the output level", but it is debated whether this is also the case at the level of learned representations. Multiple recent studies ascribe representations a certain level of innate robustness against forgetting -- that they only forget minimally in comparison with forgetting at the output level. We revisit and expand upon the experiments that revealed this difference in forgetting and illustrate the coexistence of two phenomena that affect the quality of continually learned representations: knowledge accumulation and feature forgetting. Taking both aspects into account, we show that, even though forgetting in the representation (i.e. feature forgetting) can be small in absolute terms, when measuring relative to how much was learned during a task, forgetting in the representation tends to be just as catastrophic as forgetting at the output level. Next we show that this feature forgetting is problematic as it substantially slows down the incremental learning of good general representations (i.e. knowledge accumulation). Finally, we study how feature forgetting and knowledge accumulation are affected by different types of continual learning methods.

Updated: 2024-06-24 12:03:00

标题: 持续学习表示中的知识积累与特征遗忘问题

摘要: 持续学习研究表明，神经网络在“输出层”遭受灾难性遗忘，但关于在学习表示层是否也存在这种情况存在争议。最近的多个研究认为，表示具有一定程度的天生抗遗忘性，即它们仅在与输出层遗忘相比时略微遗忘。我们重新审视并扩展了揭示这种遗忘差异的实验，并说明了两种影响持续学习表示质量的现象的共存：知识积累和特征遗忘。考虑到这两个方面，我们表明，即使在绝对值上，表示中的遗忘（即特征遗忘）可能很小，但在衡量相对于任务学习量的情况下，表示中的遗忘往往与输出层的遗忘一样灾难性。接下来，我们表明，这种特征遗忘是有问题的，因为它明显减缓了对好的通用表示进行增量学习（即知识积累）。最后，我们研究了不同类型的持续学习方法对特征遗忘和知识积累的影响。

更新时间: 2024-06-24 12:03:00

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2304.00933v4

Noisy Neighbors: Efficient membership inference attacks against LLMs

The potential of transformer-based LLMs risks being hindered by privacy concerns due to their reliance on extensive datasets, possibly including sensitive information. Regulatory measures like GDPR and CCPA call for using robust auditing tools to address potential privacy issues, with Membership Inference Attacks (MIA) being the primary method for assessing LLMs' privacy risks. Differently from traditional MIA approaches, often requiring computationally intensive training of additional models, this paper introduces an efficient methodology that generates \textit{noisy neighbors} for a target sample by adding stochastic noise in the embedding space, requiring operating the target model in inference mode only. Our findings demonstrate that this approach closely matches the effectiveness of employing shadow models, showing its usability in practical privacy auditing scenarios.

Updated: 2024-06-24 12:02:20

标题: 吵闹的邻居：针对LLMs的高效成员推断攻击

摘要: 基于变压器的LLMs的潜力可能会受到隐私问题的阻碍，因为它们依赖于包含可能包含敏感信息的大量数据集。像GDPR和CCPA这样的监管措施要求使用强大的审计工具来解决潜在的隐私问题，其中成员推断攻击（MIA）是评估LLMs隐私风险的主要方法。与通常需要计算密集型训练额外模型的传统MIA方法不同，本文介绍了一种有效的方法，通过在嵌入空间中添加随机噪声为目标样本生成“嘈杂邻居”，只需要将目标模型运行在推断模式下。我们的研究结果表明，这种方法与使用阴影模型的效果相近，显示了它在实际隐私审计场景中的可用性。

更新时间: 2024-06-24 12:02:20

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.16565v1

Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against simulations with and without $\fnl$ and systematics, showing superior performance of the neural network treatment. The neural network with a set of nine imaging property maps passes our systematic null test criteria, and is chosen as the fiducial treatment. Assuming the universality relation, we find $\fnl = 34^{+24(+50)}_{-44(-73)}$ at 68\%(95\%) confidence. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. We study how the regression method biases the measured angular power-spectrum and degrades the $\fnl$ constraining power. The use of the nine maps more than doubles the uncertainty compared to using only the three primary maps in the regression. Our results thus motivate the development of more efficient methods that avoid over-correction, protect large-scale clustering information, and preserve constraining power. Additionally, our results encourage further studies of $\fnl$ with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics and lessen the degradation in the $\fnl$ uncertainty.

Updated: 2024-06-24 11:56:28

标题: 来自亮红星系光度DESI的大尺度聚类的本地原始非高斯性

摘要: 我们利用暗能量光谱仪（DESI）成像调查中的明亮红色星系的角相关性来约束本地原始非高斯性参数$\fnl$。我们的样本包括超过1200万个目标，覆盖了14,000平方度的天空，红移范围为$0.2< z < 1.35$。我们确定了银河消光、调查深度和天文学观测作为系统误差的主要来源，并采用线性回归和人工神经网络来缓解大尺度上的非宇宙学过度聚类。我们的方法经过了带有和不带有$\fnl$和系统误差的模拟测试，显示出神经网络处理的卓越性能。具有九个成像特性图的神经网络通过了我们的系统空检验标准，并被选择为信托处理。假设普遍关系，我们发现在68\%(95\%)的置信水平下$\fnl = 34^{+24(+50)}_{-44(-73)}$。我们进行了一系列稳健性测试（例如，对成像、赤纬或使用的尺度进行截取），结果显示所得到的约束是一致的。我们研究了回归方法如何偏离测得的角功率谱并降低$\fnl$的约束能力。使用九张图比仅使用回归中的三张主要图像图大大增加了不确定性。因此，我们的结果促使开发更高效的方法，避免过度校正，保护大尺度聚类信息，并保留约束力。此外，我们的结果鼓励进一步研究DESI光谱样本中的$\fnl$，其中包括3D聚类模式应有助于区分成像系统误差并减少$\fnl$不确定性的降级。

更新时间: 2024-06-24 11:56:28

领域: astro-ph.CO,cs.LG,physics.comp-ph,physics.data-an

下载: http://arxiv.org/abs/2307.01753v2

Sigma-point Kalman Filter with Nonlinear Unknown Input Estimation via Optimization and Data-driven Approach for Dynamic Systems

Most works on joint state and unknown input (UI) estimation require the assumption that the UIs are linear; this is potentially restrictive as it does not hold in many intelligent autonomous systems. To overcome this restriction and circumvent the need to linearize the system, we propose a derivative-free Unknown Input Sigma-point Kalman Filter (SPKF-nUI) where the SPKF is interconnected with a general nonlinear UI estimator that can be implemented via nonlinear optimization and data-driven approaches. The nonlinear UI estimator uses the posterior state estimate which is less susceptible to state prediction error. In addition, we introduce a joint sigma-point transformation scheme to incorporate both the state and UI uncertainties in the estimation of SPKF-nUI. An in-depth stochastic stability analysis proves that the proposed SPKF-nUI yields exponentially converging estimation error bounds under reasonable assumptions. Finally, two case studies are carried out on a simulation-based rigid robot and a physical soft robot, i.e., robots made of soft materials with complex dynamics to validate effectiveness of the proposed filter on nonlinear dynamic systems. Our results demonstrate that the proposed SPKF-nUI achieves the lowest state and UI estimation errors when compared to the existing nonlinear state-UI filters.

Updated: 2024-06-24 11:56:15

标题: 非线性未知输入估计的σ点卡尔曼滤波器：动态系统优化和数据驱动方法

摘要: 大多数关于联合状态和未知输入（UI）估计的作品都要求UI是线性的；这可能是限制性的，因为在许多智能自主系统中并不成立。为了克服这一限制并避免需要线性化系统，我们提出了一种无导数未知输入Sigma点卡尔曼滤波器（SPKF-nUI），其中SPKF与一个通用的非线性UI估计器相互连接，可以通过非线性优化和数据驱动方法实现。非线性UI估计器使用后验状态估计，对状态预测误差不太敏感。此外，我们引入了一个联合Sigma点变换方案，将状态和UI的不确定性结合在SPKF-nUI的估计中。深入的随机稳定性分析证明了在合理假设下，所提出的SPKF-nUI在指数收敛估计误差边界方面表现出色。最后，在基于模拟的刚性机器人和实际软机器人上进行了两个案例研究，即由复杂动态软材料制成的机器人，以验证所提出的滤波器在非线性动态系统上的有效性。我们的结果表明，与现有的非线性状态-UI滤波器相比，所提出的SPKF-nUI在状态和UI估计误差方面表现出最佳。

更新时间: 2024-06-24 11:56:15

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2306.12361v2

Efficient k-means with Individual Fairness via Exponential Tilting

In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve individual fairness in clustering. We integrate the exponential tilting into the sum of squared errors (SSE) to formulate a novel objective function called tilted SSE. We demonstrate that the tilted SSE can generalize to SSE and employ the coordinate descent and first-order gradient method for optimization. We propose a novel fairness metric, the variance of the distances within each cluster, which can alleviate the Matthew Effect typically caused by existing fairness metrics. Our theoretical analysis demonstrates that the well-known k-means++ incurs a multiplicative error of O(k log k), and we establish the convergence of TKM under mild conditions. In terms of fairness, we prove that the variance generated by TKM decreases with a scaled hyperparameter. In terms of efficiency, we demonstrate the time complexity is linear with the dataset size. Our experiments demonstrate that TKM outperforms state-of-the-art methods in effectiveness, fairness, and efficiency.

Updated: 2024-06-24 11:50:31

标题: 通过指数倾斜实现个体公平性的高效k均值算法

摘要: 在基于位置的资源分配场景中，希望每个个体与设施之间的距离大致相等，从而确保公平性。通常采用个体公平聚类来实现对所有点均等处理的原则，这在这些场景中可以应用。本文提出了一种新颖的算法，倾斜k均值（TKM），旨在实现聚类中的个体公平性。我们将指数倾斜集成到平方误差（SSE）中，制定了一种称为倾斜SSE的新目标函数。我们证明了倾斜SSE能够推广到SSE，并采用坐标下降和一阶梯度方法进行优化。我们提出了一种新颖的公平度量标准，即每个簇内距离的方差，可以缓解现有公平度量标准通常引起的马太效应。我们的理论分析表明，众所周知的k-means++会产生一个O(k log k)的乘法误差，我们建立了TKM在温和条件下的收敛性。在公平性方面，我们证明TKM生成的方差随着一个缩放超参数的减小而减小。在效率方面，我们证明时间复杂度与数据集大小成线性关系。我们的实验表明，TKM在效果、公平性和效率方面优于最先进的方法。

更新时间: 2024-06-24 11:50:31

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2406.16557v1

Homomorphisms and Embeddings of STRIPS Planning Models

Determining whether two STRIPS planning instances are isomorphic is the simplest form of comparison between planning instances. It is also a particular case of the problem concerned with finding an isomorphism between a planning instance $P$ and a sub-instance of another instance $P_0$ . One application of such a mapping is to efficiently produce a compiled form containing all solutions to P from a compiled form containing all solutions to $P_0$. We also introduce the notion of embedding from an instance $P$ to another instance $P_0$, which allows us to deduce that $P_0$ has no solution-plan if $P$ is unsolvable. In this paper, we study the complexity of these problems. We show that the first is GI-complete, and can thus be solved, in theory, in quasi-polynomial time. While we prove the remaining problems to be NP-complete, we propose an algorithm to build an isomorphism, when possible. We report extensive experimental trials on benchmark problems which demonstrate conclusively that applying constraint propagation in preprocessing can greatly improve the efficiency of a SAT solver.

Updated: 2024-06-24 11:43:18

标题: STRIPS规划模型的同态和嵌入

摘要: 确定两个STRIPS规划实例是否同构是规划实例之间最简单的比较形式。这也是一个特定问题的一种情况，该问题涉及在一个规划实例$P$和另一个实例$P_0$的子实例之间找到同构的问题。这种映射的一个应用是从包含所有$P_0$的解的编译形式中有效地生成包含所有P的解的编译形式。我们还引入了从一个实例$P$到另一个实例$P_0$的嵌入概念，这使我们能够推断如果$P$无解，则$P_0$没有解决方案。在本文中，我们研究了这些问题的复杂性。我们表明第一个问题是GI完全的，因此在理论上可以在准多项式时间内解决。虽然我们证明了其余问题是NP完全的，但我们提出了一个算法来建立同构，如果可能的话。我们对基准问题进行了广泛的实验试验，结果明确表明在预处理中应用约束传播可以极大地提高SAT求解器的效率。

更新时间: 2024-06-24 11:43:18

领域: cs.AI

下载: http://arxiv.org/abs/2406.16555v1

Inference of Sequential Patterns for Neural Message Passing in Temporal Graphs

The modelling of temporal patterns in dynamic graphs is an important current research issue in the development of time-aware GNNs. Whether or not a specific sequence of events in a temporal graph constitutes a temporal pattern not only depends on the frequency of its occurrence. We consider whether it deviates from what is expected in a temporal graph where timestamps are randomly shuffled. While accounting for such a random baseline is important to model temporal patterns, it has mostly been ignored by current temporal graph neural networks. To address this issue we propose HYPA-DBGNN, a novel two-step approach that combines (i) the inference of anomalous sequential patterns in time series data on graphs based on a statistically principled null model, with (ii) a neural message passing approach that utilizes a higher-order De Bruijn graph whose edges capture overrepresented sequential patterns. Our method leverages hypergeometric graph ensembles to identify anomalous edges within both first- and higher-order De Bruijn graphs, which encode the temporal ordering of events. The model introduces an inductive bias that enhances model interpretability. We evaluate our approach for static node classification using benchmark datasets and a synthetic dataset that showcases its ability to incorporate the observed inductive bias regarding over- and under-represented temporal edges. We demonstrate the framework's effectiveness in detecting similar patterns within empirical datasets, resulting in superior performance compared to baseline methods in node classification tasks. To the best of our knowledge, our work is the first to introduce statistically informed GNNs that leverage temporal and causal sequence anomalies. HYPA-DBGNN represents a path for bridging the gap between statistical graph inference and neural graph representation learning, with potential applications to static GNNs.

Updated: 2024-06-24 11:41:12

标题: 在时间图中进行神经消息传递的顺序模式推断

摘要: 动态图中时间模式的建模是发展时间感知GNN时的一个重要的当前研究问题。在一个时间图中特定事件序列是否构成时间模式不仅取决于其发生频率。我们考虑它是否偏离了在时间图中时间戳被随机洗牌时的预期。尽管考虑这样一个随机基线对于建模时间模式很重要，但目前的时间图神经网络大多忽视了这一点。为了解决这个问题，我们提出了HYPA-DBGNN，这是一种新颖的两步方法，结合了（i）基于统计原则的空模型对图上时间序列数据中的异常序列模式进行推断，以及（ii）利用高阶De Bruijn图的神经消息传递方法，其边捕捉了过度表示的序列模式。我们的方法利用超几何图集合来识别第一和高阶De Bruijn图中的异常边，这些边编码了事件的时间顺序。该模型引入了一种归纳偏差，增强了模型的可解释性。我们使用基准数据集和一个展示其能够整合观察到的关于过度和不足表示的时间边的合成数据集，评估了我们的方法用于静态节点分类。我们展示了该框架在检测经验数据集中相似模式方面的有效性，其在节点分类任务中相比基准方法表现更好。据我们所知，我们的工作是第一个引入统计启发式GNN的工作，利用时间和因果序列异常。HYPA-DBGNN代表了一个桥梁，可以弥合统计图推断和神经图表示学习之间的差距，具有潜在的应用于静态GNN。

更新时间: 2024-06-24 11:41:12

领域: cs.LG,cs.AI,cs.SI,stat.ML

下载: http://arxiv.org/abs/2406.16552v1

A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework with Gray Code Representation

Flight Trajectory Prediction (FTP) is an essential task in Air Traffic Control (ATC), which can assist air traffic controllers in managing airspace more safely and efficiently. Existing approaches generally perform multi-horizon FTP tasks in an autoregressive manner, thereby suffering from error accumulation and low-efficiency problems. In this paper, a novel framework, called FlightBERT++, is proposed to i) forecast multi-horizon flight trajectories directly in a non-autoregressive way, and ii) improve the limitation of the binary encoding (BE) representation in the FlightBERT framework. Specifically, the proposed framework is implemented by a generalized encoder-decoder architecture, in which the encoder learns the temporal-spatial patterns from historical observations and the decoder predicts the flight status for the future horizons. Compared to conventional architecture, an innovative horizon-aware contexts generator is dedicatedly designed to consider the prior horizon information, which further enables non-autoregressive multi-horizon prediction. Additionally, the Gray code representation and the differential prediction paradigm are designed to cope with the high-bit misclassifications of the BE representation, which significantly reduces the outliers in the predictions. Moreover, a differential prompted decoder is proposed to enhance the capability of the differential predictions by leveraging the stationarity of the differential sequence. Extensive experiments are conducted to validate the proposed framework on a real-world flight trajectory dataset. The experimental results demonstrated that the proposed framework outperformed the competitive baselines in both FTP performance and computational efficiency.

Updated: 2024-06-24 11:34:19

标题: 一个使用灰色编码表示的非自回归多时间段飞行轨迹预测框架

摘要: 飞行轨迹预测（FTP）是空中交通管制（ATC）中的一项重要任务，可以帮助空中交通管制员更安全、更有效地管理空域。现有方法通常以自回归方式执行多时域FTP任务，因此存在误差累积和低效率问题。本文提出了一种新颖的框架，称为FlightBERT++，旨在 i）以非自回归方式直接预测多时域飞行轨迹，以及 ii）改进FlightBERT框架中二进制编码（BE）表示的局限性。具体而言，所提出的框架通过通用编码器-解码器架构实现，其中编码器从历史观测中学习时间-空间模式，解码器预测未来时域的飞行状态。与传统架构相比，专门设计了一种创新的时域感知上下文生成器，以考虑先前时域信息，进一步实现非自回归多时域预测。此外，设计了灰码表示和差分预测范式，以应对BE表示的高位误分类，显著减少了预测中的异常值。此外，提出了一种差分提示解码器，通过利用差分序列的平稳性增强差分预测的能力。在真实飞行轨迹数据集上进行了广泛实验证实所提出的框架。实验结果表明，所提出的框架在FTP性能和计算效率方面均优于竞争基线。

更新时间: 2024-06-24 11:34:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2305.01658v4

Improving robustness to corruptions with multiplicative weight perturbations

Deep neural networks (DNNs) excel on clean images but struggle with corrupted ones. Incorporating specific corruptions into the data augmentation pipeline can improve robustness to those corruptions but may harm performance on clean images and other types of distortion. In this paper, we introduce an alternative approach that improves the robustness of DNNs to a wide range of corruptions without compromising accuracy on clean images. We first demonstrate that input perturbations can be mimicked by multiplicative perturbations in the weight space. Leveraging this, we propose Data Augmentation via Multiplicative Perturbation (DAMP), a training method that optimizes DNNs under random multiplicative weight perturbations. We also examine the recently proposed Adaptive Sharpness-Aware Minimization (ASAM) and show that it optimizes DNNs under adversarial multiplicative weight perturbations. Experiments on image classification datasets (CIFAR-10/100, TinyImageNet and ImageNet) and neural network architectures (ResNet50, ViT-S/16) show that DAMP enhances model generalization performance in the presence of corruptions across different settings. Notably, DAMP is able to train a ViT-S/16 on ImageNet from scratch, reaching the top-1 error of 23.7% which is comparable to ResNet50 without extensive data augmentations.

Updated: 2024-06-24 11:20:44

标题: 用乘法权重扰动提高对损坏的鲁棒性

摘要: 深度神经网络（DNNs）在清晰图像上表现出色，但在受损图像上表现不佳。将特定的破坏性因素纳入数据增强流程可以提高对这些破坏性因素的鲁棒性，但可能会损害清晰图像和其他类型失真的性能。在本文中，我们介绍了一种改进DNN对各种破坏性因素的鲁棒性而不影响清晰图像准确性的替代方法。我们首先证明了输入扰动可以被权重空间中的乘法扰动所模拟。利用这一点，我们提出了通过乘法扰动进行数据增强（DAMP）的训练方法，该方法优化了DNN在随机乘法权重扰动下的表现。我们还研究了最近提出的自适应锐度感知最小化（ASAM）方法，并展示了它在对抗性乘法权重扰动下如何优化DNN。在图像分类数据集（CIFAR-10/100、TinyImageNet和ImageNet）和神经网络架构（ResNet50、ViT-S/16）上的实验表明，DAMP在不同设置下提高了模型对各种破坏性因素的泛化性能。值得注意的是，DAMP能够从头开始训练一个ViT-S/16在ImageNet上，达到了23.7%的top-1错误率，这与没有广泛数据增强的ResNet50相当。

更新时间: 2024-06-24 11:20:44

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.16540v1

Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. Therefore, we propose Character-Adapter, a plug-and-play framework designed to generate images that preserve the details of reference characters, ensuring high-fidelity consistency. Character-Adapter employs prompt-guided segmentation to ensure fine-grained regional features of reference characters and dynamic region-level adapters to mitigate concept confusion. Extensive experiments are conducted to validate the effectiveness of Character-Adapter. Both quantitative and qualitative results demonstrate that Character-Adapter achieves the state-of-the-art performance of consistent character generation, with an improvement of 24.8% compared with other methods

Updated: 2024-06-24 11:16:37

标题: 性格适配器：面部引导区域控制用于高保真性格定制

摘要: 定制图像生成旨在合成具有一致性特征的图像，对于故事叙述、肖像生成和角色设计等应用具有重要意义。然而，先前的方法在保留高保真一致性角色方面遇到挑战，原因在于特征提取不充分和参考角色概念混淆。因此，我们提出了Character-Adapter，这是一个设计用于生成保留参考角色细节的图像、确保高保真一致性的即插即用框架。Character-Adapter采用提示引导分割，以确保参考角色的细粒度区域特征，并使用动态区域级适配器来减轻概念混淆。进行了大量实验证明Character-Adapter的有效性。定量和定性结果均表明，与其他方法相比，Character-Adapter实现了一致性角色生成的最新性能，提高了24.8%。

更新时间: 2024-06-24 11:16:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16537v1

Token-based Decision Criteria Are Suboptimal in In-context Learning

In-Context Learning (ICL) typically utilizes classification criteria from probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM's last hidden states. In detail, we use the nearest centroid classification on the hidden states, assigning the category of the nearest centroid previously observed from a few-shot calibration set to the test sample as the predicted label. Our experiments on 3 models and 10 classification datasets indicate that Hidden Calibration consistently outperforms current token-based calibrations by about 20%. Our further analysis demonstrates that Hidden Calibration finds better classification criteria with less inter-categories overlap, and LMs provide linearly separable intra-category clusters with the help of demonstrations, which supports Hidden Calibration and gives new insights into the conventional ICL.

Updated: 2024-06-24 11:16:26

标题: 基于标记的决策标准在上下文学习中并不是最佳选择

摘要: 上下文学习（ICL）通常利用手动选择的标签令牌的概率作为分类标准。然而，我们认为基于令牌的分类标准会导致次优的决策边界，尽管通过翻译和受限旋转进行精细校准。为了解决这个问题，我们提出了隐藏校准，放弃了令牌概率，而是在LM的最后一个隐藏状态上使用最近的质心分类器。具体来说，我们在隐藏状态上使用最近的质心分类，将之前从少样本校准集中观察到的最近的质心的类别分配给测试样本作为预测标签。我们在3个模型和10个分类数据集上的实验表明，隐藏校准始终比当前基于令牌的校准表现出约20%的优势。我们进一步分析表明，隐藏校准找到了更好的分类标准，减少了类别间的重叠，而LM通过示范提供线性可分的类内聚类，从而支持隐藏校准，并为传统ICL提供了新的见解。

更新时间: 2024-06-24 11:16:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16535v1

Conditional Bayesian Quadrature

We propose a novel approach for estimating conditional or parametric expectations in the setting where obtaining samples or evaluating integrands is costly. Through the framework of probabilistic numerical methods (such as Bayesian quadrature), our novel approach allows to incorporates prior information about the integrands especially the prior smoothness knowledge about the integrands and the conditional expectation. As a result, our approach provides a way of quantifying uncertainty and leads to a fast convergence rate, which is confirmed both theoretically and empirically on challenging tasks in Bayesian sensitivity analysis, computational finance and decision making under uncertainty.

Updated: 2024-06-24 11:09:08

标题: 条件贝叶斯积分学

摘要: 我们提出了一种新颖的方法，用于在获取样本或评估积分项成本高昂的情况下估计条件或参数期望。通过概率数值方法的框架（如贝叶斯积分），我们的新方法允许结合关于积分项的先验信息，特别是关于积分项和条件期望的先验平滑性知识。因此，我们的方法提供了一种量化不确定性的方式，并在贝叶斯敏感性分析、计算金融和不确定性决策等具有挑战性的任务中在理论和实证上得到了快速收敛率的确认。

更新时间: 2024-06-24 11:09:08

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2406.16530v1

SyROCCo: Enhancing Systematic Reviews using Machine Learning

The sheer number of research outputs published every year makes systematic reviewing increasingly time- and resource-intensive. This paper explores the use of machine learning techniques to help navigate the systematic review process. ML has previously been used to reliably 'screen' articles for review - that is, identify relevant articles based on reviewers' inclusion criteria. The application of ML techniques to subsequent stages of a review, however, such as data extraction and evidence mapping, is in its infancy. We therefore set out to develop a series of tools that would assist in the profiling and analysis of 1,952 publications on the theme of 'outcomes-based contracting'. Tools were developed for the following tasks: assign publications into 'policy area' categories; identify and extract key information for evidence mapping, such as organisations, laws, and geographical information; connect the evidence base to an existing dataset on the same topic; and identify subgroups of articles that may share thematic content. An interactive tool using these techniques and a public dataset with their outputs have been released. Our results demonstrate the utility of ML techniques to enhance evidence accessibility and analysis within the systematic review processes. These efforts show promise in potentially yielding substantial efficiencies for future systematic reviewing and for broadening their analytical scope. Our work suggests that there may be implications for the ease with which policymakers and practitioners can access evidence. While ML techniques seem poised to play a significant role in bridging the gap between research and policy by offering innovative ways of gathering, accessing, and analysing data from systematic reviews, we also highlight their current limitations and the need to exercise caution in their application, particularly given the potential for errors and biases.

Updated: 2024-06-24 11:04:43

标题: SyROCCo：利用机器学习增强系统评价

摘要: 每年发表的研究成果数量庞大，使系统性审查变得越来越耗时和资源密集。本文探讨了利用机器学习技术来帮助进行系统性审查过程。机器学习先前已被用于可靠地对文章进行筛选 - 即根据审阅者的纳入标准识别相关文章。然而，将ML技术应用于审查的后续阶段，如数据提取和证据映射，还处于初期阶段。因此，我们设法开发了一系列工具，以协助对1952篇关于“基于结果的合同”主题的出版物进行概要和分析。为以下任务开发了工具：将出版物分配到“政策领域”类别中；识别和提取证据映射的关键信息，如组织、法律和地理信息；将证据基础连接到同一主题的现有数据集；并识别可能共享主题内容的文章子组。利用这些技术和一个公共数据集发布了一个交互式工具及其输出。我们的结果表明，ML技术对增强系统性审查过程中证据的可访问性和分析具有实用性。这些努力有望为未来系统性审查提供实质效率，并扩大其分析范围。我们的工作表明，这可能对政策制定者和实践者访问证据的便利性产生影响。虽然机器学习技术似乎已准备好在研究和政策之间发挥重要作用，提供创新的方法来收集、访问和分析系统性审查中的数据，但我们也强调它们目前的局限性，以及在应用时需要谨慎，特别是考虑到可能出现错误和偏见的潜在性。

更新时间: 2024-06-24 11:04:43

领域: cs.CL,cs.CY,cs.DL,cs.LG

下载: http://arxiv.org/abs/2406.16527v1

NARRepair: Non-Autoregressive Code Generation Model for Automatic Program Repair

With the advancement of deep learning techniques, the performance of Automatic Program Repair(APR) techniques has reached a new level. Previous deep learning-based APR techniques essentially modified program sentences in the Autoregressive(AR) manner, which predicts future values based on past values. Due to the manner of word-by-word generation, the AR-based APR technique has a huge time delay. This negative consequence overshadows the widespread adoption of APR techniques in real-life software development. To address the issue, we aim to apply the Non-Autoregressive(NAR) method to the APR task, which can output target code in a parallel manner to avoid huge inference delays. To effectively adapt the NAR manner for the APR task, we in this paper propose NARRepair, the first customized NAR code generation model for the APR task. The NARRepair features three major novelties, including 1) using repair actions to alleviate the over-correction issue, 2) extracting dependency information from AST to alleviate the issue of lacking inter-word dependency information, 3) employing two-stage decoding to alleviate the issue of lacking contextual information. We evaluated NARRepair on three widely used datasets in the APR community, and the results show that our technique can significantly improve the inference speed while maintaining high repair accuracy.

Updated: 2024-06-24 11:04:28

标题: NARRepair：用于自动程序修复的非自回归代码生成模型

摘要: 随着深度学习技术的进步，自动程序修复（APR）技术的性能已经达到了一个新水平。先前基于深度学习的APR技术基本上采用了自回归（AR）方式修改程序语句，该方式根据过去的值预测未来的值。由于逐字生成的方式，基于AR的APR技术存在巨大的时间延迟。这种负面后果遮蔽了APR技术在现实软件开发中的广泛应用。为解决这个问题，我们旨在将非自回归（NAR）方法应用于APR任务，该方法可以以并行方式输出目标代码，避免巨大的推理延迟。为了有效地将NAR方式应用于APR任务，本文提出了NARRepair，这是第一个为APR任务定制的NAR代码生成模型。NARRepair具有三个主要的创新点，包括1）使用修复操作来缓解过度校正问题，2）从AST中提取依赖信息来缓解缺乏单词间依赖信息的问题，3）采用两阶段解码来缓解缺乏上下文信息的问题。我们在APR社区中使用了三个广泛使用的数据集对NARRepair进行了评估，结果表明我们的技术可以显著提高推理速度同时保持高修复准确性。

更新时间: 2024-06-24 11:04:28

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.16526v1

OAML: Outlier Aware Metric Learning for OOD Detection Enhancement

Out-of-distribution (OOD) detection methods have been developed to identify objects that a model has not seen during training. The Outlier Exposure (OE) methods use auxiliary datasets to train OOD detectors directly. However, the collection and learning of representative OOD samples may pose challenges. To tackle these issues, we propose the Outlier Aware Metric Learning (OAML) framework. The main idea of our method is to use the k-NN algorithm and Stable Diffusion model to generate outliers for training at the feature level without making any distributional assumptions. To increase feature discrepancies in the semantic space, we develop a mutual information-based contrastive learning approach for learning from OOD data effectively. Both theoretical and empirical results confirm the effectiveness of this contrastive learning technique. Furthermore, we incorporate knowledge distillation into our learning framework to prevent degradation of in-distribution classification accuracy. The combination of contrastive learning and knowledge distillation algorithms significantly enhances the performance of OOD detection. Experimental results across various datasets show that our method significantly outperforms previous OE methods.

Updated: 2024-06-24 11:01:43

标题: OAML：异常点感知度度量学习用于OOD检测增强

摘要: 已经开发出了用于识别模型在训练过程中未见过的对象的离群检测（OOD）方法。异常暴露（OE）方法使用辅助数据集直接训练OOD检测器。然而，收集和学习代表性OOD样本可能存在挑战。为了解决这些问题，我们提出了异常感知度度量学习（OAML）框架。我们方法的主要思想是利用k-NN算法和稳定扩散模型，在特征级别生成用于训练的异常值，而不做任何分布假设。为了增加语义空间中的特征差异，我们开发了基于互信息的对比学习方法，有效地从OOD数据中学习。理论和实证结果都证实了这种对比学习技术的有效性。此外，我们将知识蒸馏技术融入到我们的学习框架中，以防止对内部分类准确性的降级。对比学习和知识蒸馏算法的结合显著提升了OOD检测的性能。跨越各种数据集的实验结果表明，我们的方法明显优于先前的OE方法。

更新时间: 2024-06-24 11:01:43

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.16525v1

Carrot and Stick: Inducing Self-Motivation with Positive & Negative Feedback

Positive thinking is thought to be an important component of self-motivation in various practical fields such as education and the workplace. Previous work, including sentiment transfer and positive reframing, has focused on the positive side of language. However, self-motivation that drives people to reach their goals has not yet been studied from a computational perspective. Moreover, negative feedback has not yet been explored, even though positive and negative feedback are both necessary to grow self-motivation. To facilitate self-motivation, we propose CArrot and STICk (CASTIC) dataset, consisting of 12,590 sentences with 5 different strategies for enhancing self-motivation. Our data and code are publicly available at here.

Updated: 2024-06-24 10:55:31

标题: 胡萝卜和棒棒：通过积极与消极反馈诱导自我动机

摘要: 积极思考被认为是自我激励在教育和工作场合等各种实际领域中的重要组成部分。先前的研究，包括情感转移和积极转化，侧重于语言的积极方面。然而，驱使人们实现目标的自我激励尚未从计算的角度进行研究。此外，尽管积极和消极反馈对于培养自我激励都是必要的，但负面反馈尚未被探讨。为了促进自我激励，我们提出了CArrot and STICk（CASTIC）数据集，包括12,590个句子，采用5种不同的策略来增强自我激励。我们的数据和代码可以在这里公开获取。

更新时间: 2024-06-24 10:55:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16521v1

A Survey of Large Language Models for Graphs

Graphs are an essential data structure utilized to represent relationships in real-world scenarios. Prior research has established that Graph Neural Networks (GNNs) deliver impressive outcomes in graph-centric tasks, such as link prediction and node classification. Despite these advancements, challenges like data sparsity and limited generalization capabilities continue to persist. Recently, Large Language Models (LLMs) have gained attention in natural language processing. They excel in language comprehension and summarization. Integrating LLMs with graph learning techniques has attracted interest as a way to enhance performance in graph learning tasks. In this survey, we conduct an in-depth review of the latest state-of-the-art LLMs applied in graph learning and introduce a novel taxonomy to categorize existing methods based on their framework design. We detail four unique designs: i) GNNs as Prefix, ii) LLMs as Prefix, iii) LLMs-Graphs Integration, and iv) LLMs-Only, highlighting key methodologies within each category. We explore the strengths and limitations of each framework, and emphasize potential avenues for future research, including overcoming current integration challenges between LLMs and graph learning techniques, and venturing into new application areas. This survey aims to serve as a valuable resource for researchers and practitioners eager to leverage large language models in graph learning, and to inspire continued progress in this dynamic field. We consistently maintain the related open-source materials at \url{https://github.com/HKUDS/Awesome-LLM4Graph-Papers}.

Updated: 2024-06-24 10:25:19

标题: 一个关于图形的大型语言模型的调查

摘要: 图表是一种用于表示真实世界场景中关系的基本数据结构。先前的研究已经确定，图神经网络（GNNs）在图中心任务中取得了令人印象深刻的成果，如链接预测和节点分类。尽管取得了这些进展，但数据稀疏和有限的泛化能力等挑战仍然存在。最近，大型语言模型（LLMs）在自然语言处理中引起了关注。它们在语言理解和总结方面表现出色。将LLMs与图学习技术整合起来已经引起了兴趣，作为提高图学习任务性能的一种方式。在本调查中，我们对最新的应用于图学习的LLMs的最新技术进行了深入审查，并介绍了一种基于其框架设计对现有方法进行分类的新颖分类法。我们详细介绍了四种独特的设计：i）GNNs作为前缀，ii）LLMs作为前缀，iii）LLMs-图集成，和iv）LLMs-Only，重点介绍了每个类别中的关键方法。我们探讨了每个框架的优势和局限性，并强调了未来研究的潜在途径，包括克服LLMs和图学习技术之间当前整合挑战，并涉足新的应用领域。这项调查旨在为渴望在图学习中利用大型语言模型的研究人员和从业者提供宝贵资源，并激发这一充满活力领域的持续进步。我们始终保持相关的开源资料在\url{https://github.com/HKUDS/Awesome-LLM4Graph-Papers}。

更新时间: 2024-06-24 10:25:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.08011v2

$\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning

Alphas are pivotal in providing signals for quantitative trading. The industry highly values the discovery of formulaic alphas for their interpretability and ease of analysis, compared with the expressive yet overfitting-prone black-box alphas. In this work, we focus on discovering formulaic alphas. Prior studies on automatically generating a collection of formulaic alphas were mostly based on genetic programming (GP), which is known to suffer from the problems of being sensitive to the initial population, converting to local optima, and slow computation speed. Recent efforts employing deep reinforcement learning (DRL) for alpha discovery have not fully addressed key practical considerations such as alpha correlations and validity, which are crucial for their effectiveness. In this work, we propose a novel framework for alpha discovery using DRL by formulating the alpha discovery process as program construction. Our agent, $\text{Alpha}^2$, assembles an alpha program optimized for an evaluation metric. A search algorithm guided by DRL navigates through the search space based on value estimates for potential alpha outcomes. The evaluation metric encourages both the performance and the diversity of alphas for a better final trading strategy. Our formulation of searching alphas also brings the advantage of pre-calculation dimensional analysis, ensuring the logical soundness of alphas, and pruning the vast search space to a large extent. Empirical experiments on real-world stock markets demonstrates $\text{Alpha}^2$'s capability to identify a diverse set of logical and effective alphas, which significantly improves the performance of the final trading strategy. The code of our method is available at https://github.com/x35f/alpha2.

Updated: 2024-06-24 10:21:29

标题: $\text{Alpha}^2$: 使用深度强化学习发现逻辑公式化的Alpha

摘要: 阿尔法因为提供量化交易的信号而至关重要。与黑箱型的阿尔法相比，行业高度重视发现公式化的阿尔法，因为后者更易解释和分析，而黑箱型的阿尔法则容易过拟合。本文致力于发现公式化的阿尔法。先前关于自动生成一系列公式化阿尔法的研究主要基于遗传规划（GP），但已知该方法存在初始种群敏感、转化为局部最优和计算速度缓慢等问题。最近尝试使用深度强化学习（DRL）进行阿尔法发现并未完全解决关键的实际考虑因素，如阿尔法之间的相关性和有效性，这对其有效性至关重要。本文提出了一种使用DRL进行阿尔法发现的新框架，将阿尔法发现过程形式化为程序构建。我们的代理机制$\text{Alpha}^2$组装了一个针对评估指标优化的阿尔法程序。由DRL引导的搜索算法根据潜在阿尔法结果的价值估计在搜索空间中导航。评估指标鼓励提高阿尔法的性能和多样性，以获得更好的最终交易策略。我们对阿尔法的搜索形式化也带来了预计算维度分析的优势，确保阿尔法的逻辑性，并在很大程度上剪枝了庞大的搜索空间。在真实股市的实证实验中，证明了$\text{Alpha}^2$能够识别一组逻辑和有效的阿尔法，显著提高了最终交易策略的性能。我们的方法代码可在https://github.com/x35f/alpha2找到。

更新时间: 2024-06-24 10:21:29

领域: q-fin.CP,cs.AI

下载: http://arxiv.org/abs/2406.16505v1

UNICAD: A Unified Approach for Attack Detection, Noise Reduction and Novel Class Identification

As the use of Deep Neural Networks (DNNs) becomes pervasive, their vulnerability to adversarial attacks and limitations in handling unseen classes poses significant challenges. The state-of-the-art offers discrete solutions aimed to tackle individual issues covering specific adversarial attack scenarios, classification or evolving learning. However, real-world systems need to be able to detect and recover from a wide range of adversarial attacks without sacrificing classification accuracy and to flexibly act in {\bf unseen} scenarios. In this paper, UNICAD, is proposed as a novel framework that integrates a variety of techniques to provide an adaptive solution. For the targeted image classification, UNICAD achieves accurate image classification, detects unseen classes, and recovers from adversarial attacks using Prototype and Similarity-based DNNs with denoising autoencoders. Our experiments performed on the CIFAR-10 dataset highlight UNICAD's effectiveness in adversarial mitigation and unseen class classification, outperforming traditional models.

Updated: 2024-06-24 10:10:03

标题: UNICAD：攻击检测、噪音减少和新类别识别的统一方法

摘要: 随着深度神经网络（DNNs）的广泛应用，它们对对抗性攻击的脆弱性和处理未知类别的局限性面临着重大挑战。目前的技术提供了针对特定对抗攻击场景、分类或演化学习的离散解决方案。然而，现实世界中的系统需要能够检测和从各种对抗攻击中恢复，而不牺牲分类准确性，并在未知场景中灵活行动。本文提出了一种名为UNICAD的新型框架，集成了各种技术，提供了一种自适应解决方案。对于目标图像分类，UNICAD利用原型和基于相似性的DNNs与去噪自编码器实现准确的图像分类，检测未知类别，并从对抗攻击中恢复。我们在CIFAR-10数据集上进行的实验突显了UNICAD在对抗缓解和未知类别分类方面的有效性，优于传统模型。

更新时间: 2024-06-24 10:10:03

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16501v1

Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning

While Large language models (LLMs) have the capability to iteratively reflect on their own outputs, recent studies have observed their struggles with knowledge-rich problems without access to external resources. In addition to the inefficiency of LLMs in self-assessment, we also observe that LLMs struggle to revisit their predictions despite receiving explicit negative feedback. Therefore, We propose Mirror, a Multiple-perspective self-reflection method for knowledge-rich reasoning, to avoid getting stuck at a particular reflection iteration. Mirror enables LLMs to reflect from multiple-perspective clues, achieved through a heuristic interaction between a Navigator and a Reasoner. It guides agents toward diverse yet plausibly reliable reasoning trajectory without access to ground truth by encouraging (1) diversity of directions generated by Navigator and (2) agreement among strategically induced perturbations in responses generated by the Reasoner. The experiments on five reasoning datasets demonstrate that Mirror's superiority over several contemporary self-reflection approaches. Additionally, the ablation study studies clearly indicate that our strategies alleviate the aforementioned challenges.

Updated: 2024-06-24 10:05:24

标题: 镜像：一种适用于知识丰富推理的多视角自我反思方法

摘要: 大型语言模型(LLMs)具有反思自己输出的能力，但最近的研究发现它们在没有外部资源的情况下处理知识丰富的问题时存在困难。除了LLMs在自我评估方面的低效性，我们还观察到LLMs尽管接收到明确的负面反馈，仍然难以重新审视其预测。因此，我们提出Mirror，一种用于处理知识丰富推理的多角度自我反思方法，以避免在特定反思迭代中陷入困境。Mirror通过导航器和推理者之间的启发式交互实现多角度线索的反思。它通过鼓励导航器生成多样化的方向和通过推理者产生的答案中策略性诱导扰动的一致性，引导代理向多样化但可能可靠的推理轨迹前进，而无需访问地面真相。对五个推理数据集的实验表明，Mirror在几种当代自我反思方法上具有优越性。此外，消融研究明确表明我们的策略有助于缓解前述挑战。

更新时间: 2024-06-24 10:05:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.14963v2

OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser

Recent research has shown that combining Mamba with Transformer architecture, which has selective state space and quadratic self-attention mechanism, outperforms using Mamba or Transformer architecture alone in language modeling tasks. The quadratic self-attention mechanism effectively alleviates the shortcomings of selective state space in handling long-term dependencies of any element in the sequence. We propose a position information injection method that connects the selective state space model with the quadratic attention, and integrates these two architectures with hybrid experts with cross-sharing domains, so that we can enjoy the advantages of both. We design a new architecture with a more biomimetic idea: Observer-Thinker-Conceiver-Expresser (OTCE), which can compete with well-known medium-scale open-source language models on a small scale in language modeling tasks.

Updated: 2024-06-24 10:05:23

标题: OTCE：混合SSM和注意力与跨领域专家混合以构建观察者-思考者-构想者-表达者

摘要: 最近的研究表明，将Mamba与具有选择性状态空间和二次自注意机制的Transformer架构相结合，在语言建模任务中的表现优于仅使用Mamba或Transformer架构。二次自注意机制有效地缓解了选择性状态空间在处理序列中任何元素的长期依赖性方面的缺点。我们提出了一种位置信息注入方法，将选择性状态空间模型与二次注意连接起来，并将这两种架构与具有交叉共享领域的混合专家一起整合，以便我们可以享受两者的优势。我们设计了一个具有更具生物启发意义的新架构：Observer-Thinker-Conceiver-Expresser（OTCE），它可以在语言建模任务的小规模中与众所周知的中等规模开源语言模型竞争。

更新时间: 2024-06-24 10:05:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16495v1

Wavelet Attention GRU for Efficient Industrial Gas Recognition with Novel Metrics

Gas recognition technology has received considerable attention from researchers in recent years. Nevertheless, the gas recognition area has faced obstacles in implementing deep learning-based recognition solutions due to the absence of standardized protocols. To tackle this problem, we suggest using two sets of specialized evaluation measures for gas recognition algorithms. These metrics will make it easier to examine the performance of these algorithms on various datasets. In addition, we provide a new model called the Wavelet Attention GRU (WAG), which is based on the wavelet attention mechanism. This method facilitates the more efficient retrieval of sensor signals. Compared to other models, WAG significantly decreases the number of sensors needed by 75% while obtaining an identification accuracy of 98.33%. This suggests that WAG is a potential approach for advancing gas recognition algorithms.

Updated: 2024-06-24 10:05:01

标题: 小波注意力GRU用于高效工业气体识别和新颖指标

摘要: 气体识别技术近年来受到研究人员的广泛关注。然而，由于缺乏标准化协议，气体识别领域在实施基于深度学习的识别解决方案方面面临障碍。为了解决这个问题，我们建议使用两组专门的评估指标来评估气体识别算法。这些指标将使我们更容易检查这些算法在各种数据集上的性能。此外，我们提供了一个新模型，称为Wavelet Attention GRU（WAG），该模型基于小波注意机制。这种方法有助于更有效地检索传感器信号。与其他模型相比，WAG将所需传感器数量减少了75%，同时获得了98.33%的识别准确率。这表明WAG是推进气体识别算法的潜在方法。

更新时间: 2024-06-24 10:05:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16997v1

Cross-domain Transfer of Valence Preferences via a Meta-optimization Approach

Cross-domain recommendation offers a potential avenue for alleviating data sparsity and cold-start problems. Embedding and mapping, as a classic cross-domain research genre, aims to identify a common mapping function to perform representation transformation between two domains. Nevertheless, previous coarse-grained preference representations, non-personalized mapping functions, and excessive reliance on overlapping users limit their performance, especially in scenarios where overlapping users are sparse. To address aforementioned challenges, we propose a novel cross-domain approach, namely CVPM. CVPM formalizes cross-domain interest transfer as a hybrid architecture of parametric meta-learning and self-supervised learning, which not only transfers user preferences at a finer level, but also enables signal enhancement with the knowledge of non-overlapping users. Specifically, with deep insights into user preferences and valence preference theory, we believe that there exists significant difference between users' positive preferences and negative behaviors, and thus employ differentiated encoders to learn their distributions. In particular, we further utilize the pre-trained model and item popularity to sample pseudo-interaction items to ensure the integrity of both distributions. To guarantee the personalization of preference transfer, we treat each user's mapping as two parts, the common transformation and the personalized bias, where the network used to generate the personalized bias is output by a meta-learner. Furthermore, in addition to the supervised loss for overlapping users, we design contrastive tasks for non-overlapping users from both group and individual-levels to avoid model skew and enhance the semantics of representations. Exhaustive data analysis and extensive experimental results demonstrate the effectiveness and advancement of our proposed framework.

Updated: 2024-06-24 10:02:24

标题: 通过元优化方法实现价值偏好的跨领域转移

摘要: 跨领域推荐为缓解数据稀疏和冷启动问题提供了潜在途径。作为经典的跨领域研究流派，嵌入和映射的目标是确定一个通用映射函数，以在两个领域之间执行表示转换。然而，先前的粗粒度偏好表示、非个性化映射函数以及过度依赖重叠用户限制了它们的性能，特别是在重叠用户稀疏的情况下。为了解决上述挑战，我们提出了一种新颖的跨领域方法，即CVPM。CVPM将跨领域兴趣转移形式化为参数元学习和自监督学习的混合架构，不仅在更细的层面上转移用户偏好，还能够利用非重叠用户的知识增强信号。具体来说，通过深入了解用户偏好和价值偏好理论，我们相信用户的正面偏好和负面行为之间存在显著差异，因此采用不同的编码器来学习它们的分布。特别地，我们进一步利用预训练模型和物品流行度来采样伪交互项目，以确保两个分布的完整性。为了保证偏好转移的个性化，我们将每个用户的映射视为两部分，即通用转换和个性化偏差，用于生成个性化偏差的网络由元学习器输出。此外，除了针对重叠用户的监督损失外，我们还为非重叠用户设计对比任务，从群体和个体级别避免模型偏斜，并增强表示的语义。详尽的数据分析和广泛的实验结果表明了我们提出的框架的有效性和先进性。

更新时间: 2024-06-24 10:02:24

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.16494v1

An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding

Genomic selection (GS), as a critical crop breeding strategy, plays a key role in enhancing food production and addressing the global hunger crisis. The predominant approaches in GS currently revolve around employing statistical methods for prediction. However, statistical methods often come with two main limitations: strong statistical priors and linear assumptions. A recent trend is to capture the non-linear relationships between markers by deep learning. However, as crop datasets are commonly long sequences with limited samples, the robustness of deep learning models, especially Transformers, remains a challenge. In this work, to unleash the unexplored potential of attention mechanism for the task of interest, we propose a simple yet effective Transformer-based framework that enables end-to-end training of the whole sequence. Via experiments on rice3k and wheat3k datasets, we show that, with simple tricks such as k-mer tokenization and random masking, Transformer can achieve overall superior performance against seminal methods on GS tasks of interest.

Updated: 2024-06-24 09:56:35

标题: 一个令人尴尬的简单方法：提升转换器在作物育种基因组选择中的性能

摘要: 基因组选择（GS）作为一种关键的作物育种策略，在增加粮食产量和解决全球饥饿危机方面发挥着重要作用。目前，GS中主要的方法围绕着使用统计方法进行预测。然而，统计方法往往存在两个主要限制：强大的统计先验和线性假设。最近的一个趋势是通过深度学习来捕捉标记物之间的非线性关系。然而，由于作物数据集通常是具有有限样本的长序列，因此深度学习模型的鲁棒性，特别是Transformers模型，仍然是一个挑战。在这项工作中，为了发挥注意力机制在感兴趣任务中的未开发潜力，我们提出了一个简单而有效的基于Transformer的框架，可以实现整个序列的端到端训练。通过对rice3k和wheat3k数据集的实验，我们展示了，通过简单的技巧如k-mer标记化和随机掩蔽，Transformer在GS任务中可以实现整体优越的性能，超过了传统方法。

更新时间: 2024-06-24 09:56:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.09585v3

Migrating Software Systems towards Post-Quantum-Cryptography -- A Systematic Literature Review

Networks such as the Internet are essential for our connected world. Quantum computing poses a threat to this heterogeneous infrastructure since it threatens fundamental security mechanisms. Therefore, a migration to post-quantum-cryptography (PQC) is necessary for networks and their components. At the moment, there is little knowledge on how such migrations should be structured and implemented in practice. Our systematic literature review addresses migration approaches for IP networks towards PQC. It surveys papers about the migration process and exemplary real-world software system migrations. On the process side, we found that terminology, migration steps, and roles are not defined precisely or consistently across the literature. Still, we identified four major phases and appropriate substeps which we matched with also emerging archetypes of roles. In terms of real-world migrations, we see that reports used several different PQC implementations and hybrid solutions for migrations of systems belonging to a wide range of system types. Across all papers we noticed three major challenges for adopters: missing experience of PQC and a high realization effort, concerns about the security of the upcoming system, and finally, high complexity. Our findings indicate that recent standardization efforts already push quantum-safe networking forward. However, the literature is still not in consensus about definitions and best practices. Implementations are mostly experimental and not necessarily practical, leading to an overall chaotic situation. To better grasp this fast moving field of (applied) research, our systematic literature review provides a comprehensive overview of its current state and serves as a starting point for delving into the matter of PQC migration.

Updated: 2024-06-24 09:49:46

标题: 将软件系统迁移到后量子密码学——系统文献综述

摘要: 网络如互联网对我们连接的世界至关重要。量子计算对这种异构基础设施构成威胁，因为它威胁到基本的安全机制。因此，网络及其组件需要迁移到后量子密码学（PQC）。目前，关于如何在实践中构建和实施这种迁移的知识很少。我们的系统文献综述探讨了IP网络向PQC迁移的方法。它调查了关于迁移过程和实际软件系统迁移的论文。在过程方面，我们发现术语、迁移步骤和角色在文献中没有被准确或一致地定义。尽管如此，我们确定了四个主要阶段和适当的子步骤，我们将其与不断出现的角色原型相匹配。在实际迁移方面，我们发现报告使用了多种不同的PQC实施和混合解决方案，用于迁移属于各种系统类型的系统。在所有论文中，我们注意到采用者面临三个主要挑战：缺乏PQC经验和高实现工作量，对即将到来的系统安全性的担忧，以及高复杂性。我们的发现表明，最近的标准化努力已经推动了量子安全网络的发展。然而，文献对定义和最佳实践仍未达成一致。实施大多是实验性的，不一定实用，导致整体处于混乱状态。为了更好地把握这个快速发展的（应用）研究领域，我们的系统文献综述提供了其当前状态的全面概述，并作为深入研究PQC迁移问题的起点。

更新时间: 2024-06-24 09:49:46

领域: cs.CR

下载: http://arxiv.org/abs/2404.12854v2

Towards Comprehensive Preference Data Collection for Reward Modeling

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models (LLMs) with human preferences, thereby enhancing the quality of responses generated. A critical component of RLHF is the reward model, which is trained on preference data and outputs a scalar reward during the inference stage. However, the collection of preference data still lacks thorough investigation. Recent studies indicate that preference data is collected either by AI or humans, where chosen and rejected instances are identified among pairwise responses. We question whether this process effectively filters out noise and ensures sufficient diversity in collected data. To address these concerns, for the first time, we propose a comprehensive framework for preference data collection, decomposing the process into four incremental steps: Prompt Generation, Response Generation, Response Filtering, and Human Labeling. This structured approach ensures the collection of high-quality preferences while reducing reliance on human labor. We conducted comprehensive experiments based on the data collected at different stages, demonstrating the effectiveness of the proposed data collection method.

Updated: 2024-06-24 09:40:39

标题: 朝向用于奖励建模的全面偏好数据收集

摘要: 人类反馈强化学习（RLHF）有助于将大型语言模型（LLMs）与人类偏好对齐，从而提高生成的响应质量。RLHF的一个关键组成部分是奖励模型，该模型在偏好数据上进行训练，并在推理阶段输出标量奖励。然而，偏好数据的收集仍然缺乏彻底的研究。最近的研究表明，偏好数据是由人工智能或人类收集的，选择和拒绝的实例在成对响应中被识别出来。我们质疑这个过程是否有效地过滤了噪音并确保了收集数据的足够多样性。为了解决这些问题，我们首次提出了一个全面的偏好数据收集框架，将该过程分解为四个增量步骤：提示生成、响应生成、响应过滤和人类标记。这种结构化方法确保了高质量偏好的收集，同时减少了对人力的依赖。我们基于不同阶段收集的数据进行了全面实验，展示了所提出的数据收集方法的有效性。

更新时间: 2024-06-24 09:40:39

领域: cs.AI

下载: http://arxiv.org/abs/2406.16486v1

Robust prediction under missingness shifts

Prediction becomes more challenging with missing covariates. What method is chosen to handle missingness can greatly affect how models perform. In many real-world problems, the best prediction performance is achieved by models that can leverage the informative nature of a value being missing. Yet, the reasons why a covariate goes missing can change once a model is deployed in practice. If such a missingness shift occurs, the conditional probability of a value being missing differs in the target data. Prediction performance in the source data may no longer be a good selection criterion, and approaches that do not rely on informative missingness may be preferable. However, we show that the Bayes predictor remains unchanged by ignorable shifts for which the probability of missingness only depends on observed data. Any consistent estimator of the Bayes predictor may therefore result in robust prediction under those conditions, although we show empirically that different methods appear robust to different types of shifts. If the missingness shift is non-ignorable, the Bayes predictor may change due to the shift. While neither approach recovers the Bayes predictor in this case, we found empirically that disregarding missingness was most beneficial when it was highly informative.

Updated: 2024-06-24 09:39:30

标题: 缺失数据变化下的稳健预测

摘要: 具有缺失协变量的预测变得更具挑战性。选择处理缺失的方法可以极大地影响模型的表现。在许多现实世界的问题中，最佳的预测性能是由能够利用缺失值的信息性质的模型实现的。然而，一旦模型在实践中部署，导致协变量缺失的原因可能会发生变化。如果发生这种缺失性转变，值得注意的是，目标数据中缺失值的条件概率会有所不同。在源数据中的预测性能可能不再是一个良好的选择标准，而不依赖于信息性缺失的方法可能更可取。然而，我们展示了贝叶斯预测器对于只依赖于观测数据的可忽略偏移保持不变。因此，任何贝叶斯预测器的一致估计量可能会在这些条件下导致稳健的预测，尽管我们在实证中发现不同的方法对不同类型的偏移表现出稳健性。如果缺失性转变是不可忽略的，那么由于此转变，贝叶斯预测器可能会发生变化。在这种情况下，两种方法都无法恢复贝叶斯预测器，但我们在实证中发现，当缺失性是高度信息性时，忽略缺失性是最有益的。

更新时间: 2024-06-24 09:39:30

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.16484v1

Improving Quaternion Neural Networks with Quaternionic Activation Functions

In this paper, we propose novel quaternion activation functions where we modify either the quaternion magnitude or the phase, as an alternative to the commonly used split activation functions. We define criteria that are relevant for quaternion activation functions, and subsequently we propose our novel activation functions based on this analysis. Instead of applying a known activation function like the ReLU or Tanh on the quaternion elements separately, these activation functions consider the quaternion properties and respect the quaternion space $\mathbb{H}$. In particular, all quaternion components are utilized to calculate all output components, carrying out the benefit of the Hamilton product in e.g. the quaternion convolution to the activation functions. The proposed activation functions can be incorporated in arbitrary quaternion valued neural networks trained with gradient descent techniques. We further discuss the derivatives of the proposed activation functions where we observe beneficial properties for the activation functions affecting the phase. Specifically, they prove to be sensitive on basically the whole input range, thus improved gradient flow can be expected. We provide an elaborate experimental evaluation of our proposed quaternion activation functions including comparison with the split ReLU and split Tanh on two image classification tasks using the CIFAR-10 and SVHN dataset. There, especially the quaternion activation functions affecting the phase consistently prove to provide better performance.

Updated: 2024-06-24 09:36:58

标题: 使用四元数激活函数改进四元数神经网络

摘要: 在这篇论文中，我们提出了新颖的四元数激活函数，其中我们修改四元数的幅度或相位，作为常用的分裂激活函数的替代方案。我们定义了与四元数激活函数相关的标准，随后基于这一分析提出了我们的新颖激活函数。与将已知的激活函数如ReLU或Tanh分别应用于四元数元素不同，这些激活函数考虑了四元数的属性并尊重四元数空间$\mathbb{H}$。特别地，所有四元数分量都被利用来计算所有输出分量，实现了例如四元数卷积中的哈密顿积对激活函数的好处。这些提出的激活函数可以应用于任意四元数值神经网络，使用梯度下降技术进行训练。我们进一步讨论了所提出的激活函数的导数，观察到激活函数对相位的影响具有有益的特性。具体地，它们被证明对基本上整个输入范围敏感，因此可以期望改进的梯度流。我们对我们提出的四元数激活函数进行了详尽的实验评估，包括在使用CIFAR-10和SVHN数据集进行的两项图像分类任务上与分裂ReLU和分裂Tanh的比较。在这里，尤其是影响相位的四元数激活函数始终表现出更好的性能。

更新时间: 2024-06-24 09:36:58

领域: cs.LG,cs.CV,cs.NE

下载: http://arxiv.org/abs/2406.16481v1

Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups using a Single Model across Cages

Behavioural experiments often happen in specialised arenas, but this may confound the analysis. To address this issue, we provide tools to study mice in the home-cage environment, equipping biologists with the possibility to capture the temporal aspect of the individual's behaviour and model the interaction and interdependence between cage-mates with minimal human intervention. Our main contribution is the novel Group Behaviour Model (GBM) which summarises the joint behaviour of groups of mice across cages, using a permutation matrix to match the mouse identities in each cage to the model. In support of the above, we also (a) developed the Activity Labelling Module (ALM) to automatically classify mouse behaviour from video, and (b) released two datasets, ABODe for training behaviour classifiers and IMADGE for modelling behaviour.

Updated: 2024-06-24 09:35:41

标题: 《鼠类与伴侣：使用单一模型跨笼自动分类和建模鼠类群体行为》

摘要: 行为实验通常发生在专门的场地，但这可能会使分析变得混乱。为了解决这个问题，我们提供了工具来研究小鼠在家庭笼环境中的行为，为生物学家提供了捕捉个体行为的时间方面，并模拟笼内同伴之间的互动和相互依赖关系的可能性，同时最大限度地减少人类干预。我们的主要贡献是新颖的群体行为模型（GBM），它总结了跨笼中小鼠群体的联合行为，使用排列矩阵将每个笼中的小鼠身份与模型匹配。为了支持上述内容，我们还（a）开发了活动标签模块（ALM）来自动分类视频中的小鼠行为，以及（b）发布了两个数据集，ABODe用于训练行为分类器，IMADGE用于建模行为。

更新时间: 2024-06-24 09:35:41

领域: cs.CV,cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.03066v2

Applications of Post-quantum Cryptography

With the constantly advancing capabilities of quantum computers, conventional cryptographic systems relying on complex math problems may encounter unforeseen vulnerabilities. Unlike regular computers, which are often deemed cost-ineffective in cryptographic attacks, quantum computers have a significant advantage in calculation speed. This distinction potentially makes currently used algorithms less secure or even completely vulnerable, compelling the exploration of post-quantum cryptography (PQC) as the most reasonable solution to quantum threats. This review aims to provide current information on applications, benefits, and challenges associated with the PQC. The review employs a systematic scoping review with the scope restricted to the years 2022 and 2023; only articles that were published in scientific journals were used in this paper. The review examined the articles on the applications of quantum computing in various spheres. However, the scope of this paper was restricted to the domain of the PQC because most of the analyzed articles featured this field. Subsequently, the paper is analyzing various PQC algorithms, including lattice-based, hash-based, code-based, multivariate polynomial, and isogeny-based cryptography. Each algorithm is being judged based on its potential applications, robustness, and challenges. All the analyzed algorithms are promising for the post-quantum era in such applications as digital signatures, communication channels, and IoT. Moreover, some of the algorithms are already implemented in the spheres of banking transactions, communication, and intellectual property. Meanwhile, despite their potential, these algorithms face serious challenges since they lack standardization, require vast amounts of storage and computation power, and might have unknown vulnerabilities that can be discovered only with years of cryptanalysis.

Updated: 2024-06-24 09:34:16

标题: 后量子密码学的应用

摘要: 随着量子计算机不断发展的能力，依赖复杂数学问题的传统加密系统可能会遇到未预见的漏洞。与常规计算机不同，常被认为在加密攻击中成本效益低的量子计算机在计算速度上具有显著优势。这种区别可能使目前使用的算法不够安全甚至完全容易受到攻击，迫使探索后量子密码学（PQC）作为应对量子威胁的最合理解决方案。本综述旨在提供有关PQC应用、优势和挑战的当前信息。本综述采用了一种有限制的系统范围综述，范围仅限于2022年和2023年；本文仅使用了发表在科学期刊上的文章。综述检查了有关量子计算在各个领域应用的文章。然而，本文的范围限定在PQC领域，因为大多数分析的文章都涉及到这个领域。随后，本文分析了各种PQC算法，包括基于格、基于哈希、基于码、多变量多项式和同态密码学。每种算法都根据其潜在应用、稳健性和挑战进行评判。所有分析的算法在后量子时代的数字签名、通信渠道和物联网等应用中都很有前景。此外，一些算法已经在银行交易、通信和知识产权领域得到实施。然而，尽管它们具有潜力，这些算法面临严重挑战，因为它们缺乏标准化，需要大量的存储和计算能力，并可能存在未知的漏洞，只有经过多年的密码分析才能发现。

更新时间: 2024-06-24 09:34:16

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2406.13258v2

Emerging NeoHebbian Dynamics in Forward-Forward Learning: Implications for Neuromorphic Computing

Advances in neural computation have predominantly relied on the gradient backpropagation algorithm (BP). However, the recent shift towards non-stationary data modeling has highlighted the limitations of this heuristic, exposing that its adaptation capabilities are far from those seen in biological brains. Unlike BP, where weight updates are computed through a reverse error propagation path, Hebbian learning dynamics provide synaptic updates using only information within the layer itself. This has spurred interest in biologically plausible learning algorithms, hypothesized to overcome BP's shortcomings. In this context, Hinton recently introduced the Forward-Forward Algorithm (FFA), which employs local learning rules for each layer and has empirically proven its efficacy in multiple data modeling tasks. In this work we argue that when employing a squared Euclidean norm as a goodness function driving the local learning, the resulting FFA is equivalent to a neo-Hebbian Learning Rule. To verify this result, we compare the training behavior of FFA in analog networks with its Hebbian adaptation in spiking neural networks. Our experiments demonstrate that both versions of FFA produce similar accuracy and latent distributions. The findings herein reported provide empirical evidence linking biological learning rules with currently used training algorithms, thus paving the way towards extrapolating the positive outcomes from FFA to Hebbian learning rules. Simultaneously, our results imply that analog networks trained under FFA could be directly applied to neuromorphic computing, leading to reduced energy usage and increased computational speed.

Updated: 2024-06-24 09:33:56

标题: 前向前向学习中新兴的NeoHebbian动态：对神经形态计算的影响

摘要: 神经计算的进展主要依赖于梯度反向传播算法（BP）。然而，最近转向非平稳数据建模凸显了这种启发式方法的局限性，暴露出其适应能力远远不及生物大脑所见。与BP不同的是，希望学习动态通过仅使用该层内部信息提供突触更新。这激起了对生物合理学习算法的兴趣，据推测，这些算法可以克服BP的缺点。在这种背景下，Hinton最近引入了前向算法（FFA），该算法为每个层使用本地学习规则，并在多个数据建模任务中经验性地证明了其有效性。在这项工作中，我们认为当使用平方欧几里得范数作为驱动本地学习的好度函数时，结果FFA等效于新-希伯规则。为了验证这一结果，我们比较了模拟网络中FFA的训练行为与尖峰神经网络中的希伯适应。我们的实验表明，FFA的两个版本均产生类似的准确性和潜在分布。本文中报道的研究结果提供了将生物学习规则与当前使用的训练算法联系起来的经验证据，从而为从FFA到希伯学习规则的积极结果推广铺平道路。同时，我们的结果暗示，在FFA训练下的模拟网络可以直接应用于神经形态计算，从而降低能源使用量并提高计算速度。

更新时间: 2024-06-24 09:33:56

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2406.16479v1

Cyber Protection Applications of Quantum Computing: A Review

Quantum computing is a cutting-edge field of information technology that harnesses the principles of quantum mechanics to perform computations. It has major implications for the cyber security industry. Existing cyber protection applications are working well, but there are still challenges and vulnerabilities in computer networks. Sometimes data and privacy are also compromised. These complications lead to research questions asking what kind of cyber protection applications of quantum computing are there and what potential methods or techniques can be used for cyber protection? These questions will reveal how much power quantum computing has and to what extent it can outperform the conventional computing systems. This scoping review was conducted by considering 815 papers. It showed the possibilities that can be achievedif quantum technologies are implemented in cyber environments. This scoping review discusses various domains such as algorithms and applications, bioinformatics, cloud and edge computing, the organization of complex systems, application areas focused on security and threats, and the broader quantum computing ecosystem. In each of these areas, there is significant scope for quantum computing to be implemented and to revolutionize the working environment. Numerous quantum computing applications for cyber protection and a number of techniques to protect our data and privacy were identified. The results are not limited to network security but also include data security. This paper also discusses societal aspects, e.g., the applications of quantum computing in the social sciences. This scoping review discusses how to enhance the efficiency and security of quantum computing in various cyber security domains. Additionally, it encourages the reader to think about what kind of techniques and methods can be deployed to secure the cyber world.

Updated: 2024-06-24 09:32:23

标题: 量子计算机在网络安全领域的应用：一项综述

摘要: 量子计算是一门利用量子力学原理进行计算的前沿信息技术领域。它对网络安全行业有重大影响。现有的网络保护应用程序运作良好，但计算机网络仍然存在挑战和漏洞。有时数据和隐私也会受到侵害。这些复杂情况引发了研究问题，即量子计算的网络保护应用是什么，以及可以用于网络保护的潜在方法或技术是什么？这些问题将揭示量子计算具有多大的能力，以及在多大程度上可以胜过传统的计算系统。本文综合考虑了815篇论文进行了范围性回顾。它展示了如果量子技术在网络环境中得到应用可能实现的可能性。这篇综述讨论了各种领域，如算法和应用、生物信息学、云和边缘计算、复杂系统的组织、以安全和威胁为重点的应用领域，以及更广泛的量子计算生态系统。在每个领域中，量子计算都有显著的应用空间，并且可以彻底改变工作环境。发现了大量用于网络保护的量子计算应用程序和一些保护数据和隐私的技术。结果不仅限于网络安全，还包括数据安全。本文还讨论了社会方面，例如量子计算在社会科学中的应用。这篇综述讨论了如何增强各种网络安全领域中量子计算的效率和安全性。此外，它鼓励读者思考可以部署哪种技术和方法来保护网络世界。

更新时间: 2024-06-24 09:32:23

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2406.13259v2

Machine Learning Applications of Quantum Computing: A Review

At the intersection of quantum computing and machine learning, this review paper explores the transformative impact these technologies are having on the capabilities of data processing and analysis, far surpassing the bounds of traditional computational methods. Drawing upon an in-depth analysis of 32 seminal papers, this review delves into the interplay between quantum computing and machine learning, focusing on transcending the limitations of classical computing in advanced data processing and applications. This review emphasizes the potential of quantum-enhanced methods in enhancing cybersecurity, a critical sector that stands to benefit significantly from these advancements. The literature review, primarily leveraging Science Direct as an academic database, delves into the transformative effects of quantum technologies on machine learning, drawing insights from a diverse collection of studies and scholarly articles. While the focus is primarily on the growing significance of quantum computing in cybersecurity, the review also acknowledges the promising implications for other sectors as the field matures. Our systematic approach categorizes sources based on quantum machine learning algorithms, applications, challenges, and potential future developments, uncovering that quantum computing is increasingly being implemented in practical machine learning scenarios. The review highlights advancements in quantum-enhanced machine learning algorithms and their potential applications in sectors such as cybersecurity, emphasizing the need for industry-specific solutions while considering ethical and security concerns. By presenting an overview of the current state and projecting future directions, the paper sets a foundation for ongoing research and strategic advancement in quantum machine learning.

Updated: 2024-06-24 09:30:24

标题: 量子计算的机器学习应用：综述

摘要: 在量子计算和机器学习的交叉点，这篇综述论文探讨了这些技术对数据处理和分析能力的变革性影响，远远超出了传统计算方法的范围。通过对32篇重要论文的深入分析，本综述深入探讨了量子计算和机器学习之间的相互作用，重点关注在高级数据处理和应用中超越经典计算的限制。本综述强调了量子增强方法在增强网络安全方面的潜力，这是一个重要领域，将极大受益于这些进步。文献综述主要利用Science Direct作为学术数据库，深入探讨了量子技术对机器学习的变革影响，并从各种研究和学术文章中汲取见解。虽然焦点主要集中在量子计算在网络安全中日益重要的意义上，但综述也承认随着该领域的发展成熟，对其他领域的有希望的影响。我们系统的方法根据量子机器学习算法、应用、挑战和潜在未来发展对来源进行分类，发现量子计算越来越多地被实施在实际机器学习场景中。综述突出了量子增强机器学习算法的进展及其在网络安全等领域的潜在应用，强调了对行业特定解决方案的需求，同时考虑了道德和安全问题。通过对当前状态的概述和未来方向的展望，本文为量子机器学习的持续研究和战略进展奠定了基础。

更新时间: 2024-06-24 09:30:24

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2406.13262v2

Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

Fine-tuning large language models (LLMs) can cause them to lose their general capabilities. However, the intrinsic mechanisms behind such forgetting remain unexplored. In this paper, we begin by examining this phenomenon by focusing on knowledge understanding and instruction following, with the latter identified as the main contributor to forgetting during fine-tuning. Consequently, we propose the Instruction Vector (IV) framework to capture model representations highly related to specific instruction-following capabilities, thereby making it possible to understand model-intrinsic forgetting. Through the analysis of IV dynamics pre and post-training, we suggest that fine-tuning mostly adds specialized reasoning patterns instead of erasing previous skills, which may appear as forgetting. Building on this insight, we develop IV-guided training, which aims to preserve original computation graph, thereby mitigating catastrophic forgetting. Empirical tests on three benchmarks confirm the efficacy of this new approach, supporting the relationship between IVs and forgetting. Our code will be made available soon.

Updated: 2024-06-24 09:29:28

标题: 通过指令向量解释可解释的大型语言模型微调的灾难性遗忘

摘要: 调整大型语言模型（LLMs）可能导致它们失去其一般能力。然而，这种遗忘背后的内在机制仍未被探索。在本文中，我们通过关注知识理解和指令遵循来开始检查这种现象，后者被确认为在调整过程中遗忘的主要贡献者。因此，我们提出指令向量（IV）框架来捕捉与特定指令跟随能力密切相关的模型表示，从而使我们能够理解模型内在的遗忘。通过分析训练前后的IV动态，我们建议，调整主要添加专门的推理模式，而不是擦除先前的技能，这可能会被视为遗忘。基于这一见解，我们开发了IV引导训练，旨在保留原始计算图，从而缓解灾难性遗忘。对三个基准测试的实证测试证实了这种新方法的有效性，支持IV与遗忘之间的关系。我们的代码将很快提供。

更新时间: 2024-06-24 09:29:28

领域: cs.AI

下载: http://arxiv.org/abs/2406.12227v2

ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction

Click-through rate (CTR) prediction has become increasingly indispensable for various Internet applications. Traditional CTR models convert the multi-field categorical data into ID features via one-hot encoding, and extract the collaborative signals among features. Such a paradigm suffers from the problem of semantic information loss. Another line of research explores the potential of pretrained language models (PLMs) for CTR prediction by converting input data into textual sentences through hard prompt templates. Although semantic signals are preserved, they generally fail to capture the collaborative information (e.g., feature interactions, pure ID features), not to mention the unacceptable inference overhead brought by the huge model size. In this paper, we aim to model both the semantic knowledge and collaborative knowledge for accurate CTR estimation, and meanwhile address the inference inefficiency issue. To benefit from both worlds and close their gaps, we propose a novel model-agnostic framework (i.e., ClickPrompt), where we incorporate CTR models to generate interaction-aware soft prompts for PLMs. We design a prompt-augmented masked language modeling (PA-MLM) pretraining task, where PLM has to recover the masked tokens based on the language context, as well as the soft prompts generated by CTR model. The collaborative and semantic knowledge from ID and textual features would be explicitly aligned and interacted via the prompt interface. Then, we can either tune the CTR model with PLM for superior performance, or solely tune the CTR model without PLM for inference efficiency. Experiments on four real-world datasets validate the effectiveness of ClickPrompt compared with existing baselines.

Updated: 2024-06-24 09:26:42

标题: 点击提示：点击率模型是适应语言模型到点击率预测的强大提示生成器

摘要: 点击率（CTR）预测在各种互联网应用中变得越来越不可或缺。传统的CTR模型通过独热编码将多字段分类数据转换为ID特征，并提取特征之间的协同信号。这种范式存在语义信息丢失的问题。另一方面的研究探索了预训练语言模型（PLMs）在CTR预测中的潜力，通过使用硬提示模板将输入数据转换为文本句子。虽然保留了语义信号，但它们通常无法捕捉协同信息（例如，特征交互、纯ID特征），更不用说由庞大模型大小带来的无法接受的推理开销。在本文中，我们旨在为准确的CTR估计建模语义知识和协作知识，并同时解决推理效率问题。为了从两个世界中受益并弥合它们之间的差距，我们提出了一个新颖的模型无关框架（即ClickPrompt），在其中我们将CTR模型整合到PLMs中生成具有交互意识的软提示。我们设计了一个增强提示的屏蔽语言建模（PA-MLM）预训练任务，PLM必须基于语言上下文和CTR模型生成的软提示来恢复屏蔽的标记。通过提示界面，从ID和文本特征中获得的协作和语义知识将被明确对齐和相互作用。然后，我们可以使用PLM调整CTR模型以获得更优异的性能，或者仅使用CTR模型进行调整以获得推理效率。在四个真实世界数据集上的实验证实了ClickPrompt相对于现有基线的有效性。

更新时间: 2024-06-24 09:26:42

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2310.09234v4

Deep-Learning Approach for Tissue Classification using Acoustic Waves during Ablation with an Er:YAG Laser (Updated)

Today's mechanical tools for bone cutting (osteotomy) cause mechanical trauma that prolongs the healing process. Medical device manufacturers aim to minimize this trauma, with minimally invasive surgery using laser cutting as one innovation. This method ablates tissue using laser light instead of mechanical tools, reducing post-surgery healing time. A reliable feedback system is crucial during laser surgery to prevent damage to surrounding tissues. We propose a tissue classification method analyzing acoustic waves generated during laser ablation, demonstrating its applicability in an ex-vivo experiment. The ablation process with a microsecond pulsed Er:YAG laser produces acoustic waves, acquired with an air-coupled transducer. These waves were used to classify five porcine tissue types: hard bone, soft bone, muscle, fat, and skin. For automated tissue classification, we compared five Neural Network (NN) approaches: a one-dimensional Convolutional Neural Network (CNN) with time-dependent input, a Fully-connected Neural Network (FcNN) with either the frequency spectrum or principal components of the frequency spectrum as input, and a combination of a CNN and an FcNN with time-dependent data and its frequency spectrum as input. Consecutive acoustic waves were used to improve classification accuracy. Grad-Cam identified the activation map of the frequencies, showing low frequencies as the most important for this task. Our results indicated that combining time-dependent data with its frequency spectrum achieved the highest classification accuracy (65.5%-75.5%). We also found that using the frequency spectrum alone was sufficient, with no additional benefit from applying Principal Components Analysis (PCA).

Updated: 2024-06-24 09:25:33

标题: 使用Er:YAG激光进行消融过程中利用声波进行组织分类的深度学习方法（更新版）

摘要: 今天的骨切割（骨切术）机械工具会引起机械性创伤，延长愈合过程。医疗器械制造商的目标是最小化这种创伤，采用激光切割作为一种创新的微创手术方法。这种方法利用激光光而不是机械工具切割组织，减少术后愈合时间。在激光手术过程中，一个可靠的反馈系统至关重要，以防止对周围组织造成损伤。我们提出了一种组织分类方法，分析在激光切割过程中产生的声波，展示了其在离体实验中的适用性。微秒脉冲Er:YAG激光的切割过程产生声波，通过空气耦合传感器获取。这些声波用于对五种猪组织类型进行分类：硬骨、软骨、肌肉、脂肪和皮肤。为了实现自动化组织分类，我们比较了五种神经网络（NN）方法：一个具有时间相关输入的一维卷积神经网络（CNN）、一个全连接神经网络（FcNN）以频谱或频谱主成分作为输入、以及一个结合了CNN和FcNN的方法，其输入为时间相关数据及其频谱。连续的声波被用来提高分类准确性。Grad-Cam确定了频率的激活图，显示低频率对于这一任务最为重要。我们的结果表明，将时间相关数据与其频谱结合可以实现最高的分类准确性（65.5%-75.5%）。我们还发现，仅使用频谱就足够了，不需要应用主成分分析（PCA）来获得额外的好处。

更新时间: 2024-06-24 09:25:33

领域: physics.med-ph,cs.AI,eess.IV,q-bio.TO

下载: http://arxiv.org/abs/2406.14570v2

Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that suffer from mislabeling due to annotation bias, engendering two principal types of uncertainty: the uncertainty regarding data usability and the uncertainty concerning label reliability. Addressing the two types of uncertainty, we have meticulously crafted a two-stage framework aiming at \textbf{S}eeking \textbf{C}ertain data \textbf{I}n extensive \textbf{U}ncertain data (SCIU). This initiative aims to purge the DFER datasets of these uncertainties, thereby ensuring that only clean, verified data is employed in training processes. To mitigate the issue of low-quality samples, we introduce the Coarse-Grained Pruning (CGP) stage, which assesses sample weights and prunes those deemed unusable due to their low weight. For samples with incorrect annotations, the Fine-Grained Correction (FGC) stage evaluates prediction stability to rectify mislabeled data. Moreover, SCIU is conceived as a universally compatible, plug-and-play framework, tailored to integrate seamlessly with prevailing DFER methodologies. Rigorous experiments across prevalent DFER datasets and against numerous benchmark methods substantiates SCIU's capacity to markedly elevate performance metrics.

Updated: 2024-06-24 09:25:02

标题: 在不确定性中寻找确定性：双阶段统一框架解决动态面部表情识别中的不确定性

摘要: 当代动态面部表情识别（DFER）技术的最新发展使得从视频内容中推导面部表情的情绪映射取得了显著进展，这一进展得益于在大量数据集上进行训练。然而，DFER数据集包含大量噪音数据。噪音来源于质量低下的捕捉，这些捕捉难以进行合理标记，以及由于注释偏差而导致的错误标记，产生了两种主要类型的不确定性：关于数据可用性的不确定性和关于标签可靠性的不确定性。为了解决这两种不确定性，我们精心设计了一个旨在在广泛的不确定数据中寻找确定数据（SCIU）的两阶段框架。这一举措旨在清除DFER数据集中的这些不确定性，从而确保只有干净、经过验证的数据被用于训练过程。为了减轻低质量样本的问题，我们引入了粗粒度修剪（CGP）阶段，该阶段评估样本权重并剪除那些由于低权重而被认为不可用的样本。对于带有错误注释的样本，精细纠正（FGC）阶段评估预测稳定性以纠正错误标记的数据。此外，SCIU被构想为一个普遍兼容、即插即用的框架，旨在与现有的DFER方法无缝集成。对流行的DFER数据集和多个基准方法进行的严格实验证实了SCIU显著提升性能指标的能力。

更新时间: 2024-06-24 09:25:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16473v1

The Hidden Pitfalls of the Cosine Similarity Loss

We show that the gradient of the cosine similarity between two points goes to zero in two under-explored settings: (1) if a point has large magnitude or (2) if the points are on opposite ends of the latent space. Counterintuitively, we prove that optimizing the cosine similarity between points forces them to grow in magnitude. Thus, (1) is unavoidable in practice. We then observe that these derivations are extremely general -- they hold across deep learning architectures and for many of the standard self-supervised learning (SSL) loss functions. This leads us to propose cut-initialization: a simple change to network initialization that helps all studied SSL methods converge faster.

Updated: 2024-06-24 09:16:59

标题: 余弦相似性损失的潜在陷阱

摘要: 我们展示了在两种未被深入探讨的情况下，两点之间的余弦相似度梯度趋近于零：（1）如果一个点具有较大的幅度，或者（2）如果这些点位于潜在空间的相反端。出乎意料的是，我们证明优化点之间的余弦相似度会导致它们增加幅度。因此，在实践中（1）是不可避免的。然后，我们观察到这些推导是非常普遍的——它们适用于深度学习架构和许多标准的自监督学习（SSL）损失函数。这促使我们提出切割初始化：一种简单的网络初始化改变，有助于所有研究的SSL方法更快地收敛。

更新时间: 2024-06-24 09:16:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.16468v1

SLOctolyzer: Fully automatic analysis toolkit for segmentation and feature extracting in scanning laser ophthalmoscopy images

Purpose: To describe SLOctolyzer: an open-source analysis toolkit for en face retinal vessels appearing in infrared reflectance scanning laser ophthalmoscopy (SLO) images. Methods: SLOctolyzer includes two main modules: segmentation and measurement. The segmentation module use deep learning methods to delineate retinal anatomy, while the measurement module quantifies key retinal vascular features such as vessel complexity, density, tortuosity, and calibre. We evaluate the segmentation module using unseen data and measure its reproducibility. Results: SLOctolyzer's segmentation module performed well against unseen internal test data (Dice for all-vessels, 0.9097; arteries, 0.8376; veins, 0.8525; optic disc, 0.9430; fovea, 0.8837). External validation against severe retinal pathology showed decreased performance (Dice for arteries, 0.7180; veins, 0.7470; optic disc, 0.9032). SLOctolyzer had good reproducibility (mean difference for fractal dimension, -0.0007; vessel density, -0.0003; vessel calibre, -0.3154 $\mu$m; tortuosity density, 0.0013). SLOctolyzer can process a macula-centred SLO image in under 20 seconds and a disc-centred SLO image in under 30 seconds using a standard laptop CPU. Conclusions: To our knowledge, SLOctolyzer is the first open-source tool to convert raw SLO images into reproducible and clinically meaningful retinal vascular parameters. SLO images are captured simultaneous to optical coherence tomography (OCT), and we believe our software will be useful for extracting retinal vascular measurements from large OCT image sets and linking them to ocular or systemic diseases. It requires no specialist knowledge or proprietary software, and allows manual correction of segmentations and re-computing of vascular metrics. SLOctolyzer is freely available at https://github.com/jaburke166/SLOctolyzer.

Updated: 2024-06-24 09:16:17

标题: SLOctolyzer：扫描激光眼底成像图像分割和特征提取的全自动分析工具包

摘要: 目的：描述SLOctolyzer：一种开源分析工具包，用于在红外反射扫描激光眼底镜（SLO）图像中显示的面对视网膜血管。方法：SLOctolyzer包括两个主要模块：分割和测量。分割模块使用深度学习方法勾画视网膜解剖结构，而测量模块量化关键的视网膜血管特征，如血管复杂性、密度、扭曲度和直径。我们使用未见数据评估分割模块，并测量其可重现性。结果：SLOctolyzer的分割模块在未见内部测试数据中表现良好（所有血管的Dice系数为0.9097；动脉为0.8376；静脉为0.8525；视盘为0.9430；黄斑为0.8837）。针对严重视网膜病理的外部验证显示性能下降（动脉的Dice系数为0.7180；静脉为0.7470；视盘为0.9032）。SLOctolyzer具有良好的可重现性（分形维度的平均差异为-0.0007；血管密度为-0.0003；血管直径为-0.3154微米；扭曲密度为0.0013）。SLOctolyzer可以在标准笔记本电脑CPU下在不到20秒内处理以黄斑为中心的SLO图像，在不到30秒内处理以视盘为中心的SLO图像。结论：据我们所知，SLOctolyzer是第一个将原始SLO图像转换为可重现且临床意义视网膜血管参数的开源工具。SLO图像与光学相干断层扫描（OCT）同时捕捉，我们相信我们的软件将有助于从大型OCT图像集中提取视网膜血管测量，并将其与眼部或全身疾病联系起来。它不需要专业知识或专有软件，并允许手动更正分割并重新计算血管指标。SLOctolyzer可在https://github.com/jaburke166/SLOctolyzer免费获取。

更新时间: 2024-06-24 09:16:17

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.16466v1

InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection

The prevalence of sarcasm in social media, conveyed through text-image combinations, presents significant challenges for sentiment analysis and intention mining. Current multi-modal sarcasm detection methods have been proven to struggle with biases from spurious cues, leading to a superficial understanding of the complex interactions between text and image. To address these issues, we propose InterCLIP-MEP, a robust framework for multi-modal sarcasm detection. InterCLIP-MEP introduces a refined variant of CLIP, Interactive CLIP (InterCLIP), as the backbone, enhancing sample representations by embedding cross-modality information in each encoder. Furthermore, a novel training strategy is designed to adapt InterCLIP for a Memory-Enhanced Predictor (MEP). MEP uses dynamic dual-channel memory to store valuable historical knowledge of test samples and then leverages this memory as a non-parametric classifier to derive the final prediction. By using InterCLIP to encode text-image interactions more effectively and incorporating MEP, InterCLIP-MEP offers a more robust recognition of multi-modal sarcasm. Experiments demonstrate that InterCLIP-MEP achieves state-of-the-art performance on the MMSD2.0 benchmark. Code and data are available at [https://github.com/CoderChen01/InterCLIP-MEP](https://github.com/CoderChen01/InterCLIP-MEP).

Updated: 2024-06-24 09:13:42

标题: InterCLIP-MEP: 互动式CLIP和增强记忆预测器用于多模态讽刺检测

摘要: 社交媒体中讽刺的普遍存在，通过文本-图像组合传达，给情感分析和意图挖掘带来了重大挑战。当前的多模态讽刺检测方法已被证明难以应对虚假线索的偏见，导致对文本和图像之间复杂交互的表面理解。为了解决这些问题，我们提出了InterCLIP-MEP，一个用于多模态讽刺检测的强大框架。InterCLIP-MEP引入了CLIP的精细变种，即交互式CLIP(InterCLIP)，作为骨干，通过在每个编码器中嵌入跨模态信息来增强样本表示。此外，设计了一种新颖的训练策略，以适应InterCLIP以用于Memory-Enhanced Predictor(MEP)。MEP使用动态双通道内存存储有价值的测试样本的历史知识，然后利用这个内存作为非参数分类器来推导最终预测。通过更有效地使用InterCLIP来编码文本-图像交互并结合MEP，InterCLIP-MEP提供了更强大的多模态讽刺识别。实验证明，InterCLIP-MEP在MMSD2.0基准测试上实现了最先进的性能。代码和数据可在[https://github.com/CoderChen01/InterCLIP-MEP](https://github.com/CoderChen01/InterCLIP-MEP)上找到。

更新时间: 2024-06-24 09:13:42

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.16464v1

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift. In this work, we first study the relationship between logits and generalization performance from the view of low-density separation assumption. Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the $L_p$ norm of the matrix of normalized logits as the estimation score. Our theoretical analysis highlights the connection between the provided score and the model's uncertainty. We conduct an extensive empirical study on common unsupervised accuracy estimation benchmarks and demonstrate that MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts.

Updated: 2024-06-24 09:12:08

标题: MANO：利用矩阵范数在分布转移下进行无监督准确性估计

摘要: 利用模型的输出，特别是logits，是一种常见的方法，用于估计预训练神经网络在分布外（OOD）样本上的测试准确度，而无需访问相应的地面真实标签。尽管这些方法易于实现且计算效率高，但目前基于logit的方法容易出现自信过度问题，导致预测偏差，特别是在自然转移情况下。在这项工作中，我们首先从低密度分离假设的角度研究logits与泛化性能之间的关系。我们的研究结果激发了我们提出的MaNo方法，该方法（1）对logits应用数据相关的归一化以减少预测偏差，（2）将归一化logits矩阵的$L_p$范数作为估计分数。我们的理论分析突显了所提供分数与模型不确定性之间的联系。我们在常见的无监督准确度估计基准上进行了广泛的实证研究，并证明了在合成、自然或亚种转移情况下，MaNo在各种架构中实现了最先进的性能。

更新时间: 2024-06-24 09:12:08

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.18979v2

Advancing Surgical VQA with Scene Graph Knowledge

Modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with language capabilities is emerging as a necessity. Our work aims to advance Visual Question Answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question-condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design. First, we propose a Surgical Scene Graph-based dataset, SSG-QA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating diverse QA pairs. Our SSG-QA dataset provides a more complex, diverse, geometrically grounded, unbiased, and surgical action-oriented dataset compared to existing surgical VQA datasets. We then propose SSG-QA-Net, a novel surgical VQA model incorporating a lightweight Scene-embedded Interaction Module (SIM), which integrates geometric scene knowledge in the VQA model design by employing cross-attention between the textual and the scene features. Our comprehensive analysis of the SSG-QA dataset shows that SSG-QA-Net outperforms existing methods across different question types and complexities. We highlight that the primary limitation in the current surgical VQA systems is the lack of scene knowledge to answer complex queries. We present a novel surgical VQA dataset and model and show that results can be significantly improved by incorporating geometric scene features in the VQA model design. The source code and the dataset will be made publicly available at: https://github.com/CAMMA-public/SSG-QA

Updated: 2024-06-24 09:07:33

标题: 用场景图知识推进外科视觉问答

摘要: 现代手术室变得越来越复杂，需要创新的术中支持系统。虽然外科数据科学的重点主要集中在视频分析上，但将外科计算机视觉与语言能力相结合已逐渐成为必要。我们的工作旨在通过场景图知识推进外科环境中的视觉问答（VQA），解决当前外科VQA系统中的两个主要挑战：消除外科VQA数据集中的问题条件偏见，并在外科VQA模型设计中融入场景感知推理。首先，我们提出了一个基于外科场景图的数据集SSG-QA，通过在公开可用数据集上应用分割和检测模型生成。我们利用器械和解剖学的空间和动作信息构建外科场景图。这些图被输入到一个问题引擎中，生成多样化的问答对。与现有外科VQA数据集相比，我们的SSG-QA数据集提供了一个更复杂、多样化、几何基础、无偏见和外科行动导向的数据集。然后，我们提出了SSG-QA-Net，一种新颖的外科VQA模型，集成了轻量级的场景嵌入交互模块（SIM），通过在文本和场景特征之间应用交叉注意力，在VQA模型设计中融入几何场景知识。我们对SSG-QA数据集的全面分析显示，SSG-QA-Net在不同问题类型和复杂度上优于现有方法。我们强调当前外科VQA系统的主要限制是缺乏场景知识以回答复杂查询。我们提出了一种新颖的外科VQA数据集和模型，并展示通过在VQA模型设计中融入几何场景特征可以显著改善结果。源代码和数据集将在以下网址公开提供：https://github.com/CAMMA-public/SSG-QA

更新时间: 2024-06-24 09:07:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.10251v3

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at https://github.com/Sweetaroo/VulDetectBench.

Updated: 2024-06-24 09:02:57

标题: VulDetectBench：利用大型语言模型评估漏洞检测的深度能力

摘要: 大型语言模型（LLMs）具有包含大量程序代码的训练语料库，极大地提高了模型对代码的理解和生成能力。然而，关于检测程序漏洞的全面研究，即与代码相关的更具体任务，以及评估LLMs在这种更专业场景中的性能仍然缺乏。为了解决漏洞分析中的常见挑战，我们的研究引入了一个新的基准，VulDetectBench，专门设计用于评估LLMs的漏洞检测能力。该基准通过五个难度逐渐增加的任务全面评估LLM的能力，以识别、分类和定位漏洞。我们评估了17个模型（包括开源和闭源），发现虽然现有模型在与漏洞识别和分类相关的任务上可以达到80%以上的准确率，但在特定、更详细的漏洞分析任务上仍存在不足，准确率不到30%，这使得很难为专业漏洞挖掘提供有价值的辅助信息。我们的基准有效评估了各种LLMs在漏洞检测特定任务中不同水平的能力，为未来研究和改进代码安全领域的关键领域奠定了基础。VulDetectBench可以在https://github.com/Sweetaroo/VulDetectBench上公开获取。

更新时间: 2024-06-24 09:02:57

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.07595v3

Noise-Robust Loss Functions: Enhancing Bounded Losses for Large-Scale Noisy Data Learning

Large annotated datasets inevitably contain noisy labels, which poses a major challenge for training deep neural networks as they easily memorize the labels. Noise-robust loss functions have emerged as a notable strategy to counteract this issue, but it remains challenging to create a robust loss function which is not susceptible to underfitting. Through a quantitative approach, this paper explores the limited overlap between the network output at initialization and regions of non-vanishing gradients of bounded loss functions in the initial learning phase. Using these insights, we address underfitting of the MAE loss with a novel method denoted as logit bias, which adds a real number $\epsilon$ to the logit at the position of the correct class. This method enables bounded losses to learn, even on datasets like WebVision, consisting of over a million images from 1000 classes. Extensive numerical experiments show that the logit bias enables MAE to compete with state-of-the-art noise robust loss functions. In addition, we demonstrate that our method can be used to determine optimal parameters for other loss functions -- without having to train networks. Remarkably, our method determines the hyperparameters based on the number of classes, resulting in loss functions which require zero dataset or noise-dependent parameters.

Updated: 2024-06-24 09:02:08

标题: 噪声鲁棒损失函数：增强有界损失以用于大规模嘈杂数据学习

摘要: 大型注释数据集不可避免地包含噪声标签，这对训练深度神经网络构成了重要挑战，因为它们很容易记住标签。噪声鲁棒的损失函数已经成为应对这一问题的一个值得注意的策略，但是创建一个不易欠拟合的鲁棒损失函数仍然具有挑战性。通过定量方法，本文探讨了网络输出在初始化阶段与有界损失函数的非零梯度区域之间的有限重叠。利用这些见解，我们通过一种称为logit偏差的新方法来解决MAE损失的欠拟合问题，该方法在正确类别的位置上将一个实数$\epsilon$添加到logit中。这种方法使有界损失能够学习，即使在像WebVision这样由1000个类别中的超过一百万张图片组成的数据集上也是如此。大量的数值实验表明，logit偏差使MAE能够与最先进的噪声鲁棒损失函数竞争。此外，我们证明了我们的方法可以用于确定其他损失函数的最佳参数，而无需训练网络。值得注意的是，我们的方法根据类别数量确定超参数，从而导致损失函数不需要任何数据集或噪声相关参数。

更新时间: 2024-06-24 09:02:08

领域: cs.LG,cond-mat.dis-nn,cs.AI

下载: http://arxiv.org/abs/2306.05497v2

Automated Privacy-Preserving Techniques via Meta-Learning

Sharing private data for learning tasks is pivotal for transparent and secure machine learning applications. Many privacy-preserving techniques have been proposed for this task aiming to transform the data while ensuring the privacy of individuals. Some of these techniques have been incorporated into tools, whereas others are accessed through various online platforms. However, such tools require manual configuration, which can be complex and time-consuming. Moreover, they require substantial expertise, potentially restricting their use to those with advanced technical knowledge. In this paper, we propose AUTOPRIV, the first automated privacy-preservation method, that eliminates the need for any manual configuration. AUTOPRIV employs meta-learning to automate the de-identification process, facilitating the secure release of data for machine learning tasks. The main goal is to anticipate the predictive performance and privacy risk of a large set of privacy configurations. We provide a ranked list of the most promising solutions, which are likely to achieve an optimal approximation within a new domain. AUTOPRIV is highly effective as it reduces computational complexity and energy consumption considerably.

Updated: 2024-06-24 08:53:45

标题: 通过元学习实现自动隐私保护技术

摘要: 分享私人数据用于学习任务对于透明和安全的机器学习应用至关重要。许多保护隐私的技术已经被提出，旨在在确保个人隐私的同时转换数据。其中一些技术已被整合到工具中，而其他技术可以通过各种在线平台访问。然而，这些工具需要手动配置，这可能会复杂且耗时。此外，它们需要相当的专业知识，可能会限制其使用范围仅限于具有高级技术知识的人员。在本文中，我们提出了AUTOPRIV，这是第一个自动化隐私保护方法，可以消除任何手动配置的需要。AUTOPRIV利用元学习来自动化去识别过程，促进了数据安全释放用于机器学习任务。其主要目标是预测一个庞大的隐私配置集合的预测性能和隐私风险。我们提供了一个最有前途的解决方案的排名列表，这些解决方案可能在新领域内实现最佳逼近。AUTOPRIV非常有效，因为它大大降低了计算复杂性和能耗。

更新时间: 2024-06-24 08:53:45

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.16456v1

Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models

Generative AI (GenAI) models have demonstrated remarkable capabilities in a wide variety of medical tasks. However, as these models are trained using generalist datasets with very limited human oversight, they can learn uses of medical products that have not been adequately evaluated for safety and efficacy, nor approved by regulatory agencies. Given the scale at which GenAI may reach users, unvetted recommendations pose a public health risk. In this work, we propose an approach to identify potentially harmful product recommendations, and demonstrate it using a recent multimodal large language model.

Updated: 2024-06-24 08:50:26

标题: 避免生成式人工智能模型中有害医疗产品推荐和离标推广的防护栏Rails

摘要: 生成式人工智能（GenAI）模型在各种医学任务中展示出卓越的能力。然而，由于这些模型是使用具有非常有限人类监督的通用数据集进行训练的，它们可能会学习到未经充分评估安全性和有效性，也未经监管机构批准的医疗产品的用途。鉴于GenAI可能触及用户的规模，未经审查的建议会对公共健康构成风险。在这项工作中，我们提出了一种方法来识别潜在有害的产品推荐，并使用最近的多模式大型语言模型进行了演示。

更新时间: 2024-06-24 08:50:26

领域: cs.AI

下载: http://arxiv.org/abs/2406.16455v1

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails. Indeed, LLMs should never generate content promoting or normalizing harmful, illegal, or unethical behavior that may contribute to harm to individuals or society. This principle applies to both normal and adversarial use. In response, we introduce ALERT, a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy. It is designed to evaluate the safety of LLMs through red teaming methodologies and consists of more than 45k instructions categorized using our novel taxonomy. By subjecting LLMs to adversarial testing scenarios, ALERT aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models. Furthermore, the fine-grained taxonomy enables researchers to perform an in-depth evaluation that also helps one to assess the alignment with various policies. In our experiments, we extensively evaluate 10 popular open- and closed-source LLMs and demonstrate that many of them still struggle to attain reasonable levels of safety.

Updated: 2024-06-24 08:50:22

标题: 警告：通过红队对大型语言模型的安全性进行全面基准评估

摘要: 在建立大型语言模型（LLMs）时，必须牢记安全并用防护栏保护它们至关重要。事实上，LLMs不应生成促进或规范有害、非法或不道德行为的内容，这可能对个人或社会造成伤害。这一原则适用于正常和对抗性使用。为此，我们引入了ALERT，一个基于新颖细粒度风险分类法评估安全性的大规模基准。它旨在通过红队方法评估LLMs的安全性，包含超过45,000条指令，使用我们的新颖分类法进行分类。通过将LLMs置于对抗测试场景中，ALERT旨在识别漏洞，通知改进，并增强语言模型的整体安全性。此外，细粒度分类法使研究人员能够进行深入评估，还有助于评估与各种政策的一致性。在我们的实验中，我们对10种流行的开源和闭源LLMs进行了广泛评估，并证明其中许多仍然难以达到合理的安全水平。

更新时间: 2024-06-24 08:50:22

领域: cs.CL,cs.CY,cs.LG,I.2

下载: http://arxiv.org/abs/2404.08676v3

Prompting with Divide-and-Conquer Program Makes Large Language Models Discerning to Hallucination and Deception

Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, simple instructional prompts suffer from inaccurate responses. Existing works show that more complicated prompting strategies, such as Chain-of-Thoughts and Least-to-Most, can unlock LLM's powerful capacity in diverse areas. Recent researches reveal that simple divide-and-conquer prompting strategy, i.e. simply dividing the input sequence to multiple sub-inputs, can also substantially improve LLM's performance in some specific tasks such as misinformation detection. In this paper, we aim at examining the utility of divide-and-conquer prompting strategy and answer on which kind of tasks this strategy gets advantages. Specifically, we provide a theoretic analysis to divide-and-conquer prompting strategy and help us identify the specific tasks where DaC prompting can bring performance boost with theoretic guarantee. We then present two cases (large integer arithmetic and fact verification) where experimental results aligns with our theoretic analysis.

Updated: 2024-06-24 08:49:29

标题: 使用分而治之程序提示使大型语言模型能够识别幻觉和欺骗

摘要: 基础模型，如大型语言模型（LLMs），由于其广泛的应用而引起了大量关注。然而，在处理涉及重复子任务和/或欺骗性内容的任务时，例如算术计算和文章级虚假新闻检测，简单的指导提示会导致不准确的响应。现有研究表明，更复杂的提示策略，如思维链和由少到多，可以释放LLM在各个领域的强大能力。最近的研究表明，简单的分而治之提示策略，即简单地将输入序列分成多个子输入，也可以大大提高LLM在某些特定任务（如虚假信息检测）中的表现。本文旨在检验分而治之提示策略的实用性，并回答这种策略在哪种任务中具有优势。具体而言，我们提供了对分而治之提示策略的理论分析，帮助我们确定可以在哪些特定任务中带来性能提升的DaC提示，并具有理论保证。然后，我们提出了两个案例（大整数算术和事实验证），实验结果与我们的理论分析一致。

更新时间: 2024-06-24 08:49:29

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.05359v5

Node-like as a Whole: Structure-aware Searching and Coarsening for Graph Classification

Graph Transformers (GTs) have made remarkable achievements in graph-level tasks. However, most existing works regard graph structures as a form of guidance or bias for enhancing node representations, which focuses on node-central perspectives and lacks explicit representations of edges and structures. One natural question is, can we treat graph structures node-like as a whole to learn high-level features? Through experimental analysis, we explore the feasibility of this assumption. Based on our findings, we propose a novel multi-view graph representation learning model via structure-aware searching and coarsening (GRLsc) on GT architecture for graph classification. Specifically, we build three unique views, original, coarsening, and conversion, to learn a thorough structural representation. We compress loops and cliques via hierarchical heuristic graph coarsening and restrict them with well-designed constraints, which builds the coarsening view to learn high-level interactions between structures. We also introduce line graphs for edge embeddings and switch to edge-central perspective to construct the conversion view. Experiments on eight real-world datasets demonstrate the improvements of GRLsc over 28 baselines from various architectures.

Updated: 2024-06-24 08:45:52

标题: 整体类似节点：结构感知搜索和粗化用于图分类

摘要: 图变换器（GTs）在图级任务中取得了显著的成就。然而，大多数现有的工作将图结构视为增强节点表示的一种形式的指导或偏见，这集中于节点中心的观点，缺乏对边缘和结构的显式表示。一个自然的问题是，我们是否可以将图结构像节点一样整体地处理以学习高级特征？通过实验分析，我们探讨了这种假设的可行性。基于我们的发现，我们提出了一种新颖的多视图图表示学习模型，通过结构感知搜索和粗化（GRLsc）在GT架构上进行图分类。具体地，我们构建了三个独特的视图，原始视图、粗化视图和转换视图，以学习全面的结构表示。我们通过层次启发式图粗化压缩环和团，通过设计良好的约束限制它们，构建粗化视图以学习结构之间的高级交互。我们还引入了线图用于边嵌入，并切换到边中心的视角来构建转换视图。在八个真实世界的数据集上的实验表明，GRLsc相对于各种架构的28个基线的改进。

更新时间: 2024-06-24 08:45:52

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2404.11869v2

Learning in Wilson-Cowan model for metapopulation

The Wilson-Cowan model for metapopulation, a Neural Mass Network Model, treats different subcortical regions of the brain as connected nodes, with connections representing various types of structural, functional, or effective neuronal connectivity between these regions. Each region comprises interacting populations of excitatory and inhibitory cells, consistent with the standard Wilson-Cowan model. By incorporating stable attractors into such a metapopulation model's dynamics, we transform it into a learning algorithm capable of achieving high image and text classification accuracy. We test it on MNIST and Fashion MNIST, in combination with convolutional neural networks, on CIFAR-10 and TF-FLOWERS, and, in combination with a transformer architecture (BERT), on IMDB, always showing high classification accuracy. These numerical evaluations illustrate that minimal modifications to the Wilson-Cowan model for metapopulation can reveal unique and previously unobserved dynamics.

Updated: 2024-06-24 08:45:03

标题: 在介绍元种群的Wilson-Cowan模型中的学习

摘要: The Wilson-Cowan模型用于介绍元种群，是一个神经质网络模型，将大脑的不同亚皮质区域视为连接节点，连接表示这些区域之间各种类型的结构、功能或有效神经连接。每个区域包括与标准Wilson-Cowan模型一致的兴奋和抑制细胞的相互作用人群。通过将稳定的吸引子纳入这种元种群模型的动态中，我们将其转变为一种学习算法，能够实现高图像和文本分类准确性。我们将其在MNIST和时尚MNIST上与卷积神经网络结合使用，在CIFAR-10和TF-FLOWERS上，与变压器架构（BERT）结合使用，总是表现出高分类准确性。这些数值评估说明，对于元种群的Wilson-Cowan模型进行最小修改可以展示出独特的、以前未观察到的动态。

更新时间: 2024-06-24 08:45:03

领域: q-bio.NC,cond-mat.dis-nn,cond-mat.stat-mech,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.16453v1

Theory on Mixture-of-Experts in Continual Learning

Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks. The Mixture-of-Experts (MoE) model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network to sparsify and distribute diverse tasks among multiple experts. However, there is a lack of theoretical analysis of MoE and its impact on the learning performance in CL. This paper provides the first theoretical results to characterize the impact of MoE in CL via the lens of overparameterized linear regression tasks. We establish the benefit of MoE over a single expert by proving that the MoE model can diversify its experts to specialize in different tasks, while its router learns to select the right expert for each task and balance the loads across all experts. Our study further suggests an intriguing fact that the MoE in CL needs to terminate the update of the gating network after sufficient training rounds to attain system convergence, which is not needed in the existing MoE studies that do not consider the continual task arrival. Furthermore, we provide explicit expressions for the expected forgetting and overall generalization error to characterize the benefit of MoE in the learning performance in CL. Interestingly, adding more experts requires additional rounds before convergence, which may not enhance the learning performance. Finally, we conduct experiments on both synthetic and real datasets to extend these insights from linear models to deep neural networks (DNNs), which also shed light on the practical algorithm design for MoE in CL.

Updated: 2024-06-24 08:29:58

标题: 持续学习中的混合专家理论

摘要: 继续学习（CL）因其能够适应随时间到来的新任务而引起了广泛关注。在CL中，灾难性遗忘（旧任务）被确定为一个主要问题，因为模型适应新任务。最近已经显示出混合专家（MoE）模型可以通过利用门控网络将多样化任务稀疏分布在多个专家之间有效地减轻CL中的灾难性遗忘。然而，对MoE及其对CL中学习性能的影响缺乏理论分析。本文通过过参数化线性回归任务的角度提供了首个理论结果，以表征MoE在CL中的影响。我们通过证明MoE模型能够使其专家多样化以专门从事不同任务，同时其路由器学习选择每个任务的正确专家并在所有专家之间平衡负载，以证明MoE相对于单个专家的好处。我们的研究进一步表明，CL中的MoE需要在经过足够的训练轮数后终止对门控网络的更新，以实现系统收敛，而这在不考虑持续任务到达的现有MoE研究中是不需要的。此外，我们提供了期望遗忘和总体泛化误差的明确表达式，以表征MoE在CL中学习性能的好处。有趣的是，增加更多专家需要额外的轮数才能收敛，这可能不会提高学习性能。最后，我们在合成和真实数据集上进行实验，将这些见解从线性模型扩展到深度神经网络（DNNs），这也为MoE在CL中的实际算法设计提供了启示。

更新时间: 2024-06-24 08:29:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16437v1

Finding (and exploiting) vulnerabilities on IP Cameras: the Tenda CP3 case study

Consumer IP cameras are now the most widely adopted solution for remote monitoring in various contexts, such as private homes or small offices. While the security of these devices has been scrutinized, most approaches are limited to relatively shallow network-based analyses. In this paper, we discuss a methodology for the security analysis and identification of remotely exploitable vulnerabilities in IP cameras, which includes static and dynamic analyses of executables extracted from IP camera firmware. Compared to existing methodologies, our approach leverages the context of the target device to focus on the identification of malicious invocation sequences that could lead to exploitable vulnerabilities. We demonstrate the application of our methodology by using the Tenda CP3 IP camera as a case study. We identified five novel CVEs, with CVSS scores ranging from 7.5 to 9.8. To partially automate our analysis, we also developed a custom tool based on Ghidra and rhabdomancer.

Updated: 2024-06-24 08:29:05

标题: 发现（和利用）IP摄像头的漏洞：Tenda CP3案例研究

摘要: 消费者IP摄像头现在是远程监控在各种情境中被广泛采用的解决方案，如私人住宅或小型办公室。虽然这些设备的安全性受到了审查，但大多数方法都局限于相对较浅的基于网络的分析。在本文中，我们讨论了一种用于安全分析和识别IP摄像头中可远程利用漏洞的方法论，其中包括从IP摄像头固件中提取的可执行文件的静态和动态分析。与现有方法论相比，我们的方法利用目标设备的上下文来集中于识别可能导致可利用漏洞的恶意调用序列。我们通过使用Tenda CP3 IP摄像头作为案例研究来展示我们方法的应用。我们确定了五个新的CVE，CVSS评分范围从7.5到9.8不等。为了部分自动化我们的分析，我们还开发了一个基于Ghidra和rhabdomancer的定制工具。

更新时间: 2024-06-24 08:29:05

领域: cs.CR

下载: http://arxiv.org/abs/2406.15103v2

TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs

Benchmarking is the de-facto standard for evaluating LLMs, due to its speed, replicability and low cost. However, recent work has pointed out that the majority of the open source benchmarks available today have been contaminated or leaked into LLMs, meaning that LLMs have access to test data during pretraining and/or fine-tuning. This raises serious concerns about the validity of benchmarking studies conducted so far and the future of evaluation using benchmarks. To solve this problem, we propose Private Benchmarking, a solution where test datasets are kept private and models are evaluated without revealing the test data to the model. We describe various scenarios (depending on the trust placed on model owners or dataset owners), and present solutions to avoid data contamination using private benchmarking. For scenarios where the model weights need to be kept private, we describe solutions from confidential computing and cryptography that can aid in private benchmarking. We build an end-to-end system, TRUCE, that enables such private benchmarking showing that the overheads introduced to protect models and benchmark are negligible (in the case of confidential computing) and tractable (when cryptographic security is required). Finally, we also discuss solutions to the problem of benchmark dataset auditing, to ensure that private benchmarks are of sufficiently high quality.

Updated: 2024-06-24 08:28:18

标题: 《停火：私人基准测试以防止污染和改善LLM的比较评估》

摘要: 基准测试是评估LLMs的事实标准，因为其速度，可复制性和低成本。然而，最近的研究指出，当今大多数开源基准测试已被污染或泄漏到LLMs中，这意味着LLMs在预训练和/或微调过程中可以访问测试数据。这引发了对迄今为止进行的基准研究的有效性以及未来使用基准进行评估的严重关切。为了解决这个问题，我们提出了私人基准测试，这是一种解决方案，其中测试数据集保持私有，模型在评估时不会向模型透露测试数据。我们描述了各种情景（取决于对模型所有者或数据集所有者的信任），并提出了避免数据污染的私人基准测试解决方案。对于需要保持模型权重私有的情况，我们描述了来自机密计算和密码学的解决方案，可以帮助进行私人基准测试。我们构建了一个端到端系统TRUCE，可以实现这种私人基准测试，表明为了保护模型和基准所引入的开销是可以忽略不计的（在机密计算的情况下），并且是可控的（当需要密码安全时）。最后，我们还讨论了基准数据集审计的解决方案，以确保私人基准测试具有足够高的质量。

更新时间: 2024-06-24 08:28:18

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2403.00393v2

Towards Bayesian Data Selection

A wide range of machine learning algorithms iteratively add data to the training sample. Examples include semi-supervised learning, active learning, multi-armed bandits, and Bayesian optimization. We embed this kind of data addition into decision theory by framing data selection as a decision problem. This paves the way for finding Bayes-optimal selections of data. For the illustrative case of self-training in semi-supervised learning, we derive the respective Bayes criterion. We further show that deploying this criterion mitigates the issue of confirmation bias by empirically assessing our method for generalized linear models, semi-parametric generalized additive models, and Bayesian neural networks on simulated and real-world data.

Updated: 2024-06-24 08:27:13

标题: 走向贝叶斯数据选择

摘要: 一系列机器学习算法迭代地将数据添加到训练样本中。示例包括半监督学习、主动学习、多臂老虎机和贝叶斯优化。我们将这种数据添加方式嵌入到决策理论中，将数据选择视为一个决策问题。这为寻找贝叶斯最优数据选择铺平了道路。以半监督学习中的自我训练为例，我们推导出了相应的贝叶斯准则。我们进一步展示，通过在模拟和实际数据上实证评估我们的方法，部署这一准则可以减轻确认偏见的问题，包括广义线性模型、半参数广义加性模型和贝叶斯神经网络。

更新时间: 2024-06-24 08:27:13

领域: stat.ML,cs.AI,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.12560v2

On-Device Soft Sensors: Real-Time Fluid Flow Estimation from Level Sensor Data

Soft sensors are crucial in bridging autonomous systems' physical and digital realms, enhancing sensor fusion and perception. Instead of deploying soft sensors on the Cloud, this study shift towards employing on-device soft sensors, promising heightened efficiency and bolstering data security. Our approach substantially improves energy efficiency by deploying Artificial Intelligence (AI) directly on devices within a wireless sensor network. Furthermore, the synergistic integration of the Microcontroller Unit and Field-Programmable Gate Array (FPGA) leverages the rapid AI inference capabilities of the latter. Empirical evidence from our real-world use case demonstrates that FPGA-based soft sensors achieve inference times ranging remarkably from 1.04 to 12.04 microseconds. These compelling results highlight the considerable potential of our innovative approach for executing real-time inference tasks efficiently, thereby presenting a feasible alternative that effectively addresses the latency challenges intrinsic to Cloud-based deployments.

Updated: 2024-06-24 08:25:50

标题: 设备上的软传感器：基于液位传感器数据的实时流体流动估计

摘要: 软传感器在连接自主系统的物理和数字领域，增强传感器融合和感知方面起着至关重要的作用。本研究不是将软传感器部署在云端，而是转向采用设备上的软传感器，承诺提高效率并增强数据安全性。我们的方法通过在无线传感器网络内直接部署人工智能（AI），显著提高了能源效率。此外，微控制器单元和现场可编程门阵列（FPGA）的协同集成利用了后者的快速AI推理能力。我们在现实世界用例中的实证证据表明，基于FPGA的软传感器的推理时间范围显著从1.04到12.04微秒不等。这些引人注目的结果突显了我们创新方法的巨大潜力，能够有效地执行实时推理任务，从而提供了一个有效应对云端部署固有延迟挑战的可行替代方案。

更新时间: 2024-06-24 08:25:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.15036v2

Surgical Triplet Recognition via Diffusion Model

Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms. The goal is to identify the combinations of instruments, verbs, and targets presented in surgical video frames. In this paper, we propose DiffTriplet, a new generative framework for surgical triplet recognition employing the diffusion model, which predicts surgical triplets via iterative denoising. To handle the challenge of triplet association, two unique designs are proposed in our diffusion framework, i.e., association learning and association guidance. During training, we optimize the model in the joint space of triplets and individual components to capture the dependencies among them. At inference, we integrate association constraints into each update of the iterative denoising process, which refines the triplet prediction using the information of individual components. Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition. Our codes will be released.

Updated: 2024-06-24 08:22:40

标题: 手术三胞胎识别的扩散模型

摘要: 手术三元组识别是实现下一代具有上下文感知的手术室的重要基础。其目标是识别手术视频帧中呈现的仪器、动词和目标的组合。在本文中，我们提出了DiffTriplet，这是一个利用扩散模型进行手术三元组识别的新的生成框架，通过迭代去噪来预测手术三元组。为了处理三元组关联的挑战，在我们的扩散框架中提出了两个独特的设计，即关联学习和关联指导。在训练过程中，我们优化模型在三元组和个体组件的联合空间中，以捕捉它们之间的依赖关系。在推断中，我们将关联约束集成到迭代去噪过程的每次更新中，利用个体组件的信息来完善三元组预测。对CholecT45和CholecT50数据集的实验表明，所提出的方法在实现手术三元组识别的新最先进性能方面具有优越性。我们的代码将会发布。

更新时间: 2024-06-24 08:22:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13210v2

Dynamic Pseudo Label Optimization in Point-Supervised Nuclei Segmentation

Deep learning has achieved impressive results in nuclei segmentation, but the massive requirement for pixel-wise labels remains a significant challenge. To alleviate the annotation burden, existing methods generate pseudo masks for model training using point labels. However, the generated masks are inevitably different from the ground truth, and these dissimilarities are not handled reasonably during the network training, resulting in the subpar performance of the segmentation model. To tackle this issue, we propose a framework named DoNuSeg, enabling \textbf{D}ynamic pseudo label \textbf{O}ptimization in point-supervised \textbf{Nu}clei \textbf{Seg}mentation. Specifically, DoNuSeg takes advantage of class activation maps (CAMs) to adaptively capture regions with semantics similar to annotated points. To leverage semantic diversity in the hierarchical feature levels, we design a dynamic selection module to choose the optimal one among CAMs from different encoder blocks as pseudo masks. Meanwhile, a CAM-guided contrastive module is proposed to further enhance the accuracy of pseudo masks. In addition to exploiting the semantic information provided by CAMs, we consider location priors inherent to point labels, developing a task-decoupled structure for effectively differentiating nuclei. Extensive experiments demonstrate that DoNuSeg outperforms state-of-the-art point-supervised methods. The code is available at https://github.com/shinning0821/MICCAI24-DoNuSeg.

Updated: 2024-06-24 08:20:53

标题: 动态伪标签优化在基于点监督的细胞核分割中的应用

摘要: 深度学习在细胞核分割方面取得了令人印象深刻的成果，但对像素级标签的巨大需求仍然是一个重要挑战。为了减轻注释负担，现有方法使用点标签生成伪掩模进行模型训练。然而，生成的掩模与地面实况不可避免地有所不同，而这些差异在网络训练过程中并未合理处理，导致分割模型性能不佳。为了解决这个问题，我们提出了一个名为DoNuSeg的框架，实现点监督核分割中的动态伪标签优化。具体来说，DoNuSeg利用类激活图（CAMs）来自适应地捕获与注释点相似语义的区域。为了利用特征层次结构中的语义多样性，我们设计了一个动态选择模块，从不同的编码器块中选择最佳的CAM作为伪掩模。同时，提出了一个CAM引导的对比模块，进一步提高伪掩模的准确性。除了利用CAM提供的语义信息外，我们考虑到点标签固有的位置先验，开发了一种有效区分细胞核的任务解耦结构。大量实验证明，DoNuSeg优于最先进的点监督方法。代码可在https://github.com/shinning0821/MICCAI24-DoNuSeg获取。

更新时间: 2024-06-24 08:20:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16427v1

Fault Detection for agents on power grid topology optimization: A Comprehensive analysis

The topology optimization of transmission networks using Deep Reinforcement Learning (DRL) has increasingly come into focus. Various researchers have proposed different DRL agents, which are often benchmarked on the Grid2Op environment from the Learning to Run a Power Network (L2RPN) challenges. The environments have many advantages with their realistic chronics and underlying power flow backends. However, the interpretation of agent survival or failure is not always clear, as there are a variety of potential causes. In this work, we focus on the failures of the power grid to identify patterns and detect them a priori. We collect the failed chronics of three different agents on the WCCI 2022 L2RPN environment, totaling about 40k data points. By clustering, we are able to detect five distinct clusters, identifying different failure types. Further, we propose a multi-class prediction approach to detect failures beforehand and evaluate five different models. Here, the Light Gradient-Boosting Machine (LightGBM) shows the best performance, with an accuracy of 86%. It also correctly identifies in 91% of the time failure and survival observations. Finally, we provide a detailed feature importance analysis that identifies critical features and regions in the grid.

Updated: 2024-06-24 08:20:43

标题: 电网拓扑优化中的代理故障检测：综合分析

摘要: 使用深度强化学习（DRL）对传输网络进行拓扑优化越来越受到关注。各种研究人员提出了不同的DRL代理，通常在Learning to Run a Power Network（L2RPN）挑战的Grid2Op环境中进行基准测试。这些环境具有许多优势，具有逼真的时间序列和基础功率流后端。然而，对代理的生存或失败的解释并不总是清晰的，因为存在各种潜在原因。在这项工作中，我们专注于电网的故障，以识别模式并预先检测到它们。我们在WCCI 2022 L2RPN环境中收集了三种不同代理的故障时间序列，总计约40k数据点。通过聚类，我们能够检测出五个不同的聚类，识别不同的故障类型。此外，我们提出了一种多类预测方法，以事先检测故障并评估五种不同的模型。在这里，轻量级梯度提升机（LightGBM）表现最佳，准确率达86％。它还在91％的时间内正确识别了故障和生存观察结果。最后，我们提供了详细的特征重要性分析，识别了电网中的关键特征和区域。

更新时间: 2024-06-24 08:20:43

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.16426v1

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks. However, it requires non-trivial efforts to implement these methods on different models. We present LlamaFactory, a unified framework that integrates a suite of cutting-edge efficient training methods. It provides a solution for flexibly customizing the fine-tuning of 100+ LLMs without the need for coding through the built-in web UI LlamaBoard. We empirically validate the efficiency and effectiveness of our framework on language modeling and text generation tasks. It has been released at https://github.com/hiyouga/LLaMA-Factory and received over 24,000 stars and 3,000 forks.

Updated: 2024-06-24 08:20:04

标题: LlamaFactory：统一高效微调100多种语言模型

摘要: 高效的微调对于将大型语言模型（LLMs）调整到下游任务中至关重要。然而，要在不同模型上实施这些方法需要付出不少努力。我们提出了LlamaFactory，这是一个统一框架，集成了一套尖端的高效训练方法。它提供了一种解决方案，可以灵活定制100多个LLMs的微调，无需通过内置的web UI LlamaBoard编码。我们在语言建模和文本生成任务上经验性地验证了我们框架的效率和有效性。该框架已发布在https://github.com/hiyouga/LLaMA-Factory，并获得了超过24,000颗星和3,000个分支。

更新时间: 2024-06-24 08:20:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.13372v3

Investigating the impact of 2D gesture representation on co-speech gesture generation

Co-speech gestures play a crucial role in the interactions between humans and embodied conversational agents (ECA). Recent deep learning methods enable the generation of realistic, natural co-speech gestures synchronized with speech, but such approaches require large amounts of training data. "In-the-wild" datasets, which compile videos from sources such as YouTube through human pose detection models, offer a solution by providing 2D skeleton sequences that are paired with speech. Concurrently, innovative lifting models have emerged, capable of transforming these 2D pose sequences into their 3D counterparts, leading to large and diverse datasets of 3D gestures. However, the derived 3D pose estimation is essentially a pseudo-ground truth, with the actual ground truth being the 2D motion data. This distinction raises questions about the impact of gesture representation dimensionality on the quality of generated motions, a topic that, to our knowledge, remains largely unexplored. In this work, we evaluate the impact of the dimensionality of the training data, 2D or 3D joint coordinates, on the performance of a multimodal speech-to-gesture deep generative model. We use a lifting model to convert 2D-generated sequences of body pose to 3D. Then, we compare the sequence of gestures generated directly in 3D to the gestures generated in 2D and lifted to 3D as post-processing.

Updated: 2024-06-24 08:19:00

标题: 研究二维手势表征对共语手势生成的影响

摘要: 言语手势在人类与具身谈话代理（ECA）之间的互动中起着至关重要的作用。最近的深度学习方法使得能够生成与言语同步的逼真、自然的言语手势，但这些方法需要大量的训练数据。通过人体姿势检测模型从YouTube等来源编译视频的“野外”数据集提供了一个解决方案，提供了与言语配对的2D骨架序列。同时，出现了创新的提升模型，能够将这些2D姿势序列转换为它们的3D对应物，从而产生了大规模且多样化的3D手势数据集。然而，衍生的3D姿势估计实质上是一种伪基准，实际基准是2D运动数据。这种区别引发了有关手势表示维度对生成动作质量的影响的问题，据我们所知，这一主题在很大程度上尚未被探讨。在这项工作中，我们评估训练数据维度（2D或3D关节坐标）对多模式言语到手势深度生成模型性能的影响。我们使用一个提升模型将2D生成的身体姿势序列转换为3D。然后，我们将直接在3D生成的手势序列与在2D生成并提升到3D的手势进行比较。

更新时间: 2024-06-24 08:19:00

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.15111v2

SWAP-NAS: Sample-Wise Activation Patterns for Ultra-fast NAS

Training-free metrics (a.k.a. zero-cost proxies) are widely used to avoid resource-intensive neural network training, especially in Neural Architecture Search (NAS). Recent studies show that existing training-free metrics have several limitations, such as limited correlation and poor generalisation across different search spaces and tasks. Hence, we propose Sample-Wise Activation Patterns and its derivative, SWAP-Score, a novel high-performance training-free metric. It measures the expressivity of networks over a batch of input samples. The SWAP-Score is strongly correlated with ground-truth performance across various search spaces and tasks, outperforming 15 existing training-free metrics on NAS-Bench-101/201/301 and TransNAS-Bench-101. The SWAP-Score can be further enhanced by regularisation, which leads to even higher correlations in cell-based search space and enables model size control during the search. For example, Spearman's rank correlation coefficient between regularised SWAP-Score and CIFAR-100 validation accuracies on NAS-Bench-201 networks is 0.90, significantly higher than 0.80 from the second-best metric, NWOT. When integrated with an evolutionary algorithm for NAS, our SWAP-NAS achieves competitive performance on CIFAR-10 and ImageNet in approximately 6 minutes and 9 minutes of GPU time respectively.

Updated: 2024-06-24 08:18:29

标题: SWAP-NAS：用于超快速NAS的样本级激活模式

摘要: 免训练度量（即零成本代理）被广泛用于避免资源密集型的神经网络训练，尤其是在神经结构搜索（NAS）中。最近的研究表明，现有的免训练度量存在一些局限性，如与不同搜索空间和任务的相关性有限，泛化能力差等。因此，我们提出了样本级激活模式及其衍生物SWAP-Score，一种新颖的高性能免训练度量。它衡量了网络对一批输入样本的表达能力。SWAP-Score与不同搜索空间和任务中的实际表现强相关，优于NAS-Bench-101/201/301和TransNAS-Bench-101上的15种现有免训练度量。SWAP-Score可以通过正则化进一步增强，从而在基于单元的搜索空间中实现更高的相关性，并在搜索过程中实现模型大小控制。例如，在NAS-Bench-201网络上，正则化的SWAP-Score与CIFAR-100验证准确度的Spearman等级相关系数为0.90，显著高于第二好的度量NWOT的0.80。将SWAP-NAS与NAS的进化算法集成后，在大约6分钟和9分钟的GPU时间内，在CIFAR-10和ImageNet上实现了竞争性的表现。

更新时间: 2024-06-24 08:18:29

领域: cs.LG,cs.CV,cs.NE

下载: http://arxiv.org/abs/2403.04161v5

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

Combinatorial Optimization is crucial to numerous real-world applications, yet still presents challenges due to its (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete within industrial solvers. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. The current best methods either rely on a collection of pre-trained policies, or on data-inefficient fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an RL approach that leverages memory to improve the adaptation of neural solvers at inference time. MEMENTO enables updating the action distribution dynamically based on the outcome of previous decisions. We validate its effectiveness on benchmark problems, in particular Traveling Salesman and Capacitated Vehicle Routing, demonstrating it can successfully be combined with standard methods to boost their performance under a given budget, both in and out-of-distribution, improving their performance on all 12 evaluated tasks.

Updated: 2024-06-24 08:18:19

标题: 记忆增强型神经求解器用于组合优化中的高效自适应Translation: Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

摘要: 组合优化在众多现实世界应用中至关重要，但由于其(NP-)难度，仍然存在挑战。在现有方法中，启发式方法往往在质量和可扩展性之间提供最佳折衷，使它们适用于工业应用。虽然强化学习（RL）为设计启发式方法提供了灵活的框架，但在工业求解器中，其应用仍然不完整。现有的学习方法仍然缺乏适应特定实例并充分利用可用计算预算的能力。当前最佳方法要么依赖于一组预训练策略，要么依赖于数据低效的微调；因此未能充分利用预算约束下新获得的信息。为此，我们提出了MEMENTO，一种利用记忆改善神经求解器在推理时间适应性的RL方法。MEMENTO能够根据先前决策的结果动态更新动作分布。我们在基准问题上验证了其有效性，特别是在旅行推销员和容量车辆路径规划中，表明它可以成功地与标准方法结合，提高它们在给定预算下的性能，无论是在分布内还是分布外，在所有12个评估任务中都提高了性能。

更新时间: 2024-06-24 08:18:19

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16424v1

Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting

Cross-Domain Few-Shot Learning has witnessed great stride with the development of meta-learning. However, most existing methods pay more attention to learning domain-adaptive inductive bias (meta-knowledge) through feature-wise manipulation or task diversity improvement while neglecting the phenomenon that deep networks tend to rely more on high-frequency cues to make the classification decision, which thus degenerates the robustness of learned inductive bias since high-frequency information is vulnerable and easy to be disturbed by noisy information. Hence in this paper, we make one of the first attempts to propose a Frequency-Aware Prompting method with mutual attention for Cross-Domain Few-Shot classification, which can let networks simulate the human visual perception of selecting different frequency cues when facing new recognition tasks. Specifically, a frequency-aware prompting mechanism is first proposed, in which high-frequency components of the decomposed source image are switched either with normal distribution sampling or zeroing to get frequency-aware augment samples. Then, a mutual attention module is designed to learn generalizable inductive bias under CD-FSL settings. More importantly, the proposed method is a plug-and-play module that can be directly applied to most off-the-shelf CD-FLS methods. Experimental results on CD-FSL benchmarks demonstrate the effectiveness of our proposed method as well as robustly improve the performance of existing CD-FLS methods. Resources at https://github.com/tinkez/FAP_CDFSC.

Updated: 2024-06-24 08:14:09

标题: 通过频率感知提示探索跨领域少样本分类

摘要: 跨领域少样本学习在元学习的发展中取得了巨大进展。然而，大多数现有方法更关注通过特征操作或任务多样性改进学习领域自适应归纳偏差（元知识），而忽视了深度网络倾向于依赖高频提示来做出分类决策的现象，这样会降低学习到的归纳偏差的稳健性，因为高频信息容易受到干扰。因此，在本文中，我们首次尝试提出一种具有相互关注的频率感知提示方法，用于跨领域少样本分类，可以让网络在面对新的识别任务时模拟人类视觉感知选择不同频率提示的过程。具体来说，首先提出了一个频率感知提示机制，在该机制中，将分解源图像的高频分量与正态分布采样或零化进行切换，以获得频率感知增强样本。然后，设计了一个相互关注模块，在CD-FSL设置下学习可泛化的归纳偏差。更重要的是，所提出的方法是一个即插即用的模块，可以直接应用于大多数现成的CD-FSL方法。在CD-FSL基准测试中的实验结果显示了我们提出的方法的有效性，以及对现有CD-FSL方法性能的稳健改进。资源位于https://github.com/tinkez/FAP_CDFSC。

更新时间: 2024-06-24 08:14:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16422v1

This actually looks like that: Proto-BagNets for local and global interpretability-by-design

Interpretability is a key requirement for the use of machine learning models in high-stakes applications, including medical diagnosis. Explaining black-box models mostly relies on post-hoc methods that do not faithfully reflect the model's behavior. As a remedy, prototype-based networks have been proposed, but their interpretability is limited as they have been shown to provide coarse, unreliable, and imprecise explanations. In this work, we introduce Proto-BagNets, an interpretable-by-design prototype-based model that combines the advantages of bag-of-local feature models and prototype learning to provide meaningful, coherent, and relevant prototypical parts needed for accurate and interpretable image classification tasks. We evaluated the Proto-BagNet for drusen detection on publicly available retinal OCT data. The Proto-BagNet performed comparably to the state-of-the-art interpretable and non-interpretable models while providing faithful, accurate, and clinically meaningful local and global explanations. The code is available at https://github.com/kdjoumessi/Proto-BagNets.

Updated: 2024-06-24 08:13:07

标题: 这实际上看起来像这样：用于局部和全局可解释性的原始BagNets设计

摘要: 可解释性是在高风险应用中使用机器学习模型的关键要求，包括医学诊断。解释黑匣子模型主要依赖于事后方法，这些方法并不能忠实地反映模型的行为。为此，人们提出了基于原型的网络，但它们的可解释性有限，因为已经证明它们提供的解释是粗糙的、不可靠的和不精确的。在这项工作中，我们介绍了Proto-BagNets，这是一种通过设计可解释的基于原型的模型，结合了局部特征模型和原型学习的优势，为准确和可解释的图像分类任务提供了有意义、连贯和相关的原型部分。我们在公开可用的视网膜OCT数据上评估了Proto-BagNet用于视网膜中Drusen的检测。Proto-BagNet在提供忠实、准确和临床意义的局部和全局解释的同时，性能与最先进的可解释和不可解释模型相当。代码可在https://github.com/kdjoumessi/Proto-BagNets 上找到。

更新时间: 2024-06-24 08:13:07

领域: cs.AI

下载: http://arxiv.org/abs/2406.15168v2

Quantum Deep Reinforcement Learning for Robot Navigation Tasks

We utilize hybrid quantum deep reinforcement learning to learn navigation tasks for a simple, wheeled robot in simulated environments of increasing complexity. For this, we train parameterized quantum circuits (PQCs) with two different encoding strategies in a hybrid quantum-classical setup as well as a classical neural network baseline with the double deep Q network (DDQN) reinforcement learning algorithm. Quantum deep reinforcement learning (QDRL) has previously been studied in several relatively simple benchmark environments, mainly from the OpenAI gym suite. However, scaling behavior and applicability of QDRL to more demanding tasks closer to real-world problems e. g., from the robotics domain, have not been studied previously. Here, we show that quantum circuits in hybrid quantum-classic reinforcement learning setups are capable of learning optimal policies in multiple robotic navigation scenarios with notably fewer trainable parameters compared to a classical baseline. Across a large number of experimental configurations, we find that the employed quantum circuits outperform the classical neural network baselines when equating for the number of trainable parameters. Yet, the classical neural network consistently showed better results concerning training times and stability, with at least one order of magnitude of trainable parameters more than the best-performing quantum circuits. However, validating the robustness of the learning methods in a large and dynamic environment, we find that the classical baseline produces more stable and better performing policies overall.

Updated: 2024-06-24 08:08:33

标题: 量子深度强化学习用于机器人导航任务

摘要: 我们利用混合量子深度强化学习来学习增加复杂性的模拟环境中简单轮式机器人的导航任务。为此，我们在混合量子-经典设置中使用两种不同的编码策略训练参数化量子电路（PQCs），以及一个使用双深度Q网络（DDQN）强化学习算法的经典神经网络基线。量子深度强化学习（QDRL）先前已在几个相对简单的基准环境中进行研究，主要来自OpenAI gym套件。然而，QDRL的扩展行为和适用性到更接近真实世界问题的更具挑战性任务，例如来自机器人领域的任务，以前尚未进行研究。在这里，我们展示了在混合量子-经典强化学习设置中的量子电路能够学习多个机器人导航场景中的最佳策略，相比于经典基线具有明显更少的可训练参数。在大量实验配置中，我们发现所使用的量子电路在等同可训练参数的情况下优于经典神经网络基线。然而，经典神经网络在训练时间和稳定性方面一贯显示出更好的结果，其可训练参数至少比表现最佳的量子电路多一个数量级。然而，在验证学习方法在大型和动态环境中的稳健性时，我们发现经典基线总体上产生更稳定和更好的性能策略。

更新时间: 2024-06-24 08:08:33

领域: cs.RO,cs.LG,quant-ph

下载: http://arxiv.org/abs/2202.12180v3

EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery Classification

Brain-computer interfaces (BCIs) harness electroencephalographic signals for direct neural control of devices, offering a significant benefit for individuals with motor impairments. Traditional machine learning methods for EEG-based motor imagery (MI) classification encounter challenges such as manual feature extraction and susceptibility to noise.This paper introduces EEGEncoder, a deep learning framework that employs modified transformers and TCNs to surmount these limitations. We innovatively propose a fusion architecture, namely Dual-Stream Temporal-Spatial Block (DSTS), to capture temporal and spatial features, improving the accuracy of Motor Imagery classification task. Additionally, we use multiple parallel structures to enhance the performance of the model. When tested on the BCI Competition IV-2a dataset, our model results outperform current state-of-the-art techniques.

Updated: 2024-06-24 08:02:17

标题: EEG编码器：用基于Transformer的运动想象分类推动BCI的发展

摘要: 脑-计算机接口（BCIs）利用脑电图信号进行直接神经控制设备，为运动障碍的个体提供重要益处。传统的基于脑电图的运动想象（MI）分类的机器学习方法遇到手动特征提取和噪音敏感性等挑战。本文介绍了EEGEncoder，这是一个利用修改后的变压器和TCNs的深度学习框架，以克服这些限制。我们创新地提出了融合架构，即双流时空块（DSTS），以捕获时空特征，提高运动想象分类任务的准确性。此外，我们使用多个并行结构来增强模型的性能。在BCI竞赛IV-2a数据集上测试时，我们的模型结果优于当前最先进的技术。

更新时间: 2024-06-24 08:02:17

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2404.14869v2

PenSLR: Persian end-to-end Sign Language Recognition Using Ensembling

Sign Language Recognition (SLR) is a fast-growing field that aims to fill the communication gaps between the hearing-impaired and people without hearing loss. Existing solutions for Persian Sign Language (PSL) are limited to word-level interpretations, underscoring the need for more advanced and comprehensive solutions. Moreover, previous work on other languages mainly focuses on manipulating the neural network architectures or hardware configurations instead of benefiting from the aggregated results of multiple models. In this paper, we introduce PenSLR, a glove-based sign language system consisting of an Inertial Measurement Unit (IMU) and five flexible sensors powered by a deep learning framework capable of predicting variable-length sequences. We achieve this in an end-to-end manner by leveraging the Connectionist Temporal Classification (CTC) loss function, eliminating the need for segmentation of input signals. To further enhance its capabilities, we propose a novel ensembling technique by leveraging a multiple sequence alignment algorithm known as Star Alignment. Furthermore, we introduce a new PSL dataset, including 16 PSL signs with more than 3000 time-series samples in total. We utilize this dataset to evaluate the performance of our system based on four word-level and sentence-level metrics. Our evaluations show that PenSLR achieves a remarkable word accuracy of 94.58% and 96.70% in subject-independent and subject-dependent setups, respectively. These achievements are attributable to our ensembling algorithm, which not only boosts the word-level performance by 0.51% and 1.32% in the respective scenarios but also yields significant enhancements of 1.46% and 4.00%, respectively, in sentence-level accuracy.

Updated: 2024-06-24 07:59:34

标题: PenSLR：使用集成技术的波斯手语端到端识别

摘要: 手语识别（SLR）是一个快速发展的领域，旨在填补聋人与无听力损失人群之间的沟通差距。现有的波斯手语（PSL）解决方案仅限于单词级别的解释，突显了对更高级和全面解决方案的需求。此外，先前针对其他语言的工作主要集中在调整神经网络架构或硬件配置，而不是从多个模型的聚合结果中获益。在本文中，我们介绍了PenSLR，一个基于手套的手语系统，由一个惯性测量单元（IMU）和五个柔性传感器组成，由一个能够预测可变长度序列的深度学习框架驱动。我们通过利用连接主义时间分类（CTC）损失函数，以端到端的方式实现这一目标，消除了对输入信号的分割需求。为了进一步增强其功能，我们提出了一种新的整合技术，利用了一种称为星对齐的多序列对齐算法。此外，我们引入了一个新的PSL数据集，包括16个PSL手势，总共超过3000个时间序列样本。我们利用这个数据集根据四个单词级别和句子级别的指标评估我们系统的性能。我们的评估结果显示，PenSLR在主观独立和主观相关的设置中分别实现了94.58%和96.70%的令人瞩目的单词准确率。这些成就归因于我们的整合算法，不仅在各自的情况下将单词级别性能提升了0.51%和1.32%，而且在句子级准确率方面分别获得了1.46%和4.00%的显著提升。

更新时间: 2024-06-24 07:59:34

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.16388v1

Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

Websites are critical in today's digital world, with over 1.11 billion currently active and approximately 252,000 new sites launched daily. Converting website layout design into functional UI code is a time-consuming yet indispensable step of website development. Manual methods of converting visual designs into functional code present significant challenges, especially for non-experts. To explore automatic design-to-code solutions, we first conduct a motivating study on GPT-4o and identify three types of issues in generating UI code: element omission, element distortion, and element misarrangement. We further reveal that a focus on smaller visual segments can help multimodal large language models (MLLMs) mitigate these failures in the generation process. In this paper, we propose DCGen, a divide-and-conquer-based approach to automate the translation of webpage design to UI code. DCGen starts by dividing screenshots into manageable segments, generating descriptions for each segment, and then reassembling them into complete UI code for the entire screenshot. We conduct extensive testing with a dataset comprised of real-world websites and various MLLMs and demonstrate that DCGen achieves up to a 14% improvement in visual similarity over competing methods. To the best of our knowledge, DCGen is the first segment-aware prompt-based approach for generating UI code directly from screenshots.

Updated: 2024-06-24 07:58:36

标题: 自屏幕截图自动生成UI代码：一种基于分而治的方法

摘要: 在当今数字世界中，网站至关重要，目前有超过11亿个活跃网站，每天大约有252,000个新网站上线。将网站布局设计转化为功能性UI代码是网站开发中耗时但不可或缺的步骤。将视觉设计手动转化为功能代码存在显著挑战，尤其对于非专家而言。为了探索自动设计到代码解决方案，我们首先对GPT-4o进行了激励研究，并确定了生成UI代码时的三种问题类型：元素遗漏、元素失真和元素错位。我们进一步揭示，关注较小的视觉片段可以帮助多模态大型语言模型（MLLMs）在生成过程中减轻这些故障。在本文中，我们提出了DCGen，一种基于分而治之的方法，用于自动将网页设计翻译成UI代码。DCGen首先将屏幕截图分成可管理的片段，为每个片段生成描述，然后将它们重新组装成完整的UI代码，以显示整个屏幕截图。我们对由真实网站和各种MLLMs组成的数据集进行了广泛测试，并展示DCGen在视觉相似性方面比竞争方法提高了达14%。据我们所知，DCGen是第一个从屏幕截图直接生成UI代码的分段感知提示式方法。

更新时间: 2024-06-24 07:58:36

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.16386v1

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, by casting multi-step reasoning of LLMs as a heuristic search problem, we aim to alleviate the pathology by introducing Q*, a general, versatile and agile framework for guiding LLMs decoding process with deliberative planning. By learning a plug-and-play Q-value model as heuristic function for estimating expected future rewards, our Q* can effectively guide LLMs to select the most promising next reasoning step without fine-tuning LLMs for the current task, which avoids the significant computational overhead and potential risk of performance degeneration on other tasks. Extensive experiments on GSM8K, MATH and MBPP demonstrate the superiority of our method, contributing to improving the reasoning performance of existing open-source LLMs.

Updated: 2024-06-24 07:50:56

标题: Q*：通过深思熟虑的规划改进LLMs的多步推理

摘要: 大型语言模型（LLMs）在许多自然语言任务中展示了令人印象深刻的能力。然而，自回归生成过程使LLMs在执行多步推理时容易产生错误、幻觉和不一致的陈述。在本文中，通过将LLMs的多步推理视为启发式搜索问题，我们旨在通过引入Q*来缓解这种病理现象，这是一个通用、多功能和灵活的框架，用于引导LLMs解码过程并进行深思熟虑的规划。通过学习一个即插即用的Q值模型作为启发式函数来估计未来预期奖励，我们的Q*可以有效地引导LLMs选择最有前景的下一个推理步骤，而无需为当前任务微调LLMs，避免了大量的计算开销和在其他任务上性能退化的潜在风险。对GSM8K、MATH和MBPP的大量实验证明了我们方法的优越性，有助于提高现有开源LLMs的推理性能。

更新时间: 2024-06-24 07:50:56

领域: cs.AI

下载: http://arxiv.org/abs/2406.14283v2

Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach

Amidst the rapid evolution of LLMs, the significance of evaluation in comprehending and propelling these models forward is increasingly paramount. Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. However, the extent and nature of these impacts continue to be subjects of debate because most assessments have been restricted to a limited number of models and data points. Clarifying the effects of these factors on performance scores can be more effectively achieved through a statistical lens. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. With the advent of a uniform evaluation framework, our research leverages an expansive dataset of evaluation results, introducing a comprehensive statistical methodology. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique, offering a robust and transparent approach to deciphering LLM performance data. Contrary to prevailing findings, our results challenge assumptions about emergent abilities and the influence of given training types and architectures in LLMs. These findings furnish new perspectives on the characteristics, intrinsic nature, and developmental trajectories of LLMs. By providing straightforward and reliable methods to scrutinize and reassess LLM performance data, this study contributes a nuanced perspective on LLM efficiency and potentials.

Updated: 2024-06-24 07:49:25

标题: 在LLM中对大规模评估结果的全面重新评估：多面统计方法

摘要: 在LLM快速发展的背景下，评估在理解和推动这些模型向前发展方面的重要性日益凸显。评估揭示了诸如扩展、培训类型、架构以及其他因素深刻影响LLM的表现。然而，这些影响的程度和性质仍然存在争议，因为大多数评估仅限于有限数量的模型和数据点。通过统计学的视角更有效地阐明这些因素对性能得分的影响是可以实现的。我们的研究对这些LLM进行了彻底的重新审视，针对当前评估方法的不足之处。随着统一评估框架的出现，我们的研究利用了一个庞大的评估结果数据集，引入了全面的统计方法论。这包括ANOVA、Tukey HSD测试、GAMM和聚类技术的应用，提供了一种强大且透明的方法来解读LLM性能数据。与当前研究结果相反，我们的结果挑战了关于新兴能力以及给定培训类型和架构在LLM中的影响的假设。这些发现为LLM的特征、内在性质和发展轨迹提供了新的视角。通过提供简单可靠的方法来审查和重新评估LLM性能数据，本研究为LLM的效率和潜力提供了细致入微的视角。

更新时间: 2024-06-24 07:49:25

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.15250v2

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation directions, each of which facilitates a variety of applications. Our work offers a holistic view that unifies numerous existing studies and suggests potential research directions. We envision our work as a useful roadmap for future research on LLMs.

Updated: 2024-06-24 07:42:32

标题: 关于奖励模型、参数更新和上下文提示之间的转换

摘要: 尽管预训练大型语言模型（LLMs）具有一般能力，但它们仍需要进一步适应以更好地服务于实际应用。在本文中，我们展示了三种流行且不同的适应工具：参数更新、奖励建模和上下文提示的互换性。这种互换性建立了一个三角形框架，具有六个转换方向，每个方向都促进了各种应用。我们的工作提供了一个统一众多现有研究的整体视角，并提出了潜在的研究方向。我们设想我们的工作将成为未来LLMs研究的有用路线图。

更新时间: 2024-06-24 07:42:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16377v1

Tempora-Fusion: Time-Lock Puzzle with Efficient Verifiable Homomorphic Linear Combination

To securely transmit sensitive information into the future, Time-Lock Puzzles (TLPs) have been developed. Their applications include scheduled payments, timed commitments, e-voting, and sealed-bid auctions. Homomorphic TLP is a key variant of TLP that enables computation on puzzles from different clients. This allows a solver/server to tackle only a single puzzle encoding the computation's result. However, existing homomorphic TLPs lack support for verifying the correctness of the computation results. We address this limitation by introducing Tempora-Fusion, a TLP that allows a server to perform homomorphic linear combinations of puzzles from different clients while ensuring verification of computation correctness. This scheme avoids asymmetric-key cryptography for verification, thus paving the way for efficient implementations. We discuss our scheme's application in various domains, such as federated learning, scheduled payments in online banking, and e-voting.

Updated: 2024-06-24 07:39:38

标题: Tempora-Fusion：具有高效可验证同态线性组合的时间锁拼图

摘要: 为了安全地将敏感信息传输到未来，已经开发了时间锁谜题（TLPs）。它们的应用包括定期支付、定时承诺、电子投票和密封竞标拍卖。同态TLP是TLP的一个关键变体，它使不同客户端的谜题上的计算成为可能。这使得解谜者/服务器只需解决一个编码计算结果的谜题。然而，现有的同态TLP缺乏验证计算结果正确性的支持。我们通过引入Tempora-Fusion来解决这一限制，这是一种TLP，允许服务器对来自不同客户端的谜题进行同态线性组合，并确保验证计算正确性。该方案避免了用于验证的非对称密钥密码学，从而为高效实现铺平了道路。我们讨论了我们方案在各个领域的应用，如联邦学习、在线银行定期支付和电子投票。

更新时间: 2024-06-24 07:39:38

领域: cs.CR,cs.CE,cs.LG

下载: http://arxiv.org/abs/2406.15070v2

CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs. Larger LM responses that closely align with the smaller LMs' output, which relies exclusively on cited documents, are verified. Responses showing discrepancies are iteratively refined through a feedback loop. Experiments on three open-domain question-answering datasets demonstrate significant performance gains of 1.5% to 7% absolute average without any required model fine-tuning.

Updated: 2024-06-24 07:39:26

标题: CaLM：对比大型和小型语言模型以验证基于实例的生成

摘要: Grounded generation旨在赋予语言模型（LMs）通过准确引用可验证来源来产生更可信和可追溯的回应的能力。然而，现有方法，无论是通过提供原始材料还是经过预处理的材料来喂养LMs，仍然容易出现错误。为了解决这个问题，我们引入了CaLM，一种新颖的验证框架。CaLM利用了这样一个观点，即一个强大的基于事实的回应应该与仅从其引用的来源中派生的信息一致。我们的框架赋予了更小的LMs更多的权力，这些LMs依赖更少的参数化记忆，并擅长于在给定查询的情况下处理相关信息，以验证更大的LMs的输出。与仅依赖于引用文件的较小LMs输出密切一致的较大LM回应被验证。显示出差异的回应通过反馈循环进行迭代精炼。在三个开放领域的问答数据集上的实验表明，在不需要任何模型微调的情况下，平均绝对性能提升了1.5%至7%。

更新时间: 2024-06-24 07:39:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05365v2

Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction

Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still suffer from ignoring topology-free patterns, which cannot be captured by GNNs. To tackle this challenge, we propose a generic model for enabling the current GNN-based methods to preserve topology-free patterns. Specifically, we first develop a Dual Cross-Scale Transformer (DCST) architecture, including a Spatial Transformer and a Temporal Transformer, to preserve the cross-scale topology-free patterns and associated dynamics, respectively. Then, to further integrate both topology-regularized/-free patterns, we propose a distillation-style learning framework, in which the existing GNN-based methods are considered as the teacher model, and the proposed DCST architecture is considered as the student model. The teacher model would inject the learned topology-regularized patterns into the student model for integrating topology-free patterns. The extensive experimental results demonstrated the effectiveness of our methods.

Updated: 2024-06-24 07:32:58

标题: 让图神经网络再次伟大：一种用于交通速度预测的无拓扑模式的通用集成范式

摘要: 城市交通速度预测旨在估计未来的交通速度，以改善城市交通服务。为了利用图神经网络（GNNs）来建模交通速度演化模式的空间相关性和时间依赖性，已经付出了巨大的努力，并受到图拓扑的规范化。尽管取得了令人满意的结果，但目前的交通速度预测方法仍然存在忽视无拓扑模式的问题，这些模式无法被GNNs捕捉到。为了解决这一挑战，我们提出了一个通用模型，使当前基于GNN的方法能够保留无拓扑模式。具体地，我们首先开发了一个双交叉尺度变换器（DCST）架构，包括一个空间变换器和一个时间变换器，分别用于保留跨尺度的无拓扑模式和相关动态。然后，为了进一步整合拓扑规范化/无拓扑模式，我们提出了一个蒸馏式学习框架，其中现有的基于GNN的方法被视为教师模型，而提出的DCST架构被视为学生模型。教师模型将学习到的拓扑规范化模式注入到学生模型中，以整合无拓扑模式。大量的实验结果证明了我们方法的有效性。

更新时间: 2024-06-24 07:32:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16992v1

The Uli Dataset: An Exercise in Experience Led Annotation of oGBV

Online gender based violence has grown concomitantly with adoption of the internet and social media. Its effects are worse in the Global majority where many users use social media in languages other than English. The scale and volume of conversations on the internet has necessitated the need for automated detection of hate speech, and more specifically gendered abuse. There is, however, a lack of language specific and contextual data to build such automated tools. In this paper we present a dataset on gendered abuse in three languages- Hindi, Tamil and Indian English. The dataset comprises of tweets annotated along three questions pertaining to the experience of gender abuse, by experts who identify as women or a member of the LGBTQIA community in South Asia. Through this dataset we demonstrate a participatory approach to creating datasets that drive AI systems.

Updated: 2024-06-24 07:31:19

标题: 《Uli数据集：基于经验的oGBV标注练习》

摘要: 在线性别暴力随着互联网和社交媒体的普及而不断增长。其影响在全球大多数地区更为严重，许多用户使用非英语语言的社交媒体。互联网上对话的规模和数量使得自动检测仇恨言论和特定性别滥用的需求变得迫切。然而，缺乏语言特定和上下文数据来构建这样的自动化工具。在本文中，我们提供了一个关于三种语言（印地语、泰米尔语和印度英语）中的性别滥用的数据集。该数据集包括专家对推特进行标注，涉及有关性别滥用经历的三个问题，这些专家自认为是南亚的女性或LGBTQIA社区的成员。通过这个数据集，我们展示了一种参与式方法来创建推动人工智能系统的数据集。

更新时间: 2024-06-24 07:31:19

领域: cs.CL,cs.AI,cs.SI

下载: http://arxiv.org/abs/2311.09086v3

Machine Learning with Real-time and Small Footprint Anomaly Detection System for In-Vehicle Gateway

Anomaly Detection System (ADS) is an essential part of a modern gateway Electronic Control Unit (ECU) to detect abnormal behaviors and attacks in vehicles. Among the existing attacks, ``one-time`` attack is the most challenging to be detected, together with the strict gateway ECU constraints of both microsecond or even nanosecond level real-time budget and limited footprint of code. To address the challenges, we propose to use the self-information theory to generate values for training and testing models, aiming to achieve real-time detection performance for the ``one-time`` attack that has not been well studied in the past. Second, the generation of self-information is based on logarithm calculation, which leads to the smallest footprint to reduce the cost in Gateway. Finally, our proposed method uses an unsupervised model without the need of training data for anomalies or attacks. We have compared different machine learning methods ranging from typical machine learning models to deep learning models, e.g., Hidden Markov Model (HMM), Support Vector Data Description (SVDD), and Long Short Term Memory (LSTM). Experimental results show that our proposed method achieves 8.7 times lower False Positive Rate (FPR), 1.77 times faster testing time, and 4.88 times smaller footprint.

Updated: 2024-06-24 07:23:52

标题: 用于车载网关的实时小占用空间异常检测系统的机器学习

摘要: 异常检测系统（ADS）是现代网关电子控制单元（ECU）的重要组成部分，用于检测车辆中的异常行为和攻击。在现有的攻击中，“一次性”攻击是最具挑战性的，同时还要考虑网关ECU的严格约束，包括微秒甚至纳秒级实时预算和代码的有限占用空间。为了解决这些挑战，我们提出使用自信息理论生成用于训练和测试模型的值，旨在实现“一次性”攻击的实时检测性能，这在过去并未得到充分研究。其次，自信息的生成基于对数运算，这导致最小的占用空间以减少网关成本。最后，我们提出的方法使用无监督模型，无需训练数据来检测异常或攻击。我们比较了不同的机器学习方法，从典型的机器学习模型到深度学习模型，例如隐马尔可夫模型（HMM）、支持向量数据描述（SVDD）和长短期记忆（LSTM）。实验结果表明，我们提出的方法实现了8.7倍更低的误报率（FPR）、1.77倍更快的测试时间和4.88倍更小的占用空间。

更新时间: 2024-06-24 07:23:52

领域: cs.CR

下载: http://arxiv.org/abs/2406.16369v1

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

Reinforcement learning agents tend to develop habits that are effective only under specific policies. Following an initial exploration phase where agents try out different actions, they eventually converge onto a particular policy. As this occurs, the distribution over state-action trajectories becomes narrower, leading agents to repeatedly experience the same transitions. This repetitive exposure fosters spurious correlations between certain observations and rewards. Agents may then pick up on these correlations and develop simplistic habits tailored to the specific set of trajectories dictated by their policy. The problem is that these habits may yield incorrect outcomes when agents are forced to deviate from their typical trajectories, prompted by changes in the environment. This paper presents a mathematical characterization of this phenomenon, termed policy confounding, and illustrates, through a series of examples, the circumstances under which it occurs.

Updated: 2024-06-24 07:06:44

标题: 不良习惯：政策混淆和RL中的轨迹外泛化

摘要: 强化学习代理往往会形成只在特定策略下有效的习惯。在初始探索阶段中，代理尝试不同的行动，最终会收敛到特定的策略上。随着这一过程的发生，状态-行动轨迹的分布会变得更窄，导致代理重复经历相同的转换。这种重复的暴露会促进某些观测和奖励之间的虚假相关性。代理可能会注意到这些相关性，并形成针对特定轨迹集的简单习惯，受其策略所主导。问题在于，当代理被迫偏离典型轨迹时，由于环境变化所促使，这些习惯可能会产生错误的结果。本文对这种现象进行了数学描述，称为策略混淆，并通过一系列示例说明了其发生的情况。

更新时间: 2024-06-24 07:06:44

领域: cs.LG

下载: http://arxiv.org/abs/2306.02419v2

VCR: Visual Caption Restoration

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.

Updated: 2024-06-24 07:05:01

标题: VCR：视觉字幕修复

摘要: 我们介绍了Visual Caption Restoration (VCR)，这是一项新颖的视觉-语言任务，挑战模型使用图像内的像素级提示准确恢复部分遮挡的文本。这项任务源于这样一个观察结果：嵌入在图像中的文本与常见的视觉元素和自然语言本质上不同，因为需要将视觉、文本和嵌入在图像中的文本的模态进行对齐。虽然许多作品已经将嵌入在图像中的文本整合到视觉问答任务中，但这些任务的方法通常依赖于光学字符识别或蒙版语言建模，从而将任务主要减少为基于文本的处理。然而，在VCR中，基于文本的处理变得无效，因为准确的文本恢复取决于提供的图像、上下文和被遮挡文本的微小暴露区域的微妙线索的结合信息。我们开发了一个管道，使用图像-标题对为VCR任务生成合成图像，通过调整标题的可见性来控制任务的难度。通过这个管道，我们构建了一个名为VCR-Wiki的VCR数据集，使用来自维基百科的带有标题的图像，包括211万英文实体和34.6万中文实体，有易和难两种变体。我们的结果显示，当前的视觉语言模型在VCR任务中明显落后于人类表现，仅仅在我们的数据集上微调模型并没有明显改进。我们发布了VCR-Wiki和数据构建代码，以促进未来的研究。

更新时间: 2024-06-24 07:05:01

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06462v2

Relation Extraction with Fine-Tuned Large Language Models in Retrieval Augmented Generation Frameworks

Information Extraction (IE) is crucial for converting unstructured data into structured formats like Knowledge Graphs (KGs). A key task within IE is Relation Extraction (RE), which identifies relationships between entities in text. Various RE methods exist, including supervised, unsupervised, weakly supervised, and rule-based approaches. Recent studies leveraging pre-trained language models (PLMs) have shown significant success in this area. In the current era dominated by Large Language Models (LLMs), fine-tuning these models can overcome limitations associated with zero-shot LLM prompting-based RE methods, especially regarding domain adaptation challenges and identifying implicit relations between entities in sentences. These implicit relations, which cannot be easily extracted from a sentence's dependency tree, require logical inference for accurate identification. This work explores the performance of fine-tuned LLMs and their integration into the Retrieval Augmented-based (RAG) RE approach to address the challenges of identifying implicit relations at the sentence level, particularly when LLMs act as generators within the RAG framework. Empirical evaluations on the TACRED, TACRED-Revisited (TACREV), Re-TACRED, and SemEVAL datasets show significant performance improvements with fine-tuned LLMs, including Llama2-7B, Mistral-7B, and T5 (Large). Notably, our approach achieves substantial gains on SemEVAL, where implicit relations are common, surpassing previous results on this dataset. Additionally, our method outperforms previous works on TACRED, TACREV, and Re-TACRED, demonstrating exceptional performance across diverse evaluation scenarios.

Updated: 2024-06-24 06:57:05

标题: 使用微调的大型语言模型在检索增强生成框架中进行关系抽取

摘要: 信息抽取（IE）对于将非结构化数据转换为知识图谱（KGs）等结构化格式至关重要。 IE中的一个关键任务是关系抽取（RE），它识别文本中实体之间的关系。存在各种关系抽取方法，包括监督、无监督、弱监督和基于规则的方法。最近利用预训练语言模型（PLMs）的研究在这一领域取得了显著成功。在当前以大型语言模型（LLMs）主导的时代，微调这些模型可以克服与零射击LLM提示为基础的RE方法相关的限制，特别是关于领域适应挑战和在句子中识别实体之间的隐含关系。这些隐含关系无法轻松从句子的依赖树中提取出来，需要逻辑推理来进行准确识别。本文探讨了微调LLMs的性能以及将它们集成到基于检索增强的（RAG）RE方法中，以解决在句子级别识别隐含关系的挑战，特别是当LLMs在RAG框架内充当生成器时。对TACRED、TACRED-Revisited（TACREV）、Re-TACRED和SemEVAL数据集的实证评估显示，微调的LLMs，包括Llama2-7B、Mistral-7B和T5（Large），显著提高了性能。值得注意的是，我们的方法在SemEVAL上取得了显著的增益，超越了该数据集上的先前结果。此外，我们的方法在TACRED、TACREV和Re-TACRED上优于先前的工作，展现出在各种评估场景中的卓越性能。

更新时间: 2024-06-24 06:57:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.14745v2

Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

Graph Neural Architecture Search (GNAS) has achieved superior performance on various graph-structured tasks. However, existing GNAS studies overlook the applications of GNAS in resource-constraint scenarios. This paper proposes to design a joint graph data and architecture mechanism, which identifies important sub-architectures via the valuable graph data. To search for optimal lightweight Graph Neural Networks (GNNs), we propose a Lightweight Graph Neural Architecture Search with Graph SparsIfication and Network Pruning (GASSIP) method. In particular, GASSIP comprises an operation-pruned architecture search module to enable efficient lightweight GNN search. Meanwhile, we design a novel curriculum graph data sparsification module with an architecture-aware edge-removing difficulty measurement to help select optimal sub-architectures. With the aid of two differentiable masks, we iteratively optimize these two modules to efficiently search for the optimal lightweight architecture. Extensive experiments on five benchmarks demonstrate the effectiveness of GASSIP. Particularly, our method achieves on-par or even higher node classification performance with half or fewer model parameters of searched GNNs and a sparser graph.

Updated: 2024-06-24 06:53:37

标题: 朝向轻量级图神经网络搜索：基于课程图稀疏化的方法

摘要: 图神经结构搜索（GNAS）在各种图结构任务上取得了优越的性能。然而，现有的GNAS研究忽视了在资源受限情况下应用GNAS的可能性。本文提出设计一种联合图数据和架构机制，通过有价值的图数据识别重要的子架构。为了搜索最佳轻量级图神经网络（GNN），我们提出了一种带有图稀疏化和网络修剪的轻量级图神经架构搜索（GASSIP）方法。具体而言，GASSIP包括一个操作修剪的架构搜索模块，以实现高效的轻量级GNN搜索。同时，我们设计了一个新颖的课程图数据稀疏化模块，具有架构感知的边删除难度测量，以帮助选择最佳子架构。借助两个可微分的掩码，我们迭代优化这两个模块，以高效地搜索最佳轻量级架构。对五个基准数据集进行的广泛实验验证了GASSIP的有效性。特别地，我们的方法在搜索到的GNN中的模型参数达到一半或更少，并且图更稀疏时，节点分类性能达到了与之相当甚至更高的水平。

更新时间: 2024-06-24 06:53:37

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2406.16357v1

Continuous-time Autoencoders for Regular and Irregular Time Series Imputation

Time series imputation is one of the most fundamental tasks for time series. Real-world time series datasets are frequently incomplete (or irregular with missing observations), in which case imputation is strongly required. Many different time series imputation methods have been proposed. Recent self-attention-based methods show the state-of-the-art imputation performance. However, it has been overlooked for a long time to design an imputation method based on continuous-time recurrent neural networks (RNNs), i.e., neural controlled differential equations (NCDEs). To this end, we redesign time series (variational) autoencoders based on NCDEs. Our method, called continuous-time autoencoder (CTA), encodes an input time series sample into a continuous hidden path (rather than a hidden vector) and decodes it to reconstruct and impute the input. In our experiments with 4 datasets and 19 baselines, our method shows the best imputation performance in almost all cases.

Updated: 2024-06-24 06:53:23

标题: 连续时间自编码器用于规则和不规则时间序列插补

摘要: 时间序列插补是时间序列中最基本的任务之一。现实世界中的时间序列数据集经常是不完整的（或者不规则，存在缺失观测值），在这种情况下，插补是必不可少的。已经提出了许多不同的时间序列插补方法。最近基于自注意力的方法展示了最先进的插补性能。然而，长期以来一直忽视了基于连续时间递归神经网络（RNNs），即神经控制微分方程（NCDEs）的插补方法的设计。为此，我们重新设计了基于NCDEs的时间序列（变分）自动编码器。我们的方法称为连续时间自编码器（CTA），将输入时间序列样本编码为连续隐藏路径（而不是隐藏向量），并解码以重建和插补输入。在我们对4个数据集和19个基线进行的实验中，我们的方法在几乎所有情况下都展示了最佳的插补性能。

更新时间: 2024-06-24 06:53:23

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2312.16581v3

Compact Model Parameter Extraction via Derivative-Free Optimization

In this paper, we address the problem of compact model parameter extraction to simultaneously extract tens of parameters via derivative-free optimization. Traditionally, parameter extraction is performed manually by dividing the complete set of parameters into smaller subsets, each targeting different operational regions of the device, a process that can take several days or even weeks. Our approach streamlines this process by employing derivative-free optimization to identify a good parameter set that best fits the compact model without performing an exhaustive number of simulations. We further enhance the optimization process to address critical issues in device modeling by carefully choosing a loss function that evaluates model performance consistently across varying magnitudes by focusing on relative errors (as opposed to absolute errors), prioritizing accuracy in key operational regions of the device above a certain threshold, and reducing sensitivity to outliers. Furthermore, we utilize the concept of train-test split to assess the model fit and avoid overfitting. This is done by fitting 80% of the data and testing the model efficacy with the remaining 20%. We demonstrate the effectiveness of our methodology by successfully modeling two semiconductor devices: a diamond Schottky diode and a GaN-on-SiC HEMT, with the latter involving the ASM-HEMT DC model, which requires simultaneously extracting 35 model parameters to fit the model to the measured data. These examples demonstrate the effectiveness of our approach and showcase the practical benefits of derivative-free optimization in device modeling.

Updated: 2024-06-24 06:52:50

标题: 通过无导数优化进行紧凑模型参数提取

摘要: 在本文中，我们解决了通过无导数优化同时提取数十个参数的紧凑模型参数提取问题。传统上，参数提取是通过手动将完整的参数集分成较小的子集进行的，每个子集针对设备的不同操作区域，这个过程可能需要几天甚至几周的时间。我们的方法通过使用无导数优化来简化这个过程，以确定最适合紧凑模型的参数集，而无需执行大量的仿真。我们进一步增强了优化过程，以解决设备建模中的关键问题，通过谨慎选择一个损失函数，该函数通过关注相对误差（而不是绝对误差）来一致评估模型在不同数量级下的性能，优先考虑设备的关键操作区域的准确性超过一定阈值，并减少对异常值的敏感性。此外，我们利用训练-测试分离的概念来评估模型拟合度并避免过度拟合。这是通过拟合80%的数据并使用剩余20%测试模型效果来实现的。我们通过成功对两种半导体器件进行建模来展示我们方法的有效性：一种是金刚石肖特基二极管，另一种是GaN-on-SiC HEMT，后者涉及ASM-HEMT DC模型，需要同时提取35个模型参数以将模型拟合到测量数据。这些示例展示了我们方法的有效性，并展示了无导数优化在设备建模中的实际益处。

更新时间: 2024-06-24 06:52:50

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.16355v1

ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Diffusion models are instrumental in text-to-audio (TTA) generation. Unfortunately, they suffer from slow inference due to an excessive number of queries to the underlying denoising network per generation. To address this bottleneck, we introduce ConsistencyTTA, a framework requiring only a single non-autoregressive network query, thereby accelerating TTA by hundreds of times. We achieve so by proposing "CFG-aware latent consistency model," which adapts consistency generation into a latent space and incorporates classifier-free guidance (CFG) into model training. Moreover, unlike diffusion models, ConsistencyTTA can be finetuned closed-loop with audio-space text-aware metrics, such as CLAP score, to further enhance the generations. Our objective and subjective evaluation on the AudioCaps dataset shows that compared to diffusion-based counterparts, ConsistencyTTA reduces inference computation by 400x while retaining generation quality and diversity.

Updated: 2024-06-24 06:51:55

标题: ConsistencyTTA：使用一致性蒸馏加速基于扩散的文本到音频生成

摘要: 扩散模型在文本到音频（TTA）生成中起着关键作用。不幸的是，由于每次生成时对底层去噪网络的查询过多，它们面临着推断速度缓慢的问题。为了解决这一瓶颈，我们引入了ConsistencyTTA，这是一个框架，只需要进行一次非自回归网络查询，从而将TTA加速数百倍。我们通过提出“CFG感知潜在一致性模型”来实现这一目标，该模型将一致性生成调整到潜在空间，并将无分类器指导（CFG）纳入模型训练中。此外，与扩散模型不同，ConsistencyTTA可以与音频空间文本感知度度量（如CLAP分数）进行闭环微调，以进一步增强生成效果。我们对AudioCaps数据集进行客观和主观评估表明，与基于扩散的对应模型相比，ConsistencyTTA将推断计算量减少了400倍，同时保持了生成质量和多样性。

更新时间: 2024-06-24 06:51:55

领域: cs.SD,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2309.10740v3

METRIK: Measurement-Efficient Randomized Controlled Trials using Transformers with Input Masking

Clinical randomized controlled trials (RCTs) collect hundreds of measurements spanning various metric types (e.g., laboratory tests, cognitive/motor assessments, etc.) across 100s-1000s of subjects to evaluate the effect of a treatment, but do so at the cost of significant trial expense. To reduce the number of measurements, trial protocols can be revised to remove metrics extraneous to the study's objective, but doing so requires additional human labor and limits the set of hypotheses that can be studied with the collected data. In contrast, a planned missing design (PMD) can reduce the amount of data collected without removing any metric by imputing the unsampled data. Standard PMDs randomly sample data to leverage statistical properties of imputation algorithms, but are ad hoc, hence suboptimal. Methods that learn PMDs produce more sample-efficient PMDs, but are not suitable for RCTs because they require ample prior data (150+ subjects) to model the data distribution. Therefore, we introduce a framework called Measurement EfficienT Randomized Controlled Trials using Transformers with Input MasKing (METRIK), which, for the first time, calculates a PMD specific to the RCT from a modest amount of prior data (e.g., 60 subjects). Specifically, METRIK models the PMD as a learnable input masking layer that is optimized with a state-of-the-art imputer based on the Transformer architecture. METRIK implements a novel sampling and selection algorithm to generate a PMD that satisfies the trial designer's objective, i.e., whether to maximize sampling efficiency or imputation performance for a given sampling budget. Evaluated across five real-world clinical RCT datasets, METRIK increases the sampling efficiency of and imputation performance under the generated PMD by leveraging correlations over time and across metrics, thereby removing the need to manually remove metrics from the RCT.

Updated: 2024-06-24 06:47:47

标题: METRIK：使用输入屏蔽的变压器进行高效测量的随机对照试验

摘要: 临床随机对照试验（RCTs）收集了数百种不同类型的测量数据（例如实验室检测、认知/运动评估等），涵盖100到1000名受试者，以评估治疗效果，但这样做会带来昂贵的试验费用。为了减少测量数据的数量，试验方案可以进行修订，去除与研究目标无关的指标，但这需要额外的人力成本，并限制了可以使用收集到的数据研究的假设范围。相反，计划缺失设计（PMD）可以通过填补未采样数据来减少收集到的数据量，而不删除任何指标。标准PMD随机采样数据以利用填补算法的统计特性，但是这些方法是临时的，因此不够优化。学习PMD的方法可以产生更加高效的PMD，但对于RCTs并不适用，因为它们需要充分的先验数据（150+受试者）来建模数据分布。因此，我们引入了一个名为使用Transformer和输入掩码的测量效率随机对照试验（METRIK）的框架，该框架首次从适量的先验数据（例如60名受试者）中计算出特定于RCT的PMD。具体来说，METRIK将PMD建模为一个可学习的输入掩码层，该层通过基于Transformer架构的最先进的填补器进行优化。METRIK实施了一种新颖的采样和选择算法，以生成一个满足试验设计者目标的PMD，即是否在给定的采样预算下最大化采样效率或填补性能。通过对五个真实世界的临床RCT数据集进行评估，METRIK通过利用时间和指标之间的相关性提高了在生成的PMD下的采样效率和填补性能，从而消除了需要手动移除RCT中指标的需求。

更新时间: 2024-06-24 06:47:47

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.16351v1

AnnotatedTables: A Large Tabular Dataset with Language Model Annotations

Tabular data is ubiquitous in real-world applications and abundant on the web, yet its annotation has traditionally required human labor, posing a significant scalability bottleneck for tabular machine learning. Our methodology can successfully annotate a large amount of tabular data and can be flexibly steered to generate various types of annotations based on specific research objectives, as we demonstrate with SQL annotation and input-target column annotation as examples. As a result, we release AnnotatedTables, a collection of 32,119 databases with LLM-generated annotations. The dataset includes 405,616 valid SQL programs, making it the largest SQL dataset with associated tabular data that supports query execution. To further demonstrate the value of our methodology and dataset, we perform two follow-up research studies. 1) We investigate whether LLMs can translate SQL programs to Rel programs, a database language previously unknown to LLMs, while obtaining the same execution results. Using our Incremental Prompt Engineering methods based on execution feedback, we show that LLMs can produce adequate translations with few-shot learning. 2) We evaluate the performance of TabPFN, a recent neural tabular classifier trained on Bayesian priors, on 2,720 tables with input-target columns identified and annotated by LLMs. On average, TabPFN performs on par with the baseline AutoML method, though the relative performance can vary significantly from one data table to another, making both models viable for practical applications depending on the situation. Our findings underscore the potential of LLMs in automating the annotation of large volumes of diverse tabular data.

Updated: 2024-06-24 06:44:14

标题: AnnotatedTables：一个带有语言模型标注的大型表格数据集

摘要: 表格数据在现实世界的应用中无处不在，在网络上也非常丰富，但其注释传统上需要人工劳动，为表格机器学习带来了重要的可扩展性瓶颈。我们的方法可以成功地注释大量的表格数据，并且可以灵活地引导生成基于特定研究目标的各种类型的注释，正如我们以SQL注释和输入-目标列注释为例所示。因此，我们发布了AnnotatedTables，这是一个包含32,119个数据库的收藏，带有LLM生成的注释。该数据集包括405,616个有效的SQL程序，使其成为支持查询执行的带有相关表格数据的最大SQL数据集。为了进一步展示我们的方法论和数据集的价值，我们进行了两项后续研究。 1）我们调查LLMs是否可以将SQL程序翻译成Rel程序，这是LLMs之前不熟悉的数据库语言，同时获得相同的执行结果。我们利用基于执行反馈的增量提示工程方法，展示了LLMs可以通过少量学习产生足够的翻译。2）我们评估了TabPFN的性能，这是一个最近在贝叶斯先验条件下训练的神经表格分类器，在2,720个由LLMs识别和注释的表格上。平均而言，TabPFN与基线AutoML方法的表现相当，尽管相对性能在不同数据表之间可能有显著差异，使得两种模型在具体应用中都是可行的，取决于情况。我们的发现强调了LLMs在自动化大量多样化表格数据注释方面的潜力。

更新时间: 2024-06-24 06:44:14

领域: cs.LG

下载: http://arxiv.org/abs/2406.16349v1

VulZoo: A Comprehensive Vulnerability Intelligence Dataset

Software vulnerabilities pose critical security and risk concerns for many software systems. Many techniques have been proposed to effectively assess and prioritize these vulnerabilities before they cause serious consequences. To evaluate their performance, these solutions often craft their own experimental datasets from limited information sources, such as MITRE CVE and NVD, lacking a global overview of broad vulnerability intelligence. The repetitive data preparation process further complicates the verification and comparison of new solutions. To resolve this issue, in this paper, we propose VulZoo, a comprehensive vulnerability intelligence dataset that covers 17 popular vulnerability information sources. We also construct connections among these sources, enabling more straightforward configuration and adaptation for different vulnerability assessment tasks (e.g., vulnerability type prediction). Additionally, VulZoo provides utility scripts for automatic data synchronization and cleaning, relationship mining, and statistics generation. We make VulZoo publicly available and maintain it with incremental updates to facilitate future research. We believe that VulZoo serves as a valuable input to vulnerability assessment and prioritization studies. The dataset with utility scripts is available at https://github.com/NUS-Curiosity/VulZoo.

Updated: 2024-06-24 06:39:07

标题: VulZoo：综合漏洞情报数据集

摘要: 软件漏洞对许多软件系统都构成了关键的安全和风险问题。许多技术已被提出，以有效评估和优先考虑这些漏洞，在它们造成严重后果之前。为了评估它们的性能，这些解决方案通常会从有限的信息来源（如MITRE CVE和NVD）中制定自己的实验数据集，缺乏对广泛漏洞情报的全局概述。重复的数据准备过程进一步使新解决方案的验证和比较变得复杂。为了解决这个问题，在本文中，我们提出了VulZoo，一个涵盖17个流行漏洞信息来源的综合漏洞情报数据集。我们还在这些来源之间建立连接，使得对于不同漏洞评估任务（如漏洞类型预测）更加简单的配置和适应成为可能。此外，VulZoo提供了用于自动数据同步和清理、关系挖掘和统计生成的实用脚本。我们将VulZoo公开并定期更新，以促进未来研究。我们相信VulZoo对于漏洞评估和优先考虑研究是一种有价值的输入。包含实用脚本的数据集可以在https://github.com/NUS-Curiosity/VulZoo上找到。

更新时间: 2024-06-24 06:39:07

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.16347v1

Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks

Large language models (LLMs) and large visual language models (LVLMs) have been at the forefront of the artificial intelligence field, particularly for tasks like text generation, video captioning, and question-answering. Typically, it is more applicable to train these models on broader knowledge bases or datasets to increase generalizability, learn relationships between topics, and recognize patterns. Instead, we propose to provide instructional datasets specific to the task of each modality within a distinct domain and then fine-tune the parameters of the model using LORA. With our approach, we can eliminate all noise irrelevant to the given task while also ensuring that the model generates with enhanced precision. For this work, we use Video-LLaVA to generate recipes given cooking videos without transcripts. Video-LLaVA's multimodal architecture allows us to provide cooking images to its image encoder, cooking videos to its video encoder, and general cooking questions to its text encoder. Thus, we aim to remove all noise unrelated to cooking while improving our model's capabilities to generate specific ingredient lists and detailed instructions. As a result, our approach to fine-tuning Video-LLaVA leads to gains over the baseline Video-LLaVA by 2% on the YouCook2 dataset. While this may seem like a marginal increase, our model trains on an image instruction dataset 2.5% the size of Video-LLaVA's and a video instruction dataset 23.76% of Video-LLaVA's.

Updated: 2024-06-24 06:39:02

标题: 定向域微调：为特定训练任务定制独立模态

摘要: 大型语言模型（LLMs）和大型视觉语言模型（LVLMs）一直处于人工智能领域的前沿，特别是在文本生成、视频字幕和问答等任务中。通常，更适用于在更广泛的知识库或数据集上训练这些模型，以增加泛化能力，学习主题之间的关系并识别模式。相反，我们提出提供针对每种模态任务的教学数据集，并使用LORA微调模型的参数。通过我们的方法，我们可以消除与给定任务无关的所有噪音，同时确保模型生成的精度提高。在这项工作中，我们使用Video-LLaVA在没有转录的情况下生成烹饪视频的食谱。Video-LLaVA的多模态架构使我们可以将烹饪图像提供给其图像编码器，将烹饪视频提供给其视频编码器，并将一般烹饪问题提供给其文本编码器。因此，我们的目标是消除与烹饪无关的所有噪音，同时提高我们模型生成特定成分列表和详细指导的能力。结果，我们微调Video-LLaVA的方法比基线Video-LLaVA在YouCook2数据集上提高了2%。虽然这可能看起来是一个边际增长，但我们的模型在图像指导数据集上训练的大小是Video-LLaVA的2.5%，视频指导数据集是Video-LLaVA的23.76%。

更新时间: 2024-06-24 06:39:02

领域: cs.CV,cs.AI,F.2.2; I.2.7

下载: http://arxiv.org/abs/2406.16346v1

Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 17 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation.

Updated: 2024-06-24 06:32:27

标题: Agent Design Pattern Catalogue：基于基础模型代理的架构模式集合

摘要: 基于基础模型的生成式人工智能为开发和实施代理提供了便利，这些代理可以利用突出的推理和语言处理能力，以主动、自主的方式追求用户的目标。然而，目前缺乏系统性知识来指导从业者设计代理时考虑到目标追求的挑战（包括生成工具性目标和计划），比如基础模型中固有的幻觉、推理过程的可解释性、复杂的问责制等问题。为了解决这个问题，我们进行了系统性文献回顾，以了解基于基础模型的代理和更广泛生态系统的最新发展。在本文中，我们提出了一个包含17种架构模式的模式目录，分析了先前文献回顾的上下文、力量和权衡结果。提出的目录可以为有效使用模式提供全面指导，并通过促进目标追求和计划生成来支持基于基础模型的代理的架构设计。

更新时间: 2024-06-24 06:32:27

领域: cs.AI,cs.SE

下载: http://arxiv.org/abs/2405.10467v3

In-context Pretraining: Language Modeling Beyond Document Boundaries

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%).

Updated: 2024-06-24 06:28:42

标题: 上下文预训练：超越文档边界的语言建模

摘要: 大型语言模型（LMs）目前被训练为根据文档前缀预测标记，使它们能够直接执行长篇生成和提示式任务，这些任务可以简化为文档完成。现有的预训练流水线通过连接随机集合的短文档来训练LMs，以创建输入上下文，但先前的文档对预测下一个文档没有信号。相反，我们提出了一种新方法，称为上下文预训练，在这种方法中，语言模型在一系列相关文档上进行预训练，从而明确鼓励它们跨越文档边界阅读和推理。我们可以通过简单地改变文档排序来进行上下文预训练，使每个上下文包含相关文档，并直接应用现有的预训练流水线。然而，这个文档排序问题是具有挑战性的。有数十亿个文档，我们希望排序能够最大化每个文档的上下文相似性，而不重复任何数据。为了做到这一点，我们引入了用于找到相关文档的近似算法，具有高效的最近邻搜索，并使用图遍历算法构建连贯的输入上下文。我们的实验证明，上下文预训练提供了一种简单且可扩展的方法，显著提高了LMs的性能：在需要更复杂的上下文推理的任务中，我们看到了显著的改进，包括上下文学习（+8％），阅读理解（+15％），忠实于先前上下文（+16％），长篇推理（+5％）和检索增强（+9％）。

更新时间: 2024-06-24 06:28:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.10638v6

Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models

The rapid advancement of Text-to-Image(T2I) generative models has enabled the synthesis of high-quality images guided by textual descriptions. Despite this significant progress, these models are often susceptible in generating contents that contradict the input text, which poses a challenge to their reliability and practical deployment. To address this problem, we introduce a novel diffusion-based framework to significantly enhance the alignment of generated images with their corresponding descriptions, addressing the inconsistency between visual output and textual input. Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image. Leveraging a state-of-the-art large language module, we first extract objects and construct a knowledge graph to predict the locations of these objects in potentially generated images. We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt, guided by the predicted object locations. Through extensive experiments on an advanced multimodal hallucination benchmark, we demonstrate the efficacy of our approach in accurately generating the images without the inconsistency with the original prompt. The code can be accessed via https://github.com/TruthAI-Lab/PCIG.

Updated: 2024-06-24 06:12:16

标题: 即时一致性图像生成（PCIG）：集成LLMs、知识图谱和可控扩散模型的统一框架

摘要: 文本到图像(T2I)生成模型的快速发展使得能够通过文本描述合成高质量图像成为可能。尽管取得了显著进展，但这些模型往往容易生成与输入文本相矛盾的内容，这对它们的可靠性和实际部署构成挑战。为了解决这一问题，我们引入了一种新颖的基于扩散的框架，显著增强了生成图像与其对应描述之间的对齐，解决了视觉输出与文本输入之间的不一致性。我们的框架建立在对不一致现象的全面分析基础上，根据它们在图像中的表现进行分类。利用最先进的大型语言模块，我们首先提取对象并构建知识图谱，以预测这些对象在可能生成的图像中的位置。然后，我们将最先进的可控图像生成模型与视觉文本生成模块结合起来，通过预测的对象位置指导生成与原始提示一致的图像。通过对一个先进的多模态幻觉基准数据集进行大量实验，我们展示了我们的方法在准确生成图像而无与原始提示不一致的效果。可以通过https://github.com/TruthAI-Lab/PCIG访问代码。

更新时间: 2024-06-24 06:12:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16333v1

Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging

While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach that uses manifold learning and the Normalized Pairwise Information Bottleneck (NPIB) measure to merge similar layers, reducing model size while preserving essential performance. We evaluate MKA on multiple benchmark datasets and various LLMs. Our findings show that MKA not only preserves model performance but also achieves substantial compression ratios, outperforming traditional pruning methods. Moreover, when coupled with quantization, MKA delivers even greater compression. Specifically, on the MMLU dataset using the Llama3-8B model, MKA achieves a compression ratio of 43.75% with a minimal performance decrease of only 2.82\%. The proposed MKA method offers a resource-efficient and performance-preserving model compression technique for LLMs.

Updated: 2024-06-24 05:57:55

标题: 修剪通过合并：基于流形对齐的层合并压缩LLMs

摘要: 尽管大型语言模型（LLMs）在许多领域表现出色，但其复杂性和规模挑战了在资源有限的环境中的部署。当前的压缩技术，如参数修剪，通常未能有效利用修剪参数中的知识。为了解决这些挑战，我们提出了一种基于流形学习和标准化成对信息瓶颈（NPIB）度量的Manifold-Based Knowledge Alignment and Layer Merging Compression（MKA）的新方法，通过合并相似的层来减小模型大小，同时保留必要的性能。我们在多个基准数据集和各种LLMs上评估了MKA。我们的研究结果表明，MKA不仅保留了模型性能，还实现了显著的压缩比，优于传统的修剪方法。此外，当与量化结合时，MKA可以实现更大的压缩。具体来说，在MMLU数据集上使用Llama3-8B模型，MKA实现了43.75%的压缩比，仅减少了2.82%的性能。提出的MKA方法为LLMs提供了一种资源高效且保持性能的模型压缩技术。

更新时间: 2024-06-24 05:57:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16330v1

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models like LLMs is relatively sparse, and the approach of distilling student-friendly knowledge, which has shown promising performance in KD for classification models, remains unexplored in generative language models. To explore this approach, we propose PromptKD, a simple yet effective method that utilizes prompt tuning - for the first time in KD - to enable generative language models to transfer student-friendly knowledge. Unlike previous works in classification that require fine-tuning the entire teacher model for extracting student-friendly knowledge, PromptKD achieves similar effects by adding a small number of prompt tokens and tuning only the prompt with student guidance. Extensive experiments on instruction-following datasets show that PromptKD achieves state-of-the-art performance while adding only 0.0007% of the teacher's parameters as prompts. Further analysis suggests that distilling student-friendly knowledge alleviates exposure bias effectively throughout the entire training process, leading to performance enhancements.

Updated: 2024-06-24 05:40:38

标题: PromptKD: 通过提示调整为生成式语言模型提炼学生友好的知识

摘要: 最近大型语言模型（LLMs）的进展引起了对推理成本的担忧，进一步增加了对模型压缩研究的需求。尽管知识蒸馏（KD）是一种突出的方法，但对于像LLMs这样的生成式语言模型的KD研究相对较少，而在分类模型的KD中表现出良好性能的蒸馏学生友好知识的方法在生成式语言模型中尚未被探索。为了探索这种方法，我们提出了PromptKD，这是一种简单而有效的方法，它利用提示调整-在KD中首次使用-以使生成式语言模型能够传递学生友好知识。与以前在分类中需要微调整整个教师模型以提取学生友好知识的工作不同，PromptKD通过添加少量提示标记并仅调整带有学生指导的提示来实现类似的效果。对遵循指令的数据集进行的大量实验表明，PromptKD在仅添加教师参数的0.0007％的情况下实现了最先进的性能。进一步的分析表明，蒸馏学生友好知识有效地缓解了整个训练过程中的曝光偏差，从而提高了性能。

更新时间: 2024-06-24 05:40:38

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.12842v2

Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to train specialized adapters, which are then uploaded to a central platform to improve LLMs. This platform uses these domain-specific adapters to handle mixed-task requests requiring personalized service. Previous research on LoRA composition either focuses on specific tasks or fixes the LoRA selection during training. However, in UML, the pool of LoRAs is dynamically updated with new uploads, requiring a generalizable selection mechanism for unseen LoRAs. Additionally, the mixed-task nature of downstream requests necessitates personalized services. To address these challenges, we propose Retrieval-Augmented Mixture of LoRA Experts (RAMoLE), a framework that adaptively retrieves and composes multiple LoRAs based on input prompts. RAMoLE has three main components: LoraRetriever for identifying and retrieving relevant LoRAs, an on-the-fly MoLE mechanism for coordinating the retrieved LoRAs, and efficient batch inference for handling heterogeneous requests. Experimental results show that RAMoLE consistently outperforms baselines, highlighting its effectiveness and scalability.

Updated: 2024-06-24 05:24:41

标题: 检索增强的LoRA专家混合模型用于可上传的机器学习

摘要: 低秩适应（LoRA）提供了一种有效的方法来微调大型语言模型（LLMs）。其模块化和即插即用的性质允许集成各种领域特定的LoRA，增强LLM的能力。像Huggingface和Modelscope这样的开源平台引入了一种新的计算范式，即可上传的机器学习（UML）。在UML中，贡献者使用分散的数据来训练专门的适配器，然后将其上传到中央平台以改进LLMs。该平台利用这些领域特定的适配器来处理需要个性化服务的混合任务请求。先前关于LoRA构成的研究要么专注于特定任务，要么在训练期间固定LoRA的选择。然而，在UML中，LoRA的池动态更新新上传内容，需要一个适用于未见LoRA的通用选择机制。此外，下游请求的混合任务性质需要个性化服务。为了解决这些挑战，我们提出了检索增强的LoRA专家混合（RAMoLE）框架，根据输入提示自适应地检索和组合多个LoRA。RAMoLE有三个主要组成部分：LoraRetriever用于识别和检索相关的LoRA，一个即时MoLE机制用于协调检索到的LoRA，以及处理异构请求的高效批量推理。实验结果表明，RAMoLE始终优于基线，突显其有效性和可扩展性。

更新时间: 2024-06-24 05:24:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16989v1

Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans

Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Although deep learning-based approaches have been recently explored for differential diagnosis of kidney cancer, they do not explicitly model the relationships between CT phases in the network design, limiting the diagnostic performance. In this paper, we propose a novel lesion-aware cross-phase attention network (LACPANet) that can effectively capture temporal dependencies of renal lesions across CT phases to accurately classify the lesions into five major pathological subtypes from time-series multi-phase CT images. We introduce a 3D inter-phase lesion-aware attention mechanism to learn effective 3D lesion features that are used to estimate attention weights describing the inter-phase relations of the enhancement patterns. We also present a multi-scale attention scheme to capture and aggregate temporal patterns of lesion features at different spatial scales for further improvement. Extensive experiments on multi-phase CT scans of kidney cancer patients from the collected dataset demonstrate that our LACPANet outperforms state-of-the-art approaches in diagnostic accuracy.

Updated: 2024-06-24 05:15:15

标题: 跨时相关注网络感知肾肿瘤亚型在多时相CT扫描中的分类

摘要: 多相计算机断层扫描（CT）已广泛用于肾癌的术前诊断，因为它具有无创性和表征肾脏病变的能力。然而，由于即使是相同类型的病变，肾脏病变在CT不同相中的增强模式也不同，放射科医生的视觉评估在临床实践中存在观察者间的可变性。尽管最近已经探索了基于深度学习的方法来进行肾癌的鉴别诊断，但它们在网络设计中并未明确建模CT不同相之间的关系，从而限制了诊断性能。在本文中，我们提出了一种新颖的病变感知跨相关注网络（LACPANet），它可以有效捕捉肾脏病变在CT不同相间的时间依赖性，从而准确将病变分类为来自时间序列多相CT图像的五种主要病理亚型。我们引入了一个3D跨相病变感知注意力机制，用于学习有效的3D病变特征，这些特征用于估计描述增强模式间关系的注意力权重。我们还提出了一个多尺度注意力方案，以捕获和聚合不同空间尺度上的病变特征的时间模式，以进一步提高性能。对来自收集的数据集的肾癌患者的多相CT扫描进行的大量实验表明，我们的LACPANet在诊断准确性方面优于现有技术。

更新时间: 2024-06-24 05:15:15

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.16322v1

Multimodal Graph Benchmark

Associating unstructured data with structured information is crucial for real-world tasks that require relevance search. However, existing graph learning benchmarks often overlook the rich semantic information associate with each node. To bridge such gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), the first comprehensive multi-modal graph benchmark that incorporates both textual and visual information. MM-GRAPH surpasses previous efforts, which have primarily focused on text-attributed graphs with various connectivity patterns. MM-GRAPH consists of five graph learning datasets of various scales that are appropriate for different learning tasks. Their multimodal node features, enabling a more comprehensive evaluation of graph learning algorithms in real-world scenarios. To facilitate research on multimodal graph learning, we further provide an extensive study on the performance of various graph neural networks in the presence of features from various modalities. MM-GRAPH aims to foster research on multimodal graph learning and drive the development of more advanced and robust graph learning algorithms. By providing a diverse set of datasets and benchmarks, MM-GRAPH enables researchers to evaluate and compare their models in realistic settings, ultimately leading to improved performance on real-world applications that rely on multimodal graph data.

Updated: 2024-06-24 05:14:09

标题: 多模式图基准Benchmark

摘要: 将非结构化数据与结构化信息关联起来对需要相关性搜索的实际任务至关重要。然而，现有的图学习基准往往忽视了与每个节点关联的丰富语义信息。为了弥合这种差距，我们引入了多模态图基准(MM-GRAPH)，这是第一个综合的多模态图基准，融合了文本和视觉信息。MM-GRAPH超越了先前的努力，这些努力主要集中在具有不同连接模式的文本属性图上。MM-GRAPH包括五个不同规模的图学习数据集，适用于不同的学习任务。它们的多模态节点特征使得在真实场景中更全面地评估图学习算法成为可能。为了促进多模态图学习的研究，我们进一步对各种图神经网络在具有不同模态特征的情况下的性能进行了广泛研究。MM-GRAPH旨在促进多模态图学习的研究，并推动更先进和更稳健的图学习算法的发展。通过提供多样化的数据集和基准，MM-GRAPH使研究人员能够在现实环境中评估和比较其模型，最终提高依赖多模态图数据的实际应用程序的性能。

更新时间: 2024-06-24 05:14:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16321v1

On Parameter Estimation in Deviated Gaussian Mixture of Experts

We consider the parameter estimation problem in the deviated Gaussian mixture of experts in which the data are generated from $(1 - \lambda^{\ast}) g_0(Y| X)+ \lambda^{\ast} \sum_{i = 1}^{k_{\ast}} p_{i}^{\ast} f(Y|(a_{i}^{\ast})^{\top}X+b_i^{\ast},\sigma_{i}^{\ast})$, where $X, Y$ are respectively a covariate vector and a response variable, $g_{0}(Y|X)$ is a known function, $\lambda^{\ast} \in [0, 1]$ is true but unknown mixing proportion, and $(p_{i}^{\ast}, a_{i}^{\ast}, b_{i}^{\ast}, \sigma_{i}^{\ast})$ for $1 \leq i \leq k^{\ast}$ are unknown parameters of the Gaussian mixture of experts. This problem arises from the goodness-of-fit test when we would like to test whether the data are generated from $g_{0}(Y|X)$ (null hypothesis) or they are generated from the whole mixture (alternative hypothesis). Based on the algebraic structure of the expert functions and the distinguishability between $g_0$ and the mixture part, we construct novel Voronoi-based loss functions to capture the convergence rates of maximum likelihood estimation (MLE) for our models. We further demonstrate that our proposed loss functions characterize the local convergence rates of parameter estimation more accurately than the generalized Wasserstein, a loss function being commonly used for estimating parameters in the Gaussian mixture of experts.

Updated: 2024-06-24 05:13:06

标题: 关于偏斜高斯专家混合模型中的参数估计

摘要: 我们考虑在偏离的高斯混合专家中的参数估计问题，其中数据由$(1 - \lambda^{\ast}) g_0(Y| X)+ \lambda^{\ast} \sum_{i = 1}^{k_{\ast}} p_{i}^{\ast} f(Y|(a_{i}^{\ast})^{\top}X+b_i^{\ast},\sigma_{i}^{\ast})$生成，其中$X, Y$分别是一个协变量向量和一个响应变量，$g_{0}(Y|X)$是一个已知函数，$\lambda^{\ast} \in [0, 1]$是真实但未知的混合比例，$(p_{i}^{\ast}, a_{i}^{\ast}, b_{i}^{\ast}, \sigma_{i}^{\ast})$对于$1 \leq i \leq k^{\ast}$是高斯混合专家的未知参数。这个问题源于拟合优度检验，当我们想要测试数据是由$g_{0}(Y|X)$（零假设）生成还是由整个混合物（备择假设）生成时。基于专家函数的代数结构和$g_0$与混合部分之间的可区分性，我们构建了基于Voronoi的新型损失函数，以捕捉最大似然估计(MLE)在我们的模型中的收敛速率。我们进一步证明，我们提出的损失函数更准确地描述了参数估计的局部收敛速率，比常用于高斯混合专家中参数估计的广义Wasserstein损失函数。

更新时间: 2024-06-24 05:13:06

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.05220v2

Testing the Limits of Jailbreaking Defenses with the Purple Problem

The rise of "jailbreak" attacks on language models has led to a flurry of defenses aimed at preventing undesirable responses. We critically examine the two stages of the defense pipeline: (i) defining what constitutes unsafe outputs, and (ii) enforcing the definition via methods such as input processing or fine-tuning. To test the efficacy of existing enforcement mechanisms, we consider a simple and well-specified definition of unsafe outputs--outputs that contain the word "purple". Surprisingly, existing fine-tuning and input defenses fail on this simple problem, casting doubt on whether enforcement algorithms can be robust for more complicated definitions. We find that real safety benchmarks similarly test enforcement for a fixed definition. We hope that future research can lead to effective/fast enforcement as well as high quality definitions used for enforcement and evaluation.

Updated: 2024-06-24 05:01:06

标题: 用紫色问题测试越狱防御的极限

摘要: “越来越多的‘越狱’攻击针对语言模型，导致了一系列旨在防止不良响应的防御措施。我们对防御管道的两个阶段进行了批判性审视：（i）定义什么构成不安全的输出，以及（ii）通过诸如输入处理或微调等方法来强制执行定义。为了测试现有执行机制的有效性，我们考虑了一个简单而明确的不安全输出定义——包含单词“purple”的输出。令人惊讶的是，现有的微调和输入防御在这个简单问题上失败了，这对于执行算法能否对更复杂的定义具有鲁棒性产生了怀疑。我们发现，真实的安全基准同样测试了对于一个固定定义的执行。我们希望未来的研究能够导致有效/快速的执行，以及用于执行和评估的高质量定义。”

更新时间: 2024-06-24 05:01:06

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.14725v2

Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy

We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which hallmark the directional nature of optimization in neural networks: when is there redundancy, and when exploration. We use them to reveal the inherent nuance and interplay involved between various optimization choices, such as momentum and weight decay. Further, the trajectory perspective helps us see the effect of scale on regularizing the directional nature of trajectories, and as a by-product, we also observe an intriguing heterogeneity of Q,K,V dynamics in the middle attention layers in LLMs and which is homogenized by scale. Importantly, we put the significant directional redundancy observed to the test by demonstrating that training only scalar batchnorm parameters some while into training matches the performance of training the entire network, which thus exhibits the potential of hybrid optimization schemes that are geared towards efficiency.

Updated: 2024-06-24 04:53:34

标题: 神经网络中优化轨迹的特征：方向性探索与冗余

摘要: 我们提出了一种全新的理解神经网络机制的方法，通过分析优化轨迹的丰富方向结构，表示为它们的点参数。为此，我们引入了一些关于优化轨迹复杂性的自然概念，无论是定性的还是定量的，这些概念标志着神经网络优化的方向性质：何时存在冗余，何时进行探索。我们利用这些概念来揭示各种优化选择之间涉及的固有微妙和相互作用，比如动量和权重衰减。此外，轨迹视角帮助我们看到尺度对规范化轨迹方向性质的影响，同时，我们还观察到在LLMs中的中间注意力层中Q、K、V动态的有趣异质性，这些动态在尺度作用下被均一化。重要的是，我们通过展示仅训练标量batchnorm参数可以在训练过程中的某个时候达到与训练整个网络性能相匹配的结果，从而将观察到的显著方向冗余放到测试中，展示了朝着效率方向的混合优化方案的潜力。

更新时间: 2024-06-24 04:53:34

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2403.07379v2

A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential equations. To address this issue, we propose using a novel class of modified softmax gating functions which transform the input before delivering them to the gating functions. As a result, the previous interaction disappears and the parameter estimation rates are significantly improved.

Updated: 2024-06-24 04:53:09

标题: 一个Softmax门控多项式逻辑混合专家的通用理论

摘要: 混合专家（MoE）模型通过门控函数结合多个子模型的力量，在许多回归和分类应用中实现更好的性能。从理论上讲，虽然先前曾尝试通过高斯MoE模型中最大似然估计的收敛分析来理解该模型在回归设置下的行为，但是在分类问题设置下的分析在文献中尚未有所涉及。我们通过建立 softmax 门控多项式 logistic MoE 模型中的密度估计和参数估计的收敛率来填补这一空白。值得注意的是，在部分专家参数消失时，由于 softmax 门控和专家函数之间通过偏微分方程存在固有的相互作用，这些速率显示出比多项式速率更慢。为了解决这个问题，我们提出使用一种新颖的修改 softmax 门控函数的类别，这些函数在传递给门控函数之前会转换输入。结果，以前的相互作用消失，参数估计速率显著提高。

更新时间: 2024-06-24 04:53:09

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.14188v2

Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization

Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL pose a significant challenge to its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstacle, especially in large model scenarios. Despite various communication efficient strategies, the intrinsic dimension-dependent communication cost remains a major bottleneck for current FL implementations. In this paper, we introduce a novel dimension-free communication strategy for FL, leveraging zero-order optimization techniques. We propose a new algorithm, FedDisco, which facilitates the transmission of only a constant number of scalar values between clients and the server in each communication round, thereby reducing the communication cost from $\mathscr{O}(d)$ to $\mathscr{O}(1)$, where $d$ is the dimension of the model parameters. Theoretically, in non-convex functions, we prove that our algorithm achieves state-of-the-art rates, which show a linear speedup of the number of clients and local steps under standard assumptions and dimension-free rate for low effective rank scenarios. Empirical evaluations through classic deep learning training and large language model fine-tuning substantiate significant reductions in communication overhead compared to traditional FL approaches. Our code is available at https://github.com/ZidongLiu/FedDisco.

Updated: 2024-06-24 04:52:25

标题: 通过零阶优化在联邦学习中实现无维度通信

摘要: Federated Learning（FL）为在分布式数据源之间进行协作和隐私保护的机器学习提供了一个有前景的框架。然而，与FL相关的大量通信成本对其效率构成了重大挑战。具体来说，在每次通信轮中，通信成本与模型的维度呈线性关系，这在大型模型场景中尤为困难。尽管有各种通信效率策略，但固有的维度相关通信成本仍然是当前FL实现的主要瓶颈。在本文中，我们介绍了一种新颖的无维度通信策略，利用零阶优化技术。我们提出了一种新算法FedDisco，它在每次通信轮中促进了仅在客户端和服务器之间传输一个常数数量的标量值，从而将通信成本从O(d)降低到O(1)，其中d是模型参数的维度。从理论上讲，在非凸函数中，我们证明了我们的算法在标准假设和低有效秩场景下实现了最先进的速率，显示了客户端数量和本地步骤的线性加速，并为低有效秩场景提供了无维度速率。通过经典的深度学习训练和大型语言模型微调的实证评估，与传统的FL方法相比，证实了通信开销的显著减少。我们的代码可以在https://github.com/ZidongLiu/FedDisco上找到。

更新时间: 2024-06-24 04:52:25

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.15861v2

Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?

Alignment of the language model with human preferences is a common approach to making a language model useful to end users. However, most alignment work is done in English, and human preference datasets are dominated by English, reflecting only the preferences of English-speaking annotators. Nevertheless, it is common practice to use the English preference data, either directly or by translating it into the target language, when aligning a multilingual language model. The question is whether such an alignment strategy marginalizes the preference of non-English speaking users. To this end, we investigate the effect of aligning Japanese language models with (mostly) English resources. In particular, we focus on evaluating whether the commonsense morality of the resulting fine-tuned models is aligned with Japanese culture using the JCommonsenseMorality (JCM) and ETHICS datasets. The experimental results show that the fine-tuned model outperforms the SFT model. However, it does not demonstrate the same level of improvement as a model fine-tuned using the JCM, suggesting that while some aspects of commonsense morality are transferable, others may not be.

Updated: 2024-06-24 04:50:12

标题: 跨文化对齐是否改变语言模型的常识道德？

摘要: 将语言模型与人类偏好对齐是使语言模型对最终用户有用的常见方法。然而，大多数对齐工作都是用英语进行的，并且人类偏好数据集以英语为主导，仅反映了英语注释者的偏好。尽管如此，在对齐多语言语言模型时，通常会使用英语偏好数据，无论是直接使用还是将其翻译成目标语言。问题是这样的对齐策略是否边缘化了非英语使用者的偏好。为此，我们调查了将日语语言模型与（主要是）英语资源对齐的效果。具体而言，我们重点评估了通过使用JCommonsenseMorality（JCM）和ETHICS数据集，最终微调模型的常识道德是否与日本文化对齐。实验结果显示，微调模型优于SFT模型。然而，它并没有展示出与使用JCM微调的模型相同水平的改进，这表明尽管一些常识道德的方面是可转移的，但其他方面可能不是。

更新时间: 2024-06-24 04:50:12

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.16316v1

LLM-Assisted Content Conditional Debiasing for Fair Text Embedding

Mitigating biases in machine learning models has become an increasing concern in Natural Language Processing (NLP), particularly in developing fair text embeddings, which are crucial yet challenging for real-world applications like search engines. In response, this paper proposes a novel method for learning fair text embeddings. First, we define a novel content-conditional equal distance (CCED) fairness for text embeddings, ensuring content-conditional independence between sensitive attributes and text embeddings. Building on CCED, we introduce a content-conditional debiasing (CCD) loss to ensure that embeddings of texts with different sensitive attributes but identical content maintain the same distance from the embedding of their corresponding neutral text. Additionally, we tackle the issue of insufficient training data by using Large Language Models (LLMs) with instructions to fairly augment texts into different sensitive groups. Our extensive evaluations show that our approach effectively enhances fairness while maintaining the utility of embeddings. Furthermore, our augmented dataset, combined with the CCED metric, serves as an new benchmark for evaluating fairness.

Updated: 2024-06-24 04:49:16

标题: LLM辅助内容条件去偏见，实现公平文本嵌入

摘要: 在自然语言处理（NLP）中，减轻机器学习模型中的偏见已经成为一个日益关注的问题，特别是在开发公平文本嵌入这样对于搜索引擎等实际应用至关重要却又具有挑战性的领域。为此，本文提出了一种学习公平文本嵌入的新方法。首先，我们定义了一种新的内容条件等距（CCED）公平性标准，确保敏感属性和文本嵌入之间的内容条件独立性。基于CCED，我们引入了一个内容条件去偏差（CCD）损失，以确保具有不同敏感属性但相同内容的文本的嵌入与其相应中立文本的嵌入之间保持相同距离。此外，我们通过使用大型语言模型（LLMs）来解决训练数据不足的问题，并指导其公平地增加文本到不同的敏感群组中。我们广泛的评估表明，我们的方法有效地增强了公平性同时保持了嵌入的效用。此外，我们的增强数据集结合CCED度量标准，可作为评估公平性的新基准。

更新时间: 2024-06-24 04:49:16

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.14208v3

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

Dense-to-sparse gating mixture of experts (MoE) has recently become an effective alternative to a well-known sparse MoE. Rather than fixing the number of activated experts as in the latter model, which could limit the investigation of potential experts, the former model utilizes the temperature to control the softmax weight distribution and the sparsity of the MoE during training in order to stabilize the expert specialization. Nevertheless, while there are previous attempts to theoretically comprehend the sparse MoE, a comprehensive analysis of the dense-to-sparse gating MoE has remained elusive. Therefore, we aim to explore the impacts of the dense-to-sparse gate on the maximum likelihood estimation under the Gaussian MoE in this paper. We demonstrate that due to interactions between the temperature and other model parameters via some partial differential equations, the convergence rates of parameter estimations are slower than any polynomial rates, and could be as slow as $\mathcal{O}(1/\log(n))$, where $n$ denotes the sample size. To address this issue, we propose using a novel activation dense-to-sparse gate, which routes the output of a linear layer to an activation function before delivering them to the softmax function. By imposing linearly independence conditions on the activation function and its derivatives, we show that the parameter estimation rates are significantly improved to polynomial rates. Finally, we conduct a simulation study to empirically validate our theoretical results.

Updated: 2024-06-24 04:45:30

标题: 温度对Softmax高斯混合专家模型的采样效率有影响吗？

摘要: 最近，密集到稀疏门控专家（MoE）已成为一个有效的选择，用于取代众所周知的稀疏MoE。与后者模型中固定激活专家数量不同，可能限制潜在专家调查的问题，前者模型利用温度来控制softmax权重分布和MoE的稀疏性，在训练期间以稳定专家专业化。然而，尽管先前有尝试从理论上理解稀疏MoE，但对于密集到稀疏门控MoE的全面分析仍然难以实现。因此，本文旨在探讨密集到稀疏门对高斯MoE下最大似然估计的影响。我们展示由于通过一些偏微分方程温度与其他模型参数之间的相互作用，参数估计的收敛速度比任何多项式速度慢，并且可能慢至$\mathcal{O}(1/\log(n))$，其中$n$表示样本大小。为解决这个问题，我们提出使用一种新颖的激活密集到稀疏门，它将线性层的输出路由到激活函数，然后将其传递给softmax函数。通过对激活函数及其导数施加线性独立性条件，我们展示了参数估计速度显着提高至多项式速度。最后，我们进行了模拟研究，以经验验证我们的理论结果。

更新时间: 2024-06-24 04:45:30

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2401.13875v2

Thinking Inside The Box: Privacy Against Stronger Adversaries

In this thesis, we study extensions of statistical cryptographic primitives. In particular we study leakage-resilient secret sharing, non-malleable extractors, and immunized ideal one-way functions. The thesis is divided into three main chapters. In the first chapter, we show that 2-out-of-2 leakage resilient (and also non-malleable) secret sharing requires randomness sources that are also extractable. This rules out the possibility of using min-entropic sources. In the second, we introduce collision-resistant seeded extractors and show that any seeded extractor can be made collision resistant at a small overhead in seed length. We then use it to give a two-source non-malleable extractor with entropy rate 0.81 in one source and polylogarithmic in the other. The non-malleable extractor lead to the first statistical privacy amplification protocol against memory tampering adversaries. In the final chapter, we study the hardness of the data structure variant of the 3SUM problem which is motivated by a recent construction to immunise random oracles against pre-processing adversaries. We give worst-case data structure hardness for the 3SUM problem matching known barriers in data structures for adaptive adversaries. We also give a slightly stronger lower bound in the case of non-adaptivity. Lastly, we give a novel result in the bit-probe setting.

Updated: 2024-06-24 04:40:28

标题: 思考盒子里的隐私：抵御更强敌手

摘要: 在这篇论文中，我们研究了统计加密原语的扩展。特别是我们研究了抗泄漏秘密共享、非可塑提取器和免疫理想单向函数。论文分为三个主要章节。在第一章中，我们展示了2-2泄漏抗性（以及非可塑性）秘密共享需要具有可提取性的随机源。这排除了使用最小熵源的可能性。在第二章中，我们介绍了抗碰撞种子提取器，并展示了任何种子提取器都可以在种子长度略有增加的情况下变得抗碰撞。然后我们利用它给出了一个在一个源中熵率为0.81，而在另一个源中为对数多项式的两源非可塑提取器。这个非可塑提取器导致了第一个针对内存篡改对手的统计隐私放大协议。在最后一章中，我们研究了受到最近构建的免疫随机预言针对预处理对手的数据结构变种的3SUM问题的困难性。我们给出了3SUM问题的最坏情况数据结构困难性，与已知的自适应对手数据结构的障碍相匹配。在非自适应情况下我们也给出了略微更强的下界。最后，我们在比特探测设置中给出了一个新颖结果。

更新时间: 2024-06-24 04:40:28

领域: cs.CR

下载: http://arxiv.org/abs/2406.16313v1

Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling

Multi-fidelity surrogate modeling aims to learn an accurate surrogate at the highest fidelity level by combining data from multiple sources. Traditional methods relying on Gaussian processes can hardly scale to high-dimensional data. Deep learning approaches utilize neural network based encoders and decoders to improve scalability. These approaches share encoded representations across fidelities without including corresponding decoder parameters. This hinders inference performance, especially in out-of-distribution scenarios when the highest fidelity data has limited domain coverage. To address these limitations, we propose Multi-fidelity Residual Neural Processes (MFRNP), a novel multi-fidelity surrogate modeling framework. MFRNP explicitly models the residual between the aggregated output from lower fidelities and ground truth at the highest fidelity. The aggregation introduces decoders into the information sharing step and optimizes lower fidelity decoders to accurately capture both in-fidelity and cross-fidelity information. We show that MFRNP significantly outperforms state-of-the-art in learning partial differential equations and a real-world climate modeling task. Our code is published at: https://github.com/Rose-STL-Lab/MFRNP

Updated: 2024-06-24 04:33:30

标题: 多保真度残差神经过程用于可扩展代理建模

摘要: 多保真度代理建模旨在通过结合来自多个来源的数据，在最高保真度水平上学习准确的代理模型。传统的依赖高斯过程的方法很难扩展到高维数据。深度学习方法利用基于神经网络的编码器和解码器来改善可扩展性。这些方法在不包括相应的解码器参数的情况下跨保真度共享编码表示。这在推断性能方面存在障碍，特别是在最高保真度数据的域覆盖有限的情况下。为了解决这些限制，我们提出了多保真度残差神经过程（MFRNP），这是一种新颖的多保真度代理建模框架。MFRNP明确地建模了较低保真度的聚合输出与最高保真度的地面真实值之间的残差。聚合引入了解码器到信息共享步骤中，并优化了较低保真度解码器以准确捕获保真度内和跨保真度信息。我们展示了MFRNP在学习偏微分方程和真实世界气候建模任务中显著优于最先进技术。我们的代码已发布在：https://github.com/Rose-STL-Lab/MFRNP。

更新时间: 2024-06-24 04:33:30

领域: cs.LG

下载: http://arxiv.org/abs/2402.18846v2

On Least Square Estimation in Softmax Gating Mixture of Experts

Mixture of experts (MoE) model is a statistical machine learning design that aggregates multiple expert networks using a softmax gating function in order to form a more intricate and expressive model. Despite being commonly used in several applications owing to their scalability, the mathematical and statistical properties of MoE models are complex and difficult to analyze. As a result, previous theoretical works have primarily focused on probabilistic MoE models by imposing the impractical assumption that the data are generated from a Gaussian MoE model. In this work, we investigate the performance of the least squares estimators (LSE) under a deterministic MoE model where the data are sampled according to a regression model, a setting that has remained largely unexplored. We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions. We demonstrate that the rates for estimating strongly identifiable experts, namely the widely used feed-forward networks with activation functions $\mathrm{sigmoid}(\cdot)$ and $\tanh(\cdot)$, are substantially faster than those of polynomial experts, which we show to exhibit a surprising slow estimation rate. Our findings have important practical implications for expert selection.

Updated: 2024-06-24 04:32:55

标题: 《关于Softmax门控专家混合模型中的最小二乘估计》

摘要: Mixture of experts（MoE）模型是一种统计机器学习设计，利用softmax门控函数聚合多个专家网络，以形成更复杂和表达丰富的模型。尽管由于其可扩展性而在几个应用中被广泛使用，MoE模型的数学和统计特性却复杂且难以分析。因此，先前的理论工作主要集中在概率MoE模型上，通过引入不切实际的假设，即数据是从高斯MoE模型生成的。在这项研究中，我们研究了在确定性MoE模型下最小二乘估计器（LSE）的性能，其中数据根据回归模型进行采样，这是一个尚未被充分探索的设置。我们建立了一个称为强可辨识性的条件，以描述各种类型专家函数的收敛行为。我们证明了强可辨识专家，即广泛使用的具有激活函数sigmoid（·）和tanh（·）的前馈网络的估计速度远远快于多项式专家的速度，我们发现多项式专家的估计速度出人意料地慢。我们的发现对专家选择具有重要的实际意义。

更新时间: 2024-06-24 04:32:55

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.02952v2

MD tree: a model-diagnostic tree grown on loss landscape

This paper considers "model diagnosis", which we formulate as a classification problem. Given a pre-trained neural network (NN), the goal is to predict the source of failure from a set of failure modes (such as a wrong hyperparameter, inadequate model size, and insufficient data) without knowing the training configuration of the pre-trained NN. The conventional diagnosis approach uses training and validation errors to determine whether the model is underfitting or overfitting. However, we show that rich information about NN performance is encoded in the optimization loss landscape, which provides more actionable insights than validation-based measurements. Therefore, we propose a diagnosis method called MD tree based on loss landscape metrics and experimentally demonstrate its advantage over classical validation-based approaches. We verify the effectiveness of MD tree in multiple practical scenarios: (1) use several models trained on one dataset to diagnose a model trained on another dataset, essentially a few-shot dataset transfer problem; (2) use small models (or models trained with small data) to diagnose big models (or models trained with big data), essentially a scale transfer problem. In a dataset transfer task, MD tree achieves an accuracy of 87.7%, outperforming validation-based approaches by 14.88%. Our code is available at https://github.com/YefanZhou/ModelDiagnosis.

Updated: 2024-06-24 04:31:17

标题: MD树：在损失地形上生长的模型诊断树

摘要: 这篇论文探讨了“模型诊断”，我们将其定义为一个分类问题。给定一个预训练的神经网络（NN），目标是在不知道预训练NN的训练配置的情况下，从一组故障模式（例如错误的超参数，不足的模型大小和不足的数据）中预测故障的来源。传统的诊断方法使用训练和验证误差来确定模型是欠拟合还是过拟合。然而，我们展示了关于NN性能的丰富信息被编码在优化损失景观中，提供了比基于验证的测量更具操作性的见解。因此，我们提出了一种基于损失景观指标的诊断方法MD树，并在实验中展示了它相对于经典的基于验证的方法的优势。我们验证了MD树在多种实际场景中的有效性：（1）使用在一个数据集上训练的几个模型来诊断在另一个数据集上训练的模型，本质上是一个少样本数据集转移问题；（2）使用小模型（或使用小数据训练的模型）来诊断大模型（或使用大数据训练的模型），本质上是一个规模转移问题。在一个数据集转移任务中，MD树实现了87.7%的准确率，比基于验证的方法高出14.88%。我们的代码可以在https://github.com/YefanZhou/ModelDiagnosis上找到。

更新时间: 2024-06-24 04:31:17

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.16988v1

Federated Learning for Estimating Heterogeneous Treatment Effects

Machine learning methods for estimating heterogeneous treatment effects (HTE) facilitate large-scale personalized decision-making across various domains such as healthcare, policy making, education, and more. Current machine learning approaches for HTE require access to substantial amounts of data per treatment, and the high costs associated with interventions makes centrally collecting so much data for each intervention a formidable challenge. To overcome this obstacle, in this work, we propose a novel framework for collaborative learning of HTE estimators across institutions via Federated Learning. We show that even under a diversity of interventions and subject populations across clients, one can jointly learn a common feature representation, while concurrently and privately learning the specific predictive functions for outcomes under distinct interventions across institutions. Our framework and the associated algorithm are based on this insight, and leverage tabular transformers to map multiple input data to feature representations which are then used for outcome prediction via multi-task learning. We also propose a novel way of federated training of personalised transformers that can work with heterogeneous input feature spaces. Experimental results on real-world clinical trial data demonstrate the effectiveness of our method.

Updated: 2024-06-24 04:21:33

标题: 联邦学习用于估计异质治疗效应

摘要: 机器学习方法用于估计异质性治疗效果（HTE）有助于在诸如医疗保健、政策制定、教育等各个领域进行大规模个性化决策。当前用于HTE的机器学习方法需要每种治疗的大量数据，并且与干预相关的高成本使得集中收集如此多数据对每种干预来说是一个巨大的挑战。为了克服这一障碍，在这项工作中，我们提出了一个通过联邦学习在机构间协同学习HTE估计器的新框架。我们证明，即使在客户端之间存在多样化的干预和受试者人群，也可以共同学习一个共同的特征表示，同时同时并私密地学习不同干预下不同机构的结果的特定预测函数。我们的框架和相关算法基于这一洞察，并利用表格变换器将多个输入数据映射到特征表示，然后通过多任务学习用于结果预测。我们还提出了一种可以处理异质输入特征空间的个性化变换器的联邦训练的新方法。对真实世界临床试验数据的实验结果证明了我们方法的有效性。

更新时间: 2024-06-24 04:21:33

领域: cs.LG

下载: http://arxiv.org/abs/2402.17705v2

Anomaly Detection of Tabular Data Using LLMs

Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating their ability to identify low-density data regions. For LLMs that are not well aligned with anomaly detection and frequently output factual errors, we apply simple yet effective data-generating processes to simulate synthetic batch-level anomaly detection datasets and propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies. Experiments on a large anomaly detection benchmark (ODDS) showcase i) GPT-4 has on-par performance with the state-of-the-art transductive learning-based anomaly detection methods and ii) the efficacy of our synthetic dataset and fine-tuning strategy in aligning LLMs to this task.

Updated: 2024-06-24 04:17:03

标题: 使用LLMs检测表格数据的异常值

摘要: 大型语言模型（LLMs）展示了它们在长文本理解和数学推理方面的潜力。本文研究了使用LLMs检测表格异常的问题，并展示预训练的LLMs是零射击批处理级异常检测器。也就是说，无需额外的特定分布模型拟合，它们可以发现数据批次中的隐藏异常值，表明它们能够识别低密度数据区域。对于与异常检测不太对齐且经常输出事实错误的LLMs，我们应用简单但有效的数据生成过程来模拟合成批处理级异常检测数据集，并提出了一种端到端微调策略，以发挥LLMs在检测真实异常方面的潜力。在大型异常检测基准（ODDS）上的实验展示了i）GPT-4与最先进的基于传导学习的异常检测方法具有同等性能，以及ii）我们合成数据集和微调策略在使LLMs与该任务对齐方面的有效性。

更新时间: 2024-06-24 04:17:03

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.16308v1

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI.

Updated: 2024-06-24 04:14:26

标题: CTIBench：用于评估网络威胁情报中LLMs的基准测试

摘要: 网络威胁情报（CTI）在当今的网络安全领域至关重要，为了解和缓解不断演变的网络威胁提供了关键见解。最近出现的大型语言模型（LLMs）在这一领域展现出潜力，但对它们的可靠性、准确性和幻觉仍然存在担忧。虽然现有的基准提供了对LLMs的一般评估，但没有基准可以解决CTI特定任务的实际和应用方面。为了弥合这一差距，我们引入了CTIBench，这是一个设计用于评估LLMs在CTI应用中表现的基准。CTIBench包括多个数据集，重点评估LLMs在网络威胁领域所获知识。我们对几种最先进模型在这些任务上的评估，提供了有关它们在CTI环境中的优势和劣势的见解，有助于更好地理解LLMs在CTI中的能力。

更新时间: 2024-06-24 04:14:26

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.07599v2

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Aligning large language models (LLMs) with human preferences is critical for their deployment. Recently, decoding-time alignment has emerged as an effective plug-and-play technique that requires no fine-tuning of model parameters. However, generating text that achieves both high reward and high likelihood remains a significant challenge. Existing methods often fail to generate high-reward text or incur substantial computational costs. In this paper, we propose Cascade Reward Sampling (CARDS) to address both issues, guaranteeing the generation of high-reward and high-likelihood text with significantly low costs. Based on our analysis of reward models (RMs) on incomplete text and our observation that high-reward prefixes induce high-reward complete text, we use rejection sampling to iteratively generate small semantic segments to form such prefixes. The segment length is dynamically determined by the predictive uncertainty of LLMs. This strategy guarantees desirable prefixes for subsequent generations and significantly reduces wasteful token re-generations and the number of reward model scoring. Our experiments demonstrate substantial gains in both generation efficiency and alignment ratings compared to the baselines, achieving five times faster text generation and 99\% win-ties in GPT-4/Claude-3 helpfulness evaluation.

Updated: 2024-06-24 04:08:35

标题: 级联奖励抽样以提高解码时间对齐效率

摘要: 将大型语言模型（LLMs）与人类偏好对齐对于它们的部署至关重要。最近，解码时对齐作为一种有效的即插即用技术已经出现，无需对模型参数进行微调。然而，生成既具有高奖励又具有高可能性的文本仍然是一个重大挑战。现有方法通常无法生成高奖励文本或产生大量的计算成本。在本文中，我们提出了级联奖励采样（CARDS）来解决这两个问题，确保以极低的成本生成既具有高奖励又具有高可能性的文本。基于我们对奖励模型（RMs）在不完整文本上的分析以及我们观察到高奖励前缀引起高奖励完整文本的现象，我们使用拒绝采样来迭代生成小的语义片段以形成这样的前缀。段的长度由LLMs的预测不确定性动态确定。这种策略确保了后续生成的理想前缀，并显著减少了无用的令牌重新生成和奖励模型评分的数量。我们的实验表明，与基线相比，在生成效率和对齐评分方面都取得了实质性的进展，实现了GPT-4/Claude-3的帮助评估中五倍更快的文本生成和99\%的胜负。

更新时间: 2024-06-24 04:08:35

领域: cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.16306v1

On Computing Pairwise Statistics with Local Differential Privacy

We study the problem of computing pairwise statistics, i.e., ones of the form $\binom{n}{2}^{-1} \sum_{i \ne j} f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model. This formulation captures important metrics such as Kendall's $\tau$ coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc. We give several novel and generic algorithms for the problem, leveraging techniques from DP algorithms for linear queries.

Updated: 2024-06-24 04:06:09

标题: 关于使用本地差分隐私计算成对统计量

摘要: 我们研究在本地模型中使用差分隐私（DP）计算成对统计量的问题，即形式为$\binom{n}{2}^{-1} \sum_{i \ne j} f(x_i, x_j)$，其中$x_i$表示第$i$个用户的输入。这种表达捕捉了重要的指标，如肯德尔$\tau$系数、曲线下面积、基尼均差、基尼熵等。我们针对该问题提出了几种新颖且通用的算法，利用了线性查询的DP算法技术。

更新时间: 2024-06-24 04:06:09

领域: cs.DS,cs.CR

下载: http://arxiv.org/abs/2406.16305v1

WeatherQA: Can Multimodal Language Models Reason about Severe Weather?

Severe convective weather events, such as hail, tornadoes, and thunderstorms, often occur quickly yet cause significant damage, costing billions of dollars every year. This highlights the importance of forecasting severe weather threats hours in advance to better prepare meteorologists and residents in at-risk areas. Can modern large foundation models perform such forecasting? Existing weather benchmarks typically focus only on predicting time-series changes in certain weather parameters (e.g., temperature, moisture) with text-only features. In this work, we introduce WeatherQA, the first multimodal dataset designed for machines to reason about complex combinations of weather parameters (a.k.a., ingredients) and predict severe weather in real-world scenarios. The dataset includes over 8,000 (multi-images, text) pairs for diverse severe weather events. Each pair contains rich information crucial for forecasting -- the images describe the ingredients capturing environmental instability, surface observations, and radar reflectivity, and the text contains forecast analyses written by human experts. With WeatherQA, we evaluate state-of-the-art vision language models, including GPT4, Claude3.5, Gemini-1.5, and a fine-tuned Llama3-based VLM, by designing two challenging tasks: (1) multi-choice QA for predicting affected area and (2) classification of the development potential of severe convection. These tasks require deep understanding of domain knowledge (e.g., atmospheric dynamics) and complex reasoning over multimodal data (e.g., interactions between weather parameters). We show a substantial gap between the strongest VLM, GPT4o, and human reasoning. Our comprehensive case study with meteorologists further reveals the weaknesses of the models, suggesting that better training and data integration are necessary to bridge this gap. WeatherQA link: https://github.com/chengqianma/WeatherQA.

Updated: 2024-06-24 03:55:30

标题: WeatherQA：多模态语言模型能否推理出严重天气？

摘要: 严重的对流天气事件，如冰雹、龙卷风和雷暴，通常发生迅速但造成重大破坏，每年造成数十亿美元的损失。这突显了提前几小时预测严重天气威胁的重要性，以更好地为气象学家和处于风险地区的居民做准备。现代大型基础模型能否进行这种预测？现有的天气基准通常只关注于预测特定天气参数（如温度、湿度）的时间序列变化，只使用文本特征。在这项工作中，我们介绍了WeatherQA，这是第一个为机器设计的多模态数据集，用于推理复杂的天气参数（即“成分”）的组合，并在现实场景中预测严重天气。该数据集包括超过8,000个（多图像，文本）对，涵盖多种严重天气事件。每个对包含对预测至关重要的丰富信息——图像描述捕捉环境不稳定性、地表观测和雷达反射率的成分，文本包含由人类专家撰写的预测分析。利用WeatherQA，我们通过设计两个具有挑战性的任务评估了最先进的视觉语言模型，包括GPT4、Claude3.5、Gemini-1.5和基于Llama3的微调。这些任务需要对领域知识（如大气动力学）进行深入理解，并对多模态数据（如天气参数之间的相互作用）进行复杂推理。我们展示了最强大的VLM，GPT4o，与人类推理之间存在实质性差距。我们与气象学家进行了全面的案例研究，进一步揭示了模型的弱点，表明更好的训练和数据整合是必要的，以弥合这一差距。WeatherQA链接：https://github.com/chengqianma/WeatherQA。

更新时间: 2024-06-24 03:55:30

领域: cs.AI,cs.CL,cs.CV,physics.ao-ph

下载: http://arxiv.org/abs/2406.11217v2

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

With the surge in the amount of video data, video summarization techniques, including visual-modal(VM) and textual-modal(TM) summarization, are attracting more and more attention. However, unimodal summarization inevitably loses the rich semantics of the video. In this paper, we focus on a more comprehensive video summarization task named Bimodal Semantic Summarization of Videos (BiSSV). Specifically, we first construct a large-scale dataset, BIDS, in (video, VM-Summary, TM-Summary) triplet format. Unlike traditional processing methods, our construction procedure contains a VM-Summary extraction algorithm aiming to preserve the most salient content within long videos. Based on BIDS, we propose a Unified framework UBiSS for the BiSSV task, which models the saliency information in the video and generates a TM-summary and VM-summary simultaneously. We further optimize our model with a list-wise ranking-based objective to improve its capacity to capture highlights. Lastly, we propose a metric, $NDCG_{MS}$, to provide a joint evaluation of the bimodal summary. Experiments show that our unified framework achieves better performance than multi-stage summarization pipelines. Code and data are available at https://github.com/MeiYutingg/UBiSS.

Updated: 2024-06-24 03:55:25

标题: UBiSS：一种用于视频双模语义摘要的统一框架

摘要: 随着视频数据量的激增，视频摘要技术，包括视觉-文本（VM）和文本-文本（TM）摘要，越来越受到关注。然而，单模态摘要不可避免地会丢失视频的丰富语义。本文关注一个更全面的视频摘要任务，名为视频的双模态语义摘要（BiSSV）。具体地，我们首先构建了一个大规模数据集BIDS，格式为（视频、VM-摘要、TM-摘要）三元组。与传统的处理方法不同，我们的构建过程包含一个旨在保留长视频中最显著内容的VM-摘要提取算法。基于BIDS，我们提出了一个统一框架UBiSS用于BiSSV任务，该框架对视频中的显著信息进行建模，同时生成TM-摘要和VM-摘要。我们进一步优化我们的模型，采用基于列表排序的目标来提高其捕捉亮点的能力。最后，我们提出了一个度量标准$NDCG_{MS}$，以提供双模态摘要的联合评估。实验证明，我们的统一框架比多阶段摘要管道表现更好。代码和数据可在https://github.com/MeiYutingg/UBiSS 上找到。

更新时间: 2024-06-24 03:55:25

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.16301v1

Landscaping Linear Mode Connectivity

The presence of linear paths in parameter space between two different network solutions in certain cases, i.e., linear mode connectivity (LMC), has garnered interest from both theoretical and practical fronts. There has been significant research that either practically designs algorithms catered for connecting networks by adjusting for the permutation symmetries as well as some others that more theoretically construct paths through which networks can be connected. Yet, the core reasons for the occurrence of LMC, when in fact it does occur, in the highly non-convex loss landscapes of neural networks are far from clear. In this work, we take a step towards understanding it by providing a model of how the loss landscape needs to behave topographically for LMC (or the lack thereof) to manifest. Concretely, we present a `mountainside and ridge' perspective that helps to neatly tie together different geometric features that can be spotted in the loss landscape along the training runs. We also complement this perspective by providing a theoretical analysis of the barrier height, for which we provide empirical support, and which additionally extends as a faithful predictor of layer-wise LMC. We close with a toy example that provides further intuition on how barriers arise in the first place, all in all, showcasing the larger aim of the work -- to provide a working model of the landscape and its topography for the occurrence of LMC.

Updated: 2024-06-24 03:53:30

标题: 景观线性模式连接性

摘要: 在线性参数空间中存在两种不同网络解决方案之间的线性路径，在某些情况下，即线性模式连接（LMC），引起了理论和实践方面的兴趣。已经有大量研究，要么实际设计算法以连接网络，通过调整排列对称性，要么更理论地构建路径，通过这些路径可以连接网络。然而，在神经网络的高度非凸损失景观中发生LMC的核心原因，在实际发生时离明确还有很远。在这项工作中，我们通过提供一个模型，探讨了损失景观在地形上需要如何表现出LMC（或缺乏LMC）的问题，向理解LMC迈出了一步。具体地，我们提出了一个“山腰和山脊”的视角，有助于将训练过程中损失景观中可以观察到的不同几何特征清晰地联系在一起。我们还通过提供理论分析屏障高度来补充这一视角，我们提供了经验支持，并且作为层次LMC的可靠预测器。最后，我们通过一个玩具示例展示了障碍如何首次出现的更多直觉，总之，展示了工作的更大目标--为LMC的发生提供一个损失景观及其地形的工作模型。

更新时间: 2024-06-24 03:53:30

领域: cs.LG

下载: http://arxiv.org/abs/2406.16300v1

Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization methods stand out. One uses other weights to compensate existing quantization error, while the other transfers the quantization difficulty to other parts in the model. Combining both merits, we introduce Learnable Singular value Increment (LSI) as an advanced solution. LSI uses Singular Value Decomposition to extract singular values of the weights and make them learnable to help weights compensate each other conditioned on activation. Incorporating LSI with existing techniques, we achieve state-of-the-art performance in diverse quantization settings, no matter in weight-only, weight-activation or extremely low bit scenarios. By unleashing the potential of LSI, efficient finetuning on quantized model is no longer a prohibitive problem.

Updated: 2024-06-24 03:52:52

标题: 补偿量化误差：使权重分层以相互补偿

摘要: 新兴的大型语言模型（LLMs）利用其出色的性能和强大的推理能力来区分传统语言模型。然而，这些LLMs所需的计算资源和存储成本令人震惊，因此量化成为一个热门话题。为了解决量化引起的精度衰减问题，在后训练量化方法中出现了两种方法。一种方法使用其他权重来补偿现有的量化误差，而另一种方法则将量化困难转移到模型的其他部分。结合两者的优点，我们引入了可学习奇异值增量（LSI）作为先进解决方案。LSI使用奇异值分解来提取权重的奇异值，并使它们可学习，以帮助权重在激活条件下相互补偿。将LSI与现有技术结合，我们在各种量化设置中实现了最先进的性能，无论是在仅权重、权重激活还是极低比特情况下。通过释放LSI的潜力，对量化模型进行有效微调不再是一个禁锢的问题。

更新时间: 2024-06-24 03:52:52

领域: cs.CL,cs.AI,F.2.3

下载: http://arxiv.org/abs/2406.16299v1

AI for Equitable Tennis Training: Leveraging AI for Equitable and Accurate Classification of Tennis Skill Levels and Training Phases

Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training systems exist, they are often tailored for professionals and are prohibitively expensive. The present study aims to classify tennis players' skill levels and classify tennis strokes into phases characterized by motion attributes for a future development of an AI-based tennis self-training model for affordable and convenient applications running on devices used in daily life such as an iPhone or an Apple Watch for tennis skill improvement. We collected motion data, including Motion Yaw, Roll and Pitch from inertial measurement units (IMUs) worn by participating junior tennis players. For this pilot study, data from twelve participants were processed using Support Vector Machine (SVM) algorithms. The SVM models demonstrated an overall accuracy of 77% in classifying players as beginners or intermediates, with low rates of false positives and false negatives, effectively distinguishing skill levels. Additionally, the tennis swings were successfully classified into five phases based on the collected motion data. These findings indicate that SVM-based classification can be a reliable foundation for developing an equitable and accessible AI-driven tennis training system.

Updated: 2024-06-24 03:40:41

标题: AI 用于公平的网球训练：利用人工智能公平准确地分类网球技能水平和训练阶段

摘要: 许多研究已经证明了网球的多重好处，例如增强整体身体和心理健康。不幸的是，许多来自低收入家庭的儿童和青少年由于经济约束，无法参与这项运动，主要是由于私人课程费用以及往返于课程和诊所之间的后勤问题。虽然存在几种网球自我训练系统，但它们通常针对专业人士，价格昂贵。本研究旨在对网球运动员的技能水平进行分类，并将网球挥拍分为以运动属性为特征的阶段，以便未来开发基于人工智能的网球自我训练模型，供在日常生活中使用的设备上运行，如iPhone或Apple Watch，以提高网球技能。我们收集了参与初级网球选手佩戴的惯性测量单元（IMUs）的运动数据，包括运动偏航、滚转和俯仰。在这项试点研究中，使用支持向量机（SVM）算法处理了来自十二名参与者的数据。SVM模型在将球员分类为初学者或中级者方面表现出77%的总体准确率，假阳性和假阴性率低，有效区分了技能水平。此外，根据收集到的运动数据，网球挥拍成功地分为五个阶段。这些发现表明，基于SVM的分类可以成为开发公平和可访问的基于人工智能的网球训练系统的可靠基础。

更新时间: 2024-06-24 03:40:41

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2406.16987v1

FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Recent advances in reinforcement learning (RL) heavily rely on a variety of well-designed benchmarks, which provide environmental platforms and consistent criteria to evaluate existing and novel algorithms. Specifically, in multi-agent RL (MARL), a plethora of benchmarks based on cooperative games have spurred the development of algorithms that improve the scalability of cooperative multi-agent systems. However, for the competitive setting, a lightweight and open-sourced benchmark with challenging gaming dynamics and visual inputs has not yet been established. In this work, we present FightLadder, a real-time fighting game platform, to empower competitive MARL research. Along with the platform, we provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics to characterize the performance and exploitability of agents. We demonstrate the feasibility of this platform by training a general agent that consistently defeats 12 built-in characters in single-player mode, and expose the difficulty of training a non-exploitable agent without human knowledge and demonstrations in two-player mode. FightLadder provides meticulously designed environments to address critical challenges in competitive MARL research, aiming to catalyze a new era of discovery and advancement in the field. Videos and code at https://sites.google.com/view/fightladder/home.

Updated: 2024-06-24 03:38:46

标题: 对抗梯：竞争性多智能体强化学习的基准测试

摘要: 最近强化学习（RL）领域的进展严重依赖于各种精心设计的基准测试，这些基准测试提供了环境平台和一致的评估标准，用于评估现有和新颖算法。具体而言，在多智体RL（MARL）中，基于合作游戏的众多基准测试推动了改善合作多智体系统可扩展性的算法的发展。然而，在竞争环境中，尚未建立具有具有挑战性游戏动态和视觉输入的轻量级开源基准测试。在这项工作中，我们提出了FightLadder，一个实时格斗游戏平台，以推动竞争性MARL研究。除了平台外，我们还提供了针对竞争游戏的最先进MARL算法的实现，以及一组评估指标，用于描述代理的性能和可利用性。我们通过训练一个通用代理，该代理在单人模式下始终击败12个内置角色，展示了该平台的可行性，并揭示了在双人模式下训练一个不可利用代理的困难性，没有人类知识和示范。FightLadder提供了精心设计的环境，以解决竞争性MARL研究中的关键挑战，旨在催生该领域的新发现和进步。视频和代码请访问https://sites.google.com/view/fightladder/home。

更新时间: 2024-06-24 03:38:46

领域: cs.MA,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.02081v2

Relaxing Continuous Constraints of Equivariant Graph Neural Networks for Physical Dynamics Learning

Incorporating Euclidean symmetries (e.g. rotation equivariance) as inductive biases into graph neural networks has improved their generalization ability and data efficiency in unbounded physical dynamics modeling. However, in various scientific and engineering applications, the symmetries of dynamics are frequently discrete due to the boundary conditions. Thus, existing GNNs either overlook necessary symmetry, resulting in suboptimal representation ability, or impose excessive equivariance, which fails to generalize to unobserved symmetric dynamics. In this work, we propose a general Discrete Equivariant Graph Neural Network (DEGNN) that guarantees equivariance to a given discrete point group. Specifically, we show that such discrete equivariant message passing could be constructed by transforming geometric features into permutation-invariant embeddings. Through relaxing continuous equivariant constraints, DEGNN can employ more geometric feature combinations to approximate unobserved physical object interaction functions. Two implementation approaches of DEGNN are proposed based on ranking or pooling permutation-invariant functions. We apply DEGNN to various physical dynamics, ranging from particle, molecular, crowd to vehicle dynamics. In twenty scenarios, DEGNN significantly outperforms existing state-of-the-art approaches. Moreover, we show that DEGNN is data efficient, learning with less data, and can generalize across scenarios such as unobserved orientation.

Updated: 2024-06-24 03:37:51

标题: 放松等变图神经网络的连续约束，用于物理动力学学习

摘要: 将欧几里德对称性（如旋转等变性）作为归纳偏好引入到图神经网络中，提高了它们在无界物理动力学建模中的泛化能力和数据效率。然而，在各种科学和工程应用中，动力学的对称性通常是由于边界条件而离散的。因此，现有的GNN要么忽视必要的对称性，导致表示能力不佳，要么施加过多的等变性，无法泛化到未观测到的对称动态。在这项工作中，我们提出了一种通用的离散等变图神经网络（DEGNN），它保证对给定的离散点群等变性。具体来说，我们展示了这种离散等变消息传递可以通过将几何特征转化为置换不变嵌入来构建。通过放宽连续等变性约束，DEGNN可以利用更多的几何特征组合来逼近未观测到的物理对象交互函数。基于排名或池化置换不变函数的两种DEGNN实现方法被提出。我们将DEGNN应用于各种物理动力学，从粒子、分子、人群到车辆动态。在二十个场景中，DEGNN明显优于现有的最先进方法。此外，我们展示了DEGNN具有数据效率，学习所需数据更少，并且可以泛化到未观察到的方向等情景。

更新时间: 2024-06-24 03:37:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16295v1

LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments

Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs. However, it remains unclear how well LLMs can function as few-shot or zero-shot embodied agents in dynamic interactive environments. To address this gap, we introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds. Compared with previous LLM-based testbeds, LangSuitE (i) offers adaptability to diverse environments without multiple simulation engines, (ii) evaluates agents' capacity to develop ``internalized world knowledge'' with embodied observations, and (iii) allows easy customization of communication and action strategies. To address the embodiment challenge, we devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information. Comprehensive benchmark results illustrate challenges and insights of embodied planning. LangSuitE represents a significant step toward building embodied generalists in the context of language models.

Updated: 2024-06-24 03:36:29

标题: LangSuitE：在具有体现文本环境的大型语言模型中进行规划、控制和交互

摘要: 最近大型语言模型（LLMs）的进展展示了在构建依赖语言描述作为输入的自主代理方面取得了令人鼓舞的成就。然而，目前尚不清楚LLMs在少量或零量训练的动态交互环境中能够有效地发挥作用。为了弥补这一空白，我们引入了LangSuitE，一个多功能且无需模拟的测试平台，其中包含6个代表性的文本交互任务在文本化的实体世界中。与以往基于LLM的测试平台相比，LangSuitE（i）能够适应多样的环境而无需多个模拟引擎，（ii）评估代理的能力以实体观察发展“内化的世界知识”，并且（iii）允许轻松定制沟通和行动策略。为了应对实体化挑战，我们设计了一个新颖的思维链（CoT）模式，EmMem，它总结了关于历史信息的实体状态。全面的基准结果展示了实体规划的挑战和见解。LangSuitE代表了在语言模型背景下构建实体通才的重要一步。

更新时间: 2024-06-24 03:36:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16294v1

Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels

Traditional supervised learning heavily relies on human-annotated datasets, especially in data-hungry neural approaches. However, various tasks, especially multi-label tasks like document-level relation extraction, pose challenges in fully manual annotation due to the specific domain knowledge and large class sets. Therefore, we address the multi-label positive-unlabelled learning (MLPUL) problem, where only a subset of positive classes is annotated. We propose Mixture Learner for Partially Annotated Classification (MLPAC), an RL-based framework combining the exploration ability of reinforcement learning and the exploitation ability of supervised learning. Experimental results across various tasks, including document-level relation extraction, multi-label image classification, and binary PU learning, demonstrate the generalization and effectiveness of our framework.

Updated: 2024-06-24 03:36:19

标题: 将监督学习和强化学习相结合，用于带有部分标签的多标签分类任务

摘要: 传统的监督学习在数据密集的神经方法中严重依赖于人工标注的数据集。然而，各种任务，特别是像文档级关系提取这样的多标签任务，由于特定领域知识和大类集合，对完全手动标注提出了挑战。因此，我们解决了多标签正-无标记学习（MLPUL）问题，其中只有一部分正类别被标注。我们提出了部分标注分类的混合学习器（MLPAC），这是一个基于RL的框架，结合了强化学习的探索能力和监督学习的利用能力。跨多种任务的实验结果，包括文档级关系提取、多标签图像分类和二进制PU学习，证明了我们框架的泛化和有效性。

更新时间: 2024-06-24 03:36:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16293v1

ZeroG: Investigating Cross-dataset Zero-shot Transferability in Graphs

With the development of foundation models such as large language models, zero-shot transfer learning has become increasingly significant. This is highlighted by the generative capabilities of NLP models like GPT-4, and the retrieval-based approaches of CV models like CLIP, both of which effectively bridge the gap between seen and unseen data. In the realm of graph learning, the continuous emergence of new graphs and the challenges of human labeling also amplify the necessity for zero-shot transfer learning, driving the exploration of approaches that can generalize across diverse graph data without necessitating dataset-specific and label-specific fine-tuning. In this study, we extend such paradigms to zero-shot transferability in graphs by introducing ZeroG, a new framework tailored to enable cross-dataset generalization. Addressing the inherent challenges such as feature misalignment, mismatched label spaces, and negative transfer, we leverage a language model to encode both node attributes and class semantics, ensuring consistent feature dimensions across datasets. We also propose a prompt-based subgraph sampling module that enriches the semantic information and structure information of extracted subgraphs using prompting nodes and neighborhood aggregation, respectively. We further adopt a lightweight fine-tuning strategy that reduces the risk of overfitting and maintains the zero-shot learning efficacy of the language model. The results underscore the effectiveness of our model in achieving significant cross-dataset zero-shot transferability, opening pathways for the development of graph foundation models. Codes and data are available at https://github.com/NineAbyss/ZeroG.

Updated: 2024-06-24 03:34:02

标题: ZeroG：研究图中跨数据集零迁移的可行性

摘要: 随着大型语言模型等基础模型的发展，零-shot迁移学习变得日益重要。这一点在类似GPT-4的自然语言处理模型的生成能力以及类似CLIP的计算机视觉模型的基于检索的方法中得到了突出体现，这两者有效地弥合了已知和未知数据之间的差距。在图学习领域，新图的不断涌现以及人工标记的挑战也增加了零-shot迁移学习的必要性，推动了对能够在各种图数据之间进行泛化的方法的探索，而不需要特定数据集和标签的微调。在这项研究中，我们通过引入ZeroG，一个新的框架，将这种范式扩展到图中的零-shot可迁移性，旨在实现跨数据集的泛化。我们解决了固有的挑战，例如特征不对齐、标签空间不匹配和负迁移，利用语言模型来编码节点属性和类语义，确保跨数据集的一致特征维度。我们还提出了一个基于提示的子图采样模块，使用提示节点和邻域聚合分别丰富了提取的子图的语义信息和结构信息。我们进一步采用了一种轻量级的微调策略，降低了过拟合的风险，并保持了语言模型的零-shot学习效果。结果强调了我们的模型在实现显著的跨数据集零-shot可迁移性方面的有效性，为图基础模型的发展开辟了道路。代码和数据可在https://github.com/NineAbyss/ZeroG获得。

更新时间: 2024-06-24 03:34:02

领域: cs.LG

下载: http://arxiv.org/abs/2402.11235v2

Accurately Classifying Out-Of-Distribution Data in Facial Recognition

Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data ("out-of-distribution data") which is different from data in the training distribution("in-distribution"). This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. This may result in a model returning confidently wrong decisions and predictions. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data? We approach this problem by incorporating the Outlier Exposure model and investigate how the model's performance changes when other datasets of facial images were implemented. We observe that the accuracy and other metrics of the model can be increased by applying Outlier Exposure, incorporating a trainable weight parameter to increase the machine's emphasis on outlier images, and by re-weighting the importance of different class labels. We also experimented with whether sorting the images and determining outliers via image features would have more of an effect on the metrics than sorting by average pixel value. Our goal was to make models not only more accurate but also more fair by scanning a more expanded range of images. We also tested the datasets in reverse order to see whether a more fair dataset with balanced features has an effect on the model's accuracy.

Updated: 2024-06-24 03:19:39

标题: 准确分类面部识别中的外部分布数据

摘要: 标准分类理论假设测试集和训练集中图像的分布是相同的。不幸的是，现实生活中通常存在不可见数据（“离群数据”），这些数据与训练分布中的数据不同（“内部分布”）。这个问题在社会正义问题中最为普遍，来自代表性不足群体的数据可能出现在测试数据中，而在训练数据中却没有对应的比例。这可能导致模型返回自信地错误决策和预测。我们感兴趣的问题是：当神经网络同时在多个内部分布数据集上进行训练时，它能否提高对离群数据面部图像的性能？我们通过引入异常暴露模型来解决这个问题，并调查当其他面部图像数据集被应用时，模型的性能如何变化。我们观察到，通过应用异常暴露，引入可训练的权重参数以增加机器对离群图像的重视，并重新调整不同类别标签的重要性，可以提高模型的准确性和其他指标。我们还尝试通过对图像进行排序并通过图像特征确定离群值，看是否比通过平均像素值进行排序更有效。我们的目标是使模型不仅更准确，而且更公平，通过扫描更广泛的图像范围。我们还按相反顺序测试数据集，以查看具有平衡特征的更公平数据集是否对模型的准确性有影响。

更新时间: 2024-06-24 03:19:39

领域: cs.CV,cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.03876v2

Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. To this end, we propose the Approximate Backpropagation (Approx-BP) theory, which provides the theoretical feasibility of decoupling the forward and backward passes. We apply our Approx-BP theory to backpropagation training and derive memory-efficient alternatives of GELU and SiLU activation functions, which use derivative functions of ReLUs in the backward pass while keeping their forward pass unchanged. In addition, we introduce a Memory-Sharing Backpropagation strategy, which enables the activation memory to be shared by two adjacent layers, thereby removing activation memory usage redundancy. Our method neither induces extra computation nor reduces training efficiency. We conduct extensive experiments with pretrained vision and language models, and the results demonstrate that our proposal can reduce up to $\sim$$30\%$ of the peak memory usage. Our code is released at https://github.com/yyyyychen/LowMemoryBP.

Updated: 2024-06-24 03:09:15

标题: 通过近似和内存共享反向传播减少微调内存开销

摘要: 将预训练的大型模型微调到下游任务是一个重要的问题，然而由于大规模参数导致的巨大内存开销使得这一过程困难重重。本文致力于从激活函数和层归一化的角度减少微调过程中的内存开销。为此，我们提出了近似反向传播（Approx-BP）理论，该理论提供了将前向传播和后向传播解耦的理论可行性。我们将Approx-BP理论应用于反向传播训练，并推导出GELU和SiLU激活函数的内存高效替代方案，这些替代方案在后向传播中使用ReLUs的导数函数，同时保持其前向传播不变。此外，我们引入了一种内存共享反向传播策略，该策略使得激活内存可以被两个相邻层共享，从而消除了激活内存使用冗余。我们的方法既不会引入额外的计算，也不会降低训练效率。我们对预训练的视觉和语言模型进行了大量实验，结果表明我们的提议可以将峰值内存使用量减少高达约30％。我们的代码已发布在https://github.com/yyyyychen/LowMemoryBP。

更新时间: 2024-06-24 03:09:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16282v1

UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction

Urban spatio-temporal prediction is crucial for informed decision-making, such as traffic management, resource optimization, and emergence response. Despite remarkable breakthroughs in pretrained natural language models that enable one model to handle diverse tasks, a universal solution for spatio-temporal prediction remains challenging Existing prediction approaches are typically tailored for specific spatio-temporal scenarios, requiring task-specific model designs and extensive domain-specific training data. In this study, we introduce UniST, a universal model designed for general urban spatio-temporal prediction across a wide range of scenarios. Inspired by large language models, UniST achieves success through: (i) utilizing diverse spatio-temporal data from different scenarios, (ii) effective pre-training to capture complex spatio-temporal dynamics, (iii) knowledge-guided prompts to enhance generalization capabilities. These designs together unlock the potential of building a universal model for various scenarios Extensive experiments on more than 20 spatio-temporal scenarios demonstrate UniST's efficacy in advancing state-of-the-art performance, especially in few-shot and zero-shot prediction. The datasets and code implementation are released on https://github.com/tsinghua-fib-lab/UniST.

Updated: 2024-06-24 02:51:27

标题: UniST：一种促进城市时空预测的通用模型

摘要: 城市时空预测对于决策制定至关重要，例如交通管理、资源优化和应急响应。尽管预训练的自然语言模型取得了显著突破，使一个模型能够处理多样化的任务，但对于时空预测的通用解决方案仍然具有挑战性。现有的预测方法通常针对特定的时空场景进行定制，需要特定任务的模型设计和大量领域特定的训练数据。本研究介绍了UniST，这是一个针对各种场景的通用城市时空预测模型。受大型语言模型的启发，UniST通过以下方式取得成功：(i)利用不同场景的多样化时空数据，(ii)有效的预训练以捕捉复杂的时空动态，(iii)知识引导的提示以增强泛化能力。这些设计共同释放了构建各种场景通用模型的潜力。对超过20个时空场景的广泛实验显示了UniST在推动最新技术性能方面的有效性，特别是在少样本和零样本预测方面。数据集和代码实现已发布在https://github.com/tsinghua-fib-lab/UniST。

更新时间: 2024-06-24 02:51:27

领域: cs.LG

下载: http://arxiv.org/abs/2402.11838v4

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

With large language models (LLMs) achieving remarkable breakthroughs in natural language processing (NLP) domains, LLM-enhanced recommender systems have received much attention and have been actively explored currently. In this paper, we focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks. First and foremost, we identify and formulate the lifelong sequential behavior incomprehension problem for LLMs in recommendation domains, i.e., LLMs fail to extract useful information from a textual context of long user behavior sequence, even if the length of context is far from reaching the context limitation of LLMs. To address such an issue and improve the recommendation performance of LLMs, we propose a novel framework, namely Retrieval-enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings. For zero-shot recommendation, we perform semantic user behavior retrieval (SUBR) to improve the data quality of testing samples, which greatly reduces the difficulty for LLMs to extract the essential knowledge from user behavior sequences. As for few-shot recommendation, we further design retrieval-enhanced instruction tuning (ReiT) by adopting SUBR as a data augmentation technique for training samples. Specifically, we develop a mixed training dataset consisting of both the original data samples and their retrieval-enhanced counterparts. We conduct extensive experiments on three real-world public datasets to demonstrate the superiority of ReLLa compared with existing baseline models, as well as its capability for lifelong sequential behavior comprehension. To be highlighted, with only less than 10% training samples, few-shot ReLLa can outperform traditional CTR models that are trained on the entire training set (e.g., DCNv2, DIN, SIM). The code is available \url{https://github.com/LaVieEnRose365/ReLLa}.

Updated: 2024-06-24 02:44:28

标题: ReLLa：用于推荐中终身序列行为理解的检索增强型大型语言模型

摘要: 随着大型语言模型（LLMs）在自然语言处理（NLP）领域取得显著突破，LLM增强的推荐系统已经引起了广泛关注，并目前正在积极探索。在本文中，我们专注于为零样本和少样本推荐任务调整和增强纯大型语言模型。首先，我们确定并阐述了LLMs在推荐领域中的终身顺序行为不理解问题，即，即使上下文的长度远未达到LLMs的上下文限制，LLMs也无法从长用户行为序列的文本上下文中提取有用信息。为了解决这个问题并提高LLMs的推荐性能，我们提出了一个新颖的框架，即检索增强的大型语言模型（ReLLa）用于零样本和少样本设置的推荐任务。对于零样本推荐，我们执行语义用户行为检索（SUBR）以提高测试样本的数据质量，从而大大降低了LLMs从用户行为序列中提取基本知识的难度。至于少样本推荐，我们进一步设计了检索增强指导调整（ReiT），通过采用SUBR作为训练样本的数据增强技术。具体来说，我们开发了一个混合训练数据集，包括原始数据样本和它们的检索增强对应样本。我们在三个真实世界的公共数据集上进行了大量实验，以证明ReLLa相对于现有基线模型的优越性，以及其对终身顺序行为理解的能力。值得强调的是，仅使用不到10%的训练样本，少样本ReLLa就可以胜过在整个训练集上训练的传统CTR模型（例如DCNv2，DIN，SIM）。代码可在\url{https://github.com/LaVieEnRose365/ReLLa}上找到。

更新时间: 2024-06-24 02:44:28

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2308.11131v5

Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by T2I DMs miss key objects mentioned in the prompt. We first conduct an empirical study on this issue, exploring the prevalence of catastrophic-neglect, potential mitigation strategies with feature enhancement, and the insights gained. Guided by the empirical findings, we propose an automated repair approach named Patcher to address catastrophic-neglect in T2I DMs. Specifically, Patcher first determines whether there are any neglected objects in the prompt, and then applies attention-guided feature enhancement to these neglected objects, resulting in a repaired prompt. Experimental results on three versions of Stable Diffusion demonstrate that Patcher effectively repairs the issue of catastrophic-neglect, achieving 10.1%-16.3% higher Correct Rate in image generation compared to baselines.

Updated: 2024-06-24 02:38:30

标题: 通过注意力引导的特征增强修复文本到图像扩散模型中的灾难性忽视

摘要: 文本到图像扩散模型（T2I DMs）因其能够从文本描述中生成高质量图像而受到广泛关注。然而，这些模型经常生成与输入提示不完全一致的图像，导致语义不一致。在这些语义不一致中，最突出的问题是灾难性忽略，即T2I DMs生成的图像缺少提示中提到的关键对象。我们首先对这个问题进行了实证研究，探讨了灾难性忽略的普遍性、具有特征增强的潜在缓解策略以及所获得的见解。在实证发现的指导下，我们提出了一个名为Patcher的自动修复方法，以解决T2I DMs中的灾难性忽略问题。具体而言，Patcher首先确定提示中是否有任何被忽略的对象，然后对这些被忽略的对象应用注意力引导的特征增强，从而生成一个修复后的提示。对三个版本的稳定扩散的实验结果表明，Patcher有效修复了灾难性忽略问题，在图像生成中的正确率比基线提高了10.1%-16.3%。

更新时间: 2024-06-24 02:38:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16272v1

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment

Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) to human preferences at the time of decoding. BoN sampling is susceptible to a problem known as reward hacking. Because the reward model is an imperfect proxy for the true objective, over-optimizing its value can compromise its performance on the true objective. A common solution to prevent reward hacking in preference learning techniques is to optimize a reward using proximity regularization (e.g., KL regularization), which ensures that the language model remains close to the reference model. In this research, we propose Regularized Best-of-N (RBoN), a variant of BoN that aims to mitigate reward hacking by incorporating a proximity term in response selection, similar to preference learning techniques. We evaluate RBoN on the AlpacaFarm and Anthropic's hh-rlhf datasets and find that it outperforms BoN. As an application of RBoN, we use RBoN to generate a pairwise preference learning dataset. Experimental results show that a DPO model trained on a dataset generated with RBoN outperforms a DPO model generated with vanilla BoN. Our code is available at https://github.com/CyberAgentAILab/regularized-bon

Updated: 2024-06-24 02:31:06

标题: 规范化的最佳N抽样以减轻语言模型对齐的奖励欺骗

摘要: 最佳-N（BoN）采样结合奖励模型已被证明是一种有效的策略，用于在解码时将大型语言模型（LLMs）与人类偏好对齐。BoN采样容易受到一种被称为奖励欺骗的问题的影响。由于奖励模型是真实目标的一个不完美代理，过度优化其值可能会影响其在真实目标上的表现。防止偏好学习技术中的奖励欺骗的常见解决方案是使用接近正则化（例如，KL正则化）来优化奖励，以确保语言模型保持接近参考模型。在这项研究中，我们提出了正则化最佳-N（RBoN），这是一种旨在通过在响应选择中加入接近项来减轻奖励欺骗的BoN变体，类似于偏好学习技术。我们在AlpacaFarm和Anthropic的hh-rlhf数据集上评估了RBoN，并发现它优于BoN。作为RBoN的应用，我们使用RBoN生成了一个成对偏好学习数据集。实验结果显示，使用RBoN生成的数据集训练的DPO模型优于使用普通BoN生成的DPO模型。我们的代码可在https://github.com/CyberAgentAILab/regularized-bon中找到。

更新时间: 2024-06-24 02:31:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01054v3

Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams

Identifying heavy hitters and estimating the frequencies of flows are fundamental tasks in various network domains. Existing approaches to this challenge can broadly be categorized into two groups, hashing-based and competing-counter-based. The Count-Min sketch is a standard example of a hashing-based algorithm, and the Space Saving algorithm is an example of a competing-counter algorithm. Recent works have explored the use of machine learning to enhance algorithms for frequency estimation problems, under the algorithms with prediction framework. However, these works have focused solely on the hashing-based approach, which may not be best for identifying heavy hitters. In this paper, we present the first learned competing-counter-based algorithm, called LSS, for identifying heavy hitters, top k, and flow frequency estimation that utilizes the well-known Space Saving algorithm. We provide theoretical insights into how and to what extent our approach can improve upon Space Saving, backed by experimental results on both synthetic and real-world datasets. Our evaluation demonstrates that LSS can enhance the accuracy and efficiency of Space Saving in identifying heavy hitters, top k, and estimating flow frequencies.

Updated: 2024-06-24 02:31:00

标题: 基于学习的流量中的重要数据包识别和流频率估计

摘要: 识别重要节点和估计流量频率是各种网络领域中的基本任务。解决这一挑战的现有方法可以大致分为两类，基于哈希和基于竞争计数器。 Count-Min草图是基于哈希的算法的标准示例，Space Saving算法是竞争计数器算法的一个示例。最近的研究探讨了利用机器学习来增强频率估计问题算法的方法，在带有预测框架的算法下。然而，这些研究仅专注于基于哈希的方法，这可能不是最适合识别重要节点的方法。在本文中，我们提出了第一个学习基于竞争计数器的算法，称为LSS，用于识别重要节点、前k个节点和流量频率估计，利用了著名的Space Saving算法。我们提供了理论见解，说明我们的方法如何以及在多大程度上能够改善Space Saving算法，支持对合成和真实世界数据集的实验结果。我们的评估表明，LSS可以提高Space Saving在识别重要节点、前k个节点和估计流量频率方面的准确性和效率。

更新时间: 2024-06-24 02:31:00

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2406.16270v1

EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models

Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data. To this end, many knowledge editing approaches for LLMs have emerged -- aiming to subtly inject/edit updated knowledge or adjust undesired behavior while minimizing the impact on unrelated inputs. Nevertheless, due to significant differences among various knowledge editing methods and the variations in task setups, there is no standard implementation framework available for the community, which hinders practitioners from applying knowledge editing to applications. To address these issues, we propose EasyEdit, an easy-to-use knowledge editing framework for LLMs. It supports various cutting-edge knowledge editing approaches and can be readily applied to many well-known LLMs such as T5, GPT-J, LlaMA, etc. Empirically, we report the knowledge editing results on LlaMA-2 with EasyEdit, demonstrating that knowledge editing surpasses traditional fine-tuning in terms of reliability and generalization. We have released the source code on GitHub, along with Google Colab tutorials and comprehensive documentation for beginners to get started. Besides, we present an online system for real-time knowledge editing, and a demo video.

Updated: 2024-06-24 02:17:57

标题: EasyEdit：一种用于大型语言模型的易于使用的知识编辑框架

摘要: 大型语言模型（LLMs）通常存在知识截断或谬误问题，这意味着它们对未见事件不知情或生成具有不正确事实的文本，原因是数据过时/嘈杂。为此，出现了许多针对LLMs的知识编辑方法，旨在微妙地注入/编辑更新的知识或调整不希望的行为，同时最大程度地减小对无关输入的影响。然而，由于各种知识编辑方法之间存在显著差异，以及任务设置的变化，社区中没有标准的实现框架可用，这阻碍了从业者将知识编辑应用于应用程序。为了解决这些问题，我们提出了EasyEdit，一个易于使用的LLMs知识编辑框架。它支持各种前沿的知识编辑方法，并可以轻松应用于许多知名的LLMs，如T5、GPT-J、LlaMA等。从实证上讲，我们使用EasyEdit在LlaMA-2上报告了知识编辑结果，证明知识编辑在可靠性和泛化方面优于传统的微调。我们已在GitHub上发布了源代码，以及适用于初学者的Google Colab教程和全面的文档。此外，我们提供了一个用于实时知识编辑的在线系统和一个演示视频。

更新时间: 2024-06-24 02:17:57

领域: cs.CL,cs.AI,cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2308.07269v3

EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models

In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at https://github.com/zjunlp/EasyInstruct, along with an online demo app and a demo video for quick-start, calling for broader research centered on instruction data and synthetic data.

Updated: 2024-06-24 02:10:23

标题: EasyInstruct：用于大型语言模型的易于使用的指令处理框架

摘要: 近年来，指令调优越来越受到关注，并成为提升大型语言模型（LLMs）能力的关键技术。为了构建高质量的指令数据集，许多指令处理方法被提出，旨在实现数据数量和数据质量之间的微妙平衡。然而，由于不同指令处理方法之间存在的不一致性，社区中尚无标准的开源指令处理实现框架可用，这阻碍了从业者进一步开发和推进。为了促进指令处理研究和发展，我们提出了EasyInstruct，一种易于使用的LLMs指令处理框架，模块化地生成、选择和提示指令，并考虑它们的组合和互动。EasyInstruct已在https://github.com/zjunlp/EasyInstruct上公开发布并得到积极维护，同时还提供在线演示应用程序和演示视频供快速入门，呼吁广泛开展以指令数据和合成数据为中心的研究。

更新时间: 2024-06-24 02:10:23

领域: cs.CL,cs.AI,cs.HC,cs.IR,cs.LG

下载: http://arxiv.org/abs/2402.03049v4

One Thousand and One Pairs: A "novel" challenge for long-context language models

Synthetic long-context LLM benchmarks (e.g., "needle-in-the-haystack") test only surface-level retrieval capabilities, but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs? We address this question by creating NoCha, a dataset of 1,001 minimally different pairs of true and false claims about 67 recently-published English fictional books, written by human readers of those books. In contrast to existing long-context benchmarks, our annotators confirm that the largest share of pairs in NoCha require global reasoning over the entire book to verify. Our experiments show that while human readers easily perform this task, it is enormously challenging for all ten long-context LLMs that we evaluate: no open-weight model performs above random chance (despite their strong performance on synthetic benchmarks), while GPT-4o achieves the highest accuracy at 55.8%. Further analysis reveals that (1) on average, models perform much better on pairs that require only sentence-level retrieval vs. global reasoning; (2) model-generated explanations for their decisions are often inaccurate even for correctly-labeled claims; and (3) models perform substantially worse on speculative fiction books that contain extensive world-building. The methodology proposed in NoCha allows for the evolution of the benchmark dataset and the easy analysis of future models.

Updated: 2024-06-24 02:03:57

标题: 一千零一对：长篇语言模型的“新颖”挑战

摘要: 合成长上下文LLM基准（例如“大海捞针”）仅测试表面级别的检索能力，但是长上下文LLM能够多好地检索、综合和推理全书长度的输入信息呢？我们通过创建NoCha数据集来回答这个问题，该数据集包含了1,001对真实和虚假论断，涉及67本最近出版的英文小说书籍，这些论断由这些书籍的人类读者编写。与现有的长上下文基准相比，我们的注释员确认NoCha中最大的一部分对需要全书范围的推理来验证。我们的实验表明，虽然人类读者很容易完成这项任务，但对我们评估的所有十个长上下文LLMs来说，这是极具挑战性的：没有一个开放权重模型能够超过随机概率（尽管它们在合成基准上表现出色），而GPT-4o的准确率最高，达到55.8%。进一步分析发现：（1）平均而言，模型在仅需要句级检索而非全局推理的对中表现要好得多；（2）模型对其决策的生成解释通常对于标记正确的论断来说也是不准确的；以及（3）模型在包含大量世界构建的幻想小说书籍上表现明显较差。NoCha提出的方法允许基准数据集的发展和未来模型的简易分析。

更新时间: 2024-06-24 02:03:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16264v1

Video-Infinity: Distributed Long Video Generation

Diffusion models have recently achieved remarkable results for video generation. Despite the encouraging performances, the generated videos are typically constrained to a small number of frames, resulting in clips lasting merely a few seconds. The primary challenges in producing longer videos include the substantial memory requirements and the extended processing time required on a single GPU. A straightforward solution would be to split the workload across multiple GPUs, which, however, leads to two issues: (1) ensuring all GPUs communicate effectively to share timing and context information, and (2) modifying existing video diffusion models, which are usually trained on short sequences, to create longer videos without additional training. To tackle these, in this paper we introduce Video-Infinity, a distributed inference pipeline that enables parallel processing across multiple GPUs for long-form video generation. Specifically, we propose two coherent mechanisms: Clip parallelism and Dual-scope attention. Clip parallelism optimizes the gathering and sharing of context information across GPUs which minimizes communication overhead, while Dual-scope attention modulates the temporal self-attention to balance local and global contexts efficiently across the devices. Together, the two mechanisms join forces to distribute the workload and enable the fast generation of long videos. Under an 8 x Nvidia 6000 Ada GPU (48G) setup, our method generates videos up to 2,300 frames in approximately 5 minutes, enabling long video generation at a speed 100 times faster than the prior methods.

Updated: 2024-06-24 01:56:12

标题: Video-Infinity: 分布式长视频生成

摘要: 最近，扩散模型在视频生成方面取得了显著的成果。尽管表现令人鼓舞，但生成的视频通常受限于少量帧数，导致片段仅持续几秒钟。生成更长视频的主要挑战包括巨大的内存需求和在单个GPU上需要的延长处理时间。一个简单的解决方案是在多个GPU之间分配工作量，然而，这会带来两个问题：(1)确保所有GPU有效地通信以共享时间和上下文信息，以及(2)修改现有的视频扩散模型，这些模型通常是在短序列上训练的，以在没有额外训练的情况下生成更长的视频。为了解决这些问题，在本文中我们介绍了Video-Infinity，一个分布式推理管道，可以实现跨多个GPU的并行处理，用于生成长格式视频。具体来说，我们提出了两种相互一致的机制：片段并行和双范围注意力。片段并行优化了跨GPU间的上下文信息的收集和共享，从而最小化了通信开销，而双范围注意力调节了时间自注意力，有效平衡了设备间的本地和全局上下文。这两种机制共同协作，分担工作量，并实现快速生成长视频。在一个8 x Nvidia 6000 Ada GPU（48G）配置下，我们的方法在大约5分钟内生成长达2,300帧的视频，使长视频生成速度比先前方法快100倍。

更新时间: 2024-06-24 01:56:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.16260v1

User Story Tutor (UST) to Support Agile Software Developers

User Stories record what must be built in projects that use agile practices. User Stories serve both to estimate effort, generally measured in Story Points, and to plan what should be done in a Sprint. Therefore, it is essential to train software engineers on how to create simple, easily readable, and comprehensive User Stories. For that reason, we designed, implemented, applied, and evaluated a web application called User Story Tutor (UST). UST checks the description of a given User Story for readability, and if needed, recommends appropriate practices for improvement. UST also estimates a User Story effort in Story Points using Machine Learning techniques. As such UST may support the continuing education of agile development teams when writing and reviewing User Stories. UST's ease of use was evaluated by 40 agile practitioners according to the Technology Acceptance Model (TAM) and AttrakDiff. The TAM evaluation averages were good in almost all considered variables. Application of the AttrakDiff evaluation framework produced similar good results. Apparently, UST can be used with good reliability. Applying UST to assist in the construction of User Stories is a viable technique that, at the very least, can be used by agile developments to complement and enhance current User Story creation.

Updated: 2024-06-24 01:55:01

标题: 用户故事辅导员（UST）以支持敏捷软件开发者

摘要: 用户故事记录了在使用敏捷实践的项目中必须构建的内容。用户故事旨在估计工作量，通常以故事点为单位衡量，并计划在一个迭代中应该完成的任务。因此，培训软件工程师如何创建简单、易读和全面的用户故事至关重要。出于这个原因，我们设计、实施、应用和评估了一个名为用户故事导师（UST）的网络应用程序。UST检查给定用户故事的描述是否易读，并在需要时推荐改进的适当实践。UST还利用机器学习技术估计用户故事的工作量。因此，UST可以在编写和审查用户故事时支持敏捷开发团队的继续教育。40名敏捷从业者根据技术接受模型（TAM）和AttrakDiff对UST的易用性进行评估。TAM评估的平均值在几乎所有考虑的变量中都很好。应用AttrakDiff评估框架产生了类似的良好结果。显然，UST可以以良好的可靠性使用。使用UST来辅助编写用户故事是一种可行的技术，至少可以被敏捷开发用来补充和增强当前的用户故事创建。

更新时间: 2024-06-24 01:55:01

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.16259v1

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. Instead of inferring the complete human behavior characteristics, MEReQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions. It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function. Extensive evaluations on simulated and real-world tasks demonstrate that MEReQ achieves sample-efficient policy alignment from human intervention.

Updated: 2024-06-24 01:51:09

标题: MEReQ：最大熵残差Q逆强化学习用于来自干预的样本高效对齐

摘要: 将机器人行为与人类偏好保持一致对于在以人为中心的环境中部署具有实体的人工智能代理至关重要。一个有前途的解决方案是通过人类干预进行互动模仿学习，即一个人类专家观察政策的执行并提供干预作为反馈。然而，现有方法常常未能有效利用先前的政策来促进学习，从而阻碍了样本效率。在这项工作中，我们介绍了MEReQ（最大熵残差Q逆强化学习），旨在通过人类干预实现样本高效的对齐。MEReQ不是推断完整的人类行为特征，而是推断捕捉人类专家与先前政策之间奖励函数差异的残差奖励函数。然后，它利用残差Q学习（RQL）使用这个残差奖励函数来将政策与人类偏好保持一致。对模拟和真实世界任务的广泛评估表明，MEReQ实现了通过人类干预进行样本高效的政策对齐。

更新时间: 2024-06-24 01:51:09

领域: cs.RO,cs.AI,cs.LG,I.2.6; I.2.9

下载: http://arxiv.org/abs/2406.16258v1

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Machine unlearning is the process of efficiently removing the influence of a training data instance from a trained machine learning model without retraining it from scratch. A popular subclass of unlearning approaches is exact machine unlearning, which focuses on techniques that explicitly guarantee the removal of the influence of a data instance from a model. Exact unlearning approaches use a machine learning model in which individual components are trained on disjoint subsets of the data. During deletion, exact unlearning approaches only retrain the affected components rather than the entire model. While existing approaches reduce retraining costs, it can still be expensive for an organization to retrain a model component as it requires halting a system in production, which leads to service failure and adversely impacts customers. To address these challenges, we introduce an exact unlearning framework -- Sequence-aware Sharded Sliced Training (S3T), designed to enhance the deletion capabilities of an exact unlearning system while minimizing the impact on model's performance. At the core of S3T, we utilize a lightweight parameter-efficient fine-tuning approach that enables parameter isolation by sequentially training layers with disjoint data slices. This enables efficient unlearning by simply deactivating the layers affected by data deletion. Furthermore, to reduce the retraining cost and improve model performance, we train the model on multiple data sequences, which allows S3T to handle an increased number of deletion requests. Both theoretically and empirically, we demonstrate that S3T attains superior deletion capabilities and enhanced performance compared to baselines across a wide range of settings.

Updated: 2024-06-24 01:45:13

标题: 朝向可伸缩的精确机器遗忘：使用参数高效微调

摘要: 机器遗忘是一种有效地从经过训练的机器学习模型中移除训练数据实例的过程，而无需从头开始重新训练。一种流行的遗忘方法子类是准确的机器遗忘，它专注于明确保证从模型中移除数据实例影响的技术。准确的遗忘方法使用一个机器学习模型，其中各个组件在数据的不相交子集上进行训练。在删除过程中，准确的遗忘方法仅重新训练受影响的组件，而不是整个模型。虽然现有方法减少了重新训练成本，但对于组织来说仍然可能昂贵，因为重新训练模型组件需要停止生产系统，导致服务故障并对客户产生不利影响。为了解决这些挑战，我们引入了一个准确的遗忘框架--序列感知分片训练（S3T），旨在增强准确的遗忘系统的删除能力，同时最大程度地减少对模型性能的影响。在S3T的核心，我们利用了一种轻量级参数高效的微调方法，通过按顺序训练具有不相交数据切片的层来实现参数隔离。这使得通过简单地停用受数据删除影响的层来实现高效遗忘。此外，为了减少重新训练成本并提高模型性能，我们在多个数据序列上对模型进行训练，这使得S3T能够处理更多的删除请求。在理论和实证方面，我们证明了与各种设置下的基线相比，S3T实现了卓越的删除能力和增强的性能。

更新时间: 2024-06-24 01:45:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.16257v1

Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce the effectiveness of unlearning. Addressing these limitations, we introduce Mini-Unlearning, a novel approach that capitalizes on a critical observation: unlearned parameters correlate with retrained parameters through contraction mapping. Our method, Mini-Unlearning, utilizes a minimal subset of historical gradients and leverages this contraction mapping to facilitate scalable, efficient unlearning. This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks. Our experiments demonstrate that Mini-Unlearning not only works under higher unlearning ratios but also outperforms existing techniques in both accuracy and security, offering a promising solution for applications requiring robust unlearning capabilities.

Updated: 2024-06-24 01:43:30

标题: 使用最小梯度依赖的机器取消学习，适用于高取消学习比率

摘要: 在机器遗忘的背景下，主要挑战在于有效地从经过训练的模型中移除私人数据的痕迹，同时保持模型性能和对隐私攻击（如成员推断攻击）的安全性。传统的基于梯度的遗忘方法通常依赖于大量的历史梯度，这在高遗忘比率下变得不切实际，并可能降低遗忘的有效性。为了解决这些限制，我们引入了Mini-Unlearning，一种新颖的方法，利用了一个关键观察：未学习的参数与重新训练的参数之间通过收缩映射相关联。我们的方法，Mini-Unlearning，利用了一个最小的历史梯度子集，并利用这个收缩映射来促进可扩展、高效的遗忘。这种轻量级、可扩展的方法显著提高了模型的准确性，并增强了对成员推断攻击的抵抗力。我们的实验证明，Mini-Unlearning不仅在更高的遗忘比率下有效，而且在准确性和安全性方面优于现有技术，为需要强大遗忘能力的应用提供了一个有前途的解决方案。

更新时间: 2024-06-24 01:43:30

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.16986v1

Intrinsic LoRA: A Generalist Approach for Discovering Knowledge in Generative Models

Generative models excel at creating images that closely mimic real scenes, suggesting they inherently encode scene representations. We introduce Intrinsic LoRA (I-LoRA), a general approach that uses Low-Rank Adaptation (LoRA) to discover scene intrinsics such as normals, depth, albedo, and shading from a wide array of generative models. I-LoRA is lightweight, adding minimally to the model's parameters and requiring very small datasets for this knowledge discovery. Our approach, applicable to Diffusion models, GANs, and Autoregressive models alike, generates intrinsics using the same output head as the original images. Through control experiments, we establish a correlation between the generative model's quality and the extracted intrinsics' accuracy. Finally, scene intrinsics obtained by our method with just hundreds to thousands of labeled images, perform on par with those from supervised methods trained on millions of labeled examples.

Updated: 2024-06-24 01:42:55

标题: 内在的LoRA：一种在生成模型中发现知识的通用方法

摘要: 生成模型在创建紴似真实场景的图像方面表现出色，这表明它们固有地编码了场景表示。我们引入了内在LoRA（I-LoRA），这是一种通用方法，利用低秩适应（LoRA）从各种生成模型中发现场景内在因素，如法线、深度、反照率和阴影。I-LoRA是轻量级的，对模型的参数增加很小，需要非常小的数据集来进行这种知识发现。我们的方法适用于扩散模型、GANs和自回归模型，使用相同的输出头部生成内在因素和原始图像。通过控制实验，我们建立了生成模型质量和提取的内在因素准确性之间的相关性。最后，通过我们的方法仅使用数百至数千张标记图像获得的场景内在因素，表现与经过数百万标记示例训练的监督方法相当。

更新时间: 2024-06-24 01:42:55

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2311.17137v2

Uncertainty-Aware Reward-Free Exploration with General Function Approximation

Mastering multiple tasks through exploration and learning in an environment poses a significant challenge in reinforcement learning (RL). Unsupervised RL has been introduced to address this challenge by training policies with intrinsic rewards rather than extrinsic rewards. However, current intrinsic reward designs and unsupervised RL algorithms often overlook the heterogeneous nature of collected samples, thereby diminishing their sample efficiency. To overcome this limitation, in this paper, we propose a reward-free RL algorithm called \alg. The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment and an uncertainty-weighted learning process to handle heterogeneous uncertainty in different samples. Theoretically, we show that in order to find an $\epsilon$-optimal policy, GFA-RFE needs to collect $\tilde{O} (H^2 \log N_{\mathcal F} (\epsilon) \mathrm{dim} (\mathcal F) / \epsilon^2 )$ number of episodes, where $\mathcal F$ is the value function class with covering number $N_{\mathcal F} (\epsilon)$ and generalized eluder dimension $\mathrm{dim} (\mathcal F)$. Such a result outperforms all existing reward-free RL algorithms. We further implement and evaluate GFA-RFE across various domains and tasks in the DeepMind Control Suite. Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.

Updated: 2024-06-24 01:37:18

标题: 不确定性感知的不带奖励的探索与一般函数逼近

摘要: 通过在环境中探索和学习掌握多个任务在强化学习（RL）中提出了一个重大挑战。无监督RL已经被引入来解决这一挑战，通过训练具有内在奖励而非外在奖励的策略。然而，当前的内在奖励设计和无监督RL算法经常忽视收集样本的异质性，从而降低它们的样本效率。为了克服这一限制，在本文中，我们提出了一种称为\alg 的无奖励RL算法。我们算法背后的关键思想是一种用于探索环境的不确定性感知内在奖励，以及一个处理不同样本中异质性不确定性的不确定性加权学习过程。理论上，我们展示了为了找到一个$\epsilon$-最优策略，GFA-RFE需要收集$\tilde{O} (H^2 \log N_{\mathcal F} (\epsilon) \mathrm{dim} (\mathcal F) / \epsilon^2 )$个剧集，其中$\mathcal F$是具有覆盖数$N_{\mathcal F} (\epsilon)$和广义eluder维度$\mathrm{dim} (\mathcal F)$的值函数类。这样的结果胜过所有现有的无奖励RL算法。我们进一步在DeepMind控制套件的各种领域和任务中实施和评估GFA-RFE。实验结果表明，GFA-RFE的表现优于或与最先进的无监督RL算法的表现相媲美。

更新时间: 2024-06-24 01:37:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16255v1

Confidence Regulation Neurons in Language Models

Despite their widespread use, the mechanisms by which large language models (LLMs) represent and regulate uncertainty in next-token predictions remain largely unexplored. This study investigates two critical components believed to influence this uncertainty: the recently discovered entropy neurons and a new set of components that we term token frequency neurons. Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits. Our work shows that entropy neurons operate by writing onto an unembedding null space, allowing them to impact the residual stream norm with minimal direct effect on the logits themselves. We observe the presence of entropy neurons across a range of models, up to 7 billion parameters. On the other hand, token frequency neurons, which we discover and describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution. Finally, we present a detailed case study where entropy neurons actively manage confidence in the setting of induction, i.e. detecting and continuing repeated subsequences.

Updated: 2024-06-24 01:31:03

标题: 语言模型中的信心调节神经元

摘要: 尽管大型语言模型（LLMs）被广泛使用，但它们代表和调节下一个标记预测中的不确定性的机制仍然大部分未被探索。本研究调查了两个被认为影响这种不确定性的关键组件：最近发现的熵神经元和我们称之为标记频率神经元的新组件。熵神经元的特征是具有异常高的权重范数，并影响最终层归一化（LayerNorm）比例因子，以有效地缩小对数。我们的工作表明，熵神经元通过编写到一个非嵌入空间，使其能够影响残差流范数，对对数本身的直接影响最小。我们观察到在各种模型中存在熵神经元，达到70亿个参数。另一方面，我们首次发现并描述的标记频率神经元根据其对数频率比例调整每个标记的对数，从而将输出分布向或远离unigram分布移动。最后，我们提供了一个详细的案例研究，其中熵神经元在诱导设置中积极管理信心，即检测和继续重复子序列。

更新时间: 2024-06-24 01:31:03

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.16254v1

Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling

Even though Non-rigid Structure-from-Motion (NRSfM) has been extensively studied and great progress has been made, there are still key challenges that hinder their broad real-world applications: 1) the inherent motion/rotation ambiguity requires either explicit camera motion recovery with extra constraint or complex Procrustean Alignment; 2) existing low-rank modeling of the global shape can over-penalize drastic deformations in the 3D shape sequence. This paper proposes to resolve the above issues from a spatial-temporal modeling perspective. First, we propose a novel Temporally-smooth Procrustean Alignment module that estimates 3D deforming shapes and adjusts the camera motion by aligning the 3D shape sequence consecutively. Our new alignment module remedies the requirement of complex reference 3D shape during alignment, which is more conductive to non-isotropic deformation modeling. Second, we propose a spatial-weighted approach to enforce the low-rank constraint adaptively at different locations to accommodate drastic spatially-variant deformation reconstruction better. Our modeling outperform existing low-rank based methods, and extensive experiments across different datasets validate the effectiveness of our method.

Updated: 2024-06-24 01:30:48

标题: 非刚性运动结构：时间平滑的普罗克鲁斯特对准和空间变形建模

摘要: 尽管非刚性结构运动(NRSfM)已经被广泛研究并取得了巨大进展，但仍然存在一些关键挑战，限制了它们在广泛实际应用中的应用：1)固有的运动/旋转模糊要求要么明确地恢复摄像机运动并增加额外约束，要么进行复杂的Procrustean对齐；2)现有的全局形状的低秩建模可能会过度惩罚3D形状序列中的剧烈变形。本文提出从空间-时间建模的角度解决上述问题。首先，我们提出了一个新颖的时间平滑Procrustean对齐模块，该模块估计3D变形形状并通过连续对齐3D形状序列来调整摄像机运动。我们的新对齐模块弥补了在对齐过程中对复杂参考3D形状的要求，这有利于非各向异性变形建模。其次，我们提出了一种空间加权方法，以在不同位置自适应地强制执行低秩约束，以更好地适应剧烈空间变形重建。我们的建模优于现有基于低秩的方法，并通过不同数据集上的广泛实验验证了我们方法的有效性。

更新时间: 2024-06-24 01:30:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04309v2

Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis

Health monitoring systems have revolutionized modern healthcare by enabling the continuous capture of physiological and behavioral data, essential for preventive measures and early health intervention. While integrating this data with Large Language Models (LLMs) has shown promise in delivering interactive health advice, traditional methods like Retrieval-Augmented Generation (RAG) and fine-tuning often fail to fully utilize the complex, multi-dimensional, and temporally relevant data from wearable devices. These conventional approaches typically provide limited actionable and personalized health insights due to their inadequate capacity to dynamically integrate and interpret diverse health data streams. In response, this paper introduces a graph-augmented LLM framework designed to significantly enhance the personalization and clarity of health insights. Utilizing a hierarchical graph structure, the framework captures inter and intra-patient relationships, enriching LLM prompts with dynamic feature importance scores derived from a Random Forest Model. The effectiveness of this approach is demonstrated through a sleep analysis case study involving 20 college students during the COVID-19 lockdown, highlighting the potential of our model to generate actionable and personalized health insights efficiently. We leverage another LLM to evaluate the insights for relevance, comprehensiveness, actionability, and personalization, addressing the critical need for models that process and interpret complex health data effectively. Our findings show that augmenting prompts with our framework yields significant improvements in all 4 criteria. Through our framework, we can elicit well-crafted, more thoughtful responses tailored to a specific patient.

Updated: 2024-06-24 01:22:54

标题: 图形增强LLMs用于个性化健康洞察：睡眠分析案例研究

摘要: 健康监测系统通过实现对生理和行为数据的持续捕获，为预防措施和早期健康干预提供了必不可少的支持，从而彻底改变了现代医疗保健。将这些数据与大型语言模型（LLMs）集成在一起已经显示出在提供互动健康建议方面具有潜力，然而传统方法如检索增强生成（RAG）和微调通常未能充分利用可穿戴设备中复杂、多维和时间相关的数据。这些传统方法通常由于无法动态集成和解释各种健康数据流而提供有限的可行和个性化的健康见解。为此，本文介绍了一种图增强的LLM框架，旨在显著提升个性化和清晰度的健康见解。该框架利用分层图结构捕获患者之间和内部的关系，将LLM提示与来自随机森林模型的动态特征重要性得分相结合，从而丰富LLM提示。通过对20名大学生在COVID-19封锁期间进行的睡眠分析案例研究来展示这种方法的有效性，突显了我们的模型在有效生成可行和个性化的健康见解方面的潜力。我们利用另一个LLM来评估这些见解的相关性、全面性、可行性和个性化程度，解决了处理和解释复杂健康数据的有效模型的迫切需求。我们的研究结果表明，通过我们的框架增强提示在所有四个标准上都取得了显著的改善。通过我们的框架，我们可以引出为特定患者量身定制的精心制作的、更深思熟虑的回应。

更新时间: 2024-06-24 01:22:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16252v1

An Optimal Tightness Bound for the Simulation Lemma

We present a bound for value-prediction error with respect to model misspecification that is tight, including constant factors. This is a direct improvement of the "simulation lemma," a foundational result in reinforcement learning. We demonstrate that existing bounds are quite loose, becoming vacuous for large discount factors, due to the suboptimal treatment of compounding probability errors. By carefully considering this quantity on its own, instead of as a subcomponent of value error, we derive a bound that is sub-linear with respect to transition function misspecification. We then demonstrate broader applicability of this technique, improving a similar bound in the related subfield of hierarchical abstraction.

Updated: 2024-06-24 01:09:33

标题: 模拟引理的最优紧密度界限

摘要: 我们提出了一个关于模型错误的值预测误差的边界，这个边界是紧凑的，包括常数因子。这是对“模拟引理”的直接改进，后者是强化学习中的基础结果。我们证明现有的边界相当松散，对于大折扣因子来说变得无用，这是由于对概率误差的复合处理不佳。通过仔细考虑这一数量本身，而不是作为值误差的子组件，我们推导出一个关于转移函数错误的次线性边界。然后我们展示了这种技术的更广泛适用性，改进了相关子领域中类似边界的技术。

更新时间: 2024-06-24 01:09:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.16249v1

ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge Graphs

In the midst of widespread misinformation and disinformation through social media and the proliferation of AI-generated texts, it has become increasingly difficult for people to validate and trust information they encounter. Many fact-checking approaches and tools have been developed, but they often lack appropriate explainability or granularity to be useful in various contexts. A text validation method that is easy to use, accessible, and can perform fine-grained evidence attribution has become crucial. More importantly, building user trust in such a method requires presenting the rationale behind each prediction, as research shows this significantly influences people's belief in automated systems. Localizing and bringing users' attention to the specific problematic content is also paramount, instead of providing simple blanket labels. In this paper, we present ClaimVer, a human-centric framework tailored to meet users' informational and verification needs by generating rich annotations and thereby reducing cognitive load. Designed to deliver comprehensive evaluations of texts, it highlights each claim, verifies it against a trusted knowledge graph (KG), presents the evidence, and provides succinct, clear explanations for each claim prediction. Finally, our framework introduces an attribution score, enhancing applicability across a wide range of downstream tasks.

Updated: 2024-06-24 01:08:24

标题: ClaimVer：通过知识图谱解释文本的可解释性声明级验证和证据归因

摘要: 在社交媒体上普遍存在误导性和不实信息以及人工智能生成的文本大量增加的情况下，人们越来越难以验证和信任他们所遇到的信息。许多事实核查方法和工具已经开发出来，但它们通常缺乏适当的可解释性或细粒度，无法在各种情境中使用。一种易于使用、可访问且能够进行细粒度证据归因的文本验证方法变得至关重要。更重要的是，在这样一种方法中建立用户信任需要呈现每个预测背后的理由，研究表明这显著影响人们对自动化系统的信任。将用户的注意力局限在特定有问题的内容上也是至关重要的，而不是提供简单的总体标签。在本文中，我们提出了ClaimVer，一个以人为中心的框架，旨在满足用户的信息和验证需求，通过生成丰富的注释来减少认知负担。设计用于提供文本的全面评估，它突出显示每个声明，针对可信知识图谱（KG）对其进行验证，呈现证据，并为每个声明预测提供简明清晰的解释。最后，我们的框架引入了一个归因分数，增强了在各种下游任务中的适用性。

更新时间: 2024-06-24 01:08:24

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2403.09724v2

Source-Free Domain Adaptation with Diffusion-Guided Source Data Generation

This paper introduces a novel approach to leverage the generalizability of Diffusion Models for Source-Free Domain Adaptation (DM-SFDA). Our proposed DMSFDA method involves fine-tuning a pre-trained text-to-image diffusion model to generate source domain images using features from the target images to guide the diffusion process. Specifically, the pre-trained diffusion model is fine-tuned to generate source samples that minimize entropy and maximize confidence for the pre-trained source model. We then use a diffusion model-based image mixup strategy to bridge the domain gap between the source and target domains. We validate our approach through comprehensive experiments across a range of datasets, including Office-31 [39], Office-Home [48], and VisDA [35]. The results demonstrate significant improvements in SFDA performance, highlighting the potential of diffusion models in generating contextually relevant, domain-specific images.

Updated: 2024-06-24 00:37:16

标题: 无源域自适应与扩散引导的源数据生成

摘要: 本文介绍了一种新颖的方法，利用扩散模型进行无源域自适应（DM-SFDA）的泛化能力。我们提出的DMSFDA方法涉及微调预训练的文本到图像扩散模型，以生成源域图像，使用目标图像的特征来指导扩散过程。具体来说，预训练的扩散模型被微调以生成最小化熵并最大化预训练源模型置信度的源样本。然后，我们使用基于扩散模型的图像混合策略来弥合源域和目标域之间的领域差距。我们通过在一系列数据集上进行全面实验来验证我们的方法，包括Office-31 [39]、Office-Home [48]和VisDA [35]。结果显示，在SFDA性能方面取得了显著改进，突出了扩散模型在生成上下文相关的、领域特定图像方面的潜力。

更新时间: 2024-06-24 00:37:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.04929v2

Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language Models

Modern blockchain, such as Ethereum, supports the deployment and execution of so-called smart contracts, autonomous digital programs with significant value of cryptocurrency. Executing smart contracts requires gas costs paid by users, which define the limits of the contract's execution. Logic vulnerabilities in smart contracts can lead to financial losses, and are often the root cause of high-impact cyberattacks. Our objective is threefold: (i) empirically investigate logic vulnerabilities in real-world smart contracts extracted from code changes on GitHub, (ii) introduce Soley, an automated method for detecting logic vulnerabilities in smart contracts, leveraging Large Language Models (LLMs), and (iii) examine mitigation strategies employed by smart contract developers to address these vulnerabilities in real-world scenarios. We obtained smart contracts and related code changes from GitHub. To address the first and third objectives, we qualitatively investigated available logic vulnerabilities using an open coding method. We identified these vulnerabilities and their mitigation strategies. For the second objective, we extracted various logic vulnerabilities, applied preprocessing techniques, and implemented and trained the proposed Soley model. We evaluated Soley along with the performance of various LLMs and compared the results with the state-of-the-art baseline on the task of logic vulnerability detection. From our analysis, we identified nine novel logic vulnerabilities, extending existing taxonomies with these vulnerabilities. Furthermore, we introduced several mitigation strategies extracted from observed developer modifications in real-world scenarios. Our Soley method outperforms existing methods in automatically identifying logic vulnerabilities. Interestingly, the efficacy of LLMs in this task was evident without requiring extensive feature engineering.

Updated: 2024-06-24 00:15:18

标题: Soley：使用大型语言模型识别和自动检测以太坊智能合约中的逻辑漏洞

摘要: 现代区块链，如以太坊，支持部署和执行所谓的智能合约，这是具有重要加密货币价值的自主数字程序。执行智能合约需要用户支付的燃气费用，这定义了合约执行的限制。智能合约中的逻辑漏洞可能导致财务损失，并且通常是高影响网络攻击的根本原因。我们的目标是三方面：（i）从GitHub的代码更改中提取现实世界智能合约，经验性地调查逻辑漏洞；（ii）引入Soley，一种利用大型语言模型（LLMs）检测智能合约中逻辑漏洞的自动化方法；（iii）检查智能合约开发人员在现实场景中采用的应对这些漏洞的缓解策略。我们从GitHub获取了智能合约和相关代码更改。为了解决第一和第三个目标，我们采用开放编码方法对可用逻辑漏洞进行了定性调查。我们确定了这些漏洞及其缓解策略。对于第二个目标，我们提取了各种逻辑漏洞，应用了预处理技术，并实施和训练了提出的Soley模型。我们评估了Soley以及各种LLMs的性能，并将结果与逻辑漏洞检测任务的现有基线进行了比较。根据我们的分析，我们确定了九种新的逻辑漏洞，通过这些漏洞扩展了现有分类法。此外，我们从观察到的开发人员在现实场景中的修改中提取了几种缓解策略。我们的Soley方法在自动识别逻辑漏洞方面胜过了现有方法。有趣的是，在这项任务中，LLMs的有效性显而易见，无需进行大量特征工程。

更新时间: 2024-06-24 00:15:18

领域: cs.ET,cs.CR

下载: http://arxiv.org/abs/2406.16244v1