Arxiv Day: Article

SHARP-Net: A Refined Pyramid Network for Deficiency Segmentation in Culverts and Sewer Pipes

This paper introduces Semantic Haar-Adaptive Refined Pyramid Network (SHARP-Net), a novel architecture for semantic segmentation. SHARP-Net integrates a bottom-up pathway featuring Inception-like blocks with varying filter sizes (3x3$ and 5x5), parallel max-pooling, and additional spatial detection layers. This design captures multi-scale features and fine structural details. Throughout the network, depth-wise separable convolutions are used to reduce complexity. The top-down pathway of SHARP-Net focuses on generating high-resolution features through upsampling and information fusion using $1\times1$ and $3\times3$ depth-wise separable convolutions. We evaluated our model using our developed challenging Culvert-Sewer Defects dataset and the benchmark DeepGlobe Land Cover dataset. Our experimental evaluation demonstrated the base model's (excluding Haar-like features) effectiveness in handling irregular defect shapes, occlusions, and class imbalances. It outperformed state-of-the-art methods, including U-Net, CBAM U-Net, ASCU-Net, FPN, and SegFormer, achieving average improvements of 14.4% and 12.1% on the Culvert-Sewer Defects and DeepGlobe Land Cover datasets, respectively, with IoU scores of 77.2% and 70.6%. Additionally, the training time was reduced. Furthermore, the integration of carefully selected and fine-tuned Haar-like features enhanced the performance of deep learning models by at least 20%. The proposed SHARP-Net, incorporating Haar-like features, achieved an impressive IoU of 94.75%, representing a 22.74% improvement over the base model. These features were also applied to other deep learning models, showing a 35.0% improvement, proving their versatility and effectiveness. SHARP-Net thus provides a powerful and efficient solution for accurate semantic segmentation in challenging real-world scenarios.

Updated: 2024-08-02 23:55:04

标题: SHARP-Net: 一种用于排水管道和下水道缺陷分割的优化金字塔网络

摘要: 本文介绍了语义Haar自适应精细金字塔网络（SHARP-Net），这是一种用于语义分割的新型架构。SHARP-Net将自下而上的途径与具有不同滤波器尺寸（3x3和5x5）的Inception样式块、并行最大池化和额外的空间检测层相结合。这种设计捕捉了多尺度特征和精细的结构细节。在整个网络中，采用了深度可分离卷积以减少复杂性。SHARP-Net的自上而下的路径专注于通过上采样和使用$1\times1$和$3\times3$深度可分离卷积进行信息融合生成高分辨率特征。我们使用我们开发的具有挑战性的Culvert-Sewer Defects数据集和基准DeepGlobe Land Cover数据集对我们的模型进行了评估。我们的实验评估表明，基础模型（不包括Haar样式特征）在处理不规则缺陷形状、遮挡和类别不平衡方面非常有效。它胜过了包括U-Net、CBAM U-Net、ASCU-Net、FPN和SegFormer在内的最先进方法，在Culvert-Sewer Defects和DeepGlobe Land Cover数据集上分别取得了14.4%和12.1%的平均改进，IoU得分分别为77.2%和70.6%。此外，训练时间也减少了。此外，精心挑选和微调的Haar样式特征的整合至少提高了深度学习模型的性能20%。所提出的SHARP-Net，结合Haar样式特征，实现了94.75%的令人印象深刻的IoU，比基础模型提高了22.74%。这些特性还应用于其他深度学习模型，显示出35.0%的改进，证明了它们的多功能性和有效性。因此，SHARP-Net为在具有挑战性的真实场景中进行准确的语义分割提供了强大而高效的解决方案。

更新时间: 2024-08-02 23:55:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.08879v1

Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives

Deep learning has gained significant attention in remote sensing, especially in pixel- or patch-level applications. Despite initial attempts to integrate deep learning into object-based image analysis (OBIA), its full potential remains largely unexplored. In this article, as OBIA usage becomes more widespread, we conducted a comprehensive review and expansion of its task subdomains, with or without the integration of deep learning. Furthermore, we have identified and summarized five prevailing strategies to address the challenge of deep learning's limitations in directly processing unstructured object data within OBIA, and this review also recommends some important future research directions. Our goal with these endeavors is to inspire more exploration in this fascinating yet overlooked area and facilitate the integration of deep learning into OBIA processing workflows.

Updated: 2024-08-02 23:54:02

标题: 深度学习与目标物体影像分析相遇：任务、挑战、策略和展望

摘要: 深度学习在遥感领域引起了广泛关注，特别是在像素或补丁级别的应用中。尽管最初尝试将深度学习整合到基于对象的图像分析（OBIA）中，但其全部潜力仍未被充分探索。在本文中，随着OBIA的使用变得更加普遍，我们对其任务子领域进行了全面审查和扩展，无论是否整合了深度学习。此外，我们还确定并总结了解决深度学习在直接处理OBIA中的非结构化对象数据方面的挑战的五种主流策略，本审查还推荐了一些重要的未来研究方向。我们的目标是通过这些努力激发更多对这个令人着迷但被忽视的领域的探索，并促进深度学习与OBIA处理工作流程的整合。

更新时间: 2024-08-02 23:54:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.01607v1

CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models

We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabilities: automated social engineering, scaling manual offensive cyber operations, and autonomous offensive cyber operations. In this paper we discuss applying these benchmarks to the Llama 3 models and a suite of contemporaneous state-of-the-art LLMs, enabling us to contextualize risks both with and without mitigations in place.

Updated: 2024-08-02 23:47:27

标题: 《CYBERSECEVAL 3：推动大型语言模型中网络安全风险和能力评估的进展》

摘要: 我们发布了一套新的LLMs安全基准，CYBERSECEVAL 3，以继续讨论经验性地衡量LLM网络安全风险和能力。CYBERSECEVAL 3评估了两个广泛类别中的8种不同风险：对第三方的风险，以及对应用程序开发人员和最终用户的风险。与先前的工作相比，我们增加了侧重于攻击性安全能力的新领域：自动社会工程、扩展手动攻击性网络操作和自主攻击性网络操作。在本文中，我们讨论将这些基准应用于Llama 3模型和一套同时代最先进的LLMs，使我们能够在有或没有缓解措施的情况下对风险进行上下文化。

更新时间: 2024-08-02 23:47:27

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2408.01605v1

FIVB ranking: Misstep in the right direction

This work uses a statistical framework to present and evaluate the ranking algorithm that has been used by F\'ed\'eration Internationale de Volleyball (FIVB) since 2020. The salient feature of the FIVB ranking is the use of the probabilistic model, which explicitly calculates the probabilities of the games to come. This explicit modeling is new in the context of official ranking, and we study the optimality of its parameters as well as its relationship with the ranking algorithm as such. The analysis is carried out using both analytical and numerical methods. We conclude that, from the modeling perspective, the use of the home-field advantage (HFA) would be beneficial and that the weighting of the game results is counterproductive. Regarding the algorithm itself, we explain the rationale beyond the approximations currently used and explain how to find new parameters which improve the performance. Finally, we propose a new model that drastically simplifies both the implementation and interpretation of the resulting algorithm.

Updated: 2024-08-02 23:46:55

标题: FIVB排名：正确方向上的失误

摘要: 这项工作利用统计框架介绍和评估自2020年以来由国际排球联合会（FIVB）使用的排名算法。FIVB排名的显著特点是使用概率模型，明确计算未来比赛的概率。这种明确建模在官方排名的背景下是新的，我们研究了其参数的最优性以及与排名算法本身的关系。分析采用了分析和数值方法。我们得出结论，从建模的角度来看，利用主场优势（HFA）将是有益的，而对比赛结果进行加权是逆生产的。关于算法本身，我们解释了当前使用的近似的基本原理，并解释如何找到改进性能的新参数。最后，我们提出了一个新模型，极大地简化了生成算法的实现和解释。

更新时间: 2024-08-02 23:46:55

领域: cs.LG

下载: http://arxiv.org/abs/2408.01603v1

A General-Purpose Device for Interaction with LLMs

This paper investigates integrating large language models (LLMs) with advanced hardware, focusing on developing a general-purpose device designed for enhanced interaction with LLMs. Initially, we analyze the current landscape, where virtual assistants and LLMs are reshaping human-technology interactions, highlighting pivotal advancements and setting the stage for a new era of intelligent hardware. Despite substantial progress in LLM technology, a significant gap exists in hardware development, particularly concerning scalability, efficiency, affordability, and multimodal capabilities. This disparity presents both challenges and opportunities, underscoring the need for hardware that is not only powerful but also versatile and capable of managing the sophisticated demands of modern computation. Our proposed device addresses these needs by emphasizing scalability, multimodal data processing, enhanced user interaction, and privacy considerations, offering a comprehensive platform for LLM integration in various applications.

Updated: 2024-08-02 23:43:29

标题: 一个用于与LLM互动的通用设备

摘要: 本文调查了将大型语言模型（LLMs）与先进硬件集成的可能性，重点是开发一种通用设备，旨在提高与LLMs的互动性。首先，我们分析了当前的情况，虚拟助手和LLMs正在重塑人机交互，突出了重要的进展，并为智能硬件的新时代奠定了基础。尽管LLM技术取得了实质性进展，但在硬件开发方面存在重大差距，特别是在可扩展性、效率、可负担性和多模态功能方面。这种不平衡既带来挑战，也带来机会，强调了不仅需要强大的硬件，还需要多功能和能够管理现代计算需求的硬件。我们提出的设备通过强调可扩展性、多模态数据处理、增强用户互动和隐私考虑，满足了这些需求，为在各种应用中集成LLMs提供了全面的平台。

更新时间: 2024-08-02 23:43:29

领域: cs.AR,cs.AI,cs.CL,cs.HC,cs.RO

下载: http://arxiv.org/abs/2408.10230v1

Physics-Informed Geometry-Aware Neural Operator

Engineering design problems often involve solving parametric Partial Differential Equations (PDEs) under variable PDE parameters and domain geometry. Recently, neural operators have shown promise in learning PDE operators and quickly predicting the PDE solutions. However, training these neural operators typically requires large datasets, the acquisition of which can be prohibitively expensive. To overcome this, physics-informed training offers an alternative way of building neural operators, eliminating the high computational costs associated with Finite Element generation of training data. Nevertheless, current physics-informed neural operators struggle with limitations, either in handling varying domain geometries or varying PDE parameters. In this research, we introduce a novel method, the Physics-Informed Geometry-Aware Neural Operator (PI-GANO), designed to simultaneously generalize across both PDE parameters and domain geometries. We adopt a geometry encoder to capture the domain geometry features, and design a novel pipeline to integrate this component within the existing DCON architecture. Numerical results demonstrate the accuracy and efficiency of the proposed method.

Updated: 2024-08-02 23:11:42

标题: 基于物理信息的几何感知神经算子

摘要: 工程设计问题通常涉及在可变PDE参数和域几何下解决参数化偏微分方程（PDEs）。最近，神经操作符在学习PDE操作符并快速预测PDE解决方案方面表现出潜力。然而，训练这些神经操作符通常需要大型数据集，其获取成本可能过高。为了克服这一问题，物理信息训练提供了一种构建神经操作符的替代方法，消除了与有限元生成训练数据相关的高计算成本。然而，目前的物理信息神经操作符在处理不同的域几何或不同的PDE参数方面存在困难。在这项研究中，我们引入了一种新颖的方法，即具有几何感知的物理信息神经操作符（PI-GANO），旨在同时泛化PDE参数和域几何。我们采用几何编码器来捕获域几何特征，并设计了一种新颖的流程来将这个组件集成到现有的DCON架构中。数值结果展示了所提出方法的准确性和效率。

更新时间: 2024-08-02 23:11:42

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2408.01600v1

Emergent Representations of Program Semantics in Language Models Trained on Programs

We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of programs written in a domain-specific language for navigating 2D grid world environments. Each program in the corpus is preceded by a (partial) specification in the form of several input-output grid world states. Despite providing no further inductive biases, we find that a probing classifier is able to extract increasingly accurate representations of the unobserved, intermediate grid world states from the LM hidden states over the course of training, suggesting the LM acquires an emergent ability to interpret programs in the formal sense. We also develop a novel interventional baseline that enables us to disambiguate what is represented by the LM as opposed to learned by the probe. We anticipate that this technique may be generally applicable to a broad range of semantic probing experiments. In summary, this paper does not propose any new techniques for training LMs of code, but develops an experimental framework for and provides insights into the acquisition and representation of formal semantics in statistical models of code. Our code is available at https://github.com/charlesjin/emergent-semantics.

Updated: 2024-08-02 23:09:32

标题: 在编程语言模型中训练的程序语义的紧急表示

摘要: 我们提供证据表明，尽管仅经过训练以执行下一个标记预测，代码语言模型（LMs）可以学习代表程序的形式语义。具体来说，我们在一个合成语料库上训练了一个Transformer模型，该语料库包含用于在2D网格世界环境中导航的特定领域语言编写的程序。语料库中的每个程序前面都有一个（部分）规范，以几个输入-输出网格世界状态的形式出现。尽管没有提供进一步的归纳偏差，我们发现一个探针分类器能够从LM隐藏状态中提取出对未观察到的中间网格世界状态的越来越准确的表示，并且在训练过程中，这表明LM获得了解释程序的形式意义的新能力。我们还开发了一种新颖的干预基线，使我们能够澄清LM所代表的内容与探测器所学习的内容之间的区别。我们预计这种技术可能普遍适用于广泛范围的语义探测实验。总之，本文并未提出任何新的代码LM训练技术，但开发了一个实验框架，并对代码统计模型中形式语义的获取和表示提供了见解。我们的代码可以在https://github.com/charlesjin/emergent-semantics上找到。

更新时间: 2024-08-02 23:09:32

领域: cs.LG,cs.AI,cs.CL,cs.PL

下载: http://arxiv.org/abs/2305.11169v3

Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.

Updated: 2024-08-02 22:59:18

标题: 高斯混合模型上的强大无监督多任务和迁移学习

摘要: 无监督学习已经被广泛应用于许多实际应用中。最简单、最重要的无监督学习模型之一是高斯混合模型（GMM）。在这项工作中，我们研究了GMM上的多任务学习问题，旨在利用任务之间潜在相似的GMM参数结构，以获得比单一任务学习更好的学习性能。我们提出了一种基于EM算法的多任务GMM学习过程，有效利用相关任务之间的未知相似性，并且对来自任意分布的异常任务具有鲁棒性。所提出的过程被证明在各种情况下实现了参数估计误差和过度错误分类误差的最小最优收敛速率。此外，我们将我们的方法推广到解决GMM的转移学习问题，导出类似的理论结果。此外，迭代无监督多任务和转移学习方法可能会遇到初始化对齐问题，我们提出了两种对齐算法来解决这个问题。最后，我们通过模拟和真实数据示例证明了我们方法的有效性。据我们所知，这是第一项具有理论保证的研究多任务和转移学习在GMM上。

更新时间: 2024-08-02 22:59:18

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2209.15224v4

Trustworthy Machine Learning under Social and Adversarial Data Sources

Machine learning has witnessed remarkable breakthroughs in recent years. As machine learning permeates various aspects of daily life, individuals and organizations increasingly interact with these systems, exhibiting a wide range of social and adversarial behaviors. These behaviors may have a notable impact on the behavior and performance of machine learning systems. Specifically, during these interactions, data may be generated by strategic individuals, collected by self-interested data collectors, possibly poisoned by adversarial attackers, and used to create predictors, models, and policies satisfying multiple objectives. As a result, the machine learning systems' outputs might degrade, such as the susceptibility of deep neural networks to adversarial examples (Shafahi et al., 2018; Szegedy et al., 2013) and the diminished performance of classic algorithms in the presence of strategic individuals (Ahmadi et al., 2021). Addressing these challenges is imperative for the success of machine learning in societal settings.

Updated: 2024-08-02 22:51:52

标题: 可信的机器学习在社交和敌对数据源下

摘要: 近年来，机器学习取得了显著的突破。随着机器学习渗透到日常生活的各个方面，个人和组织与这些系统的互动越来越多，表现出各种社会和敌对行为。这些行为可能对机器学习系统的行为和性能产生显著影响。具体而言，在这些互动过程中，数据可能由策略性个体生成，由自私的数据收集者收集，可能被敌对攻击者污染，并用于创建满足多个目标的预测器、模型和策略。因此，机器学习系统的输出可能会下降，例如深度神经网络对敌对示例的敏感性（Shafahi等，2018；Szegedy等，2013）以及经典算法在存在策略性个体的情况下性能下降（Ahmadi等，2021）。解决这些挑战对于机器学习在社会环境中的成功至关重要。

更新时间: 2024-08-02 22:51:52

领域: cs.LG,cs.AI,cs.GT

下载: http://arxiv.org/abs/2408.01596v1

Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment

The rapid growth of Artificial Intelligence (AI) has underscored the urgent need for responsible AI practices. Despite increasing interest, a comprehensive AI risk assessment toolkit remains lacking. This study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives. By integrating AI ethics principles such as fairness, transparency, and accountability into a structured question format, the RAI Question Bank aids in identifying potential risks, aligning with emerging regulations like the EU AI Act, and enhancing overall AI governance. A key benefit of the RAI Question Bank is its systematic approach to linking lower-level risk questions to higher-level ones and related themes, preventing siloed assessments and ensuring a cohesive evaluation process. Case studies illustrate the practical application of the RAI Question Bank in assessing AI projects, from evaluating risk factors to informing decision-making processes. The study also demonstrates how the RAI Question Bank can be used to ensure compliance with standards, mitigate risks, and promote the development of trustworthy AI systems. This work advances RAI by providing organizations with a valuable tool to navigate the complexities of ethical AI development and deployment while ensuring comprehensive risk management.

Updated: 2024-08-02 22:40:20

标题: 负责任的人工智能问题库：AI风险评估的综合工具

摘要: 人工智能（AI）的快速增长突显了负责任的AI实践的迫切需要。尽管人们对此越来越感兴趣，但仍缺乏一个全面的AI风险评估工具包。本研究介绍了我们的负责任AI（RAI）问题库，这是一个旨在支持多样化AI倡议的全面框架和工具。通过将公平性、透明度和问责制等AI道德原则整合到结构化的问题格式中，RAI问题库有助于识别潜在风险，与欧盟AI法案等新兴法规保持一致，并增强整体AI治理。RAI问题库的一个关键优势是其系统性方法，将较低级别的风险问题与较高级别的问题和相关主题联系起来，防止了孤立的评估，并确保了一个连贯的评估过程。案例研究展示了RAI问题库在评估AI项目方面的实际应用，从评估风险因素到指导决策过程。该研究还展示了RAI问题库如何用于确保符合标准、减轻风险，并促进可信AI系统的发展。这项工作通过为组织提供一个有价值的工具来应对道德AI开发和部署的复杂性，同时确保全面的风险管理来推动RAI的发展。

更新时间: 2024-08-02 22:40:20

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2408.11820v1

SumRecom: A Personalized Summarization Approach by Learning from Users' Feedback

Existing multi-document summarization approaches produce a uniform summary for all users without considering individuals' interests, which is highly impractical. Making a user-specific summary is a challenging task as it requires: i) acquiring relevant information about a user; ii) aggregating and integrating the information into a user-model; and iii) utilizing the provided information in making the personalized summary. Therefore, in this paper, we propose a solution to a substantial and challenging problem in summarization, i.e., recommending a summary for a specific user. The proposed approach, called SumRecom, brings the human into the loop and focuses on three aspects: personalization, interaction, and learning user's interest without the need for reference summaries. SumRecom has two steps: i) The user preference extractor to capture users' inclination in choosing essential concepts, and ii) The summarizer to discover the user's best-fitted summary based on the given feedback. Various automatic and human evaluations on the benchmark dataset demonstrate the supremacy SumRecom in generating user-specific summaries. Document summarization and Interactive summarization and Personalized summarization and Reinforcement learning.

Updated: 2024-08-02 22:33:59

标题: SumRecom：一种通过学习用户反馈实现个性化摘要的方法

摘要: 现有的多文档摘要方法为所有用户生成统一摘要，而不考虑个人的兴趣，这是非常不切实际的。制作用户特定摘要是一项具有挑战性的任务，因为它需要：i)获取关于用户的相关信息；ii)将信息聚合和整合到用户模型中；和iii)利用提供的信息制作个性化摘要。因此，在本文中，我们提出了一个解决摘要中一个重要且具有挑战性问题的解决方案，即为特定用户推荐摘要。所提出的方法称为SumRecom，将人类引入循环，并专注于三个方面：个性化、互动和学习用户兴趣，无需参考摘要。SumRecom有两个步骤：i)用户偏好提取器，捕捉用户在选择关键概念时的倾向；和ii)摘要生成器，根据给定的反馈发现用户最适合的摘要。在基准数据集上进行各种自动和人工评估显示了SumRecom在生成用户特定摘要方面的优越性。文档摘要和交互式摘要和个性化摘要和强化学习。

更新时间: 2024-08-02 22:33:59

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2408.07294v1

Mission: Impossible Language Models

Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn. However, there is very little published experimental evidence to support such a claim. Here, we develop a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. These languages lie on an impossibility continuum: at one end are languages that are inherently impossible, such as random and irreversible shuffles of English words, and on the other, languages that may not be intuitively impossible but are often considered so in linguistics, particularly those with rules based on counting word positions. We report on a wide range of evaluations to assess the capacity of GPT-2 small models to learn these uncontroversially impossible languages, and crucially, we perform these assessments at various stages throughout training to compare the learning process for each language. Our core finding is that GPT-2 struggles to learn impossible languages when compared to English as a control, challenging the core claim. More importantly, we hope our approach opens up a productive line of inquiry in which different LLM architectures are tested on a variety of impossible languages in an effort to learn more about how LLMs can be used as tools for these cognitive and typological investigations.

Updated: 2024-08-02 21:59:03

标题: 任务：不可能的语言模型

摘要: 乔姆斯基和其他人直接声称，大型语言模型(LLMs)能够学习人类可能和不可能学习的语言。然而，几乎没有公开实验证据支持这种说法。在这里，我们开发了一组不可能语言，其复杂程度不同，每种语言都是通过系统地改变英语数据而设计的，采用不自然的词序和语法规则。这些语言处于不可能性的连续性上：一端是本质上不可能的语言，比如随机和不可逆的英语单词洗牌，另一端是在语言学中常被认为不可能的语言，尤其是基于计数单词位置的规则。我们报告了一系列评估，以评估GPT-2小型模型学习这些毫无争议的不可能语言的能力，并且至关重要的是，我们在训练过程中的各个阶段进行这些评估，以比较每种语言的学习过程。我们的核心发现是，与作为对照的英语相比，GPT-2在学习不可能语言时遇到困难，挑战了核心主张。更重要的是，我们希望我们的方法开辟了一条有益的研究线路，即测试不同LLM架构对各种不可能语言的学习，以更深入地了解LLM如何被用作这些认知和类型学调查的工具。

更新时间: 2024-08-02 21:59:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.06416v2

When a Relation Tells More Than a Concept: Exploring and Evaluating Classifier Decisions with CoReX

Explanations for Convolutional Neural Networks (CNNs) based on relevance of input pixels might be too unspecific to evaluate which and how input features impact model decisions. Especially in complex real-world domains like biology, the presence of specific concepts and of relations between concepts might be discriminating between classes. Pixel relevance is not expressive enough to convey this type of information. In consequence, model evaluation is limited and relevant aspects present in the data and influencing the model decisions might be overlooked. This work presents a novel method to explain and evaluate CNN models, which uses a concept- and relation-based explainer (CoReX). It explains the predictive behavior of a model on a set of images by masking (ir-)relevant concepts from the decision-making process and by constraining relations in a learned interpretable surrogate model. We test our approach with several image data sets and CNN architectures. Results show that CoReX explanations are faithful to the CNN model in terms of predictive outcomes. We further demonstrate through a human evaluation that CoReX is a suitable tool for generating combined explanations that help assessing the classification quality of CNNs. We further show that CoReX supports the identification and re-classification of incorrect or ambiguous classifications.

Updated: 2024-08-02 21:55:52

标题: 当一个关系告诉我们更多信息时：使用CoReX探索和评估分类器决策

摘要: 基于输入像素相关性的卷积神经网络（CNNs）解释可能过于不具体，无法评估哪些输入特征以及如何影响模型决策。特别是在生物学等复杂的现实领域中，特定概念的存在以及概念之间的关系可能会在类别之间进行区分。像素相关性不足以传达这种类型的信息。因此，模型评估受限，数据中存在的相关方面和影响模型决策的因素可能会被忽视。本文提出了一种新颖的方法来解释和评估CNN模型，该方法使用基于概念和关系的解释器（CoReX）。它通过屏蔽（不）相关概念的决策过程，并通过约束在学习可解释的替代模型中的关系来解释模型对一组图像的预测行为。我们通过几个图像数据集和CNN架构测试了我们的方法。结果显示，CoReX解释在预测结果方面与CNN模型一致。我们进一步通过人类评估表明，CoReX是一个适合生成结合解释的工具，有助于评估CNN的分类质量。我们进一步展示，CoReX支持识别和重新分类不正确或模糊的分类。

更新时间: 2024-08-02 21:55:52

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.01661v2

OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models

Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1)time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2)increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3)privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces OpenLogParser, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. OpenLogParser first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i)similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii)self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that OpenLogParser achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, OpenLogParser addresses privacy and cost concerns of using commercial LLMs while achieving state-of-the-arts parsing efficiency and accuracy.

Updated: 2024-08-02 21:54:13

标题: OpenLogParser: 无监督解析与开源大型语言模型

摘要: 日志解析是将非结构化日志数据转换为结构化格式的关键步骤，有助于后续基于日志的分析。传统基于语法的日志解析器高效且有效，但在处理偏离预定义规则的日志时往往会出现准确性下降的情况。最近，基于大型语言模型（LLM）的日志解析器表现出更高的解析准确性。然而，现有的基于LLM的解析器面临三个主要挑战：1）耗时且劳动密集的手动标记用于微调或上下文学习，2）由于大量日志数据和LLM的有限上下文大小而导致的解析成本增加，以及3）使用商业模型（如ChatGPT）时存在敏感日志信息的隐私风险。为了克服这些限制，本文介绍了OpenLogParser，一种无监督的日志解析方法，利用开源LLM（即Llama3-8B）提高隐私性并降低运营成本，同时实现最先进的解析准确性。OpenLogParser首先使用固定深度分组树将具有相似静态文本但变化动态变量的日志分组。然后使用三个组件解析这些组内的日志：i）基于相似度评分的检索增强生成：根据Jaccard相似性选择每个组内不同的日志，帮助LLM区分静态文本和动态变量；ii）自我反思：迭代地查询LLM以改进日志模板以提高解析准确性；和iii）日志模板记忆：存储解析的模板以减少LLM查询以改进解析效率。我们在LogHub-2.0上的评估显示，与最先进的基于LLM的解析器相比，OpenLogParser的解析准确性提高了25％，处理日志速度提高了2.7倍。简而言之，OpenLogParser解决了使用商业LLM时的隐私和成本问题，同时实现了最先进的解析效率和准确性。

更新时间: 2024-08-02 21:54:13

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2408.01585v1

GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS

Multi-agent learning algorithms have been successful at generating superhuman planning in a wide variety of games but have had little impact on the design of deployed multi-agent planners. A key bottleneck in applying these techniques to multi-agent planning is that they require billions of steps of experience. To enable the study of multi-agent planning at this scale, we present GPUDrive, a GPU-accelerated, multi-agent simulator built on top of the Madrona Game Engine that can generate over a million steps of experience per second. Observation, reward, and dynamics functions are written directly in C++, allowing users to define complex, heterogeneous agent behaviors that are lowered to high-performance CUDA. We show that using GPUDrive we are able to effectively train reinforcement learning agents over many scenes in the Waymo Motion dataset, yielding highly effective goal-reaching agents in minutes for individual scenes and generally capable agents in a few hours. We ship these trained agents as part of the code base at https://github.com/Emerge-Lab/gpudrive.

Updated: 2024-08-02 21:37:46

标题: GPUDrive：基于数据驱动的、百万FPS的多智能体驾驶模拟

摘要: 多智能体学习算法已成功地在各种游戏中生成超人类规划，但对已部署的多智能体规划器的设计影响较小。将这些技术应用于多智能体规划的一个关键瓶颈是它们需要数十亿步的经验。为了能够以这种规模研究多智能体规划，我们提出了GPUDrive，这是一个基于Madrona游戏引擎构建的GPU加速的多智能体模拟器，可以每秒生成超过一百万步的经验。观察、奖励和动力学函数直接用C++编写，允许用户定义复杂、异构的智能体行为，并将其降低到高性能的CUDA。我们展示了使用GPUDrive，我们能够在Waymo Motion数据集中训练强化学习代理，为单个场景在几分钟内获得高效的目标达成代理，通常能力强的代理在几个小时内就可以获得。我们将这些训练过的代理作为代码库的一部分发布在https://github.com/Emerge-Lab/gpudrive。

更新时间: 2024-08-02 21:37:46

领域: cs.AI,cs.AR,cs.GR,cs.PF

下载: http://arxiv.org/abs/2408.01584v1

Conformal Diffusion Models for Individual Treatment Effect Estimation and Inference

Estimating treatment effects from observational data is of central interest across numerous application domains. Individual treatment effect offers the most granular measure of treatment effect on an individual level, and is the most useful to facilitate personalized care. However, its estimation and inference remain underdeveloped due to several challenges. In this article, we propose a novel conformal diffusion model-based approach that addresses those intricate challenges. We integrate the highly flexible diffusion modeling, the model-free statistical inference paradigm of conformal inference, along with propensity score and covariate local approximation that tackle distributional shifts. We unbiasedly estimate the distributions of potential outcomes for individual treatment effect, construct an informative confidence interval, and establish rigorous theoretical guarantees. We demonstrate the competitive performance of the proposed method over existing solutions through extensive numerical studies.

Updated: 2024-08-02 21:35:08

标题: 个体治疗效果估计和推断的共形扩散模型

摘要: 从观察数据中估计治疗效果是许多应用领域的核心问题。个体治疗效果提供了治疗效果在个体水平上最详细的衡量，并且对促进个性化护理最有用。然而，由于几个挑战，其估计和推断仍未得到充分发展。在本文中，我们提出了一种基于新颖的整体扩散模型的方法，以解决这些复杂的挑战。我们将高度灵活的扩散建模、无模型统计推断范式的整体推断，以及处理分布转移的倾向得分和协变量局部逼近相结合。我们无偏地估计了个体治疗效果的潜在结果分布，构建了一个信息丰富的置信区间，并建立了严格的理论保证。通过大量的数值研究，我们展示了所提出方法相对于现有解决方案的竞争性表现。

更新时间: 2024-08-02 21:35:08

领域: stat.ML,cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2408.01582v1

Huge Ensembles Part II: Properties of a Huge Ensemble of Hindcasts Generated with Spherical Fourier Neural Operators

In Part I, we created an ensemble based on Spherical Fourier Neural Operators. As initial condition perturbations, we used bred vectors, and as model perturbations, we used multiple checkpoints trained independently from scratch. Based on diagnostics that assess the ensemble's physical fidelity, our ensemble has comparable performance to operational weather forecasting systems. However, it requires several orders of magnitude fewer computational resources. Here in Part II, we generate a huge ensemble (HENS), with 7,424 members initialized each day of summer 2023. We enumerate the technical requirements for running huge ensembles at this scale. HENS precisely samples the tails of the forecast distribution and presents a detailed sampling of internal variability. For extreme climate statistics, HENS samples events 4$\sigma$ away from the ensemble mean. At each grid cell, HENS improves the skill of the most accurate ensemble member and enhances coverage of possible future trajectories. As a weather forecasting model, HENS issues extreme weather forecasts with better uncertainty quantification. It also reduces the probability of outlier events, in which the verification value lies outside the ensemble forecast distribution.

Updated: 2024-08-02 21:31:34

标题: 大型集合部分二：由球形傅立叶神经算子生成的大型集合的性质

摘要: 在第一部分中，我们基于球形傅里叶神经算子创建了一个集合。作为初始条件扰动，我们使用了繁殖向量，作为模型扰动，我们使用了独立从头开始训练的多个检查点。根据评估集合物理准确性的诊断结果，我们的集合表现与实际天气预报系统相当。然而，它需要数个数量级更少的计算资源。在这里第二部分中，我们生成了一个大型集合（HENS），每天初始化7,424个成员，直到2023年夏天。我们列举了在这种规模上运行大型集合的技术要求。HENS精确地采样预测分布的尾部，并展示内部变异的详细采样。对于极端气候统计数据，HENS采样与集合平均值相隔4个标准差的事件。在每个网格单元中，HENS提高了最准确的集合成员的技能，并增强了可能未来轨迹的覆盖范围。作为一种天气预报模型，HENS发布具有更好不确定性量化的极端天气预报。它还降低了离群事件的概率，其中验证值位于集合预测分布之外。

更新时间: 2024-08-02 21:31:34

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2408.01581v1

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic

Traffic forecasting uses recent measurements by sensors installed at chosen locations to forecast the future road traffic. Existing work either assumes all locations are equipped with sensors or focuses on short-term forecast. This paper studies partial sensing traffic forecast of long-term traffic, assuming sensors only at some locations. The study is important in lowering the infrastructure investment cost in traffic management since deploying sensors at all locations could incur prohibitively high cost. However, the problem is challenging due to the unknown distribution at unsensed locations, the intricate spatio-temporal correlation in long-term forecasting, as well as noise in data and irregularities in traffic patterns (e.g., road closure). We propose a Spatio-Temporal Partial Sensing (STPS) forecast model for long-term traffic prediction, with several novel contributions, including a rank-based embedding technique to capture irregularities and overcome noise, a spatial transfer matrix to overcome the spatial distribution shift from permanently sensed locations to unsensed locations, and a multi-step training process that utilizes all available data to successively refine the model parameters for better accuracy. Extensive experiments on several real-world traffic datasets demonstrate that STPS outperforms the state-of-the-art and achieves superior accuracy in partial sensing long-term forecasting.

Updated: 2024-08-02 21:22:22

标题: 空间-时间部分感知长期交通预测

摘要: 交通预测利用安装在选定位置的传感器的最近测量来预测未来的道路交通情况。现有研究要么假设所有位置都配备了传感器，要么专注于短期预测。本文研究了长期交通的部分感知预测，假设只在部分位置安装传感器。该研究对于降低交通管理中的基础设施投资成本是重要的，因为在所有位置部署传感器可能会产生 prohibitively 高的成本。然而，这个问题具有挑战性，因为未感知位置的未知分布，长期预测中复杂的时空相关性，以及数据中的噪声和交通模式的不规则性（例如，道路关闭）。我们提出了一个长期交通预测的时空部分感知（STPS）预测模型，包括几个新颖的贡献，包括一种基于排名的嵌入技术，用于捕捉不规则性和克服噪声，一个空间传输矩阵，用于克服永久感知位置到未感知位置的空间分布偏移，以及一个多步训练过程，利用所有可用数据逐步完善模型参数以获得更好的准确性。对几个真实世界交通数据集进行的广泛实验表明，STPS 胜过了最先进技术，并在部分感知长期预测中实现了更高的准确性。

更新时间: 2024-08-02 21:22:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.02689v1

SoK: Payment Channel Networks

Payment Channel Networks (PCNs) have been proposed as an alternative solution to the scalability, throughput, and cost overhead associated with on-chain transactions. By facilitating offchain execution of transactions, PCNs significantly reduce the burden on the blockchain, leading to faster transaction processing, reduced transaction fees, and enhanced privacy. Despite these advantages, the current research in PCNs presents a variety of research challenges that require further exploration. In this paper, we survey the recent work in several aspects of PCNs, such as pathfinding and routing, virtual channels, state channels, payment channel hubs and rebalancing. This survey aims to provide the reader with a detailed understanding of the current state-of-the-art in PCN research, highlighting a few important advancements. Additionally, we highlight the various unresolved issues in the area of PCN research. Specifically, this paper seeks to answer the following crucial question: What are the various interesting and non-trivial challenges in PCN research that require immediate attention from the academic and research community? By addressing this question, we aim to identify the most pressing problems and future research directions that interested readers can immediately work on. Through this analysis, we hope to inspire researchers and practitioners to tackle these challenges to make PCNs more secure and versatile

Updated: 2024-08-02 21:16:09

标题: SoK: 付款通道网络

摘要: 支付通道网络（PCNs）被提出作为一种替代解决方案，以解决与链上交易相关的可扩展性、吞吐量和成本负担。通过促进链下交易执行，PCNs显著减轻了区块链的负担，实现了更快的交易处理、降低的交易费用和增强的隐私性。尽管具有这些优势，PCNs领域的当前研究提出了一系列需要进一步探索的研究挑战。本文对PCNs的几个方面的最新研究进行了调查，如路径查找和路由、虚拟通道、状态通道、支付通道中心和再平衡。该调查旨在为读者提供对PCN研究当前最新技术水平的详细理解，突出几项重要进展。此外，我们还强调了PCN研究领域的各种未解决问题。具体而言，本文旨在回答以下关键问题：PCN研究中存在哪些有趣且不平凡的挑战需要学术和研究界立即关注？通过回答这个问题，我们旨在确定最紧迫的问题和未来的研究方向，以便感兴趣的读者可以立即着手解决。通过这种分析，我们希望激励研究人员和实践者解决这些挑战，使PCNs更安全和多功能。

更新时间: 2024-08-02 21:16:09

领域: cs.CR

下载: http://arxiv.org/abs/2407.20968v2

Deep Learning Framework for History Matching CO2 Storage with 4D Seismic and Monitoring Well Data

Geological carbon storage entails the injection of megatonnes of supercritical CO2 into subsurface formations. The properties of these formations are usually highly uncertain, which makes design and optimization of large-scale storage operations challenging. In this paper we introduce a history matching strategy that enables the calibration of formation properties based on early-time observations. Early-time assessments are essential to assure the operation is performing as planned. Our framework involves two fit-for-purpose deep learning surrogate models that provide predictions for in-situ monitoring well data and interpreted time-lapse (4D) seismic saturation data. These two types of data are at very different scales of resolution, so it is appropriate to construct separate, specialized deep learning networks for their prediction. This approach results in a workflow that is more straightforward to design and more efficient to train than a single surrogate that provides global high-fidelity predictions. The deep learning models are integrated into a hierarchical Markov chain Monte Carlo (MCMC) history matching procedure. History matching is performed on a synthetic case with and without 4D seismic data, which allows us to quantify the impact of 4D seismic on uncertainty reduction. The use of both data types is shown to provide substantial uncertainty reduction in key geomodel parameters and to enable accurate predictions of CO2 plume dynamics. The overall history matching framework developed in this study represents an efficient way to integrate multiple data types and to assess the impact of each on uncertainty reduction and performance predictions.

Updated: 2024-08-02 21:14:13

标题: 深度学习框架用于通过4D地震和监测井数据进行CO2储存历史匹配

摘要: 地质碳储存涉及向地下岩层注入数百万吨超临界CO2。这些岩层的性质通常高度不确定，这使得设计和优化大规模储存操作具有挑战性。在本文中，我们介绍了一种历史匹配策略，可根据早期观测来校准岩层性质。早期评估对确保操作按计划进行至关重要。我们的框架包括两个适用于深度学习的替代模型，用于提供现场监测井数据和解释的时间差（4D）地震饱和度数据的预测。这两种数据具有非常不同的分辨率，因此适合为它们的预测构建单独的专门的深度学习网络。这种方法比提供全局高保真度预测的单一替代模型更容易设计和训练。深度学习模型集成到分层马尔可夫链蒙特卡洛（MCMC）历史匹配程序中。历史匹配在有和无4D地震数据的合成案例上进行，这使我们能够量化4D地震对不确定性减少的影响。结果表明，同时使用这两种数据类型可显著降低关键地质模型参数的不确定性，并实现对CO2穿透动态的准确预测。本研究开发的整体历史匹配框架代表了一种有效的方法，可以整合多种数据类型，并评估每种数据对不确定性减少和性能预测的影响。

更新时间: 2024-08-02 21:14:13

领域: cs.LG,physics.geo-ph,stat.ML

下载: http://arxiv.org/abs/2408.01575v1

Telecom Foundation Models: Applications, Challenges, and Future Trends

Telecom networks are becoming increasingly complex, with diversified deployment scenarios, multi-standards, and multi-vendor support. The intricate nature of the telecom network ecosystem presents challenges to effectively manage, operate, and optimize networks. To address these hurdles, Artificial Intelligence (AI) has been widely adopted to solve different tasks in telecom networks. However, these conventional AI models are often designed for specific tasks, rely on extensive and costly-to-collect labeled data that require specialized telecom expertise for development and maintenance. The AI models usually fail to generalize and support diverse deployment scenarios and applications. In contrast, Foundation Models (FMs) show effective generalization capabilities in various domains in language, vision, and decision-making tasks. FMs can be trained on multiple data modalities generated from the telecom ecosystem and leverage specialized domain knowledge. Moreover, FMs can be fine-tuned to solve numerous specialized tasks with minimal task-specific labeled data and, in some instances, are able to leverage context to solve previously unseen problems. At the dawn of 6G, this paper investigates the potential opportunities of using FMs to shape the future of telecom technologies and standards. In particular, the paper outlines a conceptual process for developing Telecom FMs (TFMs) and discusses emerging opportunities for orchestrating specialized TFMs for network configuration, operation, and maintenance. Finally, the paper discusses the limitations and challenges of developing and deploying TFMs.

Updated: 2024-08-02 21:09:13

标题: 电信基金会模型：应用、挑战和未来趋势

摘要: 电信网络变得越来越复杂，部署场景多样化，支持多种标准和多个供应商。电信网络生态系统的复杂性给有效管理、运营和优化网络带来挑战。为了解决这些障碍，人工智能（AI）被广泛采用来解决电信网络中的不同任务。然而，这些传统的AI模型通常是为特定任务设计的，依赖于广泛且昂贵的标记数据，需要专门的电信专业知识进行开发和维护。这些AI模型通常无法泛化并支持不同的部署场景和应用。相比之下，基础模型（FMs）在语言、视觉和决策任务的各个领域展现出有效的泛化能力。FMs可以在从电信生态系统生成的多种数据模态上进行训练，并利用专业领域知识。此外，FMs可以通过微调解决多种专门任务，只需最少的任务特定标记数据，并且在某些情况下，能够利用上下文解决以前未见过的问题。在6G时代的黎明，本文调查了使用FMs塑造电信技术和标准未来的潜在机会。具体来说，本文概述了开发电信FMs（TFMs）的概念流程，并讨论了为网络配置、运营和维护协调专门TFMs的新兴机会。最后，本文讨论了开发和部署TFMs的限制和挑战。

更新时间: 2024-08-02 21:09:13

领域: cs.NI,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.03964v1

Integrating Cognitive AI with Generative Models for Enhanced Question Answering in Skill-based Learning

In online learning, the ability to provide quick and accurate feedback to learners is crucial. In skill-based learning, learners need to understand the underlying concepts and mechanisms of a skill to be able to apply it effectively. While videos are a common tool in online learning, they cannot comprehend or assess the skills being taught. Additionally, while Generative AI methods are effective in searching and retrieving answers from a text corpus, it remains unclear whether these methods exhibit any true understanding. This limits their ability to provide explanations of skills or help with problem-solving. This paper proposes a novel approach that merges Cognitive AI and Generative AI to address these challenges. We employ a structured knowledge representation, the TMK (Task-Method-Knowledge) model, to encode skills taught in an online Knowledge-based AI course. Leveraging techniques such as Large Language Models, Chain-of-Thought, and Iterative Refinement, we outline a framework for generating reasoned explanations in response to learners' questions about skills.

Updated: 2024-08-02 21:06:51

标题: 将认知人工智能与生成模型整合，以增强基于技能学习的问答能力

摘要: 在线学习中，向学习者提供快速准确的反馈是至关重要的。在技能学习中，学习者需要理解技能的基本概念和机制，才能有效地应用它。虽然视频在在线学习中是常见的工具，但它们不能理解或评估所教授的技能。此外，虽然生成式人工智能方法在从文本语料库中搜索和检索答案方面是有效的，但尚不清楚这些方法是否表现出真正的理解。这限制了它们提供技能解释或帮助解决问题的能力。本文提出了一种新颖的方法，将认知人工智能和生成式人工智能相结合，以解决这些挑战。我们采用了一种结构化知识表示，即TMK（任务-方法-知识）模型，来编码在线知识型人工智能课程中教授的技能。利用诸如大型语言模型、思维链和迭代细化等技术，我们概述了一个框架，用于根据学习者关于技能的问题生成合理的解释。

更新时间: 2024-08-02 21:06:51

领域: cs.AI

下载: http://arxiv.org/abs/2407.19393v2

DECO: Liberating Web Data Using Decentralized Oracles for TLS

Thanks to the widespread deployment of TLS, users can access private data over channels with end-to-end confidentiality and integrity. What they cannot do, however, is prove to third parties the {\em provenance} of such data, i.e., that it genuinely came from a particular website. Existing approaches either introduce undesirable trust assumptions or require server-side modifications. As a result, the value of users' private data is locked up in its point of origin. Users cannot export their data with preserved integrity to other applications without help and permission from the current data holder. We propose DECO (short for \underline{dec}entralized \underline{o}racle) to address the above problems. DECO allows users to prove that a piece of data accessed via TLS came from a particular website and optionally prove statements about such data in zero-knowledge, keeping the data itself secret. DECO is the first such system that works without trusted hardware or server-side modifications. DECO can liberate data from centralized web-service silos, making it accessible to a rich spectrum of applications. To demonstrate the power of DECO, we implement three applications that are hard to achieve without it: a private financial instrument using smart contracts, converting legacy credentials to anonymous credentials, and verifiable claims against price discrimination.

Updated: 2024-08-02 21:05:15

标题: DECO：使用分散式预言机解放Web数据的TLS

摘要: 由于广泛部署TLS，用户可以通过端对端的机密性和完整性访问私人数据。然而，他们无法向第三方证明这些数据的来源，即它确实来自于特定网站。现有的方法要么引入不良的信任假设，要么需要服务器端的修改。因此，用户的私人数据价值被锁定在其来源处。用户无法在没有当前数据持有者的帮助和许可的情况下，以保持完整性的方式将其数据导出到其他应用程序。我们提出了DECO（\underline{dec}entralized \underline{o}racle）来解决上述问题。DECO允许用户证明通过TLS访问的数据来自于特定网站，并可选择以零知识的方式证明有关该数据的陈述，同时保持数据本身的机密性。DECO是第一个在没有可信硬件或服务器端修改的情况下运作的系统。DECO可以将数据从集中式网络服务囚禁中解放出来，使其可以被丰富多样的应用程序访问。为展示DECO的强大功能，我们实现了三个难以实现的应用程序：使用智能合约的私人金融工具，将传统凭证转换为匿名凭证，以及针对价格歧视的可验证声明。

更新时间: 2024-08-02 21:05:15

领域: cs.CR

下载: http://arxiv.org/abs/1909.00938v6

Counterfactual Explanations for Medical Image Classification and Regression using Diffusion Autoencoder

Counterfactual explanations (CEs) aim to enhance the interpretability of machine learning models by illustrating how alterations in input features would affect the resulting predictions. Common CE approaches require an additional model and are typically constrained to binary counterfactuals. In contrast, we propose a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE). This approach offers inherent interpretability by enabling the generation of CEs and the continuous visualization of the model's internal representation across decision boundaries. Our method leverages the DAE's ability to encode images into a semantically rich latent space in an unsupervised manner, eliminating the need for labeled data or separate feature extraction models. We show that these latent representations are helpful for medical condition classification and the ordinal regression of severity pathologies, such as vertebral compression fractures (VCF) and diabetic retinopathy (DR). Beyond binary CEs, our method supports the visualization of ordinal CEs using a linear model, providing deeper insights into the model's decision-making process and enhancing interpretability. Experiments across various medical imaging datasets demonstrate the method's advantages in interpretability and versatility. The linear manifold of the DAE's latent space allows for meaningful interpolation and manipulation, making it a powerful tool for exploring medical image properties. Our code is available at https://github.com/matanat/dae_counterfactual.

Updated: 2024-08-02 21:01:30

标题: 医学图像分类和回归的反事实解释：使用扩散自动编码器

摘要: 反事实解释（CEs）旨在通过说明输入特征的变化如何影响结果预测来增强机器学习模型的可解释性。常见的CE方法需要一个额外的模型，并且通常受限于二元反事实。相比之下，我们提出了一种直接在生成模型的潜在空间上操作的新方法，具体来说是Diffusion Autoencoder（DAE）。这种方法通过使CE的生成和模型内部表示在决策边界上连续可视化，提供了固有的可解释性。我们的方法利用了DAE以无监督方式将图像编码为语义丰富的潜在空间的能力，消除了对标记数据或单独特征提取模型的需求。我们表明这些潜在表示对于医学状况分类和严重病理的序数回归（如椎体压缩骨折（VCF）和糖尿病性视网膜病变（DR））是有帮助的。除了二元CEs，我们的方法支持使用线性模型对序数CEs进行可视化，提供更深入的了解模型的决策过程，并增强可解释性。在各种医学成像数据集上的实验展示了该方法在可解释性和多功能性方面的优势。DAE的潜在空间的线性流形允许进行有意义的插值和操作，使其成为探索医学图像属性的强大工具。我们的代码可在https://github.com/matanat/dae_counterfactual 上找到。

更新时间: 2024-08-02 21:01:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.01571v1

VAE-IF: Deep feature extraction with averaging for fully unsupervised artifact detection in routinely acquired ICU time-series

Artifacts are a common problem in physiological time series collected from intensive care units (ICU) and other settings. They affect the quality and reliability of clinical research and patient care. Manual annotation of artifacts is costly and time-consuming, rendering it impractical. Automated methods are desired. Here, we propose a novel fully unsupervised approach to detect artifacts in clinical-standard, minute-by-minute resolution ICU data without any prior labeling or signal-specific knowledge. Our approach combines a variational autoencoder (VAE) and an isolation forest (IF) into a hybrid model to learn features and identify anomalies in different types of vital signs, such as blood pressure, heart rate, and intracranial pressure. We evaluate our approach on a real-world ICU dataset and compare it with supervised benchmark models based on long short-term memory (LSTM) and XGBoost and statistical methods such as ARIMA. We show that our unsupervised approach achieves comparable sensitivity to fully supervised methods and generalizes well to an external dataset. We also visualize the latent space learned by the VAE and demonstrate its ability to disentangle clean and noisy samples. Our approach offers a promising solution for cleaning ICU data in clinical research and practice without the need for any labels whatsoever.

Updated: 2024-08-02 20:43:09

标题: VAE-IF：深度特征提取与平均化，用于在常规获取的ICU时间序列中进行完全无监督的伪影检测

摘要: 文献摘要：在重症监护病房（ICU）和其他场所收集的生理时间序列中，伪迹是一个常见问题。它们影响临床研究和患者护理的质量和可靠性。手动注释伪迹费时费力，因此不切实际。人们希望有自动化的方法。在这里，我们提出了一种全新的完全无监督方法，用于在临床标准的按分钟分辨率ICU数据中检测伪迹，而无需任何先前标记或特定信号知识。我们的方法将变分自动编码器（VAE）和隔离森林（IF）结合成混合模型，以学习特征并识别不同类型的生命体征异常，如血压、心率和颅内压。我们在一个真实的ICU数据集上评估了我们的方法，并将其与基于长短期记忆（LSTM）和XGBoost以及ARIMA等统计方法的监督基准模型进行了比较。我们展示了我们的无监督方法实现了与完全监督方法相当的灵敏度，并且很好地推广到外部数据集。我们还可视化了VAE学习的潜在空间，并展示了它分离干净和嘈杂样本的能力。我们的方法为在临床研究和实践中清洁ICU数据提供了一个有前途的解决方案，而无需任何标签。

更新时间: 2024-08-02 20:43:09

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2312.05959v2

Pre-trained Gaussian Processes for Bayesian Optimization

Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.

Updated: 2024-08-02 20:13:29

标题: 预训练高斯过程用于贝叶斯优化

摘要: 贝叶斯优化（BO）已成为全球优化昂贵实际函数的流行策略。与BO适用于优化黑盒函数的普遍期望相反，实际上需要有关这些函数的领域知识才能成功应用BO。这种领域知识通常体现在指定函数初始信念的高斯过程（GP）先验中。然而，即使具有专业知识，量化定义先验也并非易事。这在复杂机器学习模型的超参数调优问题中尤为真实，其中调优目标的景观往往难以理解。我们寻求一种为这些功能先验设置的替代实践。特别是，我们考虑具有来自相似函数的数据的情况，这些数据使我们能够预先训练更紧的分布。我们详细说明了使用基于KL散度的损失函数的GP的预训练内容，并提出了一个名为HyperBO的新的基于预训练的BO框架。从理论上讲，我们展示了HyperBO的有界后验预测和近零后悔，而无需假设已知“基本事实”GP先验。为了验证我们的方法在现实设置中的效果，我们通过在流行的图像和文本数据集以及蛋白质序列数据集上训练近最先进的深度学习模型的数万个配置，收集了一个大型多任务超参数调整数据集。我们的结果表明，平均而言，HyperBO能够比我们新的调整数据集和现有的多任务BO基准测试中最佳竞争方法更高效地找到良好的超参数，至少是3倍。

更新时间: 2024-08-02 20:13:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2109.08215v6

Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property Prediction

The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs like Google gemini-pro and OpenAI gpt-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies - enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall that exceed 95\% with an error rate of approximately 9%, highlighting the effectiveness and versatility of the toolkit. Finally, databases for 2D material thicknesses, a critical parameter for device integration, and energy bandgap values are developed using PropertyExtractor. Specifically for the thickness database, the rapid evolution of the field has outpaced both experimental measurements and computational methods, creating a significant data gap. Our work addresses this gap and showcases the potential of PropertyExtractor as a reliable and efficient tool for the autonomous generation of various material property databases, advancing the field.

Updated: 2024-08-02 20:07:30

标题: 上下文动态学习与对话模型在数据提取和材料性能预测中的应用

摘要: 自然语言处理和大型语言模型（LLMs）的出现彻底改变了从非结构化学术论文中提取数据的方式。然而，确保数据可信性仍然是一个重要挑战。本文介绍了PropertyExtractor，这是一个开源工具，利用了像Google gemini-pro和OpenAI gpt-4这样的先进对话式LLMs，将零-shot和少量-shot的上下文学习相结合，并利用设计好的提示信息动态完善了结构化信息层次结构，实现了材料属性数据的自主、高效、可扩展和准确的识别、提取和验证。我们对材料数据进行的测试表明，精度和召回率均超过95％，错误率约为9％，突显了工具包的有效性和多功能性。最后，使用PropertyExtractor开发了2D材料厚度和能隙值数据库，这是设备集成的关键参数。特别是对于厚度数据库，该领域的快速发展已经超过了实验测量和计算方法，导致了重要的数据缺口。我们的工作填补了这一缺口，并展示了PropertyExtractor作为可靠和高效工具的潜力，用于自主生成各种材料属性数据库，推动该领域的发展。

更新时间: 2024-08-02 20:07:30

领域: cond-mat.mtrl-sci,cs.AI,physics.comp-ph

下载: http://arxiv.org/abs/2405.10448v2

Robot-Enabled Machine Learning-Based Diagnosis of Gastric Cancer Polyps Using Partial Surface Tactile Imaging

In this paper, to collectively address the existing limitations on endoscopic diagnosis of Advanced Gastric Cancer (AGC) Tumors, for the first time, we propose (i) utilization and evaluation of our recently developed Vision-based Tactile Sensor (VTS), and (ii) a complementary Machine Learning (ML) algorithm for classifying tumors using their textural features. Leveraging a seven DoF robotic manipulator and unique custom-designed and additively-manufactured realistic AGC tumor phantoms, we demonstrated the advantages of automated data collection using the VTS addressing the problem of data scarcity and biases encountered in traditional ML-based approaches. Our synthetic-data-trained ML model was successfully evaluated and compared with traditional ML models utilizing various statistical metrics even under mixed morphological characteristics and partial sensor contact.

Updated: 2024-08-02 20:01:23

标题: 机器学习辅助的机器人诊断胃癌息肉，利用部分表面触觉成像技术

摘要: 在这篇论文中，为了共同解决内窥镜诊断晚期胃癌（AGC）肿瘤存在的限制，我们首次提出（i）利用和评估我们最近开发的基于视觉的触觉传感器（VTS），以及（ii）一种互补的机器学习（ML）算法，用于利用肿瘤的纹理特征进行分类。利用七自由度的机器人操纵器和独特的定制设计和添加制造的逼真的AGC肿瘤幻像，我们展示了利用VTS进行自动数据收集的优势，解决了传统基于ML的方法中遇到的数据稀缺和偏见问题。我们的合成数据训练的ML模型成功地进行了评估，并与利用各种统计指标甚至在混合形态特征和部分传感器接触下的传统ML模型进行了比较。

更新时间: 2024-08-02 20:01:23

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.01554v1

Evaluating the Impact of Advanced LLM Techniques on AI-Lecture Tutors for a Robotics Course

This study evaluates the performance of Large Language Models (LLMs) as an Artificial Intelligence-based tutor for a university course. In particular, different advanced techniques are utilized, such as prompt engineering, Retrieval-Augmented-Generation (RAG), and fine-tuning. We assessed the different models and applied techniques using common similarity metrics like BLEU-4, ROUGE, and BERTScore, complemented by a small human evaluation of helpfulness and trustworthiness. Our findings indicate that RAG combined with prompt engineering significantly enhances model responses and produces better factual answers. In the context of education, RAG appears as an ideal technique as it is based on enriching the input of the model with additional information and material which usually is already present for a university course. Fine-tuning, on the other hand, can produce quite small, still strong expert models, but poses the danger of overfitting. Our study further asks how we measure performance of LLMs and how well current measurements represent correctness or relevance? We find high correlation on similarity metrics and a bias of most of these metrics towards shorter responses. Overall, our research points to both the potential and challenges of integrating LLMs in educational settings, suggesting a need for balanced training approaches and advanced evaluation frameworks.

Updated: 2024-08-02 19:49:19

标题: 评估先进的LLM技术对机器人课程AI讲师的影响

摘要: 这项研究评估了大型语言模型（LLMs）作为基于人工智能的大学课程辅导员的表现。具体来说，利用了不同的先进技术，如提示工程、检索增强生成（RAG）和微调。我们使用常见的相似度指标（如BLEU-4、ROUGE和BERTScore）对不同模型和应用技术进行评估，同时辅以小规模的人类评估帮助性和可信度。我们的研究结果表明，RAG结合提示工程显著增强了模型响应，并产生更好的事实性答案。在教育环境中，RAG似乎是一种理想的技术，因为它是基于为大学课程通常已经存在的额外信息和材料来丰富模型的输入。另一方面，微调可以产生相当小但仍然强大的专家模型，但存在过拟合的风险。我们的研究进一步探讨了如何衡量LLMs的性能以及当前测量如何代表正确性或相关性？我们发现相似度指标之间存在很高的相关性，大多数这些指标偏向于更短的回答。总的来说，我们的研究指出了在教育环境中整合LLMs的潜力和挑战，建议采用平衡的培训方法和先进的评估框架。

更新时间: 2024-08-02 19:49:19

领域: cs.CL,cs.AI,cs.CY,cs.RO

下载: http://arxiv.org/abs/2408.04645v1

Post-Quantum Cryptography (PQC) Network Instrument: Measuring PQC Adoption Rates and Identifying Migration Pathways

The problem of adopting quantum-resistant cryptographic network protocols or post-quantum cryptography (PQC) is critically important to democratizing quantum computing. The problem is urgent because practical quantum computers will break classical encryption in the next few decades. Past encrypted data has already been collected and can be decrypted in the near future. The main challenges of adopting post-quantum cryptography lie in algorithmic complexity and hardware/software/network implementation. The grand question of how existing cyberinfrastructure will support post-quantum cryptography remains unanswered. This paper describes: i) the design of a novel Post-Quantum Cryptography (PQC) network instrument placed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign and a part of the FABRIC testbed; ii) the latest results on PQC adoption rate across a wide spectrum of network protocols (Secure Shell -- SSH, Transport Layer Security -- TLS, etc.); iii) the current state of PQC implementation in key scientific applications (e.g., OpenSSH or SciTokens); iv) the challenges of being quantum-resistant; and v) discussion of potential novel attacks. This is the first large-scale measurement of PQC adoption at national-scale supercomputing centers and FABRIC testbeds. Our results show that only OpenSSH and Google Chrome have successfully implemented PQC and achieved an initial adoption rate of 0.029% (6,044 out of 20,556,816) for OpenSSH connections at NCSA coming from major Internet Service Providers or Autonomous Systems (ASes) such as OARNET, GTT, Google Fiber Webpass (U.S.) and Uppsala Lans Landsting (Sweden), with an overall increasing adoption rate year-over-year for 2023-2024. Our analyses identify pathways to migrate current applications to be quantum-resistant.

Updated: 2024-08-02 19:33:38

标题: 后量子密码（PQC）网络仪器：测量PQC采用率并确定迁移路径

摘要: 采用抗量子密码网络协议或后量子密码（PQC）的问题对推动量子计算的民主化至关重要。这个问题迫在眉睫，因为未来几十年内实用的量子计算机将会破解经典加密。过去加密的数据已经被收集，可以在不久的将来被解密。采用后量子密码面临的主要挑战在于算法复杂性和硬件/软件/网络实现。现有网络基础设施如何支持后量子密码仍然是一个未解之谜。本文描述了：i）位于伊利诺伊大学厄巴纳-香槟分校国家超级计算应用中心（NCSA）和FABRIC测试台的新型后量子密码（PQC）网络工具的设计；ii）在各种网络协议（如安全外壳（SSH）、传输层安全性（TLS）等）中PQC采用率的最新结果；iii）关键科学应用程序（如OpenSSH或SciTokens）中PQC实施的现状；iv）抗量子的挑战；v）潜在新攻击的讨论。这是在国家级超级计算中心和FABRIC测试台进行的第一次大规模后量子密码采用测量。我们的结果显示，只有OpenSSH和Google Chrome成功实施了PQC，并在NCSA的主要互联网服务提供商或自治系统（如OARNET、GTT、Google Fiber Webpass（美国）和瑞典的Uppsala Lans Landsting）的OpenSSH连接中取得了0.029％的初始采用率（20,556,816中的6,044），2023-2024年的整体采用率逐年增加。我们的分析确定了将当前应用程序迁移到抗量子的途径。

更新时间: 2024-08-02 19:33:38

领域: cs.NI,cs.CR,quant-ph

下载: http://arxiv.org/abs/2408.00054v2

ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.

Updated: 2024-08-02 19:24:55

标题: ECG揭示：对实际ECG数据集中客户重新识别风险的分析

摘要: 尽管心电图（ECG）数据对于诊断和监测心脏状况至关重要，但它也包含独特的生物特征信息，可能存在重大的隐私风险。现有的心电图再识别研究依赖于对众多深度学习特征进行详尽分析，限制了对临床医生决策的解释性。在本研究中，我们通过透明机器学习模型深入探讨了心电图再识别风险的解释性。我们使用SHapley Additive exPlanations（SHAP）分析来识别和解释对再识别风险的关键特征。我们使用来自五个不同真实世界数据集的心电图数据，涵盖了223名参与者，进行了身份再识别风险的实证分析。通过采用透明机器学习模型，我们揭示了不同心电图特征在准确性方面对个体再识别的贡献差异，性别准确率为0.76，年龄组准确率为0.67，参与者ID再识别准确率为0.82。我们的方法为临床专家提供了宝贵的见解，并指导了有效的保护隐私机制的发展。此外，我们的发现强调了真实世界健康应用中健壮隐私措施的必要性，并提供了详细的可操作见解，以增强数据匿名化技术。

更新时间: 2024-08-02 19:24:55

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2408.10228v1

Should We Attend More or Less? Modulating Attention for Fairness

The advances in natural language processing (NLP) pose both opportunities and challenges. While recent progress enables the development of high-performing models for a variety of tasks, it also poses the risk of models learning harmful biases from the data, such as gender stereotypes. In this work, we investigate the role of attention, a widely-used technique in current state-of-the-art NLP models, in the propagation of social biases. Specifically, we study the relationship between the entropy of the attention distribution and the model's performance and fairness. We then propose a novel method for modulating attention weights to improve model fairness after training. Since our method is only applied post-training and pre-inference, it is an intra-processing method and is, therefore, less computationally expensive than existing in-processing and pre-processing approaches. Our results show an increase in fairness and minimal performance loss on different text classification and generation tasks using language models of varying sizes. WARNING: This work uses language that is offensive.

Updated: 2024-08-02 19:20:25

标题: 我们应该更多还是更少参加？调节关注力以实现公平

摘要: 自然语言处理（NLP）的进展既带来机遇，也带来挑战。尽管最近的进展使得可以开发出执行各种任务的高性能模型，但也存在着模型从数据中学习有害偏见的风险，比如性别刻板印象。在这项工作中，我们研究了关注力在当前最先进的NLP模型中的传播社会偏见的作用。具体来说，我们研究了注意力分布的熵与模型性能和公平性之间的关系。然后，我们提出了一种新颖的方法，通过调节注意力权重来改善模型的公平性。由于我们的方法仅在训练后和推理前应用，因此它是一种内部处理方法，因此比现有的内部处理和前处理方法更节省计算资源。我们的结果显示，在使用不同大小的语言模型进行文本分类和生成任务时，公平性有所提高，同时性能损失很小。警告：本文使用了令人不悦的语言。

更新时间: 2024-08-02 19:20:25

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2305.13088v2

Momentum Capture and Prediction System Based on Wimbledon Open2023 Tournament Data

There is a hidden energy in tennis, which cannot be seen or touched. It is the force that controls the flow of the game and is present in all types of matches. This mysterious force is Momentum. This study introduces an evaluation model that synergizes the Entropy Weight Method (EWM) and Gray Relation Analysis (GRA) to quantify momentum's impact on match outcomes. Empirical validation was conducted through Mann-Whitney U and Kolmogorov-Smirnov tests, which yielded p values of 0.0043 and 0.00128,respectively. These results underscore the non-random association between momentum shifts and match outcomes, highlighting the critical role of momentum in tennis. Otherwise, our investigation foucus is the creation of a predictive model that combines the advanced machine learning algorithm XGBoost with the SHAP framework. This model enables precise predictions of match swings with exceptional accuracy (0.999013 for multiple matches and 0.992738 for finals). The model's ability to identify the influence of specific factors on match dynamics,such as bilateral distance run during points, demonstrates its prowess.The model's generalizability was thoroughly evaluated using datasets from the four Grand Slam tournaments. The results demonstrate its remarkable adaptability to different match scenarios,despite minor variations in predictive accuracy. It offers strategic insights that can help players effectively respond to opponents' shifts in momentum,enhancing their competitive edge.

Updated: 2024-08-02 19:14:49

标题: 基于2023年温布尔登公开赛数据的动量捕获和预测系统

摘要: 在网球比赛中存在一种看不见、摸不着的隐藏能量。这种能量控制着比赛的流动，并存在于各种类型的比赛中。这种神秘的力量就是动量。本研究引入了一个评估模型，将熵权法（EWM）和灰色关联分析（GRA）相结合，以量化动量对比赛结果的影响。经过曼-惠特尼U检验和科尔莫哥洛夫-斯米尔诺夫检验的实证验证，分别得到了0.0043和0.00128的p值。这些结果强调了动量转移和比赛结果之间的非随机关联，凸显了动量在网球中的关键作用。此外，我们的调查重点是创建一个将先进的机器学习算法XGBoost与SHAP框架相结合的预测模型。这个模型能够以异常精度（多场比赛为0.999013，决赛为0.992738）精确预测比赛的走势。该模型能够识别特定因素对比赛动态的影响，比如比赛中双方跑动的距离，展示了其实力。通过使用四大满贯锦标赛的数据集对模型的普适性进行了彻底评估。结果表明，尽管在预测准确性方面存在一些小的变化，但它在不同比赛场景中具有显著的适应性。它提供了战略见解，可以帮助球员有效应对对手动量的变化，提升他们的竞争优势。

更新时间: 2024-08-02 19:14:49

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2408.01544v1

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring. Treacherous operating conditions, fragile surroundings, and limited navigation control often dictate that submersibles restrict their range of motion and, thus, the baseline over which they can capture measurements. In the context of 3D scene reconstruction, it is well-known that smaller baselines make reconstruction more challenging. Our work develops a physics-based multimodal acoustic-optical neural surface reconstruction framework (AONeuS) capable of effectively integrating high-resolution RGB measurements with low-resolution depth-resolved imaging sonar measurements. By fusing these complementary modalities, our framework can reconstruct accurate high-resolution 3D surfaces from measurements captured over heavily-restricted baselines. Through extensive simulations and in-lab experiments, we demonstrate that AONeuS dramatically outperforms recent RGB-only and sonar-only inverse-differentiable-rendering--based surface reconstruction methods. A website visualizing the results of our paper is located at this address: https://aoneus.github.io/

Updated: 2024-08-02 19:02:51

标题: AONeuS：声光传感器融合的神经渲染框架

摘要: 水下感知和3D表面重建是具有广泛应用的具有挑战性的问题，在建筑、安全、海洋考古学和环境监测等领域有广泛应用。恶劣的操作条件、脆弱的环境和有限的导航控制通常要求潜水器限制其运动范围，因此，它们可以捕捉测量值的基线也受到限制。在3D场景重建的背景下，众所周知，较小的基线会使重建更具挑战性。我们的工作开发了一个基于物理的多模态声光神经表面重建框架（AONeuS），能够有效地将高分辨率的RGB测量与低分辨率的深度解析成像声纳测量集成在一起。通过融合这些互补的模态，我们的框架可以从在受限制的基线上捕获的测量中重建出准确的高分辨率3D表面。通过大量的模拟和实验室实验，我们证明了AONeuS在RGB-only和sonar-only逆可微渲染基础表面重建方法方面表现出色。我们的论文结果可通过以下网址进行查看：https://aoneus.github.io/

更新时间: 2024-08-02 19:02:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.03309v3

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.

Updated: 2024-08-02 18:55:25

标题: 鸸鹋视频：通过明确图像调节因子化文本到视频生成

摘要: 我们提出了Emu Video，这是一个文本到视频生成模型，将生成过程分解为两个步骤：首先根据文本生成图像，然后根据文本和生成的图像生成视频。我们确定了关键的设计决策--调整扩散的噪声计划，以及多阶段训练，使我们能够直接生成高质量和高分辨率的视频，而无需像以前的工作那样需要深度级联的模型。在人类评估中，我们生成的视频在质量上比所有以前的工作都更受欢迎--81%比Google的Imagen Video，90%比Nvidia的PYOCO，96%比Meta的Make-A-Video。我们的模型胜过商业解决方案，如RunwayML的Gen2和Pika Labs。最后，我们的分解方法自然地适用于根据用户的文本提示使图像动画化，我们的生成比以前的工作更受欢迎，达到了96%。

更新时间: 2024-08-02 18:55:25

领域: cs.CV,cs.AI,cs.GR,cs.LG,cs.MM

下载: http://arxiv.org/abs/2311.10709v2

Active Learning for Neural PDE Solvers

Solving partial differential equations (PDEs) is a fundamental problem in engineering and science. While neural PDE solvers can be more efficient than established numerical solvers, they often require large amounts of training data that is costly to obtain. Active Learning (AL) could help surrogate models reach the same accuracy with smaller training sets by querying classical solvers with more informative initial conditions and PDE parameters. While AL is more common in other domains, it has yet to be studied extensively for neural PDE solvers. To bridge this gap, we introduce AL4PDE, a modular and extensible active learning benchmark. It provides multiple parametric PDEs and state-of-the-art surrogate models for the solver-in-the-loop setting, enabling the evaluation of existing and the development of new AL methods for PDE solving. We use the benchmark to evaluate batch active learning algorithms such as uncertainty- and feature-based methods. We show that AL reduces the average error by up to 71% compared to random sampling and significantly reduces worst-case errors. Moreover, AL generates similar datasets across repeated runs, with consistent distributions over the PDE parameters and initial conditions. The acquired datasets are reusable, providing benefits for surrogate models not involved in the data generation.

Updated: 2024-08-02 18:48:58

标题: 神经PDE求解器的主动学习

摘要: 解决偏微分方程（PDEs）是工程和科学中的一个基本问题。虽然神经PDE求解器可能比传统的数值求解器更有效，但它们通常需要大量昂贵的训练数据。主动学习（AL）可以通过查询具有更具信息量的初始条件和PDE参数的经典求解器，帮助代理模型以更小的训练集达到相同的准确性。虽然AL在其他领域更常见，但对于神经PDE求解器尚未进行深入研究。为了弥补这一差距，我们引入了AL4PDE，一个模块化和可扩展的主动学习基准。它提供多个参数化的PDEs和最先进的代理模型，用于求解器-循环设置，从而评估现有的和开发新的PDE求解的AL方法。我们使用该基准来评估批量主动学习算法，如不确定性和基于特征的方法。我们展示AL将平均误差降低了高达71％，与随机抽样相比，并显著减少了最坏情况下的错误。此外，AL在重复运行中生成类似的数据集，具有一致的分布，涵盖了PDE参数和初始条件。获取的数据集可重复使用，为未参与数据生成的代理模型提供益处。

更新时间: 2024-08-02 18:48:58

领域: cs.LG,cs.AI,cs.CE,cs.NE

下载: http://arxiv.org/abs/2408.01536v1

An Adaptive Tensor-Train Decomposition Approach for Efficient Deep Neural Network Compression

In the field of model compression, choosing an appropriate rank for tensor decomposition is pivotal for balancing model compression rate and efficiency. However, this selection, whether done manually or through optimization-based automatic methods, often increases computational complexity. Manual rank selection lacks efficiency and scalability, often requiring extensive trial-and-error, while optimization-based automatic methods significantly increase the computational burden. To address this, we introduce a novel, automatic, and budget-aware rank selection method for efficient model compression, which employs Layer-Wise Imprinting Quantitation (LWIQ). LWIQ quantifies each layer's significance within a neural network by integrating a proxy classifier. This classifier assesses the layer's impact on overall model performance, allowing for a more informed adjustment of tensor rank. Furthermore, our approach includes a scaling factor to cater to varying computational budget constraints. This budget awareness eliminates the need for repetitive rank recalculations for different budget scenarios. Experimental results on the CIFAR-10 dataset show that our LWIQ improved by 63.2$\%$ in rank search efficiency, and the accuracy only dropped by 0.86$\%$ with 3.2x less model size on the ResNet-56 model as compared to the state-of-the-art proxy-based automatic tensor rank selection method.

Updated: 2024-08-02 18:47:11

标题: 一种用于高效深度神经网络压缩的自适应张量列分解方法

摘要: 在模型压缩领域，选择适当的张量分解秩对于平衡模型压缩率和效率至关重要。然而，无论是通过手动选择还是通过基于优化的自动方法，这种选择往往会增加计算复杂性。手动选择秩缺乏效率和可扩展性，通常需要大量的试错，而基于优化的自动方法显著增加了计算负担。为了解决这个问题，我们引入了一种新颖的、自动的、具有预算意识的秩选择方法，用于高效的模型压缩，该方法采用了层级印记量化（LWIQ）。LWIQ通过整合代理分类器来量化神经网络中每个层的重要性。该分类器评估了层对整体模型性能的影响，从而更加明智地调整张量秩。此外，我们的方法包括一个缩放因子，以满足不同计算预算约束。这种预算意识消除了在不同预算情景下重复计算秩的需要。基于CIFAR-10数据集的实验结果显示，与基于代理的自动张量秩选择方法相比，我们的LWIQ在秩搜索效率上提高了63.2$\%$，在ResNet-56模型上的模型尺寸仅减少了3.2倍，精度仅下降了0.86$\%$。

更新时间: 2024-08-02 18:47:11

领域: cs.LG,68T10, 65K10

下载: http://arxiv.org/abs/2408.01534v1

Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization

In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity. Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat. Current multi-modal deepfake detectors are often based on the attention-based fusion of heterogeneous data streams from multiple modalities. However, the heterogeneous nature of the data (such as audio and visual signals) creates a distributional modality gap and poses a significant challenge in effective fusion and hence multi-modal deepfake detection. In this paper, we propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection. The proposed approach applies attention to multi-modal multi-sequence representations and learns the contributing features among them for deepfake detection and localization. Thorough experimental validations on audio-visual deepfake datasets, namely FakeAVCeleb, AV-Deepfake1M, TVIL, and LAV-DF datasets, demonstrate the efficacy of our approach. Cross-comparison with the published studies demonstrates superior performance of our approach with an improved accuracy and precision by 3.47% and 2.05% in deepfake detection and localization, respectively. Thus, obtaining state-of-the-art performance. To facilitate reproducibility, the code and the datasets information is available at https://github.com/vcbsl/audiovisual-deepfake/.

Updated: 2024-08-02 18:45:01

标题: 跨模态上下文关注用于音视频深度伪造检测和定位

摘要: 在数字时代，深度伪造和合成媒体的出现对社会和政治的完整性构成了重大威胁。基于多模态操纵的深度伪造，例如音频-视觉，更具真实性并构成更大的威胁。当前的多模态深度伪造检测器通常基于来自多种模态的异构数据流的基于注意力的融合。然而，数据的异质性（如音频和视觉信号）造成了一个分布模态差距，给有效融合和因此多模态深度伪造检测带来了重大挑战。在本文中，我们提出了一种基于递归神经网络（RNNs）的新型多模态注意力框架，利用上下文信息进行音频-视觉深度伪造检测。所提出的方法将注意力应用于多模态多序列表示，并学习它们之间的贡献特征以用于深度伪造检测和定位。对音频-视觉深度伪造数据集FakeAVCeleb、AV-Deepfake1M、TVIL和LAV-DF进行了彻底的实验验证，证明了我们方法的有效性。与已发表的研究进行交叉比较表明，我们的方法在深度伪造检测和定位方面的准确性和精度分别提高了3.47%和2.05%，表现出卓越性能。因此，实现了最先进的性能。为了促进可重现性，代码和数据集信息可在https://github.com/vcbsl/audiovisual-deepfake/上获得。

更新时间: 2024-08-02 18:45:01

领域: cs.SD,cs.AI,cs.CV,cs.MM,eess.AS

下载: http://arxiv.org/abs/2408.01532v1

A Structured Framework for Predicting Sustainable Aviation Fuel Properties using Liquid-Phase FTIR and Machine Learning

Sustainable aviation fuels have the potential for reducing emissions and environmental impact. To help identify viable sustainable aviation fuels and accelerate research, several machine learning models have been developed to predict relevant physiochemical properties. However, many of the models have limited applicability, leverage data from complex analytical techniques with confined spectral ranges, or use feature decomposition methods that have limited interpretability. Using liquid-phase Fourier Transform Infrared (FTIR) spectra, this study presents a structured method for creating accurate and interpretable property prediction models for neat molecules, aviation fuels, and blends. Liquid-phase FTIR spectra measurements can be collected quickly and consistently, offering high reliability, sensitivity, and component specificity using less than 2 mL of sample. The method first decomposes FTIR spectra into fundamental building blocks using Non-negative Matrix Factorization (NMF) to enable scientific analysis of FTIR spectra attributes and fuel properties. The NMF features are then used to create five ensemble models for predicting final boiling point, flash point, freezing point, density at 15C, and kinematic viscosity at -20C. All models were trained using experimental property data from neat molecules, aviation fuels, and blends. The models accurately predict properties while enabling interpretation of relationships between compositional elements of a fuel, such as functional groups or chemical classes, and its properties. To support sustainable aviation fuel research and development, the models and data are available on an interactive web tool.

Updated: 2024-08-02 18:43:22

标题: 使用液相FTIR和机器学习预测可持续航空燃料性质的结构化框架

摘要: 可持续航空燃料具有减少排放和环境影响的潜力。为了帮助识别可行的可持续航空燃料并加快研究进展，已经开发了几种机器学习模型来预测相关的物理化学性质。然而，许多模型的适用性有限，利用了来自于具有受限光谱范围的复杂分析技术的数据，或者使用了具有有限可解释性的特征分解方法。本研究利用液相傅里叶变换红外（FTIR）光谱，提出了一种结构化方法，用于创建准确且可解释的性质预测模型，适用于纯分子、航空燃料和混合物。液相FTIR光谱测量可以快速且一致地收集，提供高可靠性、灵敏性和成分特异性，使用少于2毫升的样品。该方法首先使用非负矩阵分解（NMF）将FTIR光谱分解为基本构建块，以便对FTIR光谱属性和燃料性质进行科学分析。然后，利用NMF特征创建了五个集成模型，用于预测最终沸点、闪点、冰点、15摄氏度密度和-20摄氏度运动粘度。所有模型均使用来自纯分子、航空燃料和混合物的实验性质数据进行训练。这些模型准确预测了性质，同时使得可以解释燃料的组成元素（如功能团或化学类）与其性质之间的关系。为了支持可持续航空燃料的研究和开发，这些模型和数据可在交互式网络工具上获得。

更新时间: 2024-08-02 18:43:22

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2408.01530v1

Can multivariate Granger causality detect directed connectivity of a multistable and dynamic biological decision network model?

Extracting causal connections can advance interpretable AI and machine learning. Granger causality (GC) is a robust statistical method for estimating directed influences (DC) between signals. While GC has been widely applied to analysing neuronal signals in biological neural networks and other domains, its application to complex, nonlinear, and multistable neural networks is less explored. In this study, we applied time-domain multi-variate Granger causality (MVGC) to the time series neural activity of all nodes in a trained multistable biologically based decision neural network model with real-time decision uncertainty monitoring. Our analysis demonstrated that challenging two-choice decisions, where input signals could be closely matched, and the appropriate application of fine-grained sliding time windows, could readily reveal the original model's DC. Furthermore, the identified DC varied based on whether the network had correct or error decisions. Integrating the identified DC from different decision outcomes recovered most of the original model's architecture, despite some spurious and missing connectivity. This approach could be used as an initial exploration to enhance the interpretability and transparency of dynamic multistable and nonlinear biological or AI systems by revealing causal connections throughout different phases of neural network dynamics and outcomes.

Updated: 2024-08-02 18:40:15

标题: 多元格兰杰因果关系能否检测多稳定和动态生物决策网络模型的有向连接性？

摘要: 提取因果关系可以推进可解释的人工智能和机器学习。Granger因果关系（GC）是一种稳健的统计方法，用于估计信号之间的定向影响（DC）。虽然GC已被广泛应用于分析生物神经网络等领域的神经信号，但其在复杂、非线性和多稳定神经网络中的应用较少探讨。在本研究中，我们应用了时域多变量Granger因果关系（MVGC）来分析已经训练过的多稳定基于生物的决策神经网络模型中所有节点的时间序列神经活动，同时实时监测决策的不确定性。我们的分析表明，在挑战性的二选一决策中，输入信号可能非常相似，并且适当应用细粒度滑动时间窗口可以轻松地揭示原始模型的DC。此外，鉴定出的DC根据网络是否做出正确或错误决策而变化。整合不同决策结果中鉴定出的DC可以恢复大部分原始模型的结构，尽管存在一些虚假和缺失的连接。这种方法可以作为增强动态多稳定和非线性生物或人工智能系统可解释性和透明性的初步探索，通过揭示神经网络动态和结果的不同阶段中的因果连接。

更新时间: 2024-08-02 18:40:15

领域: q-bio.NC,cs.LG,cs.NE,math.DS

下载: http://arxiv.org/abs/2408.01528v1

Analyzing LLMs' Capabilities to Establish Implicit User Sentiment of Software Desirability

This study explores the use of several LLMs for providing quantitative zero-shot sentiment analysis of implicit software desirability expressed by users. The study provides scaled numerical sentiment analysis unlike other methods that simply classify sentiment as positive, neutral, or negative. Numerical analysis provides deeper insights into the magnitude of sentiment, to drive better decisions regarding product desirability. Data is collected through the use of the Microsoft Product Desirability Toolkit (PDT), a well-known qualitative user experience analysis tool. For initial exploration, the PDT metric was given to users of ZORQ, a gamification system used in undergraduate computer science education. The PDT data collected was fed through several LLMs (Claude Sonnet 3 and 3.5, GPT4, and GPT4o) and through a leading transfer learning technique, Twitter-Roberta-Base-Sentiment (TRBS), and through Vader, a leading sentiment analysis tool, for quantitative sentiment analysis. Each system was asked to evaluate the data in two ways, first by looking at the sentiment expressed in the PDT word/explanation pairs; and by looking at the sentiment expressed by the users in their grouped selection of five words and explanations, as a whole. Each LLM was also asked to provide its confidence (low, medium, high) in its sentiment score, along with an explanation of why it selected the sentiment value. All LLMs tested were able to statistically detect user sentiment from the users' grouped data, whereas TRBS and Vader were not. The confidence and explanation of confidence provided by the LLMs assisted in understanding the user sentiment. This study adds to a deeper understanding of evaluating user experiences, toward the goal of creating a universal tool that quantifies implicit sentiment expressed.

Updated: 2024-08-02 18:40:10

标题: 分析LLMs的能力以建立软件吸引力的隐含用户情感

摘要: 这项研究探讨了几种LLMs的使用，用于提供用户表达的软件期望的定量零样本情感分析。该研究提供了与其他方法不同的按比例的数值情感分析，其他方法通常只将情感分类为积极、中立或消极。数值分析提供了更深入的洞察力，以便更好地做出关于产品期望的决策。数据是通过使用微软产品期望工具包（PDT）收集的，这是一个知名的定性用户体验分析工具。为了进行初步探索，PDT指标被分配给ZORQ的用户，这是一个用于本科计算机科学教育的游戏化系统。收集的PDT数据经过几种LLMs（Claude Sonnet 3和3.5、GPT4和GPT4o）以及一个领先的迁移学习技术Twitter-Roberta-Base-Sentiment（TRBS）和Vader（一种领先的情感分析工具）进行了定量情感分析。要求每个系统以两种方式评估数据，首先是查看PDT单词/解释对中表达的情感；其次是查看用户在其组合选择的五个单词和解释中表达的情感。还要求每个LLM在其情感分数中提供其信心水平（低、中、高），以及为什么选择该情感值的解释。所有经过测试的LLMs都能够从用户的分组数据中统计地检测用户情感，而TRBS和Vader不能。LLMs提供的信心和信心解释有助于理解用户情感。这项研究加深了对评估用户体验的理解，朝着创建一个量化表达的隐性情感的通用工具的目标迈进。

更新时间: 2024-08-02 18:40:10

领域: cs.CL,cs.AI,cs.HC,cs.LG,cs.SE,I.2.7; D.2.8; I.2.6; H.5.2

下载: http://arxiv.org/abs/2408.01527v1

A probabilistic framework for learning non-intrusive corrections to long-time climate simulations from short-time training data

Chaotic systems, such as turbulent flows, are ubiquitous in science and engineering. However, their study remains a challenge due to the large range scales, and the strong interaction with other, often not fully understood, physics. As a consequence, the spatiotemporal resolution required for accurate simulation of these systems is typically computationally infeasible, particularly for applications of long-term risk assessment, such as the quantification of extreme weather risk due to climate change. While data-driven modeling offers some promise of alleviating these obstacles, the scarcity of high-quality simulations results in limited available data to train such models, which is often compounded by the lack of stability for long-horizon simulations. As such, the computational, algorithmic, and data restrictions generally imply that the probability of rare extreme events is not accurately captured. In this work we present a general strategy for training neural network models to non-intrusively correct under-resolved long-time simulations of chaotic systems. The approach is based on training a post-processing correction operator on under-resolved simulations nudged towards a high-fidelity reference. This enables us to learn the dynamics of the underlying system directly, which allows us to use very little training data, even when the statistics thereof are far from converged. Additionally, through the use of probabilistic network architectures we are able to leverage the uncertainty due to the limited training data to further improve extrapolation capabilities. We apply our framework to severely under-resolved simulations of quasi-geostrophic flow and demonstrate its ability to accurately predict the anisotropic statistics over time horizons more than 30 times longer than the data seen in training.

Updated: 2024-08-02 18:34:30

标题: 一个用于从短期训练数据中学习长期气候模拟的非侵入性校正的概率框架

摘要: 混沌系统，如湍流流动，在科学和工程中无处不在。然而，由于涉及大范围尺度和与其他物理现象的强烈相互作用（通常不完全理解），对它们的研究仍然是一项挑战。因此，为准确模拟这些系统所需的时空分辨率通常在计算上是不可行的，特别是对于长期风险评估等应用，例如由于气候变化导致的极端天气风险的量化。虽然基于数据驱动建模有望缓解这些障碍，但高质量模拟结果的稀缺性限制了用于训练这些模型的可用数据，这往往又加剧了长时间模拟的稳定性不足的问题。因此，计算、算法和数据限制通常意味着罕见极端事件的概率无法准确捕捉。在这项工作中，我们提出了一种通用策略，用于训练神经网络模型以非侵入性地纠正对混沌系统进行不充分分辨率长时间模拟的情况。该方法基于对朝向高保真度参考的不充分分辨率模拟进行后处理校正算子的训练。这使我们能够直接学习底层系统的动态，从而使我们可以仅使用极少量的训练数据，即使其统计数据远未收敛。此外，通过使用概率网络架构，我们能够利用由于有限训练数据而产生的不确定性，进一步提高外推能力。我们将我们的框架应用于对准地转流动的严重不充分分辨率模拟，并展示其能够准确预测超过训练数据中所见30倍以上的时间范围内的各向异性统计数据。

更新时间: 2024-08-02 18:34:30

领域: cs.LG,math.DS,physics.ao-ph,physics.flu-dyn

下载: http://arxiv.org/abs/2408.02688v1

DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace

Software development in the aerospace domain requires adhering to strict, high-quality standards. While there exist regulatory guidelines for commercial software in this domain (e.g., ARP-4754 and DO-178), these do not apply to software with deep neural network (DNN) components. Consequently, it is unclear how to allow aerospace systems to benefit from the deep learning revolution. Our work here seeks to address this challenge with a novel, output-centric approach for DNN certification. Our method employs statistical verification techniques, and has the key advantage of being able to flag specific inputs for which the DNN's output may be unreliable - so that they may be later inspected by a human expert. To achieve this, our method conducts a statistical analysis of the DNN's predictions for other, nearby inputs, in order to detect inconsistencies. This is in contrast to existing techniques, which typically attempt to certify the entire DNN, as opposed to individual outputs. Our method uses the DNN as a black-box, and makes no assumptions about its topology. We hope that this work constitutes another step towards integrating DNNs in safety-critical applications - especially in the aerospace domain, where high standards of quality and reliability are crucial.

Updated: 2024-08-02 18:27:15

标题: DEM：一种用于在航空航天中对深度神经网络分类器输出进行认证的方法

摘要: 航空航天领域的软件开发需要遵守严格的高质量标准。虽然在该领域存在商业软件的监管指南（如ARP-4754和DO-178），但这些指南并不适用于具有深度神经网络（DNN）组件的软件。因此，目前尚不清楚如何让航空系统从深度学习革命中受益。我们在这里的工作旨在通过一种新颖的以输出为中心的方法来解决这一挑战，用于DNN认证。我们的方法采用统计验证技术，具有一个关键优势，即能够标记DNN可能输出不可靠的特定输入，以便后续由人类专家检查。为了实现这一目标，我们的方法对其他附近输入的DNN预测进行统计分析，以检测不一致性。这与现有技术形成对比，后者通常试图对整个DNN进行认证，而不是针对单个输出。我们的方法将DNN作为黑盒使用，并不对其拓扑结构做任何假设。我们希望这项工作是朝着将DNN集成到安全关键应用中迈出的又一步 - 尤其是在航空领域，那里高质量和可靠性标准至关重要。

更新时间: 2024-08-02 18:27:15

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2401.02283v4

Gradient flow in parameter space is equivalent to linear interpolation in output space

We prove that the usual gradient flow in parameter space that underlies many training algorithms for neural networks in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved.

Updated: 2024-08-02 18:23:17

标题: 参数空间中的梯度流等价于输出空间中的线性插值

摘要: 我们证明了深度学习神经网络训练算法中常见的参数空间中的梯度流可以连续变形为适应的梯度流，从而产生（受限制的）输出空间中的欧几里德梯度流。此外，如果输出与参数之间的雅可比矩阵是满秩的（对于固定的训练数据），则时间变量可以重新参数化，使得结果流仅为线性插值，并且可以实现全局最小值。

更新时间: 2024-08-02 18:23:17

领域: cs.LG,cs.AI,math-ph,math.MP,math.OC,stat.ML,62M45, 37C10

下载: http://arxiv.org/abs/2408.01517v1

Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification

Quantum machine learning (QML) has the potential for improving the multi-label classification of rare, albeit critical, diseases in large-scale chest x-ray (CXR) datasets due to theoretical quantum advantages over classical machine learning (CML) in sample efficiency and generalizability. While prior literature has explored QML with CXRs, it has focused on binary classification tasks with small datasets due to limited access to quantum hardware and computationally expensive simulations. To that end, we implemented a Jax-based framework that enables the simulation of medium-sized qubit architectures with significant improvements in wall-clock time over current software offerings. We evaluated the performance of our Jax-based framework in terms of efficiency and performance for hybrid quantum transfer learning for long-tailed classification across 8, 14, and 19 disease labels using large-scale CXR datasets. The Jax-based framework resulted in up to a 58% and 95% speed-up compared to PyTorch and TensorFlow implementations, respectively. However, compared to CML, QML demonstrated slower convergence and an average AUROC of 0.70, 0.73, and 0.74 for the classification of 8, 14, and 19 CXR disease labels. In comparison, the CML models had an average AUROC of 0.77, 0.78, and 0.80 respectively. In conclusion, our work presents an accessible implementation of hybrid quantum transfer learning for long-tailed CXR classification with a computationally efficient Jax-based framework.

Updated: 2024-08-02 18:18:48

标题: 拓展视野：实现长尾胸部X射线分类的混合量子迁移学习

摘要: 量子机器学习（QML）具有潜力改善大规模胸部X射线（CXR）数据集中罕见但关键疾病的多标签分类，这是由于理论上量子在样本效率和泛化能力上优于经典机器学习（CML）。尽管先前的文献已经探讨了与CXR相关的QML，但由于对量子硬件的有限访问以及计算昂贵的模拟，它主要集中在小数据集的二元分类任务上。为此，我们实施了一个基于Jax的框架，该框架能够模拟中等规模的量子比特架构，并在墙钟时间上相对于当前软件提供的显著改进。我们评估了我们基于Jax的框架在效率和性能方面的表现，用于大规模CXR数据集上的混合量子迁移学习，跨8、14和19种疾病标签进行长尾分类。与PyTorch和TensorFlow实现相比，基于Jax的框架速度提高了58%至95%。然而，与CML相比，QML表现出较慢的收敛速度，并且对于8、14和19种CXR疾病标签的分类，平均AUROC分别为0.70、0.73和0.74。相比之下，CML模型的平均AUROC分别为0.77、0.78和0.80。总之，我们的工作提出了一个基于Jax的计算效率高的框架，实现了用于长尾CXR分类的混合量子迁移学习的易于访问的实现。

更新时间: 2024-08-02 18:18:48

领域: cs.CV,cs.AI,cs.LG,quant-ph

下载: http://arxiv.org/abs/2405.00156v2

Adaptive Planning with Generative Models under Uncertainty

Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains, including reinforcement learning and autonomous navigation. While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges, primarily due to the complexity of the generative model's underlying deep learning architecture. Our work addresses this challenge by introducing a simple adaptive planning policy that leverages the generative model's ability to predict long-horizon state trajectories, enabling the execution of multiple actions consecutively without the need for immediate replanning. We propose to use the predictive uncertainty derived from a Deep Ensemble of inverse dynamics models to dynamically adjust the intervals between planning sessions. In our experiments conducted on locomotion tasks within the OpenAI Gym framework, we demonstrate that our adaptive planning policy allows for a reduction in replanning frequency to only about 10% of the steps without compromising the performance. Our results underscore the potential of generative modeling as an efficient and effective tool for decision-making.

Updated: 2024-08-02 18:07:53

标题: 不确定性条件下基于生成模型的自适应规划

摘要: 利用生成模型进行规划已经成为跨领域有效的决策制定范式，包括强化学习和自主导航。虽然在每个时间步骤进行连续重新规划可能看起来很直观，因为它允许基于最新环境观察做出决策，但由于生成模型底层深度学习架构的复杂性，这导致了重大的计算挑战。我们的工作通过引入一种简单的自适应规划策略来解决这一挑战，利用生成模型预测长时间轨迹的能力，使得能够连续执行多个动作而无需立即重新规划。我们建议利用由逆动力学模型的深度集成产生的预测不确定性来动态调整规划会话之间的间隔。在我们在OpenAI Gym框架内进行的运动任务实验中，我们证明了我们的自适应规划策略可以将重新规划频率降低到仅约10%的步骤，而不影响性能。我们的结果强调了生成建模作为决策制定的高效有效工具的潜力。

更新时间: 2024-08-02 18:07:53

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2408.01510v1

Blockchain Economic Denial of Sustainability Attack: Exploiting Latency Optimization in Ethereum Transaction Forwarding

Strategies related to the blockchain concept of Extractable Value (MEV/BEV), such as arbitrage, front- or back-running create an economic incentive for network nodes to reduce latency, including minimizing transaction validation time -- a core feature to secure blockchain networks. A modified node, that neglects to filter invalid transactions in the Ethereum P2P network, introduces novel attack vectors. In this work, we formalize and evaluate a Blockchain Economic Denial of Sustainability (EDoS) attack, which can cause financial losses in traffic costs for operators of modified nodes. We 1) mathematically define the attack model, 2) identify thousands of empirical instances of this similar attack in the wild, 3) empirically measure the model parameters from our two monitoring nodes, and 4) conduct attack simulations on the local network to compare its performance with existing Denial-of-Service attacks. We show that an attacker can amplify network traffic at modified nodes by a factor of 3,600, and cause economic damages 13,800 times greater than the amount needed to carry out the attack. Despite these risks, aggressive latency reduction may still be profitable enough to justify the existence of modified nodes. To assess this trade-off, we 1) simulate the transaction validation process in the local network and 2) empirically measure the latency reduction by deploying our modified node in the Ethereum testnet. We conclude with a cost-benefit analysis of skipping validation and provide mitigation strategies against this attack.

Updated: 2024-08-02 18:06:33

标题: 区块链经济可持续性否认攻击：利用以太坊交易转发中的延迟优化

摘要: 与区块链概念中的可提取价值（MEV/BEV）相关的策略，如套利、前置或后置交易，为网络节点降低延迟提供了经济激励，包括最小化交易验证时间——这是确保区块链网络安全的核心特性。一个修改过的节点，在以太坊P2P网络中忽略筛选无效交易，引入了新的攻击向量。在这项研究中，我们形式化和评估了一种区块链经济拒绝可持续性（EDoS）攻击，这可能导致修改节点的运营商在交通成本上遭受财务损失。我们1）数学定义攻击模型，2）在野外识别了数千个这种类似攻击的实例，3）从我们的两个监控节点实证测量了模型参数，4）在本地网络上进行攻击模拟，比较其性能与现有的拒绝服务攻击。我们展示了攻击者可以将修改节点的网络流量增加3,600倍，并导致经济损失是实施攻击所需金额的13,800倍。尽管存在这些风险，积极降低延迟仍然可能足够有利可图，以证明修改节点的存在。为了评估这种权衡，我们1）在本地网络中模拟交易验证过程，2）通过在以太坊测试网络中部署我们的修改节点来实证测量延迟减少。最后，我们总结了跳过验证的成本效益分析，并提供了对抗这种攻击的缓解策略。

更新时间: 2024-08-02 18:06:33

领域: cs.CR

下载: http://arxiv.org/abs/2408.01508v1

Efficient Graph Coloring with Neural Networks: A Physics-Inspired Approach for Large Graphs

The graph coloring problem is an optimization problem involving the assignment of one of q colors to each vertex of a graph such that no two adjacent vertices share the same color. This problem is NP-hard and arises in various practical applications. In this work, we present a novel algorithm that leverages graph neural networks to tackle the problem efficiently, particularly for large graphs. We propose a physics-inspired approach that leverages tools used in statistical mechanics to improve the training and performance of the algorithm. The scaling of our method is evaluated for different connectivities and graph sizes. Finally, we demonstrate the effectiveness of our method on a dataset of Erdos-Renyi graphs, showing its applicability also in hard-to-solve connectivity regions where traditional methods struggle.

Updated: 2024-08-02 18:02:51

标题: 用神经网络实现高效图着色：一种物理启发的大规模图方法

摘要: 图着色问题是一个优化问题，涉及将q种颜色中的一种分配给图的每个顶点，以使相邻顶点不共享相同的颜色。该问题是NP难的，在各种实际应用中出现。在这项工作中，我们提出了一种利用图神经网络有效解决该问题的新算法，特别适用于大型图。我们提出了一种受物理启发的方法，利用统计力学中使用的工具来改进算法的训练和性能。我们评估了我们的方法在不同连接性和图大小下的扩展性。最后，我们展示了我们的方法在Erdos-Renyi图数据集上的有效性，表明它也适用于传统方法难以解决的连接性区域。

更新时间: 2024-08-02 18:02:51

领域: cs.LG,68T20

下载: http://arxiv.org/abs/2408.01503v1

NeuralFactors: A Novel Factor Learning Approach to Generative Modeling of Equities

The use of machine learning for statistical modeling (and thus, generative modeling) has grown in popularity with the proliferation of time series models, text-to-image models, and especially large language models. Fundamentally, the goal of classical factor modeling is statistical modeling of stock returns, and in this work, we explore using deep generative modeling to enhance classical factor models. Prior work has explored the use of deep generative models in order to model hundreds of stocks, leading to accurate risk forecasting and alpha portfolio construction; however, that specific model does not allow for easy factor modeling interpretation in that the factor exposures cannot be deduced. In this work, we introduce NeuralFactors, a novel machine-learning based approach to factor analysis where a neural network outputs factor exposures and factor returns, trained using the same methodology as variational autoencoders. We show that this model outperforms prior approaches both in terms of log-likelihood performance and computational efficiency. Further, we show that this method is competitive to prior work in generating realistic synthetic data, covariance estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization. Finally, due to the connection to classical factor analysis, we analyze how the factors our model learns cluster together and show that the factor exposures could be used for embedding stocks.

Updated: 2024-08-02 18:01:09

标题: 神经因素：一种新的因素学习方法用于生成股票建模

摘要: 机器学习在统计建模（因此，生成建模）方面的应用随着时间序列模型、文本到图像模型和尤其是大型语言模型的普及而日益受到青睐。基本上，经典因子建模的目标是对股票回报进行统计建模，在这项工作中，我们探讨了使用深度生成建模来增强经典因子模型。先前的研究已经探索了使用深度生成模型来对数百只股票进行建模，从而实现准确的风险预测和Alpha组合构建；然而，该特定模型不易于因子建模解释，因为无法推断因子敞口。在这项工作中，我们介绍了一种名为NeuralFactors的新颖的基于机器学习的因子分析方法，其中一个神经网络输出因子敞口和因子回报，使用与变分自动编码器相同的方法进行训练。我们展示了这种模型在对数似然性能和计算效率方面都优于先前的方法。此外，我们展示了该方法在生成逼真的合成数据、协方差估计、风险分析（例如投资组合的价值风险，或VaR）和投资组合优化方面与先前研究具有竞争力。最后，由于与经典因子分析的联系，我们分析了我们的模型学习的因子如何聚类在一起，并展示因子敞口可以用于嵌入股票。

更新时间: 2024-08-02 18:01:09

领域: q-fin.ST,cs.LG

下载: http://arxiv.org/abs/2408.01499v1

Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting

Large Language Models (LLMs) exhibit remarkable proficiency in addressing a diverse array of tasks within the Natural Language Processing (NLP) domain, with various prompt design strategies significantly augmenting their capabilities. However, these prompts, while beneficial, each possess inherent limitations. The primary prompt design methodologies are twofold: The first, exemplified by the Chain of Thought (CoT), involves manually crafting prompts specific to individual datasets, hence termed Expert-Designed Prompts (EDPs). Once these prompts are established, they are unalterable, and their effectiveness is capped by the expertise of the human designers. When applied to LLMs, the static nature of EDPs results in a uniform approach to both simple and complex problems within the same dataset, leading to the inefficient use of tokens for straightforward issues. The second method involves prompts autonomously generated by the LLM, known as LLM-Derived Prompts (LDPs), which provide tailored solutions to specific problems, mitigating the limitations of EDPs. However, LDPs may encounter a decline in performance when tackling complex problems due to the potential for error accumulation during the solution planning process. To address these challenges, we have conceived a novel Prompt Recursive Search (PRS) framework that leverages the LLM to generate solutions specific to the problem, thereby conserving tokens. The framework incorporates an assessment of problem complexity and an adjustable structure, ensuring a reduction in the likelihood of errors. We have substantiated the efficacy of PRS framework through extensive experiments using LLMs with different numbers of parameters across a spectrum of datasets in various domains. Compared to the CoT method, the PRS method has increased the accuracy on the BBH dataset by 8% using Llama3-7B model, achieving a 22% improvement.

Updated: 2024-08-02 17:59:42

标题: 即时递归搜索：LLM自动提示中具有自适应增长的活动框架

摘要: 大型语言模型（LLMs）在自然语言处理（NLP）领域内处理各种任务时表现出非凡的熟练度，各种提示设计策略显著增强了它们的能力。然而，这些提示虽然有益，但各自具有固有的局限性。主要的提示设计方法有两种：第一种是Chain of Thought（CoT）所体现的，涉及手工制作针对单个数据集的提示，因此被称为专家设计提示（EDPs）。一旦这些提示建立起来，它们是不可改变的，它们的有效性受到人类设计者的专业知识的限制。当应用于LLMs时，EDPs的静态性导致对同一数据集中的简单和复杂问题采取统一的方法，从而导致在处理简单问题时浪费令牌。第二种方法涉及由LLM自动生成的提示，即LLM衍生提示（LDPs），为特定问题提供定制解决方案，减轻了EDPs的限制。然而，在处理复杂问题时，LDPs可能会遇到性能下降，因为解决方案规划过程中存在错误积累的可能性。为了解决这些挑战，我们构想了一个新颖的Prompt Recursive Search（PRS）框架，利用LLM生成特定于问题的解决方案，从而节省令牌。该框架包括问题复杂性的评估和可调整的结构，确保减少错误的可能性。我们通过在各种领域中的不同数据集上使用具有不同参数数量的LLMs进行广泛实验，证实了PRS框架的有效性。与CoT方法相比，PRS方法使用Llama3-7B模型在BBH数据集上将准确率提高了8％，实现了22％的改进。

更新时间: 2024-08-02 17:59:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.01423v1

A Survey on Data Selection for Language Models

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the quality of available text data can vary. Filtering out data can also decrease the carbon footprint and financial costs of training models by reducing the amount of training required. Data selection methods aim to determine which candidate data points to include in the training dataset and how to appropriately sample from the selected data points. The promise of improved data selection methods has caused the volume of research in the area to rapidly expand. However, because deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive, few organizations have the resources for extensive data selection research. Consequently, knowledge of effective data selection practices has become concentrated within a few organizations, many of which do not openly share their findings and methodologies. To narrow this gap in knowledge, we present a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches. By describing the current landscape of research, this work aims to accelerate progress in data selection by establishing an entry point for new and established researchers. Additionally, throughout this review we draw attention to noticeable holes in the literature and conclude the paper by proposing promising avenues for future research.

Updated: 2024-08-02 17:59:31

标题: 语言模型数据选择调查

摘要: 最近大型语言模型取得成功的一个主要因素是利用庞大且不断增长的文本数据集进行无监督预训练。然而，简单地在所有可用数据上训练模型可能并非最佳选择（或不可行），因为可用文本数据的质量可能存在差异。过滤数据也可以通过减少所需训练量来降低模型训练的碳足迹和财务成本。数据选择方法旨在确定训练数据集中应包含哪些候选数据点以及如何从所选数据点中适当抽样。改进的数据选择方法的前景已经导致该领域的研究量迅速扩大。然而，由于深度学习主要受到经验证据驱动，大规模数据上的实验成本高昂，很少有组织有足够资源进行大量数据选择研究。因此，有效的数据选择实践知识已经集中在少数几家组织中，其中许多并不公开分享他们的发现和方法。为了缩小这一知识差距，我们提供了对现有文献中的数据选择方法和相关研究领域的全面回顾，提供了现有方法的分类法。通过描述研究的当前格局，本文旨在加快数据选择领域的进展，为新进和资深研究人员建立一个入口点。此外，在本文回顾过程中，我们还指出文献中明显的研究空白，并通过提出未来研究的有前途的途径来结束本文。

更新时间: 2024-08-02 17:59:31

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.16827v3

Using a CNN Model to Assess Visual Artwork's Creativity

Assessing artistic creativity has long challenged researchers, with traditional methods proving time-consuming. Recent studies have applied machine learning to evaluate creativity in drawings, but not paintings. Our research addresses this gap by developing a CNN model to automatically assess the creativity of students' paintings. Using a dataset of 600 paintings by professionals and children, our model achieved 90% accuracy and faster evaluation times than human raters. This approach demonstrates the potential of machine learning in advancing artistic creativity assessment, offering a more efficient alternative to traditional methods.

Updated: 2024-08-02 17:57:32

标题: 使用CNN模型评估视觉艺术作品的创造力

摘要: 评估艺术创造力长期以来一直是研究者面临的挑战，传统方法被证明是耗时的。最近的研究已经应用机器学习来评估绘画中的创造力，但并没有涉及绘画。我们的研究填补了这一空白，通过开发一个卷积神经网络模型来自动评估学生的绘画作品的创造力。利用一个包含600幅专业人士和儿童绘画作品的数据集，我们的模型实现了90%的准确率，并且评估速度比人类评分者更快。这种方法展示了机器学习在推进艺术创造力评估方面的潜力，提供了一个比传统方法更高效的替代方案。

更新时间: 2024-08-02 17:57:32

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2408.01481v1

Mission Impossible: A Statistical Perspective on Jailbreaking LLMs

Large language models (LLMs) are trained on a deluge of text data with limited quality control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as leaking information, fake news or hate speech. Countermeasures, commonly referred to as preference alignment, include fine-tuning the pretrained LLMs with carefully crafted text examples of desired behaviour. Even then, empirical evidence shows preference aligned LLMs can be enticed to harmful behaviour. This so called jailbreaking of LLMs is typically achieved by adversarially modifying the input prompt to the LLM. Our paper provides theoretical insights into the phenomenon of preference alignment and jailbreaking from a statistical perspective. Under our framework, we first show that pretrained LLMs will mimic harmful behaviour if present in the training corpus. Under that same framework, we then introduce a statistical notion of alignment, and lower-bound the jailbreaking probability, showing that it is unpreventable under reasonable assumptions. Based on our insights, we propose an alteration to the currently prevalent alignment strategy RLHF. Specifically, we introduce a simple modification to the RLHF objective, we call E-RLHF, that aims to increase the likelihood of safe responses. E-RLHF brings no additional training cost, and is compatible with other methods. Empirically, we demonstrate that E-RLHF outperforms RLHF on all alignment problems put forward by the AdvBench and HarmBench project without sacrificing model performance as measured by the MT-Bench project.

Updated: 2024-08-02 17:55:50

标题: 不可能的任务：对越狱LLM的统计学视角

摘要: 大型语言模型（LLMs）是在大量文本数据上进行训练的，但质量控制有限。因此，LLMs可能表现出意外或甚至有害的行为，例如泄露信息、虚假新闻或仇恨言论。常见的对策，通常称为偏好对齐，包括使用精心设计的文本示例对预训练的LLMs进行微调，以获得期望的行为。即使如此，经验证据表明，偏好对齐的LLMs也可能被引诱出现有害行为。这种所谓的LLMs越狱通常是通过对LLMs的输入提示进行对抗性修改实现的。我们的论文从统计角度提供了有关偏好对齐和越狱现象的理论见解。在我们的框架下，我们首先展示，如果训练语料库中存在有害行为，预训练的LLMs将模仿这种行为。在同一框架下，我们引入了对齐的统计概念，并将越狱概率下限化，表明在合理假设下是不可防止的。基于我们的见解，我们提出了对当前普遍使用的对齐策略RLHF的修改。具体来说，我们引入了对RLHF目标的简单修改，我们称之为E-RLHF，旨在增加安全响应的可能性。E-RLHF不会增加额外的训练成本，并且与其他方法兼容。从经验上讲，我们证明了E-RLHF在AdvBench和HarmBench项目提出的所有对齐问题上优于RLHF，而在由MT-Bench项目衡量的模型性能方面不会有损失。

更新时间: 2024-08-02 17:55:50

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.01420v1

Gemma 2: Improving Open Language Models at a Practical Size

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.

Updated: 2024-08-02 17:52:12

标题: Gemma 2: 在实际规模上改善开放语言模型

摘要: 在这项工作中，我们介绍了Gemma 2，这是Gemma系列的一个新成员，是一种轻量级、最先进的开放模型，范围从20亿到270亿个参数。在这个新版本中，我们应用了一些已知的技术修改到Transformer架构，比如交错的局部-全局注意力（Beltagy等人，2020）和组查询注意力（Ainslie等人，2023）。我们还使用知识蒸馏（Hinton等人，2015）而不是下一个令牌预测来训练20亿和90亿模型。由此产生的模型在其规模上提供了最佳性能，甚至为比它们大2-3倍的模型提供了有竞争力的替代方案。我们将所有模型发布给社区。

更新时间: 2024-08-02 17:52:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.00118v2

Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs

Humans spontaneously use increasingly efficient language as interactions progress, by adapting and forming ad-hoc conventions. This phenomenon has been studied extensively using reference games, showing properties of human language that go beyond relaying intents. It remains unexplored whether multimodal large language models (MLLMs) similarly increase communication efficiency during interactions, and what mechanisms they may adopt for this purpose. We introduce ICCA, an automated framework to evaluate such conversational adaptation as an in-context behavior in MLLMs. We evaluate several state-of-the-art MLLMs, and observe that while they may understand the increasingly efficient language of their interlocutor, they do not spontaneously make their own language more efficient over time. This latter ability can only be elicited in some models (e.g., GPT-4) with heavy-handed prompting. This shows that this property of linguistic interaction does not arise from current training regimes, even though it is a common hallmark of human language. ICCA is available at https://github.com/lil-lab/ICCA.

Updated: 2024-08-02 17:51:57

标题: 少说话，更好地交流：评估多模态LLMs中的上下文对话适应

摘要: 人类在互动过程中自发地使用越来越高效的语言，通过适应和形成临时惯例。这种现象已经被广泛研究，使用参考游戏展示了人类语言的特性，超越了传达意图。尚未探讨多模态大型语言模型（MLLMs）是否在互动过程中类似地增加通信效率，以及它们可能采取何种机制。我们引入ICCA，这是一个自动化框架，用于评估MLLMs中这种对话适应作为一种上下文行为。我们评估了几种最先进的MLLMs，并观察到，虽然它们可能理解对话者越来越高效的语言，但它们并不会自发地使自己的语言随着时间变得更加高效。这种后者的能力只能通过某些模型（例如GPT-4）进行强制性引导来引发。这表明这种语言交互的特性并非来自当前的训练制度，尽管它是人类语言的一个常见特征。ICCA可在https://github.com/lil-lab/ICCA找到。

更新时间: 2024-08-02 17:51:57

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.01417v1

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the basic causal units underlying these mechanisms are often not explicitly defined. In this paper, we propose a perspective on interpretability research grounded in causal mediation analysis. Specifically, we describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) employed, as well as methods used to search over mediators. We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate depending on the goals of a given study. We argue that this framing yields a more cohesive narrative of the field, as well as actionable insights for future work. Specifically, we recommend a focus on discovering new mediators with better trade-offs between human-interpretability and compute-efficiency, and which can uncover more sophisticated abstractions from neural networks than the primarily linear mediators employed in current work. We also argue for more standardized evaluations that enable principled comparisons across mediator types, such that we can better understand when particular causal units are better suited to particular use cases.

Updated: 2024-08-02 17:51:42

标题: 寻找正确的中介者：因果可解释性的历史、调查和理论基础

摘要: 解释性提供了一种工具集，用于理解神经网络为什么以及如何以某种方式行为。然而，在该领域中却缺乏统一性：大多数研究采用临时评估，并且没有共享理论基础，这使得衡量进展和比较不同技术的优缺点变得困难。此外，虽然机制理解经常被讨论，但这些机制的基本因果单元往往没有明确定义。在本文中，我们提出了一个基于因果中介分析的解释性研究视角。具体而言，我们根据所使用的因果单元（中介者）类型以及用于搜索中介者的方法，描述了解释性研究的历史和当前状态。我们讨论了每种中介者的优缺点，提供了洞察，说明在特定研究目标下何时特定类型的中介者和搜索方法最适合。我们认为，这种框架提供了该领域更具连贯性的叙述，以及未来工作的可操作洞察。具体而言，我们建议专注于发现具有更好的人类可解释性和计算效率之间更好平衡的新中介者，并且可以从当前工作中主要使用的线性中介者中发现更复杂的神经网络抽象。我们还主张进行更多标准化评估，以便跨中介者类型进行原则性比较，这样我们就能更好地了解特定因果单元何时更适合特定用例。

更新时间: 2024-08-02 17:51:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.01416v1

Conditional LoRA Parameter Generation

Generative models have achieved remarkable success in image, video, and text domains. Inspired by this, researchers have explored utilizing generative models to generate neural network parameters. However, these efforts have been limited by the parameter size and the practicality of generating high-performance parameters. In this paper, we propose COND P-DIFF, a novel approach that demonstrates the feasibility of controllable high-performance parameter generation, particularly for LoRA (Low-Rank Adaptation) weights, during the fine-tuning process. Specifically, we employ an autoencoder to extract efficient latent representations for parameters. We then train a conditional latent diffusion model to synthesize high-performing model parameters from random noise based on specific task conditions. Experimental results in both computer vision and natural language processing domains consistently demonstrate that COND P-DIFF can generate high-performance parameters conditioned on the given task. Moreover, we observe that the parameter distribution generated by COND P-DIFF exhibits differences compared to the distribution obtained through normal optimization methods, indicating a certain level of generalization capability. Our work paves the way for further exploration of condition-driven parameter generation, offering a promising direction for task-specific adaptation of neural networks.

Updated: 2024-08-02 17:43:34

标题: 条件性LoRA参数生成

摘要: 生成模型在图像、视频和文本领域取得了显著的成功。受此启发，研究人员探索利用生成模型生成神经网络参数。然而，这些努力受到参数大小和生成高性能参数的实用性的限制。在本文中，我们提出了一种新方法 COND P-DIFF，展示了在微调过程中控制高性能参数生成的可行性，特别针对 LoRA（低秩调整）权重。具体来说，我们利用自动编码器提取参数的高效潜在表示。然后，我们训练一个条件潜在扩散模型，根据特定任务条件，从随机噪声中合成高性能模型参数。计算机视觉和自然语言处理领域的实验结果一致表明，COND P-DIFF 能够生成受给定任务条件限制的高性能参数。此外，我们观察到，COND P-DIFF 生成的参数分布与通过正常优化方法获得的分布存在差异，表明具有一定的泛化能力。我们的工作为进一步探索基于条件驱动的参数生成铺平了道路，为神经网络的任务特定适应提供了一个有前途的方向。

更新时间: 2024-08-02 17:43:34

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.01415v1

A Comprehensive Evaluation on Event Reasoning of Large Language Models

Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abilities of event reasoning of LLMs. We introduce a novel benchmark EV2 for EValuation of EVent reasoning. EV2 consists of two levels of evaluation of schema and instance and is comprehensive in relations and reasoning paradigms. We conduct extensive experiments on EV2. We find that LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory. We also notice the imbalance of event reasoning abilities in LLMs. Besides, LLMs have event schema knowledge, however, they're not aligned with humans on how to utilize the knowledge. Based on these findings, we guide the LLMs in utilizing the event schema knowledge as memory leading to improvements on event reasoning.

Updated: 2024-08-02 17:39:32

标题: 大型语言模型事件推理的综合评估

摘要: 事件推理是许多应用程序的基本能力。它需要事件模式知识来进行全局推理，并需要处理事件之间关系和推理范式的多样性。目前尚不清楚LLMs在各种关系和推理范式上实现事件推理的能力如何。为了减少这种差距，我们全面评估了LLMs的事件推理能力。我们引入了一个新的基准EV2，用于评估事件推理。EV2包括模式和实例的两个评估级别，并在关系和推理范式上是全面的。我们在EV2上进行了大量实验。我们发现LLMs具有完成事件推理的能力，但它们的表现远未令人满意。我们还注意到LLMs的事件推理能力存在不平衡。此外，LLMs具有事件模式知识，但它们与人类如何利用这些知识并不一致。基于这些发现，我们引导LLMs利用事件模式知识作为记忆，从而改进事件推理。

更新时间: 2024-08-02 17:39:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.17513v2

Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence

This paper provides a comprehensive and detailed derivation of the backpropagation algorithm for graph convolutional neural networks using matrix calculus. The derivation is extended to include arbitrary element-wise activation functions and an arbitrary number of layers. The study addresses two fundamental problems, namely node classification and link prediction. To validate our method, we compare it with reverse-mode automatic differentiation. The experimental results demonstrate that the median sum of squared errors of the updated weight matrices, when comparing our method to the approach using reverse-mode automatic differentiation, falls within the range of $10^{-18}$ to $10^{-14}$. These outcomes are obtained from conducting experiments on a five-layer graph convolutional network, applied to a node classification problem on Zachary's karate club social network and a link prediction problem on a drug-drug interaction network. Finally, we show how the derived closed-form solution can facilitate the development of explainable AI and sensitivity analysis.

Updated: 2024-08-02 17:33:52

标题: 使用矩阵微积分推导图卷积网络的反向传播方法及其在可解释人工智能中的应用

摘要: 本文提供了一个详尽的推导，使用矩阵微积分，为图卷积神经网络的反向传播算法。推导被扩展到包括任意逐元素激活函数和任意层数。研究解决了两个基本问题，即节点分类和链接预测。为了验证我们的方法，我们将其与反向模式自动微分进行了比较。实验结果表明，更新权重矩阵的中位平方误差之和，将我们的方法与使用反向模式自动微分的方法进行比较，落在$10^{-18}$到$10^{-14}$的范围内。这些结果是通过在一个五层图卷积网络上进行实验得到的，该网络应用于Zachary的空手道俱乐部社交网络上的节点分类问题以及药物相互作用网络上的链接预测问题。最后，我们展示了如何利用推导出的闭合解决方案促进可解释人工智能和敏感性分析的发展。

更新时间: 2024-08-02 17:33:52

领域: cs.LG

下载: http://arxiv.org/abs/2408.01408v1

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning

Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instructions on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a problem is particularly challenging due to the dependent data generated in the online environment, the unknown optimal policy, and the complex exploration and exploitation trade-off in the adaptive experiment. In this paper, we aim to overcome these difficulties in policy evaluation for online learning. We explicitly derive the probability of exploration that quantifies the probability of exploring non-optimal actions under commonly used bandit algorithms. We use this probability to conduct valid inference on the online conditional mean estimator under each action and develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning. The proposed value estimator provides double protection for consistency and is asymptotically normal with a Wald-type confidence interval provided. Extensive simulation studies and real data applications are conducted to demonstrate the empirical validity of the proposed DREAM method.

Updated: 2024-08-02 17:31:24

标题: 在线学习中用于最优策略评估的双重稳健区间估计

摘要: 评估正在进行的政策的表现在许多领域起着至关重要的作用，如医学和经济学，为早停止在线实验提供关键指导和及时反馈。在线学习中的政策评估因推断最优政策的平均结果（即价值）而吸引了越来越多的关注。然而，由于在线环境中生成的依赖数据、未知的最优政策以及适应性实验中的复杂探索和利用权衡，这种问题特别具有挑战性。本文旨在克服在线学习中政策评估的这些困难。我们明确推导了探索概率，量化了在常用的bandit算法下探索非最优行为的概率。我们利用这个概率对每个行动下的在线条件均值估计器进行有效推断，并开发了双重稳健区间估计（DREAM）方法，以推断在线学习中估计最优政策下的价值。所提出的价值估计器为一致性提供双重保护，并且在提供Wald型置信区间的情况下渐近正态。进行了大量的模拟研究和实际数据应用，以证明所提出的DREAM方法的经验有效性。

更新时间: 2024-08-02 17:31:24

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2110.15501v4

Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's capability to model long sequences. Recent works have demonstrated that using parts of trajectories from training tasks as prompts in DT enhances its performance on unseen tasks, giving rise to Prompt-DT methods. However, collecting data from specific environments can be both costly and unsafe in many scenarios, leading to suboptimal performance and limited few-shot prompt abilities due to the data-hungry nature of Transformer-based models. Additionally, the limited datasets used in pre-training make it challenging for Prompt-DT type of methods to distinguish between various RL tasks through prompts alone. To address these challenges, we introduce the Language model-initialized Prompt Decision Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA). We further incorporate prompt regularization to effectively differentiate between tasks based on prompt feature representations. Our approach integrates pre-trained language model and RL tasks seamlessly. Extensive empirical studies demonstrate that initializing with a pre-trained language model significantly enhances the performance of Prompt-DT on unseen tasks compared to baseline methods.

Updated: 2024-08-02 17:25:34

标题: 预训练语言模型改进决策Transformer的少样本提示能力

摘要: 决策变压器（DT）已经成为离线强化学习（RL）任务中一类有前途的算法，利用预先收集的数据集和变压器模型长序列的能力。最近的研究表明，在DT中使用来自训练任务的轨迹部分作为提示可以提高其在未知任务上的性能，从而产生了Prompt-DT方法。然而，在许多情况下，从特定环境收集数据既昂贵又不安全，导致基于Transformer模型的模型对数据的需求量大，表现亚优化，并且在预训练中使用的有限数据集使得Prompt-DT类型的方法难以仅通过提示来区分各种RL任务。为了解决这些挑战，我们引入了Language model-initialized Prompt Decision Transformer（LPDT），它利用预训练的语言模型进行元强化学习任务，并使用Low-rank Adaptation（LoRA）对模型进行微调。我们进一步结合提示正则化以有效区分基于提示特征表示的任务。我们的方法将预训练语言模型和RL任务无缝集成在一起。大量实证研究表明，与基线方法相比，使用预训练语言模型初始化可以显著提高Prompt-DT在未知任务上的性能。

更新时间: 2024-08-02 17:25:34

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.01402v1

"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

How can intelligent machines assess their competencies in completing tasks? This question has come into focus for autonomous systems that algorithmically reason and make decisions under uncertainty. It is argued here that machine self-confidence - a form of meta-reasoning based on self-assessments of an agent's knowledge about the state of the world and itself, as well as its ability to reason about and execute tasks - leads to many eminently computable and useful competency indicators for such agents. This paper presents a culmination of work on this concept in the form of a computational framework called Factorized Machine Self-confidence (FaMSeC), which provides a holistic engineering-focused description of factors driving an algorithmic decision-making process, including: outcome assessment, solver quality, model quality, alignment quality, and past experience. In FaMSeC, self confidence indicators are derived from hierarchical `problem-solving statistics' embedded within broad classes of probabilistic decision-making algorithms such as Markov decision processes. The problem-solving statistics are obtained by evaluating and grading probabilistic exceedance margins with respect to given competency standards, which are specified for each of the various decision-making competency factors by the informee (e.g. a non-expert user or an expert system designer). This approach allows `algorithmic goodness of fit' evaluations to be easily incorporated into the design of many kinds of autonomous agents in the form of human-interpretable competency self-assessment reports. Detailed descriptions and application examples for a Markov decision process agent show how two of the FaMSeC factors (outcome assessment and solver quality) can be computed and reported for a range of possible tasking contexts through novel use of meta-utility functions, behavior simulations, and surrogate prediction models.

Updated: 2024-08-02 17:10:43

标题: "一个好的机器人总是知道自己的局限性：通过因子化机器自信度评估自治系统的决策能力"

摘要: 智能机器如何评估其完成任务的能力？对于在不确定性情况下进行算法推理和决策的自主系统来说，这个问题变得尤为重要。本文认为，机器的自信心 - 一种基于对代理的知识、世界状态以及自身能力进行自我评估的元推理形式 - 可以为这些代理提供许多极易计算和有用的能力指标。本文提出了这一概念的工作的结晶，即一种名为Factorized Machine Self-confidence (FaMSeC)的计算框架，该框架提供了一个全面的工程化描述，包括驱动算法决策过程的因素：结果评估、求解器质量、模型质量、对齐质量和过去经验。在FaMSeC中，自信心指标是从嵌入在马尔可夫决策过程等广泛类别的概率决策算法中的分层“问题解决统计数据”中推导出来的。通过评估和对给定能力标准的概率超额边际进行评分，可以获得问题解决统计数据，这些能力标准由通知者（例如非专家用户或专家系统设计者）为各种决策能力因素中的每一个指定。这种方法允许将“算法拟合度”评估轻松地纳入到许多种类的自主代理的设计中，以人类可解释的能力自我评估报告的形式呈现。对马尔可夫决策过程代理的详细描述和应用示例展示了如何通过对元效用函数、行为模拟和替代预测模型的新颖应用，计算和报告FaMSeC的两个因素（结果评估和求解器质量）以及一系列可能的任务环境。

更新时间: 2024-08-02 17:10:43

领域: cs.AI,cs.CY,cs.HC,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.19631v2

FT K-Means: A High-Performance K-Means on GPU with Fault Tolerance

K-Means is a widely used algorithm in clustering, however, its efficiency is primarily constrained by the computational cost of distance computing. Existing implementations suffer from suboptimal utilization of computational units and lack resilience against soft errors. To address these challenges, we introduce FT K-Means, a high-performance GPU-accelerated implementation of K-Means with online fault tolerance. We first present a stepwise optimization strategy that achieves competitive performance compared to NVIDIA's cuML library. We further improve FT K-Means with a template-based code generation framework that supports different data types and adapts to different input shapes. A novel warp-level tensor-core error correction scheme is proposed to address the failure of existing fault tolerance methods due to memory asynchronization during copy operations. Our experimental evaluations on NVIDIA T4 GPU and A100 GPU demonstrate that FT K-Means without fault tolerance outperforms cuML's K-Means implementation, showing a performance increase of 10\%-300\% in scenarios involving irregular data shapes. Moreover, the fault tolerance feature of FT K-Means introduces only an overhead of 11\%, maintaining robust performance even with tens of errors injected per second.

Updated: 2024-08-02 17:01:36

标题: FT K-Means：一种具有容错性的高性能GPU上的K-Means

摘要: K-Means是聚类中广泛使用的算法，然而，其效率主要受到距离计算的计算成本的限制。现有的实现受制于计算单元的低效利用和对软错误的缺乏韧性。为了解决这些挑战，我们引入了FT K-Means，这是一个高性能的GPU加速实现的K-Means，具有在线容错能力。我们首先提出了一种分步优化策略，与NVIDIA的cuML库相比，实现了竞争性的性能。我们进一步改进了FT K-Means，通过基于模板的代码生成框架，支持不同的数据类型并适应不同的输入形状。提出了一种新颖的warp级张量核心错误校正方案，以解决由于复制操作期间的内存异步化而导致现有容错方法失败的问题。我们在NVIDIA T4 GPU和A100 GPU上进行的实验评估表明，FT K-Means在没有容错的情况下优于cuML的K-Means实现，在涉及不规则数据形状的场景中，性能增加了10\%-300\%。此外，FT K-Means的容错特性仅引入了11\%的开销，在每秒注入数十个错误的情况下，仍保持稳健的性能。

更新时间: 2024-08-02 17:01:36

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2408.01391v1

Parallel Strategies for Best-First Generalized Planning

In recent years, there has been renewed interest in closing the performance gap between state-of-the-art planning solvers and generalized planning (GP), a research area of AI that studies the automated synthesis of algorithmic-like solutions capable of solving multiple classical planning instances. One of the current advancements has been the introduction of Best-First Generalized Planning (BFGP), a GP algorithm based on a novel solution space that can be explored with heuristic search, one of the foundations of modern planners. This paper evaluates the application of parallel search techniques to BFGP, another critical component in closing the performance gap. We first discuss why BFGP is well suited for parallelization and some of its differentiating characteristics from classical planners. Then, we propose two simple shared-memory parallel strategies with good scaling with the number of cores.

Updated: 2024-08-02 16:58:02

标题: 最佳优先广义规划的并行策略

摘要: 近年来，人们对Closing the performance gap between state-of-the-art planning solvers and generalized planning (GP)重新产生了兴趣，这是人工智能研究领域的一个研究方向，研究自动合成类似算法的解决方案，能够解决多个经典计划实例。目前的一个进展是引入了基于新颖解空间的Best-First Generalized Planning（BFGP），这是一种基于启发式搜索的GP算法，启发式搜索是现代计划器的基础之一。本文评估了并行搜索技术在BFGP中的应用，这是缩小性能差距的另一个关键组成部分。我们首先讨论了为什么BFGP非常适合并行化，并介绍了它与传统计划器之间的一些区别特征。然后，我们提出了两种简单的共享内存并行策略，随核心数目增加而具有良好的扩展性。

更新时间: 2024-08-02 16:58:02

领域: cs.AI,I.2.8; D.1.3

下载: http://arxiv.org/abs/2407.21485v2

On the instance optimality of detecting collisions and subgraphs

Suppose you are given a function $f\colon [n] \to [n]$ via (black-box) query access to the function. You are looking to find something local, like a collision (a pair $x \neq y$ s.t. $f(x)=f(y)$). The question is whether knowing the "shape" of the function helps you or not (by shape we mean that some permutation of the function is known). Formally, we investigate the unlabeled instance optimality of substructure detection problems in graphs and functions. A problem is $g(n)$-instance optimal if it admits an algorithm $A$ satisfying that for any possible input, the (randomized) query complexity of $A$ is at most $g(n)$ times larger than the query complexity of any algorithm $A'$ which solves the same problem while holding an unlabeled copy of the input (i.e., any $A'$ that "knows the structure of the input"). Our results point to a trichotomy of unlabeled instance optimality among substructure detection problems in graphs and functions: 1. A few very simple properties have an $O(1)$-instance optimal algorithm. 2. Most properties of graphs and functions, with examples such as containing a fixed point or a $3$-collision in functions, or a triangle in graphs, are $n^{\Omega(1)}$-far from instance optimality. 3. The problems of collision detection in functions and finding a claw in a graph serve as a middle ground between the two regimes. We show that these two properties are $\Omega(\log n)$-far from instance optimality, and conjecture that this bound is tight. We provide evidence towards this conjecture, by proving that finding a claw in a graph is $O(\log(n))$-instance optimal among all input graphs for which the query complexity of an algorithm holding an unlabeled certificate is $O\left(\sqrt{\frac{n}{\log n}}\right)$.

Updated: 2024-08-02 16:57:10

标题: 关于检测碰撞和子图的实例最优性

摘要: 假设您通过（黑盒）查询访问函数$f\colon [n] \to [n]$。您希望找到类似于碰撞（一对$x \neq y$，满足$f(x)=f(y)$）这样的本地信息。问题是是否了解函数的“形状”对您有所帮助（所谓的形状是指已知函数的某种排列）。形式上，我们研究了图和函数中子结构检测问题的无标签实例最优性。如果一个问题是$g(n)$-实例最优的，则它具有一个算法$A$，满足对于任何可能的输入，$A$的（随机）查询复杂度至多比解决相同问题的任何持有未标记输入的算法$A'$的查询复杂度大$g(n)$倍。我们的结果指向图和函数中子结构检测问题的无标签实例最优性的三分法： 1. 一些非常简单的属性具有$O(1)$-实例最优算法。 2. 图和函数的大多数属性，例如包含函数中的固定点或$3$-碰撞，或图中的三角形，距离实例最优性至少为$n^{\Omega(1)}$。 3. 函数中的碰撞检测和图中找到一个爪子的问题处于两种情况之间。我们展示这两个属性距离实例最优性至少为$\Omega(\log n)$，并猜测这个界限是紧密的。我们提供了支持这一猜测的证据，证明在所有查询复杂度为$O\left(\sqrt{\frac{n}{\log n}}\right)$的持有未标记证书的算法的输入图中，找到图中的爪子是$O(\log(n))$-实例最优的。

更新时间: 2024-08-02 16:57:10

领域: cs.DS,cs.CC,cs.CR

下载: http://arxiv.org/abs/2312.10196v2

NeuralBeta: Estimating Beta Using Deep Learning

Traditional approaches to estimating beta in finance often involve rigid assumptions and fail to adequately capture beta dynamics, limiting their effectiveness in use cases like hedging. To address these limitations, we have developed a novel method using neural networks called NeuralBeta, which is capable of handling both univariate and multivariate scenarios and tracking the dynamic behavior of beta. To address the issue of interpretability, we introduce a new output layer inspired by regularized weighted linear regression, which provides transparency into the model's decision-making process. We conducted extensive experiments on both synthetic and market data, demonstrating NeuralBeta's superior performance compared to benchmark methods across various scenarios, especially instances where beta is highly time-varying, e.g., during regime shifts in the market. This model not only represents an advancement in the field of beta estimation, but also shows potential for applications in other financial contexts that assume linear relationships.

Updated: 2024-08-02 16:55:08

标题: 神经Beta：使用深度学习估计Beta

摘要: 传统的金融中估计贝塔的方法通常涉及刚性假设，并未足够捕捉贝塔动态，限制了它们在对冲等使用情况中的效力。为了解决这些限制，我们开发了一种使用神经网络的新颖方法，称为神经贝塔（NeuralBeta），能够处理单变量和多变量场景，并跟踪贝塔的动态行为。为了解决可解释性问题，我们引入了一个受正则化加权线性回归启发的新输出层，提供了模型决策过程的透明度。我们在合成数据和市场数据上进行了广泛的实验，展示了神经贝塔相对于基准方法在各种情况下的卓越性能，特别是在贝塔高度时变的情况下，例如在市场中的制度转变期间。这个模型不仅代表了贝塔估计领域的进步，还展示了在假设线性关系的其他金融环境中应用的潜力。

更新时间: 2024-08-02 16:55:08

领域: q-fin.ST,cs.LG

下载: http://arxiv.org/abs/2408.01387v1

Learning Visual Quadrupedal Loco-Manipulation from Demonstrations

Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four limbs, each possessing 3 degrees of freedom (DoFs). Hence, we aim to empower a quadruped robot to execute real-world manipulation tasks using only its legs. We decompose the loco-manipulation process into a low-level reinforcement learning (RL)-based controller and a high-level Behavior Cloning (BC)-based planner. By parameterizing the manipulation trajectory, we synchronize the efforts of the upper and lower layers, thereby leveraging the advantages of both RL and BC. Our approach is validated through simulations and real-world experiments, demonstrating the robot's ability to perform tasks that demand mobility and high precision, such as lifting a basket from the ground while moving, closing a dishwasher, pressing a button, and pushing a door. Project website: https://zhengmaohe.github.io/leg-manip

Updated: 2024-08-02 16:51:52

标题: 学习从示范中学习视觉四足动作操作

摘要: 四足机器人逐渐被整合到人类环境中。尽管四足机器人的运动能力不断增强，但它们在现实场景中与物体的互动仍然有限。虽然四足机器人上的额外机械臂可以用来操纵物体，但有时它们是多余的，因为四足机器人本质上是一个移动单元，配备有四条肢体，每个肢体都具有三个自由度。因此，我们的目标是使四足机器人仅使用其腿执行现实世界的操纵任务。我们将运动操纵过程分解为基于低级强化学习（RL）控制器和基于高级行为克隆（BC）规划器。通过参数化操纵轨迹，我们同步上层和下层的努力，从而利用RL和BC的优势。我们的方法通过模拟和实际实验得到验证，展示了机器人执行需要机动性和高精度的任务的能力，如在移动时从地面抬起一个篮子，关闭洗碗机，按下按钮和推开门。项目网站：https://zhengmaohe.github.io/leg-manip

更新时间: 2024-08-02 16:51:52

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2403.20328v2

Mapping the Provenance Ontology to Basic Formal Ontology

The Provenance Ontology (PROV-O) is a World Wide Web Consortium (W3C) recommended ontology used to structure data about provenance across a wide variety of domains. Basic Formal Ontology (BFO) is a top-level ontology ISO/IEC standard used to structure a wide variety of ontologies, such as the OBO Foundry ontologies and the Common Core Ontologies (CCO). To enhance interoperability between these two ontologies, their extensions, and data organized by them, an alignment is presented according to a specific mapping criteria and methodology which prioritizes structural and semantic considerations. The ontology alignment is evaluated by checking its logical consistency with canonical examples of PROV-O instances and querying terms that do not satisfy the mapping criteria as formalized in SPARQL. A variety of semantic web technologies are used in support of FAIR (Findable, Accessible, Interoperable, Reusable) principles.

Updated: 2024-08-02 16:50:17

标题: 将源本体映射到基本形式本体

摘要: PROV-O是一个由万维网联盟（W3C）推荐的文献溯源本体，用于在各种领域中结构化关于溯源的数据。基本形式本体（BFO）是一种顶层本体ISO/IEC标准，用于结构化各种本体，如OBO铸造本体和通用核心本体（CCO）。为了增强这两个本体及其扩展以及由它们组织的数据之间的互操作性，根据特定的映射标准和方法论提出了一种对齐。本体对齐通过检查其与PROV-O实例的规范示例的逻辑一致性以及查询不符合SPARQL中形式化的映射标准的术语来进行评估。在支持FAIR（可找到，可访问，可互操作，可重用）原则方面使用了各种语义网技术。

更新时间: 2024-08-02 16:50:17

领域: cs.DB,cs.AI,cs.LO

下载: http://arxiv.org/abs/2408.03866v1

Explaining a probabilistic prediction on the simplex with Shapley compositions

Originating in game theory, Shapley values are widely used for explaining a machine learning model's prediction by quantifying the contribution of each feature's value to the prediction. This requires a scalar prediction as in binary classification, whereas a multiclass probabilistic prediction is a discrete probability distribution, living on a multidimensional simplex. In such a multiclass setting the Shapley values are typically computed separately on each class in a one-vs-rest manner, ignoring the compositional nature of the output distribution. In this paper, we introduce Shapley compositions as a well-founded way to properly explain a multiclass probabilistic prediction, using the Aitchison geometry from compositional data analysis. We prove that the Shapley composition is the unique quantity satisfying linearity, symmetry and efficiency on the Aitchison simplex, extending the corresponding axiomatic properties of the standard Shapley value. We demonstrate this proper multiclass treatment in a range of scenarios.

Updated: 2024-08-02 16:40:58

标题: 用Shapley组合解释在单纯形上的概率预测

摘要: 源自博弈论的Shapley值被广泛用于解释机器学习模型的预测，通过量化每个特征值对预测的贡献。这需要一个标量预测，如二元分类，而多类别概率预测是一个离散概率分布，存在于多维单纯形上。在这样一个多类别设置中，Shapley值通常以一对多的方式分别计算在每个类别上，忽略了输出分布的组合性质。本文介绍了Shapley组合作为一种基础良好的方法，以正确解释多类别概率预测，使用了从组合数据分析中的Aitchison几何学。我们证明了Shapley组合是唯一满足在Aitchison单纯形上的线性、对称和效率的数量，扩展了标准Shapley值的相应公理性质。我们在一系列场景中展示了这种正确的多类别处理方法。

更新时间: 2024-08-02 16:40:58

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2408.01382v1

Resampling and averaging coordinates on data

We introduce algorithms for robustly computing intrinsic coordinates on point clouds. Our approach relies on generating many candidate coordinates by subsampling the data and varying hyperparameters of the embedding algorithm (e.g., manifold learning). We then identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topological data analysis. The final output is the embedding obtained as an average of the representative embeddings using generalized Procrustes analysis. We validate our algorithm on both synthetic data and experimental measurements from genomics, demonstrating robustness to noise and outliers.

Updated: 2024-08-02 16:37:33

标题: 重新采样和平均数据坐标

摘要: 我们介绍了一种在点云上稳健计算固有坐标的算法。我们的方法依赖于通过对数据进行子采样和调整嵌入算法的超参数（例如，流形学习）来生成许多候选坐标。然后，通过对候选坐标集合进行聚类并使用拓扑数据分析中的形状描述符来识别代表性嵌入的子集。最终输出是通过使用广义Procrustes分析对代表性嵌入的平均值得到的嵌入。我们在合成数据和基因组学的实验测量数据上验证了我们的算法，表明其对噪声和异常值具有稳健性。

更新时间: 2024-08-02 16:37:33

领域: stat.ML,cs.CG,cs.LG

下载: http://arxiv.org/abs/2408.01379v1

Adaptive Recruitment Resource Allocation to Improve Cohort Representativeness in Participatory Biomedical Datasets

Large participatory biomedical studies, studies that recruit individuals to join a dataset, are gaining popularity and investment, especially for analysis by modern AI methods. Because they purposively recruit participants, these studies are uniquely able to address a lack of historical representation, an issue that has affected many biomedical datasets. In this work, we define representativeness as the similarity to a target population distribution of a set of attributes and our goal is to mirror the U.S. population across distributions of age, gender, race, and ethnicity. Many participatory studies recruit at several institutions, so we introduce a computational approach to adaptively allocate recruitment resources among sites to improve representativeness. In simulated recruitment of 10,000-participant cohorts from medical centers in the STAR Clinical Research Network, we show that our approach yields a more representative cohort than existing baselines. Thus, we highlight the value of computational modeling in guiding recruitment efforts.

Updated: 2024-08-02 16:32:30

标题: 灵活招募资源分配以改善参与式生物医学数据集中队列代表性

摘要: 大规模参与性生物医学研究，即招募个体加入数据集的研究，正变得越来越受欢迎和投资，特别是用现代人工智能方法进行分析的研究。由于它们有目的地招募参与者，这些研究能够独特地解决历史上代表性不足的问题，这个问题影响了许多生物医学数据集。在这项工作中，我们将代表性定义为一组属性与目标人口分布的相似性，我们的目标是在年龄、性别、种族和族裔分布上反映美国人口。许多参与性研究在多个机构招募，因此我们引入了一种计算方法，以自适应地分配招募资源到各个站点，以提高代表性。在模拟从STAR临床研究网络的医疗中心招募10,000名参与者队列的过程中，我们展示了我们的方法比现有基准产生了更具代表性的队列。因此，我们强调了计算建模在指导招募工作中的价值。

更新时间: 2024-08-02 16:32:30

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2408.01375v1

CCVA-FL: Cross-Client Variations Adaptive Federated Learning for Medical Imaging

Federated Learning (FL) offers a privacy-preserving approach to train models on decentralized data. Its potential in healthcare is significant, but challenges arise due to cross-client variations in medical image data, exacerbated by limited annotations. This paper introduces Cross-Client Variations Adaptive Federated Learning (CCVA-FL) to address these issues. CCVA-FL aims to minimize cross-client variations by transforming images into a common feature space. It involves expert annotation of a subset of images from each client, followed by the selection of a client with the least data complexity as the target. Synthetic medical images are then generated using Scalable Diffusion Models with Transformers (DiT) based on the target client's annotated images. These synthetic images, capturing diversity and representing the original data, are shared with other clients. Each client then translates its local images into the target image space using image-to-image translation. The translated images are subsequently used in a federated learning setting to develop a server model. Our results demonstrate that CCVA-FL outperforms Vanilla Federated Averaging by effectively addressing data distribution differences across clients without compromising privacy.

Updated: 2024-08-02 16:30:55

标题: CCVA-FL：用于医学影像的跨客户变异自适应联邦学习

摘要: 联邦学习（FL）提供了一种隐私保护的方法来训练分散数据上的模型。它在医疗保健领域的潜力巨大，但由于医学图像数据中跨客户端变化的挑战，加剧了受限标注的问题。本文介绍了跨客户端变化自适应联邦学习（CCVA-FL）来解决这些问题。CCVA-FL旨在通过将图像转换为共同的特征空间来最小化跨客户端的变化。它涉及对每个客户端的图像子集进行专家标注，然后选择具有最小数据复杂性的客户端作为目标。然后使用基于目标客户端标注图像的可扩展扩散模型与变形器（DiT）生成合成医学图像。这些合成图像捕捉多样性并代表原始数据，并与其他客户端共享。然后，每个客户端使用图像到图像翻译将其本地图像转换为目标图像空间。随后，在联邦学习设置中使用转换图像来开发服务器模型。我们的结果表明，CCVA-FL通过有效地解决客户端之间数据分布差异的问题而不损害隐私，优于普通联邦均值。

更新时间: 2024-08-02 16:30:55

领域: cs.CV,cs.AI,cs.LG,I.2.10; I.4.0; I.4.1; I.4.2; I.4.6; I.4.7; I.4.8; I.4.9; I.4.10; I.2.10; I.5.1; I.5.2; I.5.4; J.2; I.2.6; I.2.11; I.2.10

下载: http://arxiv.org/abs/2407.11652v3

Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent

This paper presents a novel coordinate descent algorithm leveraging a combination of one-directional line search and gradient information for parameter updates for a squared error loss function. Each parameter undergoes updates determined by either the line search or gradient method, contingent upon whether the modulus of the gradient of the loss with respect to that parameter surpasses a predefined threshold. Notably, a larger threshold value enhances algorithmic efficiency. Despite the potentially slower nature of the line search method relative to gradient descent, its parallelizability facilitates computational time reduction. Experimental validation conducted on a 2-layer Rectified Linear Unit network with synthetic data elucidates the impact of hyperparameters on convergence rates and computational efficiency.

Updated: 2024-08-02 16:29:54

标题: 混合坐标下降法在神经网络学习中的应用：利用线搜索和梯度下降进行高效学习

摘要: 本文介绍了一种新颖的坐标下降算法，利用单向线搜索和梯度信息的结合来更新平方误差损失函数的参数。每个参数通过线搜索或梯度方法确定更新，取决于损失相对于该参数的梯度模是否超过预定阈值。值得注意的是，较大的阈值值可以提高算法效率。尽管相对于梯度下降方法可能较慢，线搜索方法的可并行性有助于减少计算时间。在使用合成数据进行的实验验证中，对具有2层修正线性单元网络的超参数对收敛速度和计算效率的影响进行了阐述。

更新时间: 2024-08-02 16:29:54

领域: cs.LG

下载: http://arxiv.org/abs/2408.01374v1

Data Debugging is NP-hard for Classifiers Trained with SGD

Data debugging is to find a subset of the training data such that the model obtained by retraining on the subset has a better accuracy. A bunch of heuristic approaches are proposed, however, none of them are guaranteed to solve this problem effectively. This leaves an open issue whether there exists an efficient algorithm to find the subset such that the model obtained by retraining on it has a better accuracy. To answer this open question and provide theoretical basis for further study on developing better algorithms for data debugging, we investigate the computational complexity of the problem named Debuggable. Given a machine learning model $\mathcal{M}$ obtained by training on dataset $D$ and a test instance $(\mathbf{x}_\text{test},y_\text{test})$ where $\mathcal{M}(\mathbf{x}_\text{test})\neq y_\text{test}$, Debuggable is to determine whether there exists a subset $D^\prime$ of $D$ such that the model $\mathcal{M}^\prime$ obtained by retraining on $D^\prime$ satisfies $\mathcal{M}^\prime(\mathbf{x}_\text{test})=y_\text{test}$. To cover a wide range of commonly used models, we take SGD-trained linear classifier as the model and derive the following main results. (1) If the loss function and the dimension of the model are not fixed, Debuggable is NP-complete regardless of the training order in which all the training samples are processed during SGD. (2) For hinge-like loss functions, a comprehensive analysis on the computational complexity of Debuggable is provided; (3) If the loss function is a linear function, Debuggable can be solved in linear time, that is, data debugging can be solved easily in this case. These results not only highlight the limitations of current approaches but also offer new insights into data debugging.

Updated: 2024-08-02 16:17:59

标题: 使用随机梯度下降训练的分类器的数据调试是NP难问题

摘要: 数据调试是为了找到训练数据的一个子集，重新训练模型后，该子集获得更好的准确性。提出了一系列启发式方法，但没有一个保证有效解决这个问题。这引出了一个开放问题，即是否存在一种有效的算法来找到这样一个子集，重新训练后的模型具有更好的准确性。为了回答这个开放问题并为进一步研究开发更好的数据调试算法提供理论基础，我们调查了名为Debuggable的问题的计算复杂性。给定通过训练数据集$D$获得的机器学习模型$\mathcal{M}$和一个测试实例$(\mathbf{x}_\text{test},y_\text{test})$，其中$\mathcal{M}(\mathbf{x}_\text{test})\neq y_\text{test}$，Debuggable是确定是否存在$D$的子集$D^\prime$，使得通过在$D^\prime$上重新训练获得的模型$\mathcal{M}^\prime$满足$\mathcal{M}^\prime(\mathbf{x}_\text{test})=y_\text{test}$。为了涵盖一系列常用模型，我们将SGD训练的线性分类器作为模型，并得出以下主要结果。 (1) 如果损失函数和模型的维数不固定，无论在SGD期间处理所有训练样本的训练顺序如何，Debuggable都是NP完全的。 (2) 对于类似铰链的损失函数，提供了对Debuggable的计算复杂性的全面分析； (3) 如果损失函数是线性函数，Debuggable可以在线性时间内解决，也就是说，在这种情况下数据调试可以很容易地解决。这些结果不仅突出了当前方法的局限性，还为数据调试提供了新的见解。

更新时间: 2024-08-02 16:17:59

领域: cs.CC,cs.LG

下载: http://arxiv.org/abs/2408.01365v1

Autoencoders in Function Space

Autoencoders have found widespread application, in both their original deterministic form and in their variational formulation (VAEs). In scientific applications it is often of interest to consider data that are comprised of functions; the same perspective is useful in image processing. In practice, discretisation (of differential equations arising in the sciences) or pixellation (of images) renders problems finite dimensional, but conceiving first of algorithms that operate on functions, and only then discretising or pixellating, leads to better algorithms that smoothly operate between different levels of discretisation or pixellation. In this paper function-space versions of the autoencoder (FAE) and variational autoencoder (FVAE) are introduced, analysed, and deployed. Well-definedness of the objective function governing VAEs is a subtle issue, even in finite dimension, and more so on function space. The FVAE objective is well defined whenever the data distribution is compatible with the chosen generative model; this happens, for example, when the data arise from a stochastic differential equation. The FAE objective is valid much more broadly, and can be straightforwardly applied to data governed by differential equations. Pairing these objectives with neural operator architectures, which can thus be evaluated on any mesh, enables new applications of autoencoders to inpainting, superresolution, and generative modelling of scientific data.

Updated: 2024-08-02 16:13:51

标题: Autoencoders在函数空间中

摘要: 自编码器在其原始确定性形式和变分形式（VAEs）中已经得到广泛应用。在科学应用中，考虑由函数组成的数据往往是有趣的；在图像处理中也是如此。在实践中，对于在科学中出现的微分方程进行离散化或对图像进行像素化使问题具有有限维度，但首先构想操作函数的算法，然后再进行离散化或像素化，可以导致更好的算法，可以在不同级别的离散化或像素化之间平稳运行。在本文中，引入、分析和部署了自编码器（FAE）和变分自编码器（FVAE）的函数空间版本。规定VAEs的客观函数的明确定义是一个微妙的问题，即使在有限维度中，而在函数空间中更是如此。只要数据分布与选择的生成模型兼容，FVAE目标就是明确定义的；例如，当数据来自随机微分方程时发生这种情况。FAE目标更广泛有效，可以直接应用于由微分方程控制的数据。将这些目标与神经操作器架构配对，这样可以在任何网格上评估，从而使自编码器能够在修复、超分辨率和生成科学数据建模方面得到新的应用。

更新时间: 2024-08-02 16:13:51

领域: stat.ML,cs.LG,62G07 (Primary) 65M99, 68T07 (Secondary),I.2.6

下载: http://arxiv.org/abs/2408.01362v1

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

With the growing popularity of text-to-image generative models, there has been increasing focus on understanding their risks and biases. Recent work has found that state-of-the-art models struggle to depict everyday objects with the true diversity of the real world and have notable gaps between geographic regions. In this work, we aim to increase the diversity of generated images of common objects such that per-region variations are representative of the real world. We introduce an inference time intervention, contextualized Vendi Score Guidance (c-VSG), that guides the backwards steps of latent diffusion models to increase the diversity of a sample as compared to a "memory bank" of previously generated images while constraining the amount of variation within that of an exemplar set of real-world contextualizing images. We evaluate c-VSG with two geographically representative datasets and find that it substantially increases the diversity of generated images, both for the worst performing regions and on average, while simultaneously maintaining or improving image quality and consistency. Additionally, qualitative analyses reveal that diversity of generated images is significantly improved, including along the lines of reductive region portrayals present in the original model. We hope that this work is a step towards text-to-image generative models that reflect the true geographic diversity of the world.

Updated: 2024-08-02 16:09:49

标题: 使用上下文化的Vendi评分指导提高生成图像的地理多样性

摘要: 随着文本到图像生成模型的日益普及，人们开始越来越关注了解它们的风险和偏见。最近的研究发现，最先进的模型在描绘日常物体时往往无法展现真实世界的多样性，且在地理区域之间存在明显的差距。在这项研究中，我们旨在增加生成的常见物体图像的多样性，使每个区域的变化能够代表真实世界。我们引入了一种推理时间干预方法，称为上下文化Vendi得分指导（c-VSG），该方法指导潜在扩散模型的反向步骤，以增加样本的多样性，与之前生成的图像“记忆库”相比，同时约束在真实世界上下文化图像示例集内的变化量。我们使用两个地理上具有代表性的数据集评估了c-VSG，并发现它显著增加了生成图像的多样性，无论是在性能最差的地区还是平均水平上，同时保持或提高了图像质量和一致性。此外，定性分析表明，生成的图像多样性得到了显著改善，包括原始模型中存在的简化地区描绘。我们希望这项工作是迈向反映世界真实地理多样性的文本到图像生成模型的一步。

更新时间: 2024-08-02 16:09:49

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.04551v2

A Reparameterized Discrete Diffusion Model for Text Generation

This work studies discrete diffusion probabilistic models with applications to natural language generation. We derive an alternative yet equivalent formulation of the sampling from discrete diffusion processes and leverage this insight to develop a family of reparameterized discrete diffusion models. The derived generic framework is highly flexible, offers a fresh perspective of the generation process in discrete diffusion models, and features more effective training and decoding techniques. We conduct extensive experiments to evaluate the text generation capability of our model, demonstrating significant improvements over existing diffusion models.

Updated: 2024-08-02 16:09:14

标题: 一个重新参数化的离散扩散模型用于文本生成

摘要: 这项工作研究了具有应用于自然语言生成的离散扩散概率模型。我们推导出了从离散扩散过程中采样的另一种等效的表达方式，并利用这一洞察力发展了一系列重参数化的离散扩散模型。推导出的通用框架非常灵活，为离散扩散模型中的生成过程提供了新视角，并拥有更有效的训练和解码技术。我们进行了大量实验，评估了我们模型的文本生成能力，并展示了与现有扩散模型相比的显著改进。

更新时间: 2024-08-02 16:09:14

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2302.05737v3

MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code

With the advent of large language models (LLMs), numerous software service providers (SSPs) are dedicated to developing LLMs customized for code generation tasks, such as CodeLlama and Copilot. However, these LLMs can be leveraged by attackers to create malicious software, which may pose potential threats to the software ecosystem. For example, they can automate the creation of advanced phishing malware. To address this issue, we first conduct an empirical study and design a prompt dataset, MCGTest, which involves approximately 400 person-hours of work and consists of 406 malicious code generation tasks. Utilizing this dataset, we propose MCGMark, the first robust, code structure-aware, and encodable watermarking approach to trace LLM-generated code. We embed encodable information by controlling the token selection and ensuring the output quality based on probabilistic outliers. Additionally, we enhance the robustness of the watermark by considering the structural features of malicious code, preventing the embedding of the watermark in easily modified positions, such as comments. We validate the effectiveness and robustness of MCGMark on the DeepSeek-Coder. MCGMark achieves an embedding success rate of 88.9% within a maximum output limit of 400 tokens. Furthermore, it also demonstrates strong robustness and has minimal impact on the quality of the output code. Our approach assists SSPs in tracing and holding responsible parties accountable for malicious code generated by LLMs.

Updated: 2024-08-02 16:04:52

标题: MCGMark：一种可编码和稳健的在线水印技术，用于LLM生成的恶意代码

摘要: 随着大型语言模型（LLMs）的出现，许多软件服务提供商（SSPs）致力于开发定制的用于代码生成任务的LLMs，例如CodeLlama和Copilot。然而，这些LLMs可以被攻击者利用来创建恶意软件，这可能对软件生态系统构成潜在威胁。例如，它们可以自动化生成高级钓鱼恶意软件。为了解决这个问题，我们首先进行了一项实证研究并设计了一个提示数据集MCGTest，该数据集涉及大约400人时的工作，包含406个恶意代码生成任务。利用这个数据集，我们提出了MCGMark，这是第一个强大的、代码结构感知的、可编码的水印方法，用于追踪LLM生成的代码。我们通过控制令牌选择并根据概率异常值确保输出质量来嵌入可编码信息。此外，我们通过考虑恶意代码的结构特征增强了水印的鲁棒性，防止将水印嵌入到易于修改的位置，如注释中。我们在DeepSeek-Coder上验证了MCGMark的有效性和鲁棒性。MCGMark在最大输出限制为400个令牌的情况下实现了88.9%的嵌入成功率。此外，它还展示了很强的鲁棒性，并且对输出代码的质量影响很小。我们的方法帮助SSPs追踪并追究LLMs生成的恶意代码的责任方。

更新时间: 2024-08-02 16:04:52

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2408.01354v1

PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval

In the realm of cross-modal retrieval, seamlessly integrating diverse modalities within multimedia remains a formidable challenge, especially given the complexities introduced by noisy correspondence learning (NCL). Such noise often stems from mismatched data pairs, which is a significant obstacle distinct from traditional noisy labels. This paper introduces Pseudo-Classification based Pseudo-Captioning (PC$^2$) framework to address this challenge. PC$^2$ offers a threefold strategy: firstly, it establishes an auxiliary "pseudo-classification" task that interprets captions as categorical labels, steering the model to learn image-text semantic similarity through a non-contrastive mechanism. Secondly, unlike prevailing margin-based techniques, capitalizing on PC$^2$'s pseudo-classification capability, we generate pseudo-captions to provide more informative and tangible supervision for each mismatched pair. Thirdly, the oscillation of pseudo-classification is borrowed to assistant the correction of correspondence. In addition to technical contributions, we develop a realistic NCL dataset called Noise of Web (NoW), which could be a new powerful NCL benchmark where noise exists naturally. Empirical evaluations of PC$^2$ showcase marked improvements over existing state-of-the-art robust cross-modal retrieval techniques on both simulated and realistic datasets with various NCL settings. The contributed dataset and source code are released at https://github.com/alipay/PC2-NoiseofWeb.

Updated: 2024-08-02 15:54:49

标题: PC$^2$: 基于伪分类的伪字幕生成用于跨模态检索中的噪声对应学习

摘要: 在跨模态检索领域，将多种模态无缝集成到多媒体中仍然是一个巨大的挑战，特别是考虑到噪声对应学习（NCL）引入的复杂性。这种噪声通常来自不匹配的数据对，这是一个与传统噪声标签不同的重要障碍。本文引入了基于伪分类的伪字幕（PC$^2$）框架来解决这一挑战。PC$^2$提供了一个三重策略：首先，它建立了一个辅助的“伪分类”任务，将字幕解释为分类标签，通过非对比机制引导模型学习图像-文本语义相似性。其次，与当前基于边界的技术不同，利用PC$^2$的伪分类能力，我们生成伪字幕为每个不匹配的数据对提供更具信息性和可感知的监督。第三，借鉴伪分类的振荡来辅助纠正对应关系。除了技术贡献外，我们开发了一个名为Web噪声（NoW）的现实NCL数据集，其中噪声自然存在，可能是一个新的强大NCL基准。PC$^2$的实证评估显示，与现有的最先进的鲁棒跨模态检索技术相比，在模拟和现实数据集上，具有不同NCL设置的情况下，PC$^2$有显著改进。贡献的数据集和源代码发布在https://github.com/alipay/PC2-NoiseofWeb。

更新时间: 2024-08-02 15:54:49

领域: cs.MM,cs.AI,cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2408.01349v1

A multi-criteria approach for selecting an explanation from the set of counterfactuals produced by an ensemble of explainers

Counterfactuals are widely used to explain ML model predictions by providing alternative scenarios for obtaining the more desired predictions. They can be generated by a variety of methods that optimize different, sometimes conflicting, quality measures and produce quite different solutions. However, choosing the most appropriate explanation method and one of the generated counterfactuals is not an easy task. Instead of forcing the user to test many different explanation methods and analysing conflicting solutions, in this paper, we propose to use a multi-stage ensemble approach that will select single counterfactual based on the multiple-criteria analysis. It offers a compromise solution that scores well on several popular quality measures. This approach exploits the dominance relation and the ideal point decision aid method, which selects one counterfactual from the Pareto front. The conducted experiments demonstrated that the proposed approach generates fully actionable counterfactuals with attractive compromise values of the considered quality measures.

Updated: 2024-08-02 15:54:21

标题: 一个多标准方法用于从由解释器集合产生的反事实集中选择一个解释

摘要: 反事实推理广泛用于解释机器学习模型的预测，通过提供获得更理想预测的替代场景。可以通过多种方法生成这些反事实，这些方法优化不同的质量度量，有时产生相互矛盾的解决方案。然而，选择最合适的解释方法和其中一个生成的反事实并不是一件容易的事情。本文提出使用多阶段集成方法，基于多标准分析选择单个反事实，而不是强制用户测试许多不同的解释方法和分析相互冲突的解决方案。该方法提供了一个在几种流行的质量度量上表现良好的折衷解决方案。该方法利用了支配关系和理想点决策辅助方法，从帕累托前沿选择一个反事实。进行的实验表明，所提出的方法生成了具有吸引力的折衷值的完全可行的反事实，这些值考虑了所考虑的质量度量。

更新时间: 2024-08-02 15:54:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.13940v2

StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation

Multimodal semantic segmentation shows significant potential for enhancing segmentation accuracy in complex scenes. However, current methods often incorporate specialized feature fusion modules tailored to specific modalities, thereby restricting input flexibility and increasing the number of training parameters. To address these challenges, we propose StitchFusion, a straightforward yet effective modal fusion framework that integrates large-scale pre-trained models directly as encoders and feature fusers. This approach facilitates comprehensive multi-modal and multi-scale feature fusion, accommodating any visual modal inputs. Specifically, Our framework achieves modal integration during encoding by sharing multi-modal visual information. To enhance information exchange across modalities, we introduce a multi-directional adapter module (MultiAdapter) to enable cross-modal information transfer during encoding. By leveraging MultiAdapter to propagate multi-scale information across pre-trained encoders during the encoding process, StitchFusion achieves multi-modal visual information integration during encoding. Extensive comparative experiments demonstrate that our model achieves state-of-the-art performance on four multi-modal segmentation datasets with minimal additional parameters. Furthermore, the experimental integration of MultiAdapter with existing Feature Fusion Modules (FFMs) highlights their complementary nature. Our code is available at StitchFusion_repo.

Updated: 2024-08-02 15:41:16

标题: StitchFusion：将任何视觉模态编织在一起，以增强多模态语义分割

摘要: 多模态语义分割在提高复杂场景中的分割准确度方面显示出显著潜力。然而，当前的方法通常包含针对特定模态量身定制的专用特征融合模块，从而限制了输入的灵活性并增加了训练参数的数量。为了解决这些挑战，我们提出了一种称为StitchFusion的简单而有效的模态融合框架，将大规模预训练模型直接集成为编码器和特征融合器。这种方法促进了全面的多模态和多尺度特征融合，适应任何视觉模态输入。具体来说，我们的框架通过在编码过程中共享多模态视觉信息实现模态集成。为了增强跨模态信息交换，我们引入了一个多方向适配器模块（MultiAdapter），在编码过程中实现跨模态信息传递。通过利用MultiAdapter在编码过程中跨预训练编码器传播多尺度信息，StitchFusion 在编码过程中实现多模态视觉信息集成。大量的比较实验证明，我们的模型在四个多模态分割数据集上取得了最先进的性能，且额外参数最小。此外，将MultiAdapter与现有的特征融合模块（FFMs）进行实验集成突显了它们的互补性。我们的代码可在StitchFusion_repo上找到。

更新时间: 2024-08-02 15:41:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.01343v1

Leveraging Knowledge Graph Embedding for Effective Conversational Recommendation

Conversational recommender system (CRS), which combines the techniques of dialogue system and recommender system, has obtained increasing interest recently. In contrast to traditional recommender system, it learns the user preference better through interactions (i.e. conversations), and then further boosts the recommendation performance. However, existing studies on CRS ignore to address the relationship among attributes, users, and items effectively, which might lead to inappropriate questions and inaccurate recommendations. In this view, we propose a knowledge graph based conversational recommender system (referred as KG-CRS). Specifically, we first integrate the user-item graph and item-attribute graph into a dynamic graph, i.e., dynamically changing during the dialogue process by removing negative items or attributes. We then learn informative embedding of users, items, and attributes by also considering propagation through neighbors on the graph. Extensive experiments on three real datasets validate the superiority of our method over the state-of-the-art approaches in terms of both the recommendation and conversation tasks.

Updated: 2024-08-02 15:38:55

标题: 利用知识图谱嵌入进行有效的对话推荐

摘要: 对话式推荐系统（CRS）是近来备受关注的一种结合了对话系统和推荐系统技术的系统。与传统的推荐系统相比，CRS通过互动（即对话）更好地了解用户偏好，进而提升推荐性能。然而，现有研究忽视了有效处理属性、用户和物品之间的关系，可能导致不当的问题和不准确的推荐。因此，我们提出了基于知识图的对话式推荐系统（称为KG-CRS）。具体地，我们首先将用户-物品图和物品-属性图整合成一个动态图，即在对话过程中通过删除负面物品或属性动态变化。然后，我们通过考虑在图上的邻居传播，学习用户、物品和属性的信息嵌入。在三个真实数据集上的广泛实验验证了我们的方法在推荐和对话任务方面优于现有方法的优越性。

更新时间: 2024-08-02 15:38:55

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2408.01342v1

A Hyperparameter Study for Quantum Kernel Methods

Quantum kernel methods are a promising method in quantum machine learning thanks to the guarantees connected to them. Their accessibility for analytic considerations also opens up the possibility of prescreening datasets based on their potential for a quantum advantage. To do so, earlier works developed the geometric difference, which can be understood as a closeness measure between two kernel-based machine learning approaches, most importantly between a quantum kernel and a classical kernel. This metric links the quantum and classical model complexities, and it was developed to bound generalization error. Therefore, it raises the question of how this metric behaves in an empirical setting. In this work, we investigate the effects of hyperparameter choice on the model performance and the generalization gap between classical and quantum kernels. The importance of hyperparameters is well known also for classical machine learning. Of special interest are hyperparameters associated with the quantum Hamiltonian evolution feature map, as well as the number of qubits to trace out before computing a projected quantum kernel. We conduct a thorough investigation of the hyperparameters across 11 datasets and we identify certain aspects that can be exploited. Analyzing the effects of certain hyperparameter settings on the empirical performance, as measured by cross validation accuracy, and generalization ability, as measured by geometric difference described above, brings us one step closer to understanding the potential of quantum kernel methods on classical datasets.

Updated: 2024-08-02 15:38:38

标题: 量子核方法的超参数研究

摘要: 量子核方法是量子机器学习中一种有前途的方法，得益于与它们相关的保证。它们对于分析考虑的可访问性也打开了基于潜在量子优势的数据集预筛选的可能性。为此，早期的工作开发了几何差异，它可以被理解为两种基于核的机器学习方法之间的接近度量，最重要的是量子核和经典核之间的接近度量。这个指标将量子和经典模型的复杂性联系起来，并且它被开发用于限制泛化错误。因此，它引发了一个问题，即这个指标在经验设置中的行为如何。在这项工作中，我们研究了超参数选择对模型性能和经典核和量子核之间的泛化差距的影响。对于经典机器学习，超参数的重要性是众所周知的。特别感兴趣的是与量子哈密顿演化特征映射相关的超参数，以及在计算投影量子核之前要追踪的量子比特数。我们在11个数据集上对超参数进行了彻底调查，我们确定了可以利用的某些方面。分析某些超参数设置对经验性能（通过交叉验证准确性衡量）和泛化能力（如上述几何差异所描述的）的影响，使我们更接近了理解量子核方法在经典数据集上的潜力。

更新时间: 2024-08-02 15:38:38

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2310.11891v3

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

The widespread adoption of large language models (LLMs) has created a pressing need for an efficient, secure and private serving infrastructure, which allows researchers to run open source or custom fine-tuned LLMs and ensures users that their data remains private and is not stored without their consent. While high-performance computing (HPC) systems equipped with state-of-the-art GPUs are well-suited for training LLMs, their batch scheduling paradigm is not designed to support real-time serving of AI applications. Cloud systems, on the other hand, are well suited for web services but commonly lack access to the computational power of HPC clusters, especially expensive and scarce high-end GPUs, which are required for optimal inference speed. We propose an architecture with an implementation consisting of a web service that runs on a cloud VM with secure access to a scalable backend running a multitude of LLM models on HPC systems. By offering a web service using our HPC infrastructure to host LLMs, we leverage the trusted environment of local universities and research centers to offer a private and secure alternative to commercial LLM services. Our solution natively integrates with the HPC batch scheduler Slurm, enabling seamless deployment on HPC clusters, and is able to run side by side with regular Slurm workloads, while utilizing gaps in the schedule created by Slurm. In order to ensure the security of the HPC system, we use the SSH ForceCommand directive to construct a robust circuit breaker, which prevents successful attacks on the web-facing server from affecting the cluster. We have successfully deployed our system as a production service, and made the source code available at \url{https://github.com/gwdg/chat-ai}

Updated: 2024-08-02 15:34:22

标题: 聊天AI：基于Slurm的无缝HPC服务解决方案

摘要: 大规模语言模型（LLMs）的广泛应用已经创造了一个迫切的需求，即需要一个高效、安全和私密的服务基础设施，使研究人员能够运行开源或定制的LLMs，并确保用户的数据保持私密，不会在未经同意的情况下存储。虽然配备最先进GPU的高性能计算（HPC）系统非常适合训练LLMs，但它们的批处理调度范式并不设计用于支持AI应用的实时服务。另一方面，云系统非常适合Web服务，但通常缺乏对HPC集群的计算能力访问，尤其是昂贵和稀缺的高端GPU，这些GPU对于最佳推理速度是必需的。我们提出了一个架构，其实现由一个在云VM上运行的Web服务组成，该服务可以安全访问可扩展后端，该后端在HPC系统上运行多种LLM模型。通过利用我们的HPC基础设施提供LLMs的Web服务，我们利用本地大学和研究中心的可信环境，为商业LLMs服务提供了私密和安全的选择。我们的解决方案与HPC批处理调度程序Slurm进行了本地集成，可以无缝部署在HPC集群上，并且能够与常规Slurm工作负载并行运行，同时利用Slurm创建的时间间隙。为了确保HPC系统的安全性，我们使用SSH ForceCommand指令构建了一个强大的断路器，可以阻止对面向Web的服务器的成功攻击影响集群。我们已成功将我们的系统部署为生产服务，并在\url{https://github.com/gwdg/chat-ai}上提供了源代码。

更新时间: 2024-08-02 15:34:22

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2407.00110v2

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

Multimodal models that jointly process audio and language hold great promise in audio understanding and are increasingly being adopted in the music domain. By allowing users to query via text and obtain information about a given audio input, these models have the potential to enable a variety of music understanding tasks via language-based interfaces. However, their evaluation poses considerable challenges, and it remains unclear how to effectively assess their ability to correctly interpret music-related inputs with current methods. Motivated by this, we introduce MuChoMusic, a benchmark for evaluating music understanding in multimodal language models focused on audio. MuChoMusic comprises 1,187 multiple-choice questions, all validated by human annotators, on 644 music tracks sourced from two publicly available music datasets, and covering a wide variety of genres. Questions in the benchmark are crafted to assess knowledge and reasoning abilities across several dimensions that cover fundamental musical concepts and their relation to cultural and functional contexts. Through the holistic analysis afforded by the benchmark, we evaluate five open-source models and identify several pitfalls, including an over-reliance on the language modality, pointing to a need for better multimodal integration. Data and code are open-sourced.

Updated: 2024-08-02 15:34:05

标题: MuChoMusic：评估多模态音频语言模型中的音乐理解

摘要: 多模态模型联合处理音频和语言在音频理解方面具有巨大的潜力，并在音乐领域越来越受到采用。通过允许用户通过文本查询并获取有关给定音频输入的信息，这些模型有潜力通过基于语言的界面实现各种音乐理解任务。然而，它们的评估面临着相当大的挑战，目前尚不清楚如何有效地评估它们使用当前方法正确解释与音乐相关的输入的能力。受此启发，我们引入了MuChoMusic，这是一个专注于音频的多模态语言模型评估音乐理解的基准。MuChoMusic包括1,187个多项选择问题，所有问题均由人类注释者验证，涵盖了来自两个公开可用音乐数据集的644首音乐曲目，涵盖了各种流派。基准中的问题旨在评估跨多个维度的知识和推理能力，涵盖了基本音乐概念及其与文化和功能背景的关系。通过基准提供的整体分析，我们评估了五个开源模型，并确定了几个缺点，包括对语言模态过度依赖，指出需要更好的多模态整合。数据和代码均已开源。

更新时间: 2024-08-02 15:34:05

领域: cs.SD,cs.CL,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2408.01337v1

Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers

We investigate a problem estimating coefficients of linear regression under sparsity assumption when covariates and noises are sampled from heavy tailed distributions. Additionally, we consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers. Our estimators can be computed efficiently, and exhibit sharp error bounds.

Updated: 2024-08-02 15:33:04

标题: 当噪声和协变量是重尾且受离群值污染时的稀疏线性回归

摘要: 我们研究了在线性回归中在稀疏假设下估计系数的问题，当自变量和噪声来自重尾分布时。另外，我们考虑了当不仅自变量和噪声来自重尾分布，而且还受到异常值污染的情况。我们的估计量可以高效计算，并且具有尖锐的误差界限。

更新时间: 2024-08-02 15:33:04

领域: stat.ML,cs.LG,62J07

下载: http://arxiv.org/abs/2408.01336v1

A Backbone for Long-Horizon Robot Task Understanding

End-to-end robot learning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability. This framework uses therbligs (basic action elements) as the backbone to decompose high-level robot tasks into elemental robot configurations, which are then integrated with current foundation models to improve task understanding. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, the Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action execution, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively. Supplementary material is available at: https://sites.google.com/view/therbligsbasedbackbone/home

Updated: 2024-08-02 15:32:42

标题: 一个长期任务理解的骨干结构

摘要: 端到端机器人学习，尤其是针对长期任务，经常导致结果不可预测和泛化能力差。为了解决这些挑战，我们提出了一种新颖的基于Therblig的骨干框架（TBBF），以增强机器人任务理解和可转移性。该框架使用therbligs（基本动作元素）作为骨干，将高级机器人任务分解为基本机器人配置，然后与当前基础模型集成以改善任务理解。该方法包括两个阶段：离线训练和在线测试。在离线训练阶段，我们开发了Meta-RGate SynerFusion（MGSF）网络，用于在各种任务中准确分割therblig。在在线测试阶段，在收集新任务的一次性演示后，我们的MGSF网络提取高级知识，然后使用Action Registration（ActionREG）将其编码到图像中。此外，采用了用于视觉校正的大型语言模型（LLM）-对齐策略（LAP-VC）以确保精确的动作执行，在新颖的机器人场景中促进轨迹传输。实验结果验证了这些方法，在therblig分割中实现了94.37%的召回率，并在简单和复杂场景的现实世界在线机器人测试中分别实现了94.4%和80%的成功率。补充材料可在以下网址获得：https://sites.google.com/view/therbligsbasedbackbone/home

更新时间: 2024-08-02 15:32:42

领域: cs.RO,cs.AI,cs.CV,cs.HC

下载: http://arxiv.org/abs/2408.01334v1

Deep Reinforcement Learning for Traveling Purchaser Problems

The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant benefit of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, leveraging DRL, we can train the policy network to optimize the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.

Updated: 2024-08-02 15:30:14

标题: 深度强化学习在旅行买家问题中的应用

摘要: 旅行采购者问题（TPP）是一个具有广泛应用的重要组合优化问题。由于路由和采购之间的耦合，现有的TPP研究通常同时解决路线构建和采购计划，然而，这导致了具有高计算成本的精确方法和设计复杂但性能有限的启发式方法。与之形成鲜明对比的是，我们提出了一种基于深度强化学习（DRL）的新方法，该方法分别处理路线构建和采购计划，同时从全局角度评估和优化解决方案。我们方法的关键组成部分包括用于捕捉市场-产品关系的TPP的二部图表示，以及从二部图中提取信息并使用它们来顺序构建路线的策略网络。我们框架的一个显著优势是我们可以使用策略网络高效地构建路线，一旦确定了路线，相关的采购计划可以通过线性规划轻松导出，同时利用DRL，我们可以训练策略网络以优化全局解决方案目标。此外，通过引入元学习策略，策略网络可以在大型TPP实例上稳定训练，并且可以在不同大小和分布的实例之间很好地泛化，甚至可以应用于在训练期间从未见过的更大实例。对各种合成TPP实例和TPPLIB基准测试的实验表明，我们基于DRL的方法可以显著优于成熟的TPP启发式方法，将最优性差距缩小40%-90%，并且在运行时间上也具有优势，特别是在大型实例上。

更新时间: 2024-08-02 15:30:14

领域: math.OC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.02476v3

HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction

As the recommendation service needs to address increasingly diverse distributions, such as multi-population, multi-scenario, multitarget, and multi-interest, more and more recent works have focused on multi-distribution modeling and achieved great progress. However, most of them only consider modeling in a single multi-distribution manner, ignoring that mixed multi-distributions often coexist and form hierarchical relationships. To address these challenges, we propose a flexible modeling paradigm, named Hierarchical Multi-Distribution Network (HMDN), which efficiently models these hierarchical relationships and can seamlessly integrate with existing multi-distribution methods, such as Mixture of-Experts (MoE) and Dynamic-Weight (DW) models. Specifically, we first design a hierarchical multi-distribution representation refinement module, employing a multi-level residual quantization to obtain fine-grained hierarchical representation. Then, the refined hierarchical representation is integrated into the existing single multi-distribution models, seamlessly expanding them into mixed multi-distribution models. Experimental results on both public and industrial datasets validate the effectiveness and flexibility of HMDN.

Updated: 2024-08-02 15:29:59

标题: HMDN：用于点击率预测的分层多分布网络

摘要: 随着推荐服务需要应对越来越多的不同分布，例如多种人群、多种情景、多种目标和多种兴趣，越来越多的最新研究关注于多分布建模，并取得了巨大进展。然而，大多数研究仅考虑单一多分布方式的建模，忽略了混合多分布经常共存并形成层次关系的事实。为了解决这些挑战，我们提出了一种灵活的建模范式，命名为Hierarchical Multi-Distribution Network (HMDN)，有效地建模这些层次关系，并能与现有的多分布方法（如专家混合模型和动态权重模型）无缝集成。具体而言，我们首先设计了一个层次多分布表示细化模块，采用多级残差量化来获得细粒度的层次表示。然后，将细化的层次表示集成到现有的单一多分布模型中，无缝地扩展为混合多分布模型。在公共和工业数据集上的实验结果验证了HMDN的有效性和灵活性。

更新时间: 2024-08-02 15:29:59

领域: cs.LG

下载: http://arxiv.org/abs/2408.01332v1

UnifiedNN: Efficient Neural Network Training on the Cloud

Nowadays, cloud-based services are widely favored over the traditional approach of locally training a Neural Network (NN) model. Oftentimes, a cloud service processes multiple requests from users--thus training multiple NN models concurrently. However, training NN models concurrently is a challenging process, which typically requires significant amounts of available computing resources and takes a long time to complete. In this paper, we present UnifiedNN to effectively train multiple NN models concurrently on the cloud. UnifiedNN effectively "combines" multiple NN models and features several memory and time conservation mechanisms to train multiple NN models simultaneously without impacting the accuracy of the training process. Specifically, UnifiedNN merges multiple NN models and creates a large singular unified model in order to efficiently train all models at once. We have implemented a prototype of UnifiedNN in PyTorch and we have compared its performance with relevant state-of-the-art frameworks. Our experimental results demonstrate that UnifiedNN can reduce memory consumption by up to 53% and training time by up to 81% when compared with vanilla PyTorch without impacting the model training and testing accuracy. Finally, our results indicate that UnifiedNN can reduce memory consumption by up to 52% and training time by up to 41% when compared to state-of-the-art frameworks when training multiple models concurrently.

Updated: 2024-08-02 15:29:39

标题: UnifiedNN：云上高效神经网络训练

摘要: 现在，人们普遍更喜欢使用基于云的服务来训练神经网络（NN）模型，而不是传统的本地训练方法。通常，云服务会同时处理来自用户的多个请求，从而同时训练多个NN模型。然而，同时训练多个NN模型是一个具有挑战性的过程，通常需要大量的可用计算资源，并需要很长时间才能完成。本文介绍了UnifiedNN，可以有效地在云上同时训练多个NN模型。UnifiedNN有效地“合并”多个NN模型，并具有多种内存和时间节约机制，可以同时训练多个NN模型而不影响训练过程的准确性。具体来说，UnifiedNN合并多个NN模型，创建一个大型的统一模型，以便同时高效地训练所有模型。我们在PyTorch中实现了UnifiedNN的原型，并将其性能与相关最先进的框架进行了比较。我们的实验结果表明，与普通的PyTorch相比，UnifiedNN可以将内存消耗减少高达53％，训练时间缩短高达81％，而不影响模型训练和测试的准确性。最后，我们的结果表明，在同时训练多个模型时，与最先进的框架相比，UnifiedNN可以将内存消耗减少高达52％，训练时间缩短高达41％。

更新时间: 2024-08-02 15:29:39

领域: cs.LG

下载: http://arxiv.org/abs/2408.01331v1

Mixed moving average field guided learning for spatio-temporal data

Influenced mixed moving average fields are a versatile modeling class for spatio-temporal data. However, their predictive distribution is not generally known. Under this modeling assumption, we define a novel spatio-temporal embedding and a theory-guided machine learning approach that employs a generalized Bayesian algorithm to make ensemble forecasts. We use Lipschitz predictors and determine fixed-time and any-time PAC Bayesian bounds in the batch learning setting. Performing causal forecast is a highlight of our methodology as its potential application to data with spatial and temporal short and long-range dependence. We then test the performance of our learning methodology by using linear predictors and data sets simulated from a spatio-temporal Ornstein-Uhlenbeck process.

Updated: 2024-08-02 15:26:37

标题: 混合移动平均场引导学习用于时空数据

摘要: 受影响的混合移动平均场是一种多功能建模类别，适用于时空数据。然而，它们的预测分布通常是未知的。在这种建模假设下，我们定义了一种新颖的时空嵌入和一个理论引导的机器学习方法，利用广义贝叶斯算法进行集成预测。我们使用Lipschitz预测器，并在批量学习设置中确定固定时间和任意时间的PAC贝叶斯边界。执行因果预测是我们方法的一个亮点，因为它有潜在应用于具有空间和时间短期和长期依赖性的数据。然后，我们通过使用线性预测器和从时空Ornstein-Uhlenbeck过程模拟的数据集来测试我们的学习方法的性能。

更新时间: 2024-08-02 15:26:37

领域: stat.ML,cs.LG,math.ST,stat.TH,60E07, 60E15, 60G25, 60G60, 62C10

下载: http://arxiv.org/abs/2301.00736v4

A Robotics-Inspired Scanpath Model Reveals the Importance of Uncertainty and Semantic Object Cues for Gaze Guidance in Dynamic Scenes

How we perceive objects around us depends on what we actively attend to, yet our eye movements depend on the perceived objects. Still, object segmentation and gaze behavior are typically treated as two independent processes. Drawing on an information processing pattern from robotics, we present a mechanistic model that simulates these processes for dynamic real-world scenes. Our image-computable model uses the current scene segmentation for object-based saccadic decision-making while using the foveated object to refine its scene segmentation recursively. To model this refinement, we use a Bayesian filter, which also provides an uncertainty estimate for the segmentation that we use to guide active scene exploration. We demonstrate that this model closely resembles observers' free viewing behavior, measured by scanpath statistics, including foveation duration and saccade amplitude distributions used for parameter fitting and higher-level statistics not used for fitting. These include how object detections, inspections, and returns are balanced and a delay of returning saccades without an explicit implementation of such temporal inhibition of return. Extensive simulations and ablation studies show that uncertainty promotes balanced exploration and that semantic object cues are crucial to form the perceptual units used in object-based attention. Moreover, we show how our model's modular design allows for extensions, such as incorporating saccadic momentum or pre-saccadic attention, to further align its output with human scanpaths.

Updated: 2024-08-02 15:20:34

标题: 一个启发自机器人的扫描路径模型揭示了不确定性和语义对象线索对动态场景中凝视引导的重要性

摘要: 我们如何感知周围的物体取决于我们积极关注的内容，然而我们的眼动取决于所感知的物体。然而，物体分割和凝视行为通常被视为两个独立的过程。借鉴机器人技术中的信息处理模式，我们提出了一个模拟这些过程的机械模型，用于动态的现实场景。我们的图像可计算模型利用当前的场景分割进行基于对象的注视决策，同时使用凹陷物体递归地优化其场景分割。为了模拟这种优化，我们使用贝叶斯滤波器，该滤波器还提供了用于引导主动场景探索的分割的不确定性估计。我们证明这个模型与观察者的自由观看行为非常相似，通过扫描路径统计量来衡量，包括用于参数拟合的凹陷持续时间和注视幅度分布，以及用于拟合的高级统计量。这些统计量包括对象检测、检查和返回的平衡以及返回注视的延迟，而不需要明确实现此类时间性抑制。广泛的模拟和消融研究表明，不确定性促进了平衡的探索，并且语义对象线索对于形成对象注意中使用的感知单元至关重要。此外，我们展示了我们的模型的模块化设计如何允许扩展，例如整合注视动量或先注视注意力，以进一步使其输出与人类扫描路径对齐。

更新时间: 2024-08-02 15:20:34

领域: cs.CV,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2408.01322v1

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of single-modality systems. In this paper, we systematically sort out the applications of MLLM in multimodal tasks such as natural language, vision, and audio. We also provide a comparative analysis of the focus of different MLLMs in the tasks, and provide insights into the shortcomings of current MLLMs, and suggest potential directions for future research. Through these discussions, this paper hopes to provide valuable insights for the further development and application of MLLM.

Updated: 2024-08-02 15:14:53

标题: 一篇全面综述多模态大型语言模型：在不同任务中的性能和挑战

摘要: 在一个以数据爆炸性增长和快速技术进步定义的时代，多模式大型语言模型（MLLMs）处于人工智能系统的前沿。设计用于无缝集成不同数据类型-包括文本、图像、视频、音频和生理序列-MLLMs解决了远远超出单模式系统能力范围的现实应用的复杂性。本文系统地整理了MLLM在自然语言、视觉和音频等多模式任务中的应用。我们还对不同MLLM在任务中的重点进行了比较分析，并提出了当前MLLM的不足之处，并建议未来研究的潜在方向。通过这些讨论，本文希望为MLLM的进一步发展和应用提供有价值的见解。

更新时间: 2024-08-02 15:14:53

领域: cs.AI

下载: http://arxiv.org/abs/2408.01319v1

LLMs Plagiarize: Ensuring Responsible Sourcing of Large Language Model Training Data Through Knowledge Graph Comparison

In light of recent legal allegations brought by publishers, newspapers, and other creators of copyrighted corpora against large language model developers who use their copyrighted materials for training or fine-tuning purposes, we propose a novel system, a variant of a plagiarism detection system, that assesses whether a knowledge source has been used in the training or fine-tuning of a large language model. Unlike current methods, we utilize an approach that uses Resource Description Framework (RDF) triples to create knowledge graphs from both a source document and an LLM continuation of that document. These graphs are then analyzed with respect to content using cosine similarity and with respect to structure using a normalized version of graph edit distance that shows the degree of isomorphism. Unlike traditional plagiarism systems that focus on content matching and keyword identification between a source and a target corpus, our approach enables a broader and more accurate evaluation of similarity between a source document and LLM continuation by focusing on relationships between ideas and their organization with regards to others. Additionally, our approach does not require access to LLM metrics like perplexity that may be unavailable in closed large language model "black-box" systems, as well as the training corpus. We thus assess whether an LLM has "plagiarized" a corpus in its continuation through similarity measures. A prototype of our system will be found on a hyperlinked GitHub repository.

Updated: 2024-08-02 15:13:26

标题: LLMs 抄袭：通过知识图谱比较确保大型语言模型训练数据的负责采集

摘要: 针对最近出现的出版商、报纸和其他拥有版权语料库的创作者对大型语言模型开发者使用其版权材料进行训练或微调而提出的法律指控，我们提出了一个新颖的系统，这是一种剽窃检测系统的变种，用于评估知识来源是否已被用于训练或微调大型语言模型。与当前的方法不同，我们利用一种方法，使用资源描述框架（RDF）三元组从源文档和LLM文档的延续中创建知识图。然后，通过使用余弦相似度对这些图进行内容分析，通过使用一种显示同构程度的标准化图编辑距离对结构进行分析。与传统的剽窃系统侧重于源文档和目标语料库之间的内容匹配和关键字识别不同，我们的方法通过关注思想之间的关系及其组织与其他事物之间的关系，实现了对源文档和LLM延续之间相似性的更广泛和更准确的评估。此外，我们的方法不需要访问LLM指标，例如在封闭的大型语言模型“黑匣子”系统中可能无法获得的困惑度，以及训练语料库。因此，我们通过相似性度量评估LLM是否在其延续中“剽窃”了一个语料库。我们系统的原型将在一个超链接的GitHub存储库中找到。

更新时间: 2024-08-02 15:13:26

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.02659v2

Point Prediction for Streaming Data

We present two new approaches for point prediction with streaming data. One is based on the Count-Min sketch (CMS) and the other is based on Gaussian process priors with a random bias. These methods are intended for the most general predictive problems where no true model can be usefully formulated for the data stream. In statistical contexts, this is often called the $\mathcal{M}$-open problem class. Under the assumption that the data consists of i.i.d samples from a fixed distribution function $F$, we show that the CMS-based estimates of the distribution function are consistent. We compare our new methods with two established predictors in terms of cumulative $L^1$ error. One is based on the Shtarkov solution (often called the normalized maximum likelihood) in the normal experts setting and the other is based on Dirichlet process priors. These comparisons are for two cases. The first is one-pass meaning that the updating of the predictors is done using the fact that the CMS is a sketch. For predictors that are not one-pass, we use streaming $K$-means to give a representative subset of fixed size that can be updated as data accumulate. Preliminary computational work suggests that the one-pass median version of the CMS method is rarely outperformed by the other methods for sufficiently complex data. We also find that predictors based on Gaussian process priors with random biases perform well. The Shtarkov predictors we use here did not perform as well probably because we were only using the simplest example. The other predictors seemed to perform well mainly when the data did not look like they came from an M-open data generator.

Updated: 2024-08-02 15:12:52

标题: 流数据的点预测

摘要: 我们提出了两种基于流数据的点预测新方法。一种是基于Count-Min草图（CMS），另一种是基于具有随机偏差的高斯过程先验。这些方法旨在解决最一般的预测问题，在这些问题中，无法为数据流制定有用的真实模型。在统计上下文中，这通常被称为$\mathcal{M}$-open问题类。在假设数据由固定分布函数$F$的i.i.d样本组成的前提下，我们表明基于CMS的分布函数估计是一致的。我们将我们的新方法与两种已建立的预测器进行了累积$L^1$误差比较。一种是基于Shtarkov解（通常称为正常专家设置中的归一化最大似然）的预测器，另一种是基于狄利克雷过程先验的预测器。这些比较是针对两种情况进行的。第一种是一遍过，意味着通过CMS是一个草图来更新预测器。对于不是一遍过的预测器，我们使用流式$K$-means来给出一个固定大小的代表性子集，可以随着数据积累而更新。初步的计算工作表明，CMS方法的一遍过中位数版本在足够复杂的数据情况下很少被其他方法超越。我们还发现，基于具有随机偏差的高斯过程先验的预测器表现良好。我们在这里使用的Shtarkov预测器表现不佳，可能是因为我们只使用了最简单的例子。其他预测器似乎在数据看起来不像来自一个M-open数据生成器时表现良好。

更新时间: 2024-08-02 15:12:52

领域: stat.ML,cs.LG,35A01, 65L10, 65L12, 65L20, 65L70

下载: http://arxiv.org/abs/2408.01318v1

Synergistic pathways of modulation enable robust task packing within neural dynamics

Understanding how brain networks learn and manage multiple tasks simultaneously is of interest in both neuroscience and artificial intelligence. In this regard, a recent research thread in theoretical neuroscience has focused on how recurrent neural network models and their internal dynamics enact multi-task learning. To manage different tasks requires a mechanism to convey information about task identity or context into the model, which from a biological perspective may involve mechanisms of neuromodulation. In this study, we use recurrent network models to probe the distinctions between two forms of contextual modulation of neural dynamics, at the level of neuronal excitability and at the level of synaptic strength. We characterize these mechanisms in terms of their functional outcomes, focusing on their robustness to context ambiguity and, relatedly, their efficiency with respect to packing multiple tasks into finite size networks. We also demonstrate distinction between these mechanisms at the level of the neuronal dynamics they induce. Together, these characterizations indicate complementarity and synergy in how these mechanisms act, potentially over multiple time-scales, toward enhancing robustness of multi-task learning.

Updated: 2024-08-02 15:12:01

标题: 协同调制途径使神经动力学内的任务包装更加稳健

摘要: 理解大脑网络如何学习和同时管理多个任务，对神经科学和人工智能都具有重要意义。在这方面，最近理论神经科学的研究重点是递归神经网络模型及其内部动态如何实现多任务学习。要管理不同的任务需要一种机制，将有关任务身份或上下文的信息传达到模型中，从生物学的角度来看，这可能涉及神经调节机制。在这项研究中，我们使用递归网络模型来探讨神经动力学的两种形式的上下文调节之间的区别，即神经元兴奋性和突触强度的水平。我们根据它们的功能结果对这些机制进行表征，重点关注它们对上下文模糊性的稳健性，以及它们相对于将多个任务装入有限大小网络的效率。我们还展示了这些机制在它们诱导的神经动力学水平上的区别。总的来说，这些表征表明这些机制在增强多任务学习的稳健性方面可能在多个时间尺度上具有互补性和协同作用。

更新时间: 2024-08-02 15:12:01

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2408.01316v1

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corpus containing 340 hours of approximately 9,000 dialogues, recorded based on the open domain dialogue dataset, TopicalChat. The MultiDialog contains parallel audio-visual recordings of conversation partners acting according to the given script with emotion annotations, which we expect to open up research opportunities in multimodal synthesis. Our Face-to-Face spoken dialogue model incorporates a textually pretrained large language model and adapts it into the audio-visual spoken dialogue domain by incorporating speech-text joint pretraining. Through extensive experiments, we validate the effectiveness of our model in facilitating a face-to-face conversation. Demo and data are available at https://multidialog.github.io and https://huggingface.co/datasets/IVLLab/MultiDialog, respectively.

Updated: 2024-08-02 15:05:47

标题: 让我们真实交谈：面对面对话的口语对话模型

摘要: 在本文中，我们介绍了一种新颖的面对面口语对话模型。它处理用户输入的音频-视频语音并生成音频-视频语音作为回应，标志着朝着创建一个无需依赖中间文本的头像聊天机器人系统迈出了初始步骤。为此，我们新引入了MultiDialog，这是第一个大规模的多模态（即音频和视觉）口语对话语料库，包含约9,000个对话，总计340小时的录音，基于开放领域对话数据集TopicalChat。MultiDialog包含根据给定脚本和情感注释行事的对话伙伴的平行音频-视频录音，我们期望这将为多模态合成开辟研究机会。我们的面对面口语对话模型整合了一个经过文本预训练的大型语言模型，并通过融合语音-文本联合预训练将其调整到音频-视频口语对话领域。通过大量实验，我们验证了我们的模型在促进面对面对话方面的有效性。演示和数据可在https://multidialog.github.io和https://huggingface.co/datasets/IVLLab/MultiDialog获取。

更新时间: 2024-08-02 15:05:47

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.07867v2

PsybORG+: Cognitive Modeling for Triggering and Detection of Cognitive Biases of Advanced Persistent Threats

Advanced Persistent Threats (APTs) bring significant challenge to cybersecurity due to their sophisticated and stealthy nature. Traditional cybersecurity measures fail to defend against APTs. Cognitive vulnerabilities can significantly influence attackers' decision-making processes, which presents an opportunity for defenders to exploit these weaknesses. This paper introduces PsybORG, a multi-agent cybersecurity simulation environment designed to model APT behaviors influenced by cognitive vulnerabilities. PsybORG uses a Hidden Markov Model (HMM) to simulate attacker behaviors. We use Bayesian inference and decision tree analysis of action sequences to do cognitive vulnerabilities inference. In addition, a system called PsybORG+ is built for generating synthetic data. We also design a trigger to stimulate the sunk cost fallacy in attackers. Our contributions include the mathematical modeling of APTs, the development of PsybORG, and the implementation of techniques to infer attackers' cognitive vulnerabilities.

Updated: 2024-08-02 15:00:58

标题: PsybORG+: 高级持续威胁的认知偏见触发和检测的认知建模

摘要: 高级持续威胁（APTs）由于其复杂和隐秘的特性给网络安全带来了重大挑战。传统的网络安全措施无法防御APTs。认知漏洞可以显著影响攻击者的决策过程，这为防御者利用这些弱点提供了机会。本文介绍了PsybORG，一个多智能体网络安全仿真环境，旨在模拟受认知漏洞影响的APTs行为。PsybORG使用隐马尔可夫模型（HMM）来模拟攻击者行为。我们使用贝叶斯推断和决策树分析动作序列来进行认知漏洞推断。此外，我们还建立了一个名为PsybORG+的系统用于生成合成数据。我们还设计了一个触发器来刺激攻击者的沉没成本谬误。我们的贡献包括APTs的数学建模，PsybORG的开发以及推断攻击者认知漏洞的技术的实施。

更新时间: 2024-08-02 15:00:58

领域: cs.CR

下载: http://arxiv.org/abs/2408.01310v1

Decentralized Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties

In the rapidly evolving internet-of-things (IoT) ecosystem, effective data analysis techniques are crucial for handling distributed data generated by sensors. Addressing the limitations of existing methods, such as the sub-gradient approach, which fails to distinguish between active and non-active coefficients effectively, this paper introduces the decentralized smoothing alternating direction method of multipliers (DSAD) for penalized quantile regression. Our method leverages non-convex sparse penalties like the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD), improving the identification and retention of significant predictors. DSAD incorporates a total variation norm within a smoothing ADMM framework, achieving consensus among distributed nodes and ensuring uniform model performance across disparate data sources. This approach overcomes traditional convergence challenges associated with non-convex penalties in decentralized settings. We present theoretical proofs and extensive simulation results to validate the effectiveness of the DSAD, demonstrating its superiority in achieving reliable convergence and enhancing estimation accuracy compared with prior methods.

Updated: 2024-08-02 15:00:04

标题: 去中心化平滑ADMM用于具有非凸稀疏惩罚项的分位数回归

摘要: 在快速发展的物联网生态系统中，有效的数据分析技术对于处理传感器生成的分布式数据至关重要。针对现有方法的局限性，如次梯度方法未能有效区分活跃和非活跃系数，本文介绍了用于惩罚分位回归的去中心化平滑交替方向乘子（DSAD）方法。我们的方法利用非凸稀疏惩罚，如极小值凹惩罚（MCP）和平滑剪裁绝对偏差（SCAD），改善了重要预测因子的识别和保留。DSAD在平滑ADMM框架中结合了总变异范数，实现了在分布式节点之间达成共识，并确保跨不同数据源的统一模型性能。这种方法克服了分布式环境中与非凸惩罚相关的传统收敛挑战。我们提供理论证明和广泛的模拟结果，验证了DSAD的有效性，证明了与先前方法相比，其在实现可靠收敛和提高估计精度方面的优势。

更新时间: 2024-08-02 15:00:04

领域: cs.LG

下载: http://arxiv.org/abs/2408.01307v1

Bond Graphs for multi-physics informed Neural Networks for multi-variate time series

In the trend of hybrid Artificial Intelligence techniques, Physical-Informed Machine Learning has seen a growing interest. It operates mainly by imposing data, learning, or architecture bias with simulation data, Partial Differential Equations, or equivariance and invariance properties. While it has shown great success on tasks involving one physical domain, such as fluid dynamics, existing methods are not adapted to tasks with complex multi-physical and multi-domain phenomena. In addition, it is mainly formulated as an end-to-end learning scheme. To address these challenges, we propose to leverage Bond Graphs, a multi-physics modeling approach, together with Message Passing Graph Neural Networks. We propose a Neural Bond graph Encoder (NBgE) producing multi-physics-informed representations that can be fed into any task-specific model. It provides a unified way to integrate both data and architecture biases in deep learning. Our experiments on two challenging multi-domain physical systems - a Direct Current Motor and the Respiratory System - demonstrate the effectiveness of our approach on a multivariate time-series forecasting task.

Updated: 2024-08-02 14:59:48

标题: 多物理信息神经网络的键合图对多变量时间序列进行建模

摘要: 在混合人工智能技术的趋势中，受到越来越多关注的是物理信息机器学习。它主要通过使用模拟数据、偏微分方程或等变性和不变性属性来施加数据、学习或架构偏见。虽然在涉及一个物理领域的任务，如流体动力学，它已经显示出巨大的成功，但现有方法并不适用于具有复杂多物理和多领域现象的任务。此外，它主要被制定为端到端学习方案。为了解决这些挑战，我们提议利用Bond图，一种多物理建模方法，结合消息传递图神经网络。我们提出了一个神经Bond图编码器（NBgE），产生可以输入到任何特定任务模型的多物理信息表示。它提供了一种统一的方式来集成深度学习中的数据和架构偏见。我们在两个具有挑战性的多领域物理系统 - 直流电动机和呼吸系统上的实验 - 展示了我们的方法在多变量时间序列预测任务上的有效性。

更新时间: 2024-08-02 14:59:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.13586v2

CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning

The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this paper, we propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains and avoid forgetting existing feature distributions. CP-Prompt captures intra-domain knowledge by compositionally inserting personalized prompts on multi-head self-attention layers and then learns the inter-domain knowledge with a common prompting strategy. CP-Prompt shows superiority compared with state-of-the-art baselines among three widely evaluated DIL tasks. The source code is available at https://github.com/dannis97500/CP_Prompt.

Updated: 2024-08-02 14:58:54

标题: CP-Prompt：基于组合的跨模态提示，用于领域增量式持续学习

摘要: 跨模态领域增量学习（DIL）的关键挑战是使学习模型能够在相同任务下连续学习来自不同特征分布的新数据，而不会忘记旧数据。然而，现有的表现最佳方法仍然导致高遗忘率，因为它们缺乏领域内知识提取和领域间共同提示策略。在本文中，我们提出了一个简单而有效的框架CP-Prompt，通过训练有限的参数来指导预训练模型学习新的领域，并避免忘记现有的特征分布。CP-Prompt通过在多头自注意力层上组合插入个性化提示来捕捉领域内知识，然后采用共同提示策略学习领域间知识。与三个广泛评估的DIL任务中的最新基线相比，CP-Prompt表现出了优越性。源代码可在https://github.com/dannis97500/CP_Prompt找到。

更新时间: 2024-08-02 14:58:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21043v2

Routoo: Learning to Route to Large Language Models Effectively

Developing foundational large language models (LLMs) is becoming increasingly costly and inefficient. Also, closed-source and larger open-source models generally offer better response quality but come with higher inference costs than smaller models. In this paper, we introduce Routoo, an architecture designed to optimize the selection of LLMs for specific prompts based on performance, cost, and efficiency. Routoo consists of two key components: a performance predictor and a cost-aware decoding. The performance predictor is a lightweight LLM that estimates the performance of various underlying LLMs without needing to execute and evaluate them. The cost-aware decoding then selects the most suitable model based on these predictions and other constraints like cost and latency. We evaluated Routoo using the MMLU benchmark across 57 domains employing open-source models. Our results show that Routoo matches the performance of the Mixtral 8x7b model while reducing inference costs by one-third. Additionally, by allowing increased costs, Routoo surpasses Mixtral's accuracy by over 5% at equivalent costs, achieving an accuracy of 75.9%. When integrating GPT4 into our model pool, Routoo nearly matches GPT4's performance at half the cost and exceeds it with a 25% cost reduction. These outcomes highlight Routoo's potential to create new SOTA in a cost-effective manner by leveraging the collective knowledge of multiple LLMs.

Updated: 2024-08-02 14:50:05

标题: Routoo：有效学习路由到大型语言模型

摘要: 开发基础大型语言模型(LLMs)变得越来越昂贵和低效。此外，封闭源和更大的开源模型通常提供更好的响应质量，但伴随着更高的推理成本比较较小的模型。在本文中，我们介绍了Routoo，这是一种旨在优化选择特定提示的LLMs的架构，基于性能、成本和效率。Routoo由两个关键组件组成：性能预测器和成本感知解码。性能预测器是一个轻量级的LLM，它估计各种底层LLMs的性能，而无需执行和评估它们。成本感知解码然后基于这些预测和其他约束条件，如成本和延迟，选择最合适的模型。我们使用MMLU基准测试对Routoo进行了评估，涵盖57个领域，采用开源模型。我们的结果显示，Routoo在减少推理成本的同时，与Mixtral 8x7b模型的性能相匹配。此外，通过允许增加成本，Routoo在等价成本下超过Mixtral的准确率超过5%，实现了75.9%的准确率。当将GPT4集成到我们的模型池中时，Routoo几乎在一半的成本下与GPT4的性能相匹配，并且在成本降低25%的情况下超过了它。这些结果突显了Routoo通过利用多个LLMs的集体知识以成本有效的方式创造新的SOTA的潜力。

更新时间: 2024-08-02 14:50:05

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.13979v2

Amortized Probabilistic Detection of Communities in Graphs

Learning community structures in graphs has broad applications across scientific domains. While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty. We propose a simple framework for amortized community detection, which addresses both of these issues by combining the expressive power of GNNs with recent methods for amortized clustering. Our models consist of a graph representation backbone that extracts structural information and an amortized clustering network that naturally handles variable numbers of clusters. Both components combine into well-defined models of the posterior distribution of graph communities and are jointly optimized given labeled graphs. At inference time, the models yield parallel samples from the posterior of community labels, quantifying uncertainty in a principled way. We evaluate several models from our framework on synthetic and real datasets, and demonstrate improved performance compared to previous methods. As a separate contribution, we extend recent amortized probabilistic clustering architectures by adding attention modules, which yield further improvements on community detection tasks.

Updated: 2024-08-02 14:44:47

标题: 在图中的社区的摊销概率检测

摘要: 在图中学习社区结构具有广泛的科学应用。虽然图神经网络（GNN）在编码图结构方面取得了成功，但现有基于GNN的社区检测方法受限于需要事先知道社区数量，同时缺乏适当的概率公式来处理不确定性。我们提出了一个简单的摊销社区检测框架，通过将GNN的表达能力与最近的摊销聚类方法相结合来解决这些问题。我们的模型包括提取结构信息的图表示骨干和自然处理可变数量的簇的摊销聚类网络。这两个组件结合在一起，形成了图社区后验分布的明确定义模型，并在给定标记图的情况下进行联合优化。在推断时，模型从社区标签后验中产生并行样本，以一种原则性的方式量化不确定性。我们在合成和真实数据集上评估了我们框架中的几个模型，并展示了与先前方法相比的改进性能。作为一个独立的贡献，我们通过添加注意力模块扩展了最近的摊销概率聚类结构，这在社区检测任务上进一步提高了性能。

更新时间: 2024-08-02 14:44:47

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2010.15727v4

A Systematic Mapping Study on SDN Controllers for Enhancing Security in IoT Networks

Context: The increase in Internet of Things (IoT) devices gives rise to an increase in deceptive manipulations by malicious actors. These actors should be prevented from targeting the IoT networks. Cybersecurity threats have evolved and become dynamically sophisticated, such that they could exploit any vulnerability found in IoT networks. However, with the introduction of the Software Defined Network (SDN) in the IoT networks as the central monitoring unit, IoT networks are less vulnerable and less prone to threats. %Although, the SDN itself is vulnerable to several threats. Objective: To present a comprehensive and unbiased overview of the state-of-the-art on IoT networks security enhancement using SDN controllers. Method: We review the current body of knowledge on enhancing the security of IoT networks using SDN with a Systematic Mapping Study (SMS) following the established guidelines. Results: The SMS result comprises 33 primary studies analyzed against four major research questions. The SMS highlights current research trends and identifies gaps in the SDN-IoT network security. Conclusion: We conclude that the SDN controller architecture commonly used for securing IoT networks is the centralized controller architecture. However, this architecture is not without its limitations. Additionally, the predominant technique utilized for risk mitigation is machine learning.

Updated: 2024-08-02 14:44:15

标题: 一个关于SDN控制器在增强物联网网络安全性方面的系统性映射研究

摘要: 背景：物联网设备数量的增加导致恶意行为者进行欺骗性操纵的增加。这些行为者应该被阻止攻击物联网网络。网络安全威胁已经演变并变得动态复杂，以至于它们可以利用物联网网络中发现的任何漏洞。然而，随着软件定义网络（SDN）在物联网网络中作为中央监控单元的引入，物联网网络变得更不易受攻击和威胁。尽管如此，SDN本身也容易受到多种威胁。目标：提供物联网网络安全增强使用SDN控制器的最新和公正的综述。方法：我们通过系统性地进行映射研究（SMS）来评估使用SDN增强物联网网络安全的当前知识体系，遵循已建立的指导方针。结果：映射研究结果包括33项主要研究，针对四个主要研究问题进行分析。映射研究突出了当前研究趋势，并确定了SDN-物联网网络安全中存在的空白。结论：我们得出结论，用于保护物联网网络的SDN控制器架构通常是集中式控制器架构。然而，这种架构并非没有局限性。此外，用于风险缓解的主要技术是机器学习。

更新时间: 2024-08-02 14:44:15

领域: cs.CR,cs.NI,cs.SE

下载: http://arxiv.org/abs/2408.01303v1

A Decision-driven Methodology for Designing Uncertainty-aware AI Self-Assessment

Artificial intelligence (AI) has revolutionized decision-making processes and systems throughout society and, in particular, has emerged as a significant technology in high-impact scenarios of national interest. Yet, despite AI's impressive predictive capabilities in controlled settings, it still suffers from a range of practical setbacks preventing its widespread use in various critical scenarios. In particular, it is generally unclear if a given AI system's predictions can be trusted by decision-makers in downstream applications. To address the need for more transparent, robust, and trustworthy AI systems, a suite of tools has been developed to quantify the uncertainty of AI predictions and, more generally, enable AI to "self-assess" the reliability of its predictions. In this manuscript, we categorize methods for AI self-assessment along several key dimensions and provide guidelines for selecting and designing the appropriate method for a practitioner's needs. In particular, we focus on uncertainty estimation techniques that consider the impact of self-assessment on the choices made by downstream decision-makers and on the resulting costs and benefits of decision outcomes. To demonstrate the utility of our methodology for self-assessment design, we illustrate its use for two realistic national-interest scenarios. This manuscript is a practical guide for machine learning engineers and AI system users to select the ideal self-assessment techniques for each problem.

Updated: 2024-08-02 14:43:45

标题: 一个面向设计不确定性感知AI自我评估的决策驱动方法论

摘要: 人工智能（AI）已经彻底改变了社会各个领域的决策过程和系统，并且特别是在国家利益高度敏感的场景中成为一项重要技术。然而，尽管在受控环境中AI具有令人印象深刻的预测能力，但它仍然面临一系列实际问题，阻碍了其在各种关键场景中的广泛应用。特别是，通常不清楚给定AI系统的预测是否可以被决策者在下游应用中信任。为了满足对更透明、强大和可信赖的AI系统的需求，已经开发了一套工具来量化AI预测的不确定性，更一般地使AI能够“自我评估”其预测的可靠性。在这篇论文中，我们将AI自我评估方法分为几个关键维度，并提供了选择和设计适用于从业者需求的方法的指导方针。特别是，我们关注考虑自我评估对下游决策者选择的影响以及决策结果的成本和收益的不确定性估计技术。为了展示我们的自我评估设计方法的实用性，我们演示了其在两个现实的国家利益场景中的应用。这篇论文是对机器学习工程师和AI系统用户选择每个问题的理想自我评估技术的实用指南。

更新时间: 2024-08-02 14:43:45

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.01301v1

Secure Targeted Message Dissemination in IoT Using Blockchain Enabled Edge Computing

Smart devices are considered as an integral part of Internet of Things (IoT), have an aim to make a dynamic network to exchange information, collect data, analysis, and make optimal decisions in an autonomous way to achieve more efficient, automatic, and economical services. Message dissemination among these smart devices allows adding new features, sending updated instructions, alerts or safety messages, informing the pricing information or billing amount, incentives, and installing security patches. On one hand, such message disseminations are directly beneficial to the all parties involved in the IoT system. On the other hand, due to remote procedure, smart devices, vendors, and other involved authorities might have to meet a number of security, privacy, and performance related concerns while disseminating messages among targeted devices. To this end, in this paper, we design STarEdgeChain, a security and privacy aware targeted message dissemination in IoT to show how blockchain along with advanced cryptographic techniques are devoted to address such concerns. In fact, the STarEdgeChain employs a permissioned blockchain assisted edge computing in order to expedite a single signcrypted message dissemination among targeted groups of devices, at the same time avoiding the dependency of utilizing multiple unicasting approaches. Finally, we develop a software prototype of STarEdgeChain and show it's practicability for smart devices. The codes are publicly available at https://github.com/mbaqer/Blockchain-IoT

Updated: 2024-08-02 14:43:24

标题: 使用区块链启用的边缘计算在物联网中实现安全的定向消息传播

摘要: 智能设备被视为物联网的一个重要组成部分，旨在建立一个动态网络，以交换信息、收集数据、进行分析，并以自主方式作出最佳决策，从而实现更高效、自动化和经济化的服务。这些智能设备之间的消息传播允许添加新功能，发送更新的指令、警报或安全消息，提供定价信息或账单金额，提供激励措施，并安装安全补丁。一方面，这种消息传播对物联网系统中涉及的各方直接有益。另一方面，由于远程过程，智能设备、供应商和其他相关机构在向目标设备传播消息时可能需要满足一系列安全、隐私和性能相关的关注点。因此，在本文中，我们设计了STarEdgeChain，这是一个针对物联网中的安全和隐私意识的定向消息传播系统，展示了如何利用区块链和先进的加密技术来解决这些关注点。事实上，STarEdgeChain利用许可的区块链辅助边缘计算，以加快在目标设备群体之间传播单一签名加密消息，同时避免依赖多个单播方法。最后，我们开发了STarEdgeChain的软件原型，并展示了其对智能设备的实用性。代码可以在https://github.com/mbaqer/Blockchain-IoT上公开获取。

更新时间: 2024-08-02 14:43:24

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2401.06384v2

Assessing Robustness of Machine Learning Models using Covariate Perturbations

As machine learning models become increasingly prevalent in critical decision-making models and systems in fields like finance, healthcare, etc., ensuring their robustness against adversarial attacks and changes in the input data is paramount, especially in cases where models potentially overfit. This paper proposes a comprehensive framework for assessing the robustness of machine learning models through covariate perturbation techniques. We explore various perturbation strategies to assess robustness and examine their impact on model predictions, including separate strategies for numeric and non-numeric variables, summaries of perturbations to assess and compare model robustness across different scenarios, and local robustness diagnosis to identify any regions in the data where a model is particularly unstable. Through empirical studies on real world dataset, we demonstrate the effectiveness of our approach in comparing robustness across models, identifying the instabilities in the model, and enhancing model robustness.

Updated: 2024-08-02 14:41:36

标题: 使用协变量扰动评估机器学习模型的稳健性

摘要: 随着机器学习模型在金融、医疗等领域关键决策模型和系统中变得越来越普遍，确保它们抵御对抗性攻击和输入数据变化的稳健性至关重要，特别是在模型可能过拟合的情况下。本文提出了一个通过协变量扰动技术评估机器学习模型稳健性的综合框架。我们探讨了各种扰动策略来评估稳健性，并检查它们对模型预测的影响，包括针对数值和非数值变量的单独策略，扰动摘要以评估和比较不同场景下模型稳健性，以及局部稳健性诊断以识别数据中模型特别不稳定的区域。通过对真实数据集的实证研究，我们展示了我们的方法在比较模型稳健性、识别模型中的不稳定性和增强模型稳健性方面的有效性。

更新时间: 2024-08-02 14:41:36

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2408.01300v1

Reinforcement Learning applied to Insurance Portfolio Pursuit

When faced with a new customer, many factors contribute to an insurance firm's decision of what offer to make to that customer. In addition to the expected cost of providing the insurance, the firm must consider the other offers likely to be made to the customer, and how sensitive the customer is to differences in price. Moreover, firms often target a specific portfolio of customers that could depend on, e.g., age, location, and occupation. Given such a target portfolio, firms may choose to modulate an individual customer's offer based on whether the firm desires the customer within their portfolio. We term the problem of modulating offers to achieve a desired target portfolio the portfolio pursuit problem. Having formulated the portfolio pursuit problem as a sequential decision making problem, we devise a novel reinforcement learning algorithm for its solution. We test our method on a complex synthetic market environment, and demonstrate that it outperforms a baseline method which mimics current industry approaches to portfolio pursuit.

Updated: 2024-08-02 14:40:19

标题: 强化学习在保险组合追踪中的应用

摘要: 面对一个新客户时，许多因素会影响保险公司决定向该客户提供什么样的报价。除了提供保险的预期成本外，公司还必须考虑可能向客户提供的其他报价，以及客户对价格差异的敏感程度。此外，公司通常会针对一个特定的客户组合进行定位，这可能取决于年龄、地点和职业等因素。在确定这样一个目标客户组合后，公司可能会选择调节个别客户的报价，基于公司是否希望将该客户纳入其组合。我们将调节报价以实现期望目标客户组合的问题称为组合追求问题。将组合追求问题构建为一个顺序决策问题后，我们设计了一种新颖的强化学习算法来解决该问题。我们在一个复杂的合成市场环境中测试了我们的方法，并证明它胜过了一个基准方法，该方法模拟了目前行业对组合追求的方法。

更新时间: 2024-08-02 14:40:19

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2408.00713v2

Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees

Multivariate decision trees are powerful machine learning tools for classification and regression that attract many researchers and industry professionals. An optimal binary tree has two types of vertices, (i) branching vertices which have exactly two children and where datapoints are assessed on a set of discrete features and (ii) leaf vertices at which datapoints are given a prediction, and can be obtained by solving a biobjective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching vertices. Branching vertices are linear combinations of training features and therefore can be thought of as hyperplanes. In this paper, we propose two cut-based mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees (leaf vertices assign discrete classes). Our models leverage on-the-fly identification of minimal infeasible subsystems (MISs) from which we derive cutting planes that hold the form of packing constraints. We show theoretical improvements on the strongest flow-based MILO formulation currently in the literature and conduct experiments on publicly available datasets to show our models' ability to scale, strength against traditional branch and bound approaches, and robustness in out-of-sample test performance. Our code and data are available on GitHub.

Updated: 2024-08-02 14:37:28

标题: 最佳混合整数线性优化训练的多变量分类树

摘要: 多元决策树是用于分类和回归的强大机器学习工具，吸引了许多研究人员和行业专业人士。一个最优的二叉树有两种类型的顶点，(i)分支顶点，它们恰好有两个子顶点，在这些顶点上对数据点进行评估，并且在一组离散特征上进行评估；(ii)叶顶点，数据点在这些顶点上被给出一个预测，并且可以通过解决一个寻求(i)最大化正确分类的数据点数量和(ii)最小化分支顶点数量的双目标优化问题来获得。分支顶点是训练特征的线性组合，因此可以被视为超平面。在本文中，我们提出了两种基于切割的混合整数线性优化（MILO）形式，用于设计最优二进制分类树（叶节点分配离散类）。我们的模型利用即时识别最小不可行子系统（MISs），从中导出保持打包约束形式的切割平面。我们展示了比文献中当前最强的基于流的MILO形式的理论改进，并在公开可用的数据集上进行实验，展示了我们的模型在规模、强度对传统分支和界限方法的对抗性，以及在样本外测试性能方面的稳健性。我们的代码和数据可以在GitHub上找到。

更新时间: 2024-08-02 14:37:28

领域: cs.LG,cs.DM,math.CO

下载: http://arxiv.org/abs/2408.01297v1

Regularized Contrastive Partial Multi-view Outlier Detection

In recent years, multi-view outlier detection (MVOD) methods have advanced significantly, aiming to identify outliers within multi-view datasets. A key point is to better detect class outliers and class-attribute outliers, which only exist in multi-view data. However, existing methods either is not able to reduce the impact of outliers when learning view-consistent information, or struggle in cases with varying neighborhood structures. Moreover, most of them do not apply to partial multi-view data in real-world scenarios. To overcome these drawbacks, we propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD). In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency. Specifically, we propose (1) An outlier-aware contrastive loss with a potential outlier memory bank to eliminate their bias motivated by a theoretical analysis. (2) A neighbor alignment contrastive loss to capture the view-shared local structural correlation. (3) A spreading regularization loss to prevent the model from overfitting over outliers. With the Cross-view Relation Transfer technique, we could easily impute the missing view samples based on the features of neighbors. Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors under different settings.

Updated: 2024-08-02 14:34:27

标题: 正则化对比部分多视角异常检测

摘要: 近年来，多视图异常检测（MVOD）方法取得了显著进展，旨在识别多视图数据集中的异常值。一个关键点是更好地检测类异常和类属性异常，这些异常只存在于多视图数据中。然而，现有方法要么无法在学习视图一致信息时减少异常值的影响，要么在具有不同邻域结构的情况下陷入困境。此外，大多数方法在现实场景中不适用于部分多视图数据。为了克服这些缺点，我们提出了一种名为正则化对比部分多视图异常检测（RCPMOD）的新方法。在这个框架中，我们利用对比学习来学习视图一致信息，并通过一致性程度来区分异常值。具体而言，我们提出了（1）一种考虑异常值的对比损失，并使用潜在的异常值记忆库来消除其偏见，这受到理论分析的启发。（2）一种邻居对齐对比损失，用于捕获视图共享的局部结构相关性。（3）一种传播正则化损失，防止模型过度拟合异常值。通过交叉视图关系传输技术，我们可以基于邻居的特征轻松地填补缺失的视图样本。在四个基准数据集上的实验结果表明，我们提出的方法在不同设置下可以胜过最先进的竞争对手。

更新时间: 2024-08-02 14:34:27

领域: cs.MM,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.07819v1

Don't Waste Your Time: Early Stopping Cross-Validation

State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold cross-validation instead of holdout validation drastically increases the computational cost of validating a single configuration. While ensuring better generalization and, by extension, better performance, the additional cost is often prohibitive for effective model selection within a time budget. We aim to make model selection with cross-validation more effective. Therefore, we study early stopping the process of cross-validation during model selection. We investigate the impact of early stopping on random search for two algorithms, MLP and random forest, across 36 classification datasets. We further analyze the impact of the number of folds by considering 3-, 5-, and 10-folds. In addition, we investigate the impact of early stopping with Bayesian optimization instead of random search and also repeated cross-validation. Our exploratory study shows that even a simple-to-understand and easy-to-implement method consistently allows model selection to converge faster; in ~94% of all datasets, on average by ~214%. Moreover, stopping cross-validation enables model selection to explore the search space more exhaustively by considering +167% configurations on average within one hour, while also obtaining better overall performance.

Updated: 2024-08-02 14:33:32

标题: 不要浪费时间：早停止交叉验证

摘要: 目前针对表格数据的自动化机器学习系统通常采用交叉验证，以确保所测量的性能能够泛化到未见过的数据，或者后续的集成不会过拟合。然而，使用k折交叉验证而不是留出验证会显著增加验证单个配置的计算成本。虽然可以确保更好的泛化性能，从而提高性能，但额外的成本通常会限制在时间预算内进行有效模型选择。我们的目标是使交叉验证下的模型选择更加有效。因此，我们研究了在模型选择过程中提前停止交叉验证的影响。我们研究了对36个分类数据集中的两种算法（MLP和随机森林）进行随机搜索时提前停止的影响。我们进一步分析了3折、5折和10折交叉验证的折数对结果的影响。此外，我们还研究了使用贝叶斯优化而不是随机搜索以及重复交叉验证的提前停止的影响。我们的探索性研究表明，即使是一个简单易懂且易于实施的方法也能够使模型选择更快地收敛；在大约94%的数据集中，平均提高了约214%。此外，停止交叉验证使模型选择能够更全面地探索搜索空间，平均在一个小时内考虑了+167%的配置，同时也获得了更好的整体性能。

更新时间: 2024-08-02 14:33:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.03389v2

Feature Clock: High-Dimensional Effects in Two-Dimensional Plots

Humans struggle to perceive and interpret high-dimensional data. Therefore, high-dimensional data are often projected into two dimensions for visualization. Many applications benefit from complex nonlinear dimensionality reduction techniques, but the effects of individual high-dimensional features are hard to explain in the two-dimensional space. Most visualization solutions use multiple two-dimensional plots, each showing the effect of one high-dimensional feature in two dimensions; this approach creates a need for a visual inspection of k plots for a k-dimensional input space. Our solution, Feature Clock, provides a novel approach that eliminates the need to inspect these k plots to grasp the influence of original features on the data structure depicted in two dimensions. Feature Clock enhances the explainability and compactness of visualizations of embedded data and is available in an open-source Python library.

Updated: 2024-08-02 14:31:37

标题: 特征时钟：二维图中的高维效应

摘要: 人类很难感知和解释高维数据。因此，高维数据经常被投影到二维空间进行可视化。许多应用程序受益于复杂的非线性降维技术，但在二维空间中很难解释个体高维特征的影响。大多数可视化解决方案使用多个二维图，每个图显示一个高维特征在二维空间中的影响；这种方法需要对 k 维输入空间进行 k 个图的视觉检查。我们的解决方案，Feature Clock，提供了一种新颖的方法，消除了需要检查这些 k 个图来把握原始特征对在二维空间中呈现的数据结构的影响。Feature Clock增强了嵌入数据可视化的可解释性和紧凑性，并在开源Python库中提供。

更新时间: 2024-08-02 14:31:37

领域: cs.LG

下载: http://arxiv.org/abs/2408.01294v1

3DPX: Progressive 2D-to-3D Oral Image Reconstruction with Hybrid MLP-CNN Networks

Panoramic X-ray (PX) is a prevalent modality in dental practice for its wide availability and low cost. However, as a 2D projection image, PX does not contain 3D anatomical information, and therefore has limited use in dental applications that can benefit from 3D information, e.g., tooth angular misa-lignment detection and classification. Reconstructing 3D structures directly from 2D PX has recently been explored to address limitations with existing methods primarily reliant on Convolutional Neural Networks (CNNs) for direct 2D-to-3D mapping. These methods, however, are unable to correctly infer depth-axis spatial information. In addition, they are limited by the in-trinsic locality of convolution operations, as the convolution kernels only capture the information of immediate neighborhood pixels. In this study, we propose a progressive hybrid Multilayer Perceptron (MLP)-CNN pyra-mid network (3DPX) for 2D-to-3D oral PX reconstruction. We introduce a progressive reconstruction strategy, where 3D images are progressively re-constructed in the 3DPX with guidance imposed on the intermediate recon-struction result at each pyramid level. Further, motivated by the recent ad-vancement of MLPs that show promise in capturing fine-grained long-range dependency, our 3DPX integrates MLPs and CNNs to improve the semantic understanding during reconstruction. Extensive experiments on two large datasets involving 464 studies demonstrate that our 3DPX outperforms state-of-the-art 2D-to-3D oral reconstruction methods, including standalone MLP and transformers, in reconstruction quality, and also im-proves the performance of downstream angular misalignment classification tasks.

Updated: 2024-08-02 14:28:10

标题: 3DPX：使用混合MLP-CNN网络进行渐进式2D到3D口腔影像重建

摘要: 全景X光（PX）在牙科实践中是一种普遍的模态，因为其广泛的可用性和低成本。然而，作为一种2D投影图像，PX不包含3D解剖信息，因此在牙科应用中的使用受到限制，例如，牙齿角度错位检测和分类。最近，直接从2D PX重建3D结构的方法已经开始探索，以解决现有方法的限制，这些方法主要依赖于卷积神经网络（CNN）进行直接的2D到3D映射。然而，这些方法无法正确推断深度轴的空间信息。此外，它们受到卷积操作的固有局部性的限制，因为卷积核只捕获相邻像素的信息。在这项研究中，我们提出了一种渐进式混合多层感知器（MLP）-卷积神经网络金字塔网络（3DPX）用于2D到3D口腔PX重建。我们引入了一种渐进式重建策略，在3DPX中逐渐重建3D图像，并在每个金字塔级别对中间重建结果施加指导。受到最近MLP的进展的启发，这些进展显示出在捕捉细粒度的长程依赖性方面很有前景，我们的3DPX集成了MLP和CNN以在重建过程中提高语义理解。对涉及464项研究的两个大型数据集进行的广泛实验表明，我们的3DPX在重建质量上优于最先进的2D到3D口腔重建方法，包括独立的MLP和变压器，并且还提高了下游角度错位分类任务的性能。

更新时间: 2024-08-02 14:28:10

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.01292v1

Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives

The emergence of Generative Artificial Intelligence (AI) and Large Language Models (LLMs) has marked a new era of Natural Language Processing (NLP), introducing unprecedented capabilities that are revolutionizing various domains. This paper explores the current state of these cutting-edge technologies, demonstrating their remarkable advancements and wide-ranging applications. Our paper contributes to providing a holistic perspective on the technical foundations, practical applications, and emerging challenges within the evolving landscape of Generative AI and LLMs. We believe that understanding the generative capabilities of AI systems and the specific context of LLMs is crucial for researchers, practitioners, and policymakers to collaboratively shape the responsible and ethical integration of these technologies into various domains. Furthermore, we identify and address main research gaps, providing valuable insights to guide future research endeavors within the AI research community.

Updated: 2024-08-02 14:26:55

标题: 最近在生成式人工智能和大型语言模型方面的进展：当前状况、挑战和展望

摘要: 生成式人工智能（AI）和大型语言模型（LLMs）的出现标志着自然语言处理（NLP）的一个新时代的到来，引入了前所未有的能力，正在彻底改变各个领域。本文探讨了这些尖端技术的当前状态，展示了它们在技术上的显著进展和广泛的应用。我们的论文致力于提供对生成式AI和LLMs不断发展的技术基础、实际应用和新兴挑战的整体视角。我们相信，了解AI系统的生成能力和LLMs的具体背景对于研究人员、实践者和政策制定者共同塑造这些技术在各个领域中负责任和道德整合至关重要。此外，我们确定并解决了主要研究领域的差距，为指导未来AI研究社区的研究努力提供了宝贵的见解。

更新时间: 2024-08-02 14:26:55

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14962v3

Semi-Supervised Dual-Stream Self-Attentive Adversarial Graph Contrastive Learning for Cross-Subject EEG-based Emotion Recognition

Electroencephalography (EEG) is an objective tool for emotion recognition with promising applications. However, the scarcity of labeled data remains a major challenge in this field, limiting the widespread use of EEG-based emotion recognition. In this paper, a semi-supervised Dual-stream Self-Attentive Adversarial Graph Contrastive learning framework (termed as DS-AGC) is proposed to tackle the challenge of limited labeled data in cross-subject EEG-based emotion recognition. The DS-AGC framework includes two parallel streams for extracting non-structural and structural EEG features. The non-structural stream incorporates a semi-supervised multi-domain adaptation method to alleviate distribution discrepancy among labeled source domain, unlabeled source domain, and unknown target domain. The structural stream develops a graph contrastive learning method to extract effective graph-based feature representation from multiple EEG channels in a semi-supervised manner. Further, a self-attentive fusion module is developed for feature fusion, sample selection, and emotion recognition, which highlights EEG features more relevant to emotions and data samples in the labeled source domain that are closer to the target domain. Extensive experiments conducted on two benchmark databases (SEED and SEED-IV) using a semi-supervised cross-subject leave-one-subject-out cross-validation evaluation scheme show that the proposed model outperforms existing methods under different incomplete label conditions (with an average improvement of 5.83% on SEED and 6.99% on SEED-IV), demonstrating its effectiveness in addressing the label scarcity problem in cross-subject EEG-based emotion recognition.

Updated: 2024-08-02 14:25:40

标题: 半监督双流自注意对抗图对比学习用于跨主体基于EEG的情绪识别

摘要: 脑电图（EEG）是一种客观的情绪识别工具，具有广泛的应用前景。然而，在这一领域中标记数据的稀缺仍然是一个主要挑战，限制了基于EEG的情绪识别的广泛应用。本文提出了一个半监督的双流自注意对抗图对比学习框架（称为DS-AGC），以解决跨主体EEG情绪识别中有限标记数据的挑战。DS-AGC框架包括两个并行流，用于提取非结构化和结构化EEG特征。非结构化流整合了一种半监督多领域适应方法，以减轻标记源域、未标记源域和未知目标域之间的分布差异。结构化流开发了一种图对比学习方法，以半监督方式从多个EEG通道中提取有效的基于图的特征表示。此外，还开发了一个自我注意融合模块，用于特征融合、样本选择和情感识别，突出了与情绪和标记源域中更接近目标域的数据样本相关的EEG特征。在两个基准数据库（SEED和SEED-IV）上进行的广泛实验采用半监督跨主体留一主体外交叉验证评估方案，结果显示所提出的模型在不同不完整标记条件下优于现有方法（在SEED上平均改进5.83％，在SEED-IV上改进6.99％），表明其在解决跨主体EEG情绪识别中标记稀缺问题方面的有效性。

更新时间: 2024-08-02 14:25:40

领域: eess.SP,cs.HC,cs.LG

下载: http://arxiv.org/abs/2308.11635v2

Weakly Supervised Text-to-SQL Parsing through Question Decomposition

Text-to-SQL parsers are crucial in enabling non-experts to effortlessly query relational data. Training such parsers, by contrast, generally requires expertise in annotating natural language (NL) utterances with corresponding SQL queries. In this work, we propose a weak supervision approach for training text-to-SQL parsers. We take advantage of the recently proposed question meaning representation called QDMR, an intermediate between NL and formal query languages. Given questions, their QDMR structures (annotated by non-experts or automatically predicted), and the answers, we are able to automatically synthesize SQL queries that are used to train text-to-SQL models. We test our approach by experimenting on five benchmark datasets. Our results show that the weakly supervised models perform competitively with those trained on annotated NL-SQL data. Overall, we effectively train text-to-SQL parsers, while using zero SQL annotations.

Updated: 2024-08-02 14:21:43

标题: 弱监督文本到SQL解析：通过问题分解

摘要: 文本到SQL解析器在使非专业人士能够轻松查询关系数据方面至关重要。相比之下，训练这类解析器通常需要在自然语言（NL）话语中注释相应SQL查询的专业知识。在这项工作中，我们提出了一种弱监督方法来训练文本到SQL解析器。我们利用最近提出的问题意义表示称为QDMR，这是介于NL和形式查询语言之间的中间表示。给定问题、它们的QDMR结构（由非专家注释或自动预测）以及答案，我们能够自动合成用于训练文本到SQL模型的SQL查询。我们通过对五个基准数据集进行实验来测试我们的方法。我们的结果显示，弱监督模型与在注释的NL-SQL数据上训练的模型具有竞争力。总的来说，我们有效地训练了文本到SQL解析器，同时使用零个SQL注释。

更新时间: 2024-08-02 14:21:43

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2112.06311v4

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.

Updated: 2024-08-02 14:18:51

标题: 通过对多个思维链进行元推理回答问题

摘要: 现代多跳问答（QA）系统通常将问题分解为一系列推理步骤，称为思维链（CoT），然后才得出最终答案。通常，会抽样多个链条，并通过对最终答案的投票机制进行聚合，但中间步骤本身被丢弃。虽然这种方法提高了性能，但它们并不考虑链条之间的中间步骤关系，也不为预测答案提供统一解释。我们引入了多链推理（MCR）方法，该方法促使大型语言模型在多个思维链上进行元推理，而不是聚合它们的答案。MCR检查不同的推理链，混合它们之间的信息，并在生成解释和预测答案时选择最相关的事实。MCR在7个多跳QA数据集上胜过强基线。此外，我们的分析表明，MCR的解释具有高质量，使人类能够验证其答案。

更新时间: 2024-08-02 14:18:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2304.13007v4

A Tiny Supervised ODL Core with Auto Data Pruning for Human Activity Recognition

In this paper, we introduce a low-cost and low-power tiny supervised on-device learning (ODL) core that can address the distributional shift of input data for human activity recognition. Although ODL for resource-limited edge devices has been studied recently, how exactly to provide the training labels to these devices at runtime remains an open-issue. To address this problem, we propose to combine an automatic data pruning with supervised ODL to reduce the number queries needed to acquire predicted labels from a nearby teacher device and thus save power consumption during model retraining. The data pruning threshold is automatically tuned, eliminating a manual threshold tuning. As a tinyML solution at a few mW for the human activity recognition, we design a supervised ODL core that supports our automatic data pruning using a 45nm CMOS process technology. We show that the required memory size for the core is smaller than the same-shaped multilayer perceptron (MLP) and the power consumption is only 3.39mW. Experiments using a human activity recognition dataset show that the proposed automatic data pruning reduces the communication volume by 55.7% and power consumption accordingly with only 0.9% accuracy loss.

Updated: 2024-08-02 14:09:39

标题: 一个带有自动数据修剪功能的微型监督ODL核心用于人类活动识别

摘要: 在这篇论文中，我们介绍了一种低成本、低功耗的微型监督式设备学习（ODL）核心，可以解决输入数据的分布偏移对人类活动识别造成的影响。尽管最近已经研究了针对资源有限的边缘设备的ODL，但如何在运行时为这些设备提供训练标签仍然是一个未解决的问题。为了解决这个问题，我们提出结合自动数据修剪和监督式ODL，以减少从附近的教师设备获取预测标签所需的查询次数，从而在模型重新训练期间节省功耗。数据修剪阈值是自动调整的，消除了手动调整阈值的需求。作为一种用于人类活动识别的微型机器学习解决方案，在45纳米CMOS工艺技术下设计了一个支持我们自动数据修剪的监督式ODL核心。我们展示了核心所需的存储器大小比相同形状的多层感知器（MLP）小，功耗仅为3.39mW。使用人类活动识别数据集的实验表明，所提出的自动数据修剪将通信量减少了55.7%，相应地降低了功耗，但仅有0.9%的准确率损失。

更新时间: 2024-08-02 14:09:39

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2408.01283v1

DragD3D: Realistic Mesh Editing with Rigidity Control Driven by 2D Diffusion Priors

Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Mesh editing methods are typically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics and geometry-based regularizers are not aware of the global context and semantics of the object, and the more recent deep learning priors are limited to a specific class of 3D object deformations. Our main contribution is a vertex-based mesh editing method called DragD3D based on (1) a novel optimization formulation that decouples the rotation and stretch components of the deformation and combines a 3D geometric regularizer with (2) the recently introduced DDS loss which scores the faithfulness of the rendered 2D image to one from a diffusion model. Thus, our deformation method achieves globally realistic shape deformation which is not restricted to any class of objects. Our new formulation optimizes directly the transformation of the neural Jacobian field explicitly separating the rotational and stretching components. The objective function of the optimization combines the approximate gradients of DDS and the gradients from the geometric loss to satisfy the vertex constraints. Additional user control over desired global shape deformation is made possible by allowing explicit per-triangle deformation control as well as explicit separation of rotational and stretching components of the deformation. We show that our deformations can be controlled to yield realistic shape deformations that are aware of the global context of the objects, and provide better results than just using geometric regularizers.

Updated: 2024-08-02 14:08:59

标题: DragD3D：由2D扩散先验驱动的具有刚度控制的真实网格编辑

摘要: 直接网格编辑和变形是几何建模和动画流程中的关键组成部分。网格编辑方法通常被构建为将用户指定的顶点约束与决定其余顶点位置的正则化器相结合的优化问题。正则化器的选择对最终结果的逼真度和真实性至关重要。基于物理和几何的正则化器并不了解对象的全局上下文和语义，而最近的深度学习先验仅限于一类特定的3D对象变形。我们的主要贡献是一种基于顶点的网格编辑方法，称为DragD3D，基于（1）一种新颖的优化公式，将变形的旋转和拉伸组件分离，并将3D几何正则化器与（2）最近引入的DDS损失相结合，该损失评分渲染的2D图像与扩散模型中的图像的忠实度。因此，我们的变形方法实现了全局逼真的形状变形，不受任何对象类的限制。我们的新公式直接优化神经雅可比场的变换，明确分离旋转和拉伸组件。优化的目标函数结合了DDS的近似梯度和来自几何损失的梯度，以满足顶点约束。通过允许显式的每个三角形变形控制以及明确分离变形的旋转和拉伸组件，用户可以对所需的全局形状变形进行额外控制。我们展示了我们的变形可以被控制以产生意识到对象全局上下文的逼真形状变形，并且比仅使用几何正则化器提供更好的结果。

更新时间: 2024-08-02 14:08:59

领域: cs.GR,cs.LG

下载: http://arxiv.org/abs/2310.04561v2

Comprehensive Library of Variational LSE Solvers

Linear systems of equations can be found in various mathematical domains, as well as in the field of machine learning. By employing noisy intermediate-scale quantum devices, variational solvers promise to accelerate finding solutions for large systems. Although there is a wealth of theoretical research on these algorithms, only fragmentary implementations exist. To fill this gap, we have developed the variational-lse-solver framework, which realizes existing approaches in literature, and introduces several enhancements. The user-friendly interface is designed for researchers that work at the abstraction level of identifying and developing end-to-end applications.

Updated: 2024-08-02 13:59:12

标题: 全面的变分LSE求解器库 (Note: LSE stands for least squares error)

摘要: 线性方程组可以在各种数学领域以及机器学习领域找到。通过利用噪声中等规模量子设备，变分求解器承诺加速找到大型系统的解决方案。尽管对这些算法进行了大量的理论研究，但只存在片段实现。为了填补这一空白，我们开发了变分-lse-求解器框架，实现了文献中现有的方法，并引入了几项增强功能。用户友好的界面旨在为在识别和开发端到端应用程序的抽象级别工作的研究人员设计。

更新时间: 2024-08-02 13:59:12

领域: quant-ph,cs.LG,cs.SE

下载: http://arxiv.org/abs/2404.09916v2

Certified Robust Invariant Polytope Training in Neural Controlled ODEs

We consider a nonlinear control system modeled as an ordinary differential equation subject to disturbance, with a state feedback controller parameterized as a feedforward neural network. We propose a framework for training controllers with certified robust forward invariant polytopes, where any trajectory initialized inside the polytope remains within the polytope, regardless of the disturbance. First, we parameterize a family of lifted control systems in a higher dimensional space, where the original neural controlled system evolves on an invariant subspace of each lifted system. We use interval analysis and neural network verifiers to further construct a family of lifted embedding systems, carefully capturing the knowledge of this invariant subspace. If the vector field of any lifted embedding system satisfies a sign constraint at a single point, then a certain convex polytope of the original system is robustly forward invariant. Treating the neural network controller and the lifted system parameters as variables, we propose an algorithm to train controllers with certified forward invariant polytopes in the closed-loop control system. Through two examples, we demonstrate how the simplicity of the sign constraint allows our approach to scale with system dimension to over $50$ states, and outperform state-of-the-art Lyapunov-based sampling approaches in runtime.

Updated: 2024-08-02 13:55:26

标题: 神经控制ODE中的认证鲁棒不变多面体训练

摘要: 我们考虑一个被建模为普通微分方程的非线性控制系统，受到干扰的影响，并且具有参数化为前馈神经网络的状态反馈控制器。我们提出了一个框架，用于训练具有认证的鲁棒前向不变多面体的控制器，在这个多面体内初始化的任何轨迹都会保持在多面体内，而不受干扰的影响。首先，我们在一个更高维空间中参数化了一族抬升控制系统，原始的神经网络控制系统在每个抬升系统的不变子空间上演化。我们利用区间分析和神经网络验证器进一步构建了一族抬升嵌入系统，仔细捕捉了这个不变子空间的知识。如果任何抬升嵌入系统的向量场在一个单点上满足一个符号约束，那么原系统的某个凸多面体会是鲁棒前向不变的。将神经网络控制器和抬升系统参数视为变量，我们提出了一个算法，在闭环控制系统中训练具有认证的前向不变多面体的控制器。通过两个例子，我们演示了符号约束的简单性如何使我们的方法能够随着系统维度扩展到超过50个状态，并在运行时间上优于最先进的基于Lyapunov的采样方法。

更新时间: 2024-08-02 13:55:26

领域: cs.LG,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2408.01273v1

LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation

The emergent large language/multimodal models facilitate the evolution of mobile agents, especially in mobile UI task automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined action sequences, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on-device mobile UI task execution and faithful, scalable task evaluation. By observing that the task execution process only transfers UI states, LlamaTouch employs a novel evaluation approach that only assesses whether an agent traverses all manually annotated, essential application/system states. LlamaTouch comprises three key techniques: (1) On-device task execution that enables mobile agents to interact with realistic mobile environments for task execution. (2) Fine-grained UI component annotation that merges pixel-level screenshots and textual screen hierarchies to explicitly identify and precisely annotate essential UI components with a rich set of designed annotation primitives. (3) A multi-level application state matching algorithm that utilizes exact and fuzzy matching to accurately detect critical information in each screen, even with unpredictable UI layout/content dynamics. LlamaTouch currently incorporates four mobile agents and 496 tasks, encompassing both tasks in the widely-used datasets and our self-constructed ones to cover more diverse mobile applications. Evaluation results demonstrate LlamaTouch's high faithfulness of evaluation in real-world mobile environments and its better scalability than human validation. LlamaTouch also enables easy task annotation and integration of new mobile agents. Code and dataset are publicly available at https://github.com/LlamaTouch/LlamaTouch.

Updated: 2024-08-02 13:49:32

标题: LlamaTouch：一个忠实且可扩展的移动UI任务自动化测试平台

摘要: 新兴的大型语言/多模型有助于移动代理的进化，特别是在移动UI任务自动化方面。然而，现有的评估方法依赖于人工验证或已建立的数据集，用于比较代理预测的动作与预定义的动作序列，这种方法不可扩展且不可靠。为了克服这些限制，本文介绍了LlamaTouch，一个用于设备上移动UI任务执行和忠实、可扩展任务评估的测试平台。通过观察到任务执行过程仅传输UI状态，LlamaTouch采用了一种新颖的评估方法，仅评估代理是否遍历了所有手动注释的、必要的应用程序/系统状态。LlamaTouch包括三个关键技术：（1）设备上的任务执行，使移动代理能够与现实移动环境进行交互以执行任务。（2）细粒度UI组件注释，将像素级屏幕截图和文本屏幕层次结构合并，以明确识别和精确注释必要的UI组件，并使用一组设计良好的注释基元。（3）多级应用程序状态匹配算法，利用精确和模糊匹配准确检测每个屏幕中的关键信息，即使UI布局/内容动态不可预测。LlamaTouch目前包括四个移动代理和496个任务，涵盖了广泛使用的数据集中的任务和我们自己构建的任务，以涵盖更多多样化的移动应用程序。评估结果表明，LlamaTouch在真实移动环境中具有高度忠实的评估能力，比人工验证更具可扩展性。LlamaTouch还可以轻松进行任务注释和集成新的移动代理。代码和数据集可以在https://github.com/LlamaTouch/LlamaTouch上公开获取。

更新时间: 2024-08-02 13:49:32

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2404.16054v2

D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions

Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently limited by well-known challenges that exist in the large language model space. Hallucinations and imprecision in responses can lead to misdiagnosis which currently hinder the clinical adaptability of VLMs. To create precise, user-friendly models in healthcare, we propose D-Rax -- a domain-specific, conversational, radiologic assistance tool that can be used to gain insights about a particular radiologic image. In this study, we enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting, offering comprehensive insights from medical imaging and aiding in the formulation of accurate diagnosis. D-Rax is achieved by fine-tuning the LLaVA-Med architecture on our curated enhanced instruction-following data, comprising of images, instructions, as well as disease diagnosis and demographic predictions derived from MIMIC-CXR imaging data, CXR-related visual question answer (VQA) pairs, and predictive outcomes from multiple expert AI models. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations. Leveraging the power of state-of-the-art diagnostic models combined with VLMs, D-Rax empowers clinicians to interact with medical images using natural language, which could potentially streamline their decision-making process, enhance diagnostic accuracy, and conserve their time.

Updated: 2024-08-02 13:45:53

标题: D-Rax：利用多模态数据和专家模型预测的领域特定放射学助手

摘要: 大型视觉语言模型（VLMs）已经从研究进展到适用于通用用例。 LLaVA-Med是生物医学领域的一种开创性的大型语言和视觉助手，可以执行多模态生物医学图像和数据分析，为放射科医生提供自然语言界面。尽管它具有很高的泛化能力，并且可以处理多模态数据，但它目前受到存在于大型语言模型空间中的众所周知的挑战的限制。幻觉和回应的不准确性可能导致误诊，这目前阻碍了VLM的临床适应性。为了在医疗保健领域创建精确、用户友好的模型，我们提出了D-Rax——一个特定领域的、对话式的、放射学辅助工具，可用于获取关于特定放射图像的见解。在这项研究中，我们加强了胸部X线（CXR）图像的对话分析，以支持放射学报告，从医学成像中提供全面的见解，并帮助制定准确的诊断。通过在我们精心策划的增强指令遵循数据上微调LLaVA-Med架构实现了D-Rax，该数据包括图像、指令，以及从MIMIC-CXR成像数据、CXR相关视觉问题答案（VQA）对和来自多个专家AI模型的预测结果中推导的疾病诊断和人口预测。我们观察到在评估开放和封闭结束对话时的回应有统计显著的改进。利用最先进的诊断模型与VLM的力量，D-Rax使临床医生能够使用自然语言与医学图像进行交互，这可能加快他们的决策过程，提高诊断准确性，并节省他们的时间。

更新时间: 2024-08-02 13:45:53

领域: cs.AI,cs.CL,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.02604v2

The virtual CAT: A tool for algorithmic thinking assessment in Swiss compulsory education

In today's digital era, holding algorithmic thinking (AT) skills is crucial, not only in computer science-related fields. These abilities enable individuals to break down complex problems into more manageable steps and create a sequence of actions to solve them. To address the increasing demand for AT assessments in educational settings and the limitations of current methods, this paper introduces the virtual Cross Array Task (CAT), a digital adaptation of an unplugged assessment activity designed to evaluate algorithmic skills in Swiss compulsory education. This tool offers scalable and automated assessment, reducing human involvement and mitigating potential data collection errors. The platform features gesture-based and visual block-based programming interfaces, ensuring its usability for diverse learners, further supported by multilingual capabilities. To evaluate the virtual CAT platform, we conducted a pilot evaluation in Switzerland involving a heterogeneous group of students. The findings show the platform's usability, proficiency and suitability for assessing AT skills among students of diverse ages, development stages, and educational backgrounds, as well as the feasibility of large-scale data collection.

Updated: 2024-08-02 13:36:17

标题: 虚拟CAT：瑞士义务教育中算法思维评估工具

摘要: 在当今数字时代，掌握算法思维（AT）技能至关重要，不仅在与计算机科学相关的领域。这些能力使个人能够将复杂问题分解为更易管理的步骤，并创建一系列行动来解决问题。为了应对教育环境中对AT评估需求的增加以及当前方法的局限性，本文介绍了虚拟Cross Array Task（CAT），这是一种数字化版本的非插电评估活动，旨在评估瑞士义务教育中的算法技能。该工具提供可扩展和自动化的评估，减少人为干预并减轻潜在的数据收集错误。该平台具有基于手势和视觉块编程接口，确保其对各种学习者的可用性，并进一步得到多语言能力的支持。为了评估虚拟CAT平台，我们在瑞士进行了一项涉及异质学生群体的试点评估。结果显示该平台对不同年龄、发展阶段和教育背景的学生评估AT技能的可用性、熟练度和适用性，以及大规模数据收集的可行性。

更新时间: 2024-08-02 13:36:17

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2408.01263v1

Detection and Characterization of Coordinated Online Behavior: A Survey

Coordination is a fundamental aspect of life. The advent of social media has made it integral also to online human interactions, such as those that characterize thriving online communities and social movements. At the same time, coordination is also core to effective disinformation, manipulation, and hate campaigns. This survey collects, categorizes, and critically discusses the body of work produced as a result of the growing interest on coordinated online behavior. We reconcile industry and academic definitions, propose a comprehensive framework to study coordinated online behavior, and review and critically discuss the existing detection and characterization methods. Our analysis identifies open challenges and promising directions of research, serving as a guide for scholars, practitioners, and policymakers in understanding and addressing the complexities inherent to online coordination.

Updated: 2024-08-02 13:27:56

标题: 在线行为的检测和特征化：一项调查

摘要: 协调是生活的基本方面。社交媒体的出现也使其成为在线人际互动的一个重要组成部分，例如繁荣的在线社区和社会运动所特征的互动。同时，协调也是有效的虚假信息、操纵和仇恨活动的核心。本调查收集、分类并批判性地讨论了由于对协调在线行为日益增长的兴趣而产生的研究成果。我们调和了行业和学术定义，提出了一个综合框架来研究协调的在线行为，并审查和批判性地讨论了现有的检测和表征方法。我们的分析确定了开放性挑战和有前途的研究方向，为学者、从业者和政策制定者在理解和解决在线协调的复杂性问题提供了指导。

更新时间: 2024-08-02 13:27:56

领域: cs.SI,cs.AI,cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2408.01257v1

Canonical Decision Diagrams Modulo Theories

Decision diagrams (DDs) are powerful tools to represent effectively propositional formulas, which are largely used in many domains, in particular in formal verification and in knowledge compilation. Some forms of DDs (e.g., OBDDs, SDDs) are canonical, that is, (under given conditions on the atom list) they univocally represent equivalence classes of formulas. Given the limited expressiveness of propositional logic, a few attempts to leverage DDs to SMT level have been presented in the literature. Unfortunately, these techniques still suffer from some limitations: most procedures are theory-specific; some produce theory DDs (T-DDs) which do not univocally represent T-valid formulas or T-inconsistent formulas; none of these techniques provably produces theory-canonical T-DDs, which (under given conditions on the T-atom list) univocally represent T-equivalence classes of formulas. Also, these procedures are not easy to implement, and very few implementations are actually available. In this paper, we present a novel very-general technique to leverage DDs to SMT level, which has several advantages: it is very easy to implement on top of an AllSMT solver and a DD package, which are used as blackboxes; it works for every form of DDs and every theory, or combination thereof, supported by the AllSMT solver; it produces theory-canonical T-DDs if the propositional DD is canonical. We have implemented a prototype tool for both T-OBDDs and T-SDDs on top of OBDD and SDD packages and the MathSAT SMT solver. Some preliminary empirical evaluation supports the effectiveness of the approach.

Updated: 2024-08-02 13:27:16

标题: 模除理论的规范决策图

摘要: 决策图（DDs）是有效表示命题公式的强大工具，在许多领域中被广泛使用，特别是在形式验证和知识编译中。一些形式的DDs（例如OBDDs，SDDs）是规范的，即（在原子列表上的给定条件下）它们唯一地表示公式的等价类。鉴于命题逻辑的有限表达能力，文献中已经提出了一些利用DDs到SMT水平的尝试。不幸的是，这些技术仍然存在一些局限性：大多数程序是特定于理论的；一些产生理论DDs（T-DDs）不唯一地表示T-有效公式或T-不一致公式；这些技术都不能保证产生理论规范的T-DDs，即（在T-原子列表上的给定条件下）唯一地表示T-等价类的公式。此外，这些程序实现起来并不容易，实际上可用的实现非常少。在本文中，我们提出了一种全新的非常通用的技术，利用DDs到SMT水平，具有几个优势：它非常容易在AllSMT求解器和DD软件包之上实现，这些软件包被用作黑匣子；它适用于任何形式的DDs和每个理论，或它们的组合，由AllSMT求解器支持；如果命题DD是规范的，它产生理论规范的T-DDs。我们已经在OBDD和SDD软件包以及MathSAT SMT求解器上实现了一个原型工具，用于T-OBDDs和T-SDDs。一些初步的经验评估支持该方法的有效性。

更新时间: 2024-08-02 13:27:16

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2404.16455v3

The MovieLens Beliefs Dataset: Collecting Pre-Choice Data for Online Recommender Systems

An increasingly important aspect of designing recommender systems involves considering how recommendations will influence consumer choices. This paper addresses this issue by introducing a method for collecting user beliefs about un-experienced items - a critical predictor of choice behavior. We implemented this method on the MovieLens platform, resulting in a rich dataset that combines user ratings, beliefs, and observed recommendations. We document challenges to such data collection, including selection bias in response and limited coverage of the product space. This unique resource empowers researchers to delve deeper into user behavior and analyze user choices absent recommendations, measure the effectiveness of recommendations, and prototype algorithms that leverage user belief data, ultimately leading to more impactful recommender systems. The dataset can be found at https://grouplens.org/datasets/movielens/ml_belief_2024/.

Updated: 2024-08-02 13:26:44

标题: MovieLens信念数据集：为在线推荐系统收集预选数据

摘要: 设计推荐系统越来越重要的一个方面是考虑推荐如何影响消费者选择。本文通过引入一种收集用户对未体验物品信念的方法来解决这个问题 - 这是选择行为的关键预测因素。我们在MovieLens平台上实施了这种方法，产生了一个丰富的数据集，结合了用户评分、信念和观察到的推荐。我们记录了这种数据收集中的挑战，包括回应中的选择偏见和产品空间的有限覆盖。这一独特资源使研究人员能够深入研究用户行为，分析用户在没有推荐的情况下的选择，衡量推荐的有效性，并原型化利用用户信念数据的算法，最终实现更有影响力的推荐系统。该数据集可以在https://grouplens.org/datasets/movielens/ml_belief_2024/ 找到。

更新时间: 2024-08-02 13:26:44

领域: cs.IR,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.11053v3

Fine-grained Attention in Hierarchical Transformers for Tabular Time-series

Tabular data is ubiquitous in many real-life systems. In particular, time-dependent tabular data, where rows are chronologically related, is typically used for recording historical events, e.g., financial transactions, healthcare records, or stock history. Recently, hierarchical variants of the attention mechanism of transformer architectures have been used to model tabular time-series data. At first, rows (or columns) are encoded separately by computing attention between their fields. Subsequently, encoded rows (or columns) are attended to one another to model the entire tabular time-series. While efficient, this approach constrains the attention granularity and limits its ability to learn patterns at the field-level across separate rows, or columns. We take a first step to address this gap by proposing Fieldy, a fine-grained hierarchical model that contextualizes fields at both the row and column levels. We compare our proposal against state of the art models on regression and classification tasks using public tabular time-series datasets. Our results show that combining row-wise and column-wise attention improves performance without increasing model size. Code and data are available at https://github.com/raphaaal/fieldy.

Updated: 2024-08-02 13:25:16

标题: Hierarchical Transformers for Tabular Time-series中的细粒度注意力

摘要: 表格数据在许多现实生活系统中是无处不在的。特别是，时间相关的表格数据，其中行是按时间顺序相关的，通常用于记录历史事件，例如金融交易、医疗记录或股票历史。最近，变压器架构的注意力机制的分层变体被用来建模表格时间序列数据。首先，行（或列）通过计算它们字段之间的注意力分别进行编码。随后，编码的行（或列）相互关注，以建模整个表格时间序列。虽然高效，但这种方法限制了注意力的粒度，限制了其在跨越不同行或列上学习字段级模式的能力。我们通过提出Fieldy，一个细粒度的分层模型，来解决这一差距的第一步，该模型在行和列级别上对字段进行上下文化。我们将我们的提议与公共表格时间序列数据集上的最先进模型进行比较，进行回归和分类任务。我们的结果显示，结合行和列的注意力可以提高性能，而不会增加模型大小。代码和数据可在https://github.com/raphaaal/fieldy 上找到。

更新时间: 2024-08-02 13:25:16

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2406.15327v2

MERA: A Comprehensive LLM Evaluation in Russian

Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks.

Updated: 2024-08-02 13:23:18

标题: MERA：俄语中的全面LLM评估

摘要: 在过去几年中，人工智能研究中最显著的进展之一是基础模型（FMs）的发展，以语言模型（LMs）的崛起为首要特点。随着模型规模的增加，LMs在可衡量方面展示出增强，同时也发展出新的定性特征。然而，尽管研究人员的关注和LM应用的快速增长，其能力、局限性和相关风险仍需要更好地理解。为了解决这些问题，我们引入了一个开放的俄语语言体系结构多模态评估（MERA），这是一个新的指导基准，用于评估面向俄语语言的基础模型。该基准包括21个评估任务，涵盖了11个技能领域的生成模型，并设计为黑盒测试，以确保排除数据泄露。该论文介绍了一种评估FMs和LMs在零和少次射击固定指令设置中的方法，该方法可扩展到其他模态。我们提出了一种评估方法，一个用于MERA评估的开源代码库，以及一个具有提交系统的排行榜。我们评估了开放LMs作为基准线，并发现它们仍远远落后于人类水平。我们公开发布MERA，以指导即将到来的研究，预期开创性的模型特征，标准化评估程序，并解决潜在的社会问题。

更新时间: 2024-08-02 13:23:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.04531v3

Insights from the Design Space Exploration of Flow-Guided Nanoscale Localization

Nanodevices with Terahertz (THz)-based wireless communication capabilities are providing a primer for flow-guided localization within the human bloodstreams. Such localization is allowing for assigning the locations of sensed events with the events themselves, providing benefits along the lines of early and precise diagnostics, and reduced costs and invasiveness. Flow-guided localization is still in a rudimentary phase, with only a handful of works targeting the problem. Nonetheless, the performance assessments of the proposed solutions are already carried out in a non-standardized way, usually along a single performance metric, and ignoring various aspects that are relevant at such a scale (e.g., nanodevices' limited energy) and for such a challenging environment (e.g., extreme attenuation of in-body THz propagation). As such, these assessments feature low levels of realism and cannot be compared in an objective way. Toward addressing this issue, we account for the environmental and scale-related peculiarities of the scenario and assess the performance of two state-of-the-art flow-guided localization approaches along a set of heterogeneous performance metrics such as the accuracy and reliability of localization.

Updated: 2024-08-02 13:16:48

标题: 关于流动引导纳米尺度定位设计空间探索的见解

摘要: 具有太赫兹（THz）无线通信能力的纳米器件为人类血液流中的定位提供了基础。这种定位允许将感知事件的位置与事件本身进行关联，从而提供了早期和精确诊断、降低成本和侵入性等方面的好处。流导定位仍处于初级阶段，只有少数作品针对这一问题。然而，所提出的解决方案的性能评估已经以非标准化的方式进行，通常沿着单一性能指标进行，忽略了在这种规模（例如，纳米器件的能源有限）和在这种具有挑战性的环境（例如，体内THz传播的极端衰减）下相关的各种方面。因此，这些评估具有低水平的现实性，无法以客观的方式进行比较。为了解决这个问题，我们考虑了场景的环境和规模相关特征，并评估了两种最先进的流导定位方法在一组异构性能指标上的性能，如定位的准确性和可靠性。

更新时间: 2024-08-02 13:16:48

领域: cs.NI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2305.18493v3

SeCritMass: Threshold Secret Petitions

We introduce the notion of an $n$-threshold secret petition, in which users add encrypted signatures to a petition, and the signatures are decrypted if and only if at least $n$ signatures have been gathered. This solves the coordination problem in which users wish to sign a petition or commit to a cause, but do not want to be identified as having signed it before enough others have signed it too. We present an implementation of such a petition based on the ElGamal cryptosystem. Applications include reporting misconduct in situations were complainants hesitate to come forward alone, such as in allegations of sexual harassment or police brutality.

Updated: 2024-08-02 13:15:25

标题: SeCritMass：阈值秘密请愿

摘要: 我们介绍了$n$-阈值秘密请愿的概念，其中用户向请愿书添加加密签名，只有在收集到至少$n$个签名时才解密这些签名。这解决了用户希望签署请愿书或支持某一事业时面临的协调问题，但又不希望在足够多的其他人签署之前被识别为签署者的问题。我们提出了基于ElGamal加密系统的此类请愿书的实现。应用包括在投诉人犹豫独自站出来举报不当行为的情况下报告不当行为，例如性骚扰或警察暴行的指控。

更新时间: 2024-08-02 13:15:25

领域: cs.CR,math.NT,94A60,E.3; K.4.2

下载: http://arxiv.org/abs/2408.01255v1

TrIM: Triangular Input Movement Systolic Array for Convolutional Neural Networks -- Part I: Dataflow and Analytical Modelling

In order to follow the ever-growing computational complexity and data intensity of state-of-the-art AI models, new computing paradigms are being proposed. These paradigms aim at achieving high energy efficiency, by mitigating the Von Neumann bottleneck that relates to the energy cost of moving data between the processing cores and the memory. Convolutional Neural Networks (CNNs) are particularly susceptible to this bottleneck, given the massive data they have to manage. Systolic Arrays (SAs) are promising architectures to mitigate the data transmission cost, thanks to high data utilization carried out by an array of Processing Elements (PEs). These PEs continuously exchange and process data locally based on specific dataflows (like weight stationary and row stationary), in turn reducing the number of memory accesses to the main memory. The hardware specialization of SAs can meet different workloads, ranging from matrix multiplications to multi-dimensional convolutions. In this paper, we propose TrIM: a novel dataflow for SAs based on a Triangular Input Movement and compatible with CNN computing. When compared to state-of-the-art SA dataflows, like weight stationary and row stationary, the high data utilization offered by TrIM guarantees ~10x less memory access. Furthermore, considering that PEs continuously overlap multiplications and accumulations, TrIM achieves high throughput (up to 81.8% higher than row stationary), other than requiring a limited number of registers (up to 15.6x fewer registers than row stationary).

Updated: 2024-08-02 13:15:17

标题: TrIM：用于卷积神经网络的三角形输入移动系统阵列--第一部分：数据流和分析建模

摘要: 为了跟上当今先进AI模型日益增长的计算复杂性和数据密集度，新的计算范式正在被提出。这些范式旨在通过缓解与在处理核心和内存之间传输数据的能量成本相关的冯·诺依曼瓶颈，实现高能效。卷积神经网络（CNNs）特别容易受到这种瓶颈的影响，因为它们需要处理大量数据。脉动阵列（SAs）是减少数据传输成本的有希望的架构，这要归功于由一组处理元素（PEs）执行的高数据利用率。这些PEs根据特定的数据流（如权重静态和行静态）不断地在本地交换和处理数据，从而减少了对主内存的内存访问次数。SAs的硬件专业化可以满足从矩阵乘法到多维卷积等不同工作负载。在本文中，我们提出了TrIM：一种基于三角形输入移动的SAs数据流，与CNN计算兼容。与权重静态和行静态等最先进的SA数据流相比，TrIM提供的高数据利用率保证了约10倍较少的内存访问。此外，考虑到PEs不断重叠乘法和累积，TrIM实现了高吞吐量（比行静态高达81.8%），而且需要的寄存器数量有限（比行静态少多达15.6倍）。

更新时间: 2024-08-02 13:15:17

领域: cs.AI,cs.AR

下载: http://arxiv.org/abs/2408.01254v1

Metareasoning in uncertain environments: a meta-BAMDP framework

In decision-making scenarios, \textit{reasoning} can be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome such as maximizing the value function of a Markov decision process (MDP). However, executing $P$ itself may bear some costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Such costs need to be taken into account in order to accurately model human behavior, as well as optimizing AI planning, as all physical systems are bound to face resource constraints. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to two-armed Bernoulli bandit (TABB) tasks, which have often been used to study human decision making. Owing to the meta problem's complexity, our solutions are necessarily approximate, but nevertheless robust within a range of assumptions that are arguably realistic for human decision-making scenarios. These results offer a normative framework for understanding human exploration under cognitive constraints. This integration of Bayesian adaptive strategies with metareasoning enriches both the theoretical landscape of decision-making research and practical applications in designing AI systems that plan under uncertainty and resource constraints.

Updated: 2024-08-02 13:15:01

标题: 不确定环境下的元推理：一个元BAMDP框架

摘要: 在决策场景中，\textit{推理}可以被视为一个算法$P$，该算法选择一个动作$a^* \in \mathcal{A}$，旨在优化一些结果，比如最大化马尔科夫决策过程（MDP）的值函数。然而，执行$P$本身可能会带来一些成本（时间、能量、有限容量等），需要与通过在基础决策问题中做出选择获得的明确效用一起考虑。为了准确地建模人类行为，并优化AI规划，这些成本需要被考虑在内，因为所有物理系统都面临资源约束。找到合适的$P$本身可以被构建为在推理过程$P$的空间上的优化问题，通常被称为\textit{元推理}。传统上，人类元推理模型假设代理知道基础MDP的转换和奖励分布。本文通过提出一个元贝叶斯自适应MDP（meta-BAMDP）框架来泛化这种模型，以处理在未知奖励/转换分布环境中的元推理，该框架涵盖了人类和AI系统面临的更大更现实的规划问题集合。作为第一步，我们将该框架应用于两臂伯努利老虎机（TABB）任务，这些任务经常用于研究人类决策。由于元问题的复杂性，我们的解决方案必然是近似的，但在一系列可能对人类决策场景而言合理的假设范围内仍然是稳健的。这些结果为理解在认知约束下的人类探索提供了规范框架。这种贝叶斯自适应策略与元推理的整合丰富了决策研究的理论领域，同时也在设计在不确定性和资源约束下规划的AI系统的实际应用中发挥作用。

更新时间: 2024-08-02 13:15:01

领域: cs.AI,cs.SY,eess.SY,q-bio.NC

下载: http://arxiv.org/abs/2408.01253v1

AIM: Automated Input Set Minimization for Metamorphic Security Testing

Although the security testing of Web systems can be automated by generating crafted inputs, solutions to automate the test oracle, i.e., distinguishing correct from incorrect outputs, remain preliminary. Specifically, previous work has demonstrated the potential of metamorphic testing; indeed, security failures can be determined by metamorphic relations that turn valid inputs into malicious inputs. However, without further guidance, metamorphic relations are typically executed on a large set of inputs, which is time-consuming and thus makes metamorphic testing impractical. We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm able to efficiently select diverse inputs while minimizing their total cost. Further, it contains a problem-reduction component to reduce the search space and speed up the minimization process. We evaluated the effectiveness of AIM on two well-known Web systems, Jenkins and Joomla, with documented vulnerabilities. We compared AIM's results with four baselines. Overall, AIM reduced metamorphic testing time by 84% for Jenkins and 82% for Joomla, while preserving vulnerability detection. Furthermore, AIM outperformed all the considered baselines regarding vulnerability coverage.

Updated: 2024-08-02 13:11:06

标题: 目标：自动化输入集最小化用于形变安全测试

摘要: 尽管Web系统的安全测试可以通过生成精心设计的输入来自动化，但自动化测试神谕的解决方案，即区分正确和不正确的输出，仍处于初步阶段。具体而言，先前的研究已经证明了变形测试的潜力；实际上，安全失败可以通过将有效输入转换为恶意输入的变形关系来确定。然而，缺乏进一步的指导，变形关系通常在大量输入上执行，这既耗时又使变形测试变得不切实际。我们提出了AIM，这是一种自动选择输入以减少测试成本同时保留漏洞检测能力的方法。AIM包括基于聚类的黑盒方法，根据其安全属性识别类似输入。它还依赖于一种新颖的遗传算法，能够有效地选择多样化的输入，同时最小化它们的总成本。此外，它包含一个问题缩减组件，以减少搜索空间并加快最小化过程。我们在两个知名的Web系统Jenkins和Joomla上评估了AIM的有效性，这两个系统都有已记录的漏洞。我们将AIM的结果与四个基准进行了比较。总体而言，AIM将Jenkins的变形测试时间缩短了84%，将Joomla的变形测试时间缩短了82%，同时保持了漏洞检测。此外，就漏洞覆盖率而言，AIM在所有考虑的基准中表现最佳。

更新时间: 2024-08-02 13:11:06

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2402.10773v3

Deep progressive reinforcement learning-based flexible resource scheduling framework for IRS and UAV-assisted MEC system

The intelligent reflection surface (IRS) and unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) system is widely used in temporary and emergency scenarios. Our goal is to minimize the energy consumption of the MEC system by jointly optimizing UAV locations, IRS phase shift, task offloading, and resource allocation with a variable number of UAVs. To this end, we propose a Flexible REsource Scheduling (FRES) framework by employing a novel deep progressive reinforcement learning which includes the following innovations: Firstly, a novel multi-task agent is presented to deal with the mixed integer nonlinear programming (MINLP) problem. The multi-task agent has two output heads designed for different tasks, in which a classified head is employed to make offloading decisions with integer variables while a fitting head is applied to solve resource allocation with continuous variables. Secondly, a progressive scheduler is introduced to adapt the agent to the varying number of UAVs by progressively adjusting a part of neurons in the agent. This structure can naturally accumulate experiences and be immune to catastrophic forgetting. Finally, a light taboo search (LTS) is introduced to enhance the global search of the FRES. The numerical results demonstrate the superiority of the FRES framework which can make real-time and optimal resource scheduling even in dynamic MEC systems.

Updated: 2024-08-02 13:10:33

标题: 基于深度渐进式强化学习的灵活资源调度框架，用于IRS和UAV辅助MEC系统

摘要: 智能反射表面（IRS）和无人机（UAV）辅助的移动边缘计算（MEC）系统被广泛应用于临时和紧急情况。我们的目标是通过联合优化无人机位置、IRS 相位移动、任务卸载和资源分配以及可变数量的无人机，来最小化MEC 系统的能耗。为此，我们提出了一种灵活的资源调度（FRES）框架，采用了一种新颖的深度递进式强化学习，其中包括以下创新：首先，提出了一种新颖的多任务代理来处理混合整数非线性规划（MINLP）问题。多任务代理具有两个输出头，用于不同的任务，其中一个分类头用于做出带有整数变量的卸载决策，而一个拟合头则用于解决连续变量的资源分配。其次，引入了递进式调度器，通过逐渐调整代理中的一部分神经元，使其适应不断变化的无人机数量。这种结构可以自然地积累经验，并对灾难性遗忘具有免疫能力。最后，引入了轻量级禁忌搜索（LTS）来增强FRES 的全局搜索。数值结果表明了 FRES 框架的优越性，即使在动态MEC 系统中也能实现实时和最佳资源调度。

更新时间: 2024-08-02 13:10:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.01248v1

MapComp: A Secure View-based Collaborative Analytics Framework for Join-Group-Aggregation

This paper introduces MapComp, a novel view-based framework to facilitate join-group-aggregation (JGA) queries for collaborative analytics. Through specially crafted materialized view for join and novel design of group-aggregation (GA) protocols, MapComp removes duplicated join workload and expedites subsequent GA, improving the efficiency of JGA query execution. To support continuous data updates, our materialized view offers payload-independence feature and brings in significant efficiency improvement of view refreshing with free MPC overhead. This feature also allows further acceleration for GA, where we devised multiple novel protocols that outperform prior works. Notably, our work represents the first endeavor to expedite secure collaborative JGA queries using materialized views. Our experiments demonstrate a significant advantage of MapComp, achieving up to a 2189.9x efficiency improvement compared to the non-view based baseline when executing queries eight times.

Updated: 2024-08-02 13:08:06

标题: MapComp：一个安全的基于视图的协作分析框架，用于连接-组合-聚合

摘要: 本文介绍了MapComp，这是一个新颖的基于视图的框架，旨在促进协作分析中的连接-分组-聚合（JGA）查询。通过为连接制定特别设计的物化视图和新颖的分组-聚合（GA）协议，MapComp消除了重复的连接工作量，并加快了后续的GA，提高了JGA查询执行的效率。为了支持连续的数据更新，我们的物化视图提供了payload-independence特性，并通过免费的MPC开销带来了显着的效率改进。这一特性还允许进一步加速GA，我们设计了多个优于先前工作的新颖协议。值得注意的是，我们的工作代表了首次尝试使用物化视图加速安全的协作JGA查询。我们的实验表明，MapComp具有显著的优势，在执行查询八次时，与基于非视图的基线相比，效率提高了2189.9倍。

更新时间: 2024-08-02 13:08:06

领域: cs.CR

下载: http://arxiv.org/abs/2408.01246v1

Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models

This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models. Leveraging a dataset of 12,909 dry bean samples, reduced from an initial 13,611 through outlier removal and feature extraction, we applied Principal Component Analysis (PCA) for dimensionality reduction and trained two multiclass classifiers: XGBoost and Support Vector Machine (SVM). The models were evaluated using nested cross-validation to ensure robust performance assessment and hyperparameter tuning. The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively. The results underscore the efficacy of these machine learning approaches in agricultural applications, particularly in enhancing the uniformity and efficiency of seed classification. This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization. Future work will explore incorporating more diverse datasets and advanced algorithms to further improve classification accuracy.

Updated: 2024-08-02 13:05:33

标题: 使用XGBoost和SVM模型自动分类干豆品种

摘要: 本文提出了一项关于利用机器学习模型对七种不同品种的干豆进行自动分类的比较研究。通过去除异常值和特征提取，我们利用了包含12,909个干豆样本的数据集（最初为13,611个），应用主成分分析（PCA）进行降维，并训练了两个多类别分类器：XGBoost和支持向量机（SVM）。通过嵌套交叉验证评估模型以确保鲁棒性性能评估和超参数调整。XGBoost和SVM模型分别实现了94.00%和94.39%的总体正确分类率。结果强调了这些机器学习方法在农业应用中的有效性，特别是在增强种子分类的一致性和效率方面。这项研究为精准农业领域的日益增长的研究工作做出了贡献，表明自动化系统可以显著支持种子质量控制和作物产量优化。未来的工作将探索整合更多样化的数据集和先进算法以进一步提高分类准确性。

更新时间: 2024-08-02 13:05:33

领域: cs.LG

下载: http://arxiv.org/abs/2408.01244v1

MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data Integration

The prediction of surrounding vehicle trajectories is crucial for collision-free path planning. In this study, we focus on a scenario where a connected and autonomous vehicle (CAV) serves as the central agent, utilizing both sensors and communication technologies to perceive its surrounding traffics consisting of autonomous vehicles (AVs), connected vehicles (CVs), and human-driven vehicles (HDVs). Our trajectory prediction task is aimed at all the detected surrounding vehicles. To effectively integrate the multi-source data from both sensor and communication technologies, we propose a deep learning framework called MSMA utilizing a cross-attention module for multi-source data fusion. Vector map data is utilized to provide contextual information. The trajectory dataset is collected in CARLA simulator with synthesized data errors introduced. Numerical experiments demonstrate that in a mixed traffic flow scenario, the integration of data from different sources enhances our understanding of the environment. This notably improves trajectory prediction accuracy, particularly in situations with a high CV market penetration rate. The code is available at: https://github.com/xichennn/MSMA.

Updated: 2024-08-02 13:03:00

标题: MSMA：在联网和自动驾驶车辆环境中利用多源数据集成进行多智能体轨迹预测

摘要: 周边车辆轨迹的预测对于无碰撞路径规划至关重要。在这项研究中，我们关注一个场景，其中一个连接的自动驾驶车辆（CAV）作为中心代理，利用传感器和通讯技术感知其周围的交通，包括自动驾驶车辆（AVs）、连接车辆（CVs）和人驾驶车辆（HDVs）。我们的轨迹预测任务旨在所有检测到的周围车辆。为了有效地整合来自传感器和通讯技术的多源数据，我们提出了一个名为MSMA的深度学习框架，利用交叉注意力模块进行多源数据融合。矢量地图数据用于提供上下文信息。轨迹数据集在CARLA模拟器中收集，引入了合成数据错误。数值实验表明，在混合交通流场景中，来自不同来源的数据整合增强了我们对环境的理解。这显著提高了轨迹预测的准确性，特别是在高CV市场渗透率的情况下。代码可在以下链接获取：https://github.com/xichennn/MSMA。

更新时间: 2024-08-02 13:03:00

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.21310v2

Real-time gravitational-wave inference for binary neutron stars using machine learning

Mergers of binary neutron stars (BNSs) emit signals in both the gravitational-wave (GW) and electromagnetic (EM) spectra. Famously, the 2017 multi-messenger observation of GW170817 led to scientific discoveries across cosmology, nuclear physics, and gravity. Central to these results were the sky localization and distance obtained from GW data, which, in the case of GW170817, helped to identify the associated EM transient, AT 2017gfo, 11 hours after the GW signal. Fast analysis of GW data is critical for directing time-sensitive EM observations; however, due to challenges arising from the length and complexity of signals, it is often necessary to make approximations that sacrifice accuracy. Here, we present a machine learning framework that performs complete BNS inference in just one second without making any such approximations. Our approach enhances multi-messenger observations by providing (i) accurate localization even before the merger; (ii) improved localization precision by $\sim30\%$ compared to approximate low-latency methods; and (iii) detailed information on luminosity distance, inclination, and masses, which can be used to prioritize expensive telescope time. Additionally, the flexibility and reduced cost of our method open new opportunities for equation-of-state studies. Finally, we demonstrate that our method scales to extremely long signals, up to an hour in length, thus serving as a blueprint for data analysis for next-generation ground- and space-based detectors.

Updated: 2024-08-02 13:00:54

标题: 使用机器学习实时推断双中子星引力波

摘要: 双中子星（BNSs）的合并在引力波（GW）和电磁（EM）光谱中发出信号。众所周知，2017年GW170817的多信使观测导致了横跨宇宙学、核物理和引力的科学发现。这些结果的核心是从GW数据中获得的天空定位和距离，这在GW170817的情况下有助于在GW信号后的11小时内识别相关的EM瞬变AT 2017gfo。快速分析GW数据对于指导时效性EM观测至关重要；然而，由于信号的长度和复杂性而产生的挑战，通常需要进行牺牲准确性的近似。在这里，我们提出了一个机器学习框架，可以在仅一秒钟内执行完整的BNS推断，而无需进行任何此类近似。我们的方法通过提供（i）甚至在合并之前就准确的定位；（ii）与近似低延迟方法相比改善了约30％的定位精度；以及（iii）有关亮度距离、倾斜和质量的详细信息，可以用于优先考虑昂贵的望远镜时间，增强了多信使观测。此外，我们的方法的灵活性和降低的成本为状态方程研究开辟了新的机会。最后，我们证明了我们的方法可扩展到极长的信号，长达一个小时，因此可作为下一代地面和空间探测器的数据分析的蓝图。

更新时间: 2024-08-02 13:00:54

领域: gr-qc,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2407.09602v2

Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving

Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle. With the rapid advancement in connected technologies, such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication, valuable information from alternate views becomes accessible via wireless networks. The integration of information from alternative views has the potential to overcome the inherent limitations associated with a single viewpoint, such as occlusions and limited field of view. In this work, we introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models. Unlike previous approaches where the multi-view data is manually fused or formulated as a separate training stage, our model supports end-to-end training, enhancing both flexibility and performance. Moreover, the predicted multimodal trajectories are calibrated by a post-hoc conformal prediction module to get valid and efficient confidence regions. We evaluated the entire framework using the real-world V2I dataset V2X-Seq. Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU. The code is publicly available at: \url{https://github.com/xichennn/V2I_trajectory_prediction}.

Updated: 2024-08-02 13:00:46

标题: 在合作驾驶中使用多视角数据整合的共形轨迹预测

摘要: 当前关于轨迹预测的研究主要依赖于自车辆上的传感器收集的数据。随着车辆之间（V2V）和车辆与基础设施（V2I）通信等连接技术的迅速发展，来自不同视角的宝贵信息通过无线网络变得更加可获得。整合来自不同视角的信息具有潜力克服单一视角固有的限制，如遮挡和有限的视野。在本研究中，我们介绍了V2INet，一个新颖的轨迹预测框架，旨在通过扩展现有的单一视角模型来对多视角数据进行建模。与先前的方法不同，其中多视角数据是手动融合或制定为单独的训练阶段，我们的模型支持端到端训练，提高了灵活性和性能。此外，通过后验拟合预测模块校准了预测的多模态轨迹，以获得有效和高效的置信区域。我们使用真实世界的V2I数据集V2X-Seq评估了整个框架。我们的结果表明，在单个GPU上，以最终位移误差（FDE）和漏检率（MR）的性能方面表现出优越性。代码公开可在以下链接获取：https://github.com/xichennn/V2I_trajectory_prediction。

更新时间: 2024-08-02 13:00:46

领域: cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.00374v2

Tailoring Graph Neural Network-based Flow-guided Localization to Individual Bloodstreams and Activities

Flow-guided localization using in-body nanodevices in the bloodstream is expected to be beneficial for early disease detection, continuous monitoring of biological conditions, and targeted treatment. The nanodevices face size and power constraints that produce erroneous raw data for localization purposes. On-body anchors receive this data, and use it to derive the locations of diagnostic events of interest. Different Machine Learning (ML) approaches have been recently proposed for this task, yet they are currently restricted to a reference bloodstream of a resting patient. As such, they are unable to deal with the physical diversity of patients' bloodstreams and cannot provide continuous monitoring due to changes in individual patient's activities. Toward addressing these issues for the current State-of-the-Art (SotA) flow-guided localization approach based on Graph Neural Networks (GNNs), we propose a pipeline for GNN adaptation based on individual physiological indicators including height, weight, and heart rate. Our results indicate that the proposed adaptions are beneficial in reconciling the individual differences between bloodstreams and activities.

Updated: 2024-08-02 12:58:08

标题: 个性化血流和活动导向定位的图神经网络的定制化

摘要: 在体内纳米器件在血液中引导定位被认为对早期疾病检测、生物条件的持续监测和定向治疗有益。纳米器件面临尺寸和功率约束，导致为定位目的产生错误的原始数据。外置锚点接收这些数据，并用它们推导感兴趣的诊断事件的位置。最近提出了不同的机器学习（ML）方法来完成这项任务，但目前仅限于静息患者的血流参考。因此，它们无法处理患者血液流的物理多样性，并且由于个体患者活动的变化，无法提供持续监测。为了解决基于图神经网络（GNNs）的当前最先进的流引导定位方法面临的这些问题，我们提出了基于个体生理指标（包括身高、体重和心率）的GNN适应流程。我们的结果表明，所提出的适应措施有助于协调个体间的血流和活动差异。

更新时间: 2024-08-02 12:58:08

领域: cs.LG,cs.AI,cs.ET,cs.NI

下载: http://arxiv.org/abs/2408.01239v1

HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling

In the field of robotic control, designing individual controllers for each robot leads to high computational costs. Universal control policies, applicable across diverse robot morphologies, promise to mitigate this challenge. Predominantly, models based on Graph Neural Networks (GNN) and Transformers are employed, owing to their effectiveness in capturing relational dynamics across a robot's limbs. However, these models typically employ homogeneous graph structures that overlook the functional diversity of different limbs. To bridge this gap, we introduce HeteroMorpheus, a novel method based on heterogeneous graph Transformer. This method uniquely addresses limb heterogeneity, fostering better representation of robot dynamics of various morphologies. Through extensive experiments we demonstrate the superiority of HeteroMorpheus against state-of-the-art methods in the capability of policy generalization, including zero-shot generalization and sample-efficient transfer to unfamiliar robot morphologies.

Updated: 2024-08-02 12:40:01

标题: 异源莫非：基于形态异质性建模的通用控制

摘要: 在机器人控制领域，为每个机器人设计单独的控制器会导致高昂的计算成本。适用于不同机器人形态的通用控制策略承诺可以缓解这一挑战。主要基于图神经网络（GNN）和Transformer的模型被广泛采用，因为它们在捕捉机器人肢体之间的关系动态方面是有效的。然而，这些模型通常采用忽略不同肢体功能多样性的同质图结构。为了弥合这一差距，我们引入了HeteroMorpheus，一种基于异质图Transformer的新方法。该方法独特地解决了肢体异质性问题，促进了对各种形态机器人动态的更好表示。通过大量实验证明了HeteroMorpheus在策略泛化能力方面优于现有方法，包括零样本泛化和高效地将策略转移到陌生机器人形态。

更新时间: 2024-08-02 12:40:01

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2408.01230v1

Arithmetic with Language Models: from Memorization to Computation

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

Updated: 2024-08-02 12:39:17

标题: 使用语言模型进行算术运算：从记忆到计算

摘要: 最近大型语言模型的新兴计算和问题解决能力的更好理解对于进一步改进它们并扩大其适用性至关重要。本研究探讨了一个训练有素的语言模型如何执行超越训练数据的算术计算，该模型被训练用于预测下一个标记。二进制加法和乘法构成了一个很好的测试基础，因为它们需要非常小的词汇量，并且展示出相关的输入/输出不连续性，使得对于新数据的平滑输入插值无效。我们成功地训练了一个轻量级语言模型来学习这些任务，并进行了一系列实验来探讨外推能力和内部信息处理。我们的发现支持这样一个假设，即语言模型作为一个编码-回归-解码机器，在适当的内部表示映射到输入标记表示之后，在值空间中进行计算。

更新时间: 2024-08-02 12:39:17

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2308.01154v4

Modelling Assessment Rubrics through Bayesian Networks: a Pragmatic Approach

Automatic assessment of learner competencies is a fundamental task in intelligent tutoring systems. An assessment rubric typically and effectively describes relevant competencies and competence levels. This paper presents an approach to deriving a learner model directly from an assessment rubric defining some (partial) ordering of competence levels. The model is based on Bayesian networks and exploits logical gates with uncertainty (often referred to as noisy gates) to reduce the number of parameters of the model, so to simplify their elicitation by experts and allow real-time inference in intelligent tutoring systems. We illustrate how the approach can be applied to automatize the human assessment of an activity developed for testing computational thinking skills. The simple elicitation of the model starting from the assessment rubric opens up the possibility of quickly automating the assessment of several tasks, making them more easily exploitable in the context of adaptive assessment tools and intelligent tutoring systems.

Updated: 2024-08-02 12:27:17

标题: 通过贝叶斯网络对评估标准进行建模：一种实用方法

摘要: 学习者能力的自动评估是智能辅导系统中的一个基本任务。评估标准通常有效地描述相关的能力和能力水平。本文提出了一种从评估标准直接推导学习者模型的方法，该评估标准定义了某些（部分）能力水平的排序。该模型基于贝叶斯网络，并利用带有不确定性的逻辑门（通常称为嘈杂门）来减少模型的参数数量，以便简化专家的引导，并允许在智能辅导系统中进行实时推理。我们说明了这种方法如何应用于自动化为测试计算思维能力而开发的活动进行人工评估。从评估标准开始简单引导模型的可能性打开了快速自动化评估多项任务的可能性，使它们更容易在自适应评估工具和智能辅导系统的背景下利用。

更新时间: 2024-08-02 12:27:17

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2209.05467v3

Rubric-based Learner Modelling via Noisy Gates Bayesian Networks for Computational Thinking Skills Assessment

In modern and personalised education, there is a growing interest in developing learners' competencies and accurately assessing them. In a previous work, we proposed a procedure for deriving a learner model for automatic skill assessment from a task-specific competence rubric, thus simplifying the implementation of automated assessment tools. The previous approach, however, suffered two main limitations: (i) the ordering between competencies defined by the assessment rubric was only indirectly modelled; (ii) supplementary skills, not under assessment but necessary for accomplishing the task, were not included in the model. In this work, we address issue (i) by introducing dummy observed nodes, strictly enforcing the skills ordering without changing the network's structure. In contrast, for point (ii), we design a network with two layers of gates, one performing disjunctive operations by noisy-OR gates and the other conjunctive operations through logical ANDs. Such changes improve the model outcomes' coherence and the modelling tool's flexibility without compromising the model's compact parametrisation, interpretability and simple experts' elicitation. We used this approach to develop a learner model for Computational Thinking (CT) skills assessment. The CT-cube skills assessment framework and the Cross Array Task (CAT) are used to exemplify it and demonstrate its feasibility.

Updated: 2024-08-02 12:21:05

标题: 通过噪声门贝叶斯网络的基于量规的学习者建模用于计算思维技能评估

摘要: 在现代和个性化教育中，对于发展学习者的能力和准确评估他们的兴趣日益增长。在先前的工作中，我们提出了一种从特定任务能力规范中推导学习者模型以进行自动技能评估的程序，从而简化了自动化评估工具的实施。然而，先前的方法存在两个主要限制：(i)评估规范中定义的能力之间的顺序只是间接建模；(ii)虽然不在评估范围内但对完成任务必不可少的额外技能未包含在模型中。在这项工作中，我们通过引入虚拟观察节点来解决问题(i)，严格执行技能排序而不改变网络结构。相比之下，对于问题(ii)，我们设计了一个具有两层门的网络，一个通过噪声-OR门执行分离操作，另一个通过逻辑AND执行连接操作。这些改变提高了模型结果的一致性和建模工具的灵活性，同时不影响模型的紧凑参数化、可解释性和简单的专家引导。我们使用这种方法开发了一个用于计算思维（CT）技能评估的学习者模型。CT-立方体技能评估框架和交叉阵列任务（CAT）被用来举例说明，并证明其可行性。

更新时间: 2024-08-02 12:21:05

领域: cs.AI,cs.ET

下载: http://arxiv.org/abs/2408.01221v1

Improving Retrieval Augmented Language Model with Self-Reasoning

The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples.

Updated: 2024-08-02 12:11:17

标题: 用自我推理改进检索增强语言模型

摘要: 检索增强语言模型（RALM）通过在推理过程中引入外部知识，在知识密集型任务上表现出卓越的性能，从而缓解了大型语言模型（LLMs）中存在的事实幻觉。尽管取得了这些进展，RALMs的实施仍然存在挑战，特别是在可靠性和可追溯性方面。具体而言，不相关的文档检索可能导致无效的响应生成，甚至降低LLMs的性能，而在生成的输出中缺乏适当的引文使得验证模型的可信度变得更加复杂。为此，我们提出了一个旨在提高RALMs可靠性和可追溯性的新型自我推理框架，其核心思想是利用LLM自动生成的推理轨迹。该框架包括三个过程：一个关联感知过程，一个证据感知选择过程和一个轨迹分析过程，用于构建自我推理轨迹。我们已经在四个公共数据集上评估了我们的框架（两个短问答数据集，一个长问答数据集和一个事实验证数据集），以展示我们的方法的优越性，它可以胜过现有的最先进模型，并且可以在只使用2,000个训练样本的情况下实现与GPT-4相当的性能。

更新时间: 2024-08-02 12:11:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.19813v2

ZNorm: Z-Score Gradient Normalization for Accelerating Neural Network Training

The rapid advancements in deep learning necessitate efficient training methods for deep neural networks (DNNs). As models grow in complexity, vanishing and exploding gradients impede convergence and performance. We propose Z-Score Normalization for Gradient Descent (ZNorm), an innovative technique that adjusts only the gradients to enhance training efficiency and improve model performance. ZNorm normalizes the overall gradients, providing consistent gradient scaling across layers, thereby reducing the risks of vanishing and exploding gradients. Our extensive experiments on CIFAR-10 and medical datasets demonstrate that ZNorm not only accelerates convergence but also enhances performance metrics. ZNorm consistently outperforms existing methods, achieving superior results using the same computational settings. In medical imaging applications, ZNorm improves tumor prediction and segmentation performances, underscoring its practical utility. These findings highlight ZNorm's potential as a robust and versatile tool for improving the efficiency and effectiveness of deep neural network training across a wide range of architectures and applications.

Updated: 2024-08-02 12:04:19

标题: ZNorm：用于加速神经网络训练的Z-分数梯度归一化

摘要: 深度学习的快速发展需要高效的深度神经网络（DNNs）训练方法。随着模型复杂度的增加，消失和爆炸梯度阻碍了收敛和性能。我们提出了一种称为Z-Score标准化梯度下降（ZNorm）的创新技术，该技术只调整梯度以增强训练效率和提高模型性能。ZNorm对整体梯度进行标准化，提供一致的梯度缩放，从而降低消失和爆炸梯度的风险。我们在CIFAR-10和医学数据集上进行了大量实验，证明ZNorm不仅加速了收敛，还增强了性能指标。ZNorm始终优于现有方法，在相同的计算设置下取得了优越的结果。在医学图像应用中，ZNorm改进了肿瘤预测和分割性能，突显了其实用性。这些发现凸显了ZNorm作为一种强大而多功能的工具，可以改善各种架构和应用程序中深度神经网络训练的效率和有效性。

更新时间: 2024-08-02 12:04:19

领域: cs.LG

下载: http://arxiv.org/abs/2408.01215v1

Revisiting the Robust Alignment of Circuit Breakers

Over the past decade, adversarial training has emerged as one of the few reliable methods for enhancing model robustness against adversarial attacks [Szegedy et al., 2014, Madry et al., 2018, Xhonneux et al., 2024], while many alternative approaches have failed to withstand rigorous subsequent evaluations. Recently, an alternative defense mechanism, namely "circuit breakers" [Zou et al., 2024], has shown promising results for aligning LLMs. In this report, we show that the robustness claims of "Improving Alignment and Robustness with Circuit Breakers" against unconstraint continuous attacks in the embedding space of the input tokens may be overestimated [Zou et al., 2024]. Specifically, we demonstrate that by implementing a few simple changes to embedding space attacks [Schwinn et al., 2024a,b], we achieve 100% attack success rate (ASR) against circuit breaker models. Without conducting any further hyperparameter tuning, these adjustments increase the ASR by more than 80% compared to the original evaluation. Code is accessible at: https://github.com/SchwinnL/circuit-breakers-eval

Updated: 2024-08-02 12:02:47

标题: 重新审视断路器的稳健对齐

摘要: 在过去的十年中，对抗训练已经成为增强模型对抗性攻击鲁棒性的少数可靠方法之一[Szegedy等，2014年，Madry等，2018年，Xhonneux等，2024年]，而许多替代方法在严格的后续评估中未能经受住考验。最近，一种替代的防御机制，即“断路器”[Zou等，2024年]，已显示出对齐LLMs的有希望的结果。在本报告中，我们展示了“使用断路器改进对齐和鲁棒性”针对输入令牌的嵌入空间中的无约束连续攻击的鲁棒性声明可能被高估[Zou等，2024年]。具体来说，我们通过对嵌入空间攻击[Schwinn等，2024a，b]进行一些简单的修改，实现了对断路器模型的100%攻击成功率（ASR）。在不进行任何进一步的超参数调整的情况下，这些调整将ASR与原始评估相比增加了80%以上。代码可在以下网址获取：https://github.com/SchwinnL/circuit-breakers-eval

更新时间: 2024-08-02 12:02:47

领域: cs.CR

下载: http://arxiv.org/abs/2407.15902v2

High-Throughput Phenotyping of Clinical Text Using Large Language Models

High-throughput phenotyping automates the mapping of patient signs to standardized ontology concepts and is essential for precision medicine. This study evaluates the automation of phenotyping of clinical summaries from the Online Mendelian Inheritance in Man (OMIM) database using large language models. Due to their rich phenotype data, these summaries can be surrogates for physician notes. We conduct a performance comparison of GPT-4 and GPT-3.5-Turbo. Our results indicate that GPT-4 surpasses GPT-3.5-Turbo in identifying, categorizing, and normalizing signs, achieving concordance with manual annotators comparable to inter-rater agreement. Despite some limitations in sign normalization, the extensive pre-training of GPT-4 results in high performance and generalizability across several phenotyping tasks while obviating the need for manually annotated training data. Large language models are expected to be the dominant method for automating high-throughput phenotyping of clinical text.

Updated: 2024-08-02 12:00:00

标题: 使用大型语言模型进行临床文本的高通量表型化

摘要: 高通量表型分析自动化了将患者体征映射到标准本体概念，对于精准医学至关重要。本研究评估了使用大型语言模型自动化对在线门迪利安遗传数据库（OMIM）的临床摘要进行表型分析。由于这些摘要包含丰富的表型数据，因此它们可以作为医生笔记的替代品。我们对GPT-4和GPT-3.5-Turbo进行了性能比较。我们的结果表明，GPT-4在识别、分类和归一化体征方面优于GPT-3.5-Turbo，与手动标注者的一致性接近于评分者间的一致性。尽管在体征归一化方面存在一些限制，但GPT-4的广泛预训练导致其在多个表型任务上表现出高性能和泛化能力，同时消除了对手动标注训练数据的需求。预计大型语言模型将成为自动化高通量临床文本表型分析的主要方法。

更新时间: 2024-08-02 12:00:00

领域: cs.CL,cs.AI,I.7; I.2

下载: http://arxiv.org/abs/2408.01214v1

Unsupervised Graph-based Learning Method for Sub-band Allocation in 6G Subnetworks

In this paper, we present an unsupervised approach for frequency sub-band allocation in wireless networks using graph-based learning. We consider a dense deployment of subnetworks in the factory environment with a limited number of sub-bands which must be optimally allocated to coordinate inter-subnetwork interference. We model the subnetwork deployment as a conflict graph and propose an unsupervised learning approach inspired by the graph colouring heuristic and the Potts model to optimize the sub-band allocation using graph neural networks. The numerical evaluation shows that the proposed method achieves close performance to the centralized greedy colouring sub-band allocation heuristic with lower computational time complexity. In addition, it incurs reduced signalling overhead compared to iterative optimization heuristics that require all the mutual interfering channel information. We further demonstrate that the method is robust to different network settings.

Updated: 2024-08-02 11:54:57

标题: 无监督图形学习方法在6G子网络中的子带分配

摘要: 在这篇论文中，我们提出了一种无监督的基于图学习的无线网络频率子带分配方法。我们考虑在工厂环境中密集部署的子网络，这些子网络只有有限数量的子带，必须被最优地分配以协调子网络之间的干扰。我们将子网络部署建模为一个冲突图，并提出了一种受图着色启发和Potts模型启发的无监督学习方法，利用图神经网络优化子带分配。数值评估表明，所提出的方法在较低的计算时间复杂度下，实现了与集中贪婪着色子带分配启发式方法接近的性能。此外，与需要所有相互干扰信道信息的迭代优化启发式方法相比，它产生了更少的信令开销。我们进一步证明该方法对不同网络设置具有鲁棒性。

更新时间: 2024-08-02 11:54:57

领域: cs.NI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2401.00950v2

A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications

Deep learning has revolutionized biomedical research by providing sophisticated methods to handle complex, high-dimensional data. Multimodal deep learning (MDL) further enhances this capability by integrating diverse data types such as imaging, textual data, and genetic information, leading to more robust and accurate predictive models. In MDL, differently from early and late fusion methods, intermediate fusion stands out for its ability to effectively combine modality-specific features during the learning process. This systematic review aims to comprehensively analyze and formalize current intermediate fusion methods in biomedical applications. We investigate the techniques employed, the challenges faced, and potential future directions for advancing intermediate fusion methods. Additionally, we introduce a structured notation to enhance the understanding and application of these methods beyond the biomedical domain. Our findings are intended to support researchers, healthcare professionals, and the broader deep learning community in developing more sophisticated and insightful multimodal models. Through this review, we aim to provide a foundational framework for future research and practical applications in the dynamic field of MDL.

Updated: 2024-08-02 11:48:04

标题: 一种系统性的综述：多模态深度学习在生物医学应用中的中间融合

摘要: 深度学习通过提供处理复杂、高维数据的复杂方法，彻底改变了生物医学研究。多模态深度学习（MDL）通过整合包括成像、文本数据和基因信息在内的多样化数据类型，进一步增强了这种能力，从而产生更加健壮和准确的预测模型。在MDL中，与早期和晚期融合方法不同，中间融合以其在学习过程中有效组合特定模态特征的能力而脱颖而出。本系统性综述旨在全面分析和规范当前生物医学应用中的中间融合方法。我们调查了所采用的技术、面临的挑战以及推进中间融合方法的潜在未来方向。此外，我们引入了一种结构化符号，以增强对这些方法在生物医学领域之外的理解和应用。我们的发现旨在支持研究人员、医疗专业人士和更广泛的深度学习社区，开发更加复杂和具洞察力的多模态模型。通过这一综述，我们旨在为MDL这一动态领域的未来研究和实际应用提供基础框架。

更新时间: 2024-08-02 11:48:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.02686v1

Efficient Test Data Generation for MC/DC with OCL and Search

System-level testing of avionics software systems requires compliance with different international safety standards such as DO-178C. An important consideration of the avionics industry is automated test data generation according to the criteria suggested by safety standards. One of the recommended criteria by DO-178C is the modified condition/decision coverage (MC/DC) criterion. The current model-based test data generation approaches use constraints written in Object Constraint Language (OCL), and apply search techniques to generate test data. These approaches either do not support MC/DC criterion or suffer from performance issues while generating test data for large-scale avionics systems. In this paper, we propose an effective way to automate MC/DC test data generation during model-based testing. We develop a strategy that utilizes case-based reasoning (CBR) and range reduction heuristics designed to solve MC/DC-tailored OCL constraints. We performed an empirical study to compare our proposed strategy for MC/DC test data generation using CBR, range reduction, both CBR and range reduction, with an original search algorithm, and random search. We also empirically compared our strategy with existing constraint-solving approaches. The results show that both CBR and range reduction for MC/DC test data generation outperform the baseline approach. Moreover, the combination of both CBR and range reduction for MC/DC test data generation is an effective approach compared to existing constraint solvers.

Updated: 2024-08-02 11:39:03

标题: 使用OCL和搜索生成MC/DC的高效测试数据

摘要: 航空电子软件系统的系统级测试需要遵守不同的国际安全标准，如DO-178C。航空电子行业的一个重要考虑因素是根据安全标准建议的标准自动生成测试数据。DO-178C建议的一个推荐标准是修改的条件/决策覆盖准则（MC/DC）。当前基于模型的测试数据生成方法使用Object Constraint Language（OCL）编写的约束，并应用搜索技术生成测试数据。这些方法要么不支持MC/DC准则，要么在为大规模航空电子系统生成测试数据时存在性能问题。在本文中，我们提出了一种有效的方式来在基于模型的测试过程中自动生成MC/DC测试数据。我们开发了一种利用基于案例的推理（CBR）和设计用于解决MC/DC定制OCL约束的范围缩减启发式的策略。我们进行了一项经验研究，比较了我们提出的使用CBR、范围缩减、CBR和范围缩减两种方法以及原始搜索算法和随机搜索的MC/DC测试数据生成策略。我们还在经验上将我们的策略与现有的约束求解方法进行了比较。结果表明，CBR和范围缩减对MC/DC测试数据生成优于基线方法。此外，CBR和范围缩减结合用于MC/DC测试数据生成是一种有效的方法，与现有的约束求解器相比。

更新时间: 2024-08-02 11:39:03

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2401.03469v3

Automated System-level Testing of Unmanned Aerial Systems

Unmanned aerial systems (UAS) rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems. The current industrial practice is to manually create test scenarios, manually/automatically execute these scenarios using simulators, and manually evaluate outcomes. The test scenarios typically consist of setting certain flight or environment conditions and testing the system under test in these settings. The state-of-the-art approaches for this purpose also require manual test scenario development and evaluation. In this paper, we propose a novel approach to automate the system-level testing of the UAS. The proposed approach (AITester) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios. The test scenarios are generated on the fly, i.e., during test execution based on the environmental context at runtime. The approach is supported by a toolset. We empirically evaluate the proposed approach on two core components of UAS, an autopilot system of an unmanned aerial vehicle (UAV) and cockpit display systems (CDS) of the ground control station (GCS). The results show that the AITester effectively generates test scenarios causing deviations from the expected behavior of the UAV autopilot and reveals potential flaws in the GCS-CDS.

Updated: 2024-08-02 11:36:14

标题: 无人机系统级测试的自动化测试

摘要: 无人驾驶系统（UAS）依赖于各种航空电子系统，这些系统是安全关键和任务关键的。国际安全标准的一个主要要求是对航空电子软件系统进行严格的系统级测试。当前的工业实践是手动创建测试场景，使用模拟器手动/自动执行这些场景，并手动评估结果。测试场景通常包括设置特定的飞行或环境条件，并在这些设置下测试被测系统。针对这一目的的最先进方法也需要手动进行测试场景的开发和评估。在本文中，我们提出了一种新颖的方法来自动化UAS的系统级测试。所提出的方法（AITester）利用基于模型的测试和人工智能（AI）技术来自动生成、执行和评估各种测试场景。测试场景是实时生成的，即在运行时基于环境上下文执行测试。该方法得到了一套工具集的支持。我们在UAS的两个核心组件上对所提出的方法进行了实证评估，即无人机（UAV）的自动驾驶系统和地面控制站（GCS）的驾驶舱显示系统（CDS）。结果表明，AITester有效地生成了导致UAV自动驾驶系统行为偏离预期的测试场景，并揭示了GCS-CDS中潜在的缺陷。

更新时间: 2024-08-02 11:36:14

领域: cs.SE,cs.AI,cs.RO

下载: http://arxiv.org/abs/2403.15857v2

Certifiably Robust Encoding Schemes

Quantum machine learning uses principles from quantum mechanics to process data, offering potential advances in speed and performance. However, previous work has shown that these models are susceptible to attacks that manipulate input data or exploit noise in quantum circuits. Following this, various studies have explored the robustness of these models. These works focus on the robustness certification of manipulations of the quantum states. We extend this line of research by investigating the robustness against perturbations in the classical data for a general class of data encoding schemes. We show that for such schemes, the addition of suitable noise channels is equivalent to evaluating the mean value of the noiseless classifier at the smoothed data, akin to Randomized Smoothing from classical machine learning. Using our general framework, we show that suitable additions of phase-damping noise channels improve empirical and provable robustness for the considered class of encoding schemes.

Updated: 2024-08-02 11:29:21

标题: 可以翻译为：可证实的强鲁棒编码方案

摘要: 量子机器学习利用量子力学原理处理数据，提供了速度和性能方面的潜在进展。然而，先前的研究表明这些模型容易受到操纵输入数据或利用量子电路中的噪音的攻击。在此之后，各种研究探讨了这些模型的稳健性。这些工作关注对量子态操纵的稳健性认证。我们通过研究一般类别的数据编码方案中针对经典数据扰动的稳健性来扩展这一研究方向。我们表明，对于这种方案，添加适当的噪声通道等同于在平滑数据上评估无噪声分类器的平均值，类似于经典机器学习中的随机平滑。使用我们的通用框架，我们表明适当添加相阻尼噪声通道可以提高所考虑的编码方案类别的经验和可证实的稳健性。

更新时间: 2024-08-02 11:29:21

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2408.01200v1

LLMs' Understanding of Natural Language Revealed

Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. Despite their utility in a number of downstream NLP tasks, ample research has shown that LLMs are incapable of performing reasoning in tasks that require quantification over and the manipulation of symbolic variables (e.g., planning and problem solving); see for example [25][26]. In this document, however, we will focus on testing LLMs for their language understanding capabilities, their supposed forte. As we will show here, the language understanding capabilities of LLMs have been widely exaggerated. While LLMs have proven to generate human-like coherent language (since that's how they were designed), their language understanding capabilities have not been properly tested. In particular, we believe that the language understanding capabilities of LLMs should be tested by performing an operation that is the opposite of 'text generation' and specifically by giving the LLM snippets of text as input and then querying what the LLM "understood". As we show here, when doing so it will become apparent that LLMs do not truly understand language, beyond very superficial inferences that are essentially the byproduct of the memorization of massive amounts of ingested text.

Updated: 2024-08-02 11:26:12

标题: LLMs对自然语言的理解揭示了

摘要: 大型语言模型（LLMs）是对语言进行大规模自下而上、数据驱动的逆向工程实验的结果。尽管它们在许多下游自然语言处理任务中发挥了作用，但大量研究表明LLMs无法在需要对符号变量进行量化和操作的任务中进行推理（例如，规划和问题解决）；例如参见[25][26]。然而，在本文中，我们将重点测试LLMs的语言理解能力，即它们所谓的长处。正如我们将在这里展示的那样，LLMs的语言理解能力已经被广泛夸大。虽然LLMs已被证明能够生成类似人类的连贯语言（因为它们是这样设计的），但它们的语言理解能力尚未得到适当测试。特别是，我们认为LLMs的语言理解能力应该通过执行与“文本生成”相反的操作来测试，具体而言，通过将文本片段作为输入提供给LLM，然后查询LLM“理解”了什么。正如我们在这里展示的那样，这样做时将会明显地表明，LLMs并不真正理解语言，除了非常肤浅的推断，这些推断本质上是大量摄入文本的记忆的副产品。

更新时间: 2024-08-02 11:26:12

领域: cs.AI

下载: http://arxiv.org/abs/2407.19630v2

Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems

Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function with predefined weights. A number of Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems. In this work, we use a MORL technique called Deep W-Learning (DWN) and apply it to the Emergent Web Servers exemplar, a self-adaptive server, to find the optimal configuration for runtime performance optimization. We compare DWN to two single-objective optimization implementations: {\epsilon}-greedy algorithm and Deep Q-Networks. Our initial evaluation shows that DWN optimizes multiple objectives simultaneously with similar results than DQN and {\epsilon}-greedy approaches, having a better performance for some metrics, and avoids issues associated with combining multiple objectives into a single utility function.

Updated: 2024-08-02 11:16:09

标题: 多目标深度强化学习用于自主系统优化

摘要: 强化学习（RL）在自主系统（AS）中被广泛使用，因为它能够在运行时学习，而无需环境模型或预定义操作。然而，大多数RL在AS中的应用，例如基于Q-learning的应用，只能优化一个目标，这使得在多目标系统中需要将多个目标组合成一个具有预定义权重的单一目标函数。存在许多多目标强化学习（MORL）技术，但它们大多数被应用于RL基准测试，而不是实际的AS系统中。在这项工作中，我们使用一种MORL技术称为深度W学习（DWN），并将其应用于Emergent Web Servers示例，一个自适应服务器，以找到运行时性能优化的最佳配置。我们将DWN与两种单目标优化实现进行比较：ε-greedy算法和深度Q网络。我们的初步评估显示，DWN能够同时优化多个目标，与DQN和ε-greedy方法取得类似的结果，在某些指标上表现更好，并避免了将多个目标组合成单一效用函数时出现的问题。

更新时间: 2024-08-02 11:16:09

领域: cs.AI

下载: http://arxiv.org/abs/2408.01188v1

Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning

Quantum Reinforcement Learning (QRL) offers potential advantages over classical Reinforcement Learning, such as compact state space representation and faster convergence in certain scenarios. However, practical benefits require further validation. QRL faces challenges like flat solution landscapes, where traditional gradient-based methods are inefficient, necessitating the use of gradient-free algorithms. This work explores the integration of metaheuristic algorithms -- Particle Swarm Optimization, Ant Colony Optimization, Tabu Search, Genetic Algorithm, Simulated Annealing, and Harmony Search -- into QRL. These algorithms provide flexibility and efficiency in parameter optimization. Evaluations in $5\times5$ MiniGrid Reinforcement Learning environments show that, all algorithms yield near-optimal results, with Simulated Annealing and Particle Swarm Optimization performing best. In the Cart Pole environment, Simulated Annealing, Genetic Algorithms, and Particle Swarm Optimization achieve optimal results, while the others perform slightly better than random action selection. These findings demonstrate the potential of Particle Swarm Optimization and Simulated Annealing for efficient QRL learning, emphasizing the need for careful algorithm selection and adaptation.

Updated: 2024-08-02 11:14:41

标题: 使用元启发式策略在强化学习中优化变分量子电路

摘要: 量子强化学习（QRL）相较于经典强化学习具有潜在优势，如紧凑的状态空间表示和在某些情况下更快的收敛速度。然而，实际效益需要进一步验证。QRL面临挑战，例如平坦的解决方案景观，传统的基于梯度的方法效率低下，需要使用无梯度算法。本研究探讨了元启发式算法——粒子群优化、蚁群优化、禁忌搜索、遗传算法、模拟退火和和谐搜索——与QRL的整合。这些算法在参数优化方面提供了灵活性和效率。在$5\times5$ MiniGrid强化学习环境中的评估表明，所有算法产生接近最优结果，其中模拟退火和粒子群优化表现最佳。在Cart Pole环境中，模拟退火、遗传算法和粒子群优化实现了最佳结果，而其他算法略优于随机动作选择。这些发现显示了粒子群优化和模拟退火在有效的QRL学习中的潜力，强调了谨慎的算法选择和适应的必要性。

更新时间: 2024-08-02 11:14:41

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.01187v1

Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation

Representing symbolic music with compound tokens, where each token consists of several different sub-tokens representing a distinct musical feature or attribute, offers the advantage of reducing sequence length. While previous research has validated the efficacy of compound tokens in music sequence modeling, predicting all sub-tokens simultaneously can lead to suboptimal results as it may not fully capture the interdependencies between them. We introduce the Nested Music Transformer (NMT), an architecture tailored for decoding compound tokens autoregressively, similar to processing flattened tokens, but with low memory usage. The NMT consists of two transformers: the main decoder that models a sequence of compound tokens and the sub-decoder for modeling sub-tokens of each compound token. The experiment results showed that applying the NMT to compound tokens can enhance the performance in terms of better perplexity in processing various symbolic music datasets and discrete audio tokens from the MAESTRO dataset.

Updated: 2024-08-02 11:02:38

标题: 嵌套音乐变压器：在符号音乐和音频生成中顺序解码复合标记

摘要: 用复合标记表示符号音乐，其中每个标记由表示不同音乐特征或属性的几个不同子标记组成，这样可以减少序列长度。虽然先前的研究已经验证了复合标记在音乐序列建模中的有效性，但同时预测所有子标记可能导致次优结果，因为它可能无法完全捕捉它们之间的相互依赖关系。我们引入了Nested Music Transformer（NMT），这是一种专为自动回归解码复合标记而设计的架构，类似于处理扁平标记，但内存使用较低。NMT由两个transformer组成：主解码器模拟一系列复合标记，子解码器模拟每个复合标记的子标记。实验结果表明，将NMT应用于复合标记可以提高性能，从而在处理各种符号音乐数据集和MAESTRO数据集中的离散音频标记时获得更好的困惑度。

更新时间: 2024-08-02 11:02:38

领域: cs.SD,cs.IR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2408.01180v1

EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody

Speaker identification (SI) determines a speaker's identity based on their spoken utterances. Previous work indicates that SI deep neural networks (DNNs) are vulnerable to backdoor attacks. Backdoor attacks involve embedding hidden triggers in DNNs' training data, causing the DNN to produce incorrect output when these triggers are present during inference. This is the first work that explores SI DNNs' vulnerability to backdoor attacks using speakers' emotional prosody, resulting in dynamic, inconspicuous triggers. %Such an attack could have real-world implications in forensics, authentication, and surveillance. We conducted a parameter study using three different datasets and DNN architectures to determine the impact of emotions as backdoor triggers on the accuracy of SI systems. Additionally, we have explored the robustness of our attacks by applying defenses like pruning, STRIP-ViTA, and three popular preprocessing techniques: quantization, median filtering, and squeezing. Our findings show that the aforementioned models are prone to our attack, indicating that emotional triggers (sad and neutral prosody) can be effectively used to compromise the integrity of SI systems. However, the results of our pruning experiments suggest potential solutions for reinforcing the models against our attacks, decreasing the attack success rate up to 40%.

Updated: 2024-08-02 11:00:12

标题: EmoBack：利用情绪韵律进行针对说话者识别的后门攻击

摘要: 说话人识别（SI）根据说话者的话语确定说话者的身份。先前的工作表明SI深度神经网络（DNN）容易受到后门攻击。后门攻击涉及将隐藏触发器嵌入DNN的训练数据中，导致DNN在推理过程中出现这些触发器时产生不正确的输出。这是第一项研究探讨使用说话者情感韵律探测SI DNN对后门攻击的脆弱性，导致出现动态、隐蔽的触发器。这种攻击在取证、认证和监视等现实世界中可能产生影响。我们使用三种不同数据集和DNN架构进行参数研究，以确定情绪作为后门触发器对SI系统准确性的影响。此外，我们通过应用修剪、STRIP-ViTA和三种流行的预处理技术：量化、中值滤波和挤压，探讨了我们攻击的鲁棒性。我们的研究结果显示，前述模型容易受到我们的攻击，表明情感触发器（悲伤和中性韵律）可以有效地用于破坏SI系统的完整性。然而，我们修剪实验的结果暗示了加强模型防御的潜在解决方案，将攻击成功率降低了高达40%。

更新时间: 2024-08-02 11:00:12

领域: cs.CR

下载: http://arxiv.org/abs/2408.01178v1

Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems

Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries. By digitizing data throughout the product life cycle, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures. Thanks to data process capability, Generative Artificial Intelligence (GAI) can drive the construction and update of DTs to improve predictive accuracy and prepare for diverse smart manufacturing. However, mechanisms that leverage sensing Industrial Internet of Things (IIoT) devices to share data for the construction of DTs are susceptible to adverse selection problems. In this paper, we first develop a GAI-driven DT architecture for ICPSs. To address the adverse selection problem caused by information asymmetry, we propose a contract theory model and develop the sustainable diffusion-based soft actor-critic algorithm to identify the optimal feasible contract. Specifically, we leverage the dynamic structured pruning technique to reduce parameter numbers of actor networks, allowing sustainability and efficient implementation of the proposed algorithm. Finally, numerical results demonstrate the effectiveness of the proposed scheme.

Updated: 2024-08-02 10:47:10

标题: 可持续的基于扩散的激励机制：工业物理系统中由生成式AI驱动的数字孪生体

摘要: 工业物联网系统（ICPSs）是现代制造业和工业的一个重要组成部分。通过在产品生命周期中数字化数据，工业物联网系统中的数字孪生（DTs）使当前工业基础设施转变为智能和自适应基础设施。由于数据处理能力，生成式人工智能（GAI）可以推动数字孪生的构建和更新，提高预测准确性，为多样化智能制造做好准备。然而，利用感知工业物联网设备共享数据构建数字孪生的机制容易受到不利选择问题的影响。本文首先为工业物联网系统开发了一个由GAI驱动的数字孪生架构。为了解决信息不对称导致的不利选择问题，我们提出了一个合同理论模型，并开发了可持续扩散基础软Actor-Critic算法来确定最佳可行合同。具体来说，我们利用动态结构修剪技术来减少演员网络的参数数量，从而实现提出算法的可持续性和高效实施。最后，数值结果证明了所提方案的有效性。

更新时间: 2024-08-02 10:47:10

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2408.01173v1

Misinforming LLMs: vulnerabilities, challenges and opportunities

Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood. Despite exhibiting coherent answers and apparent reasoning behaviors, LLMs rely on statistical patterns in word embeddings rather than true cognitive processes. This leads to vulnerabilities such as "hallucination" and misinformation. The paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors. However, ongoing research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs capable of generating statements based on given truth and explaining their self-reasoning process.

Updated: 2024-08-02 10:35:49

标题: 误导LLM：漏洞、挑战和机遇

摘要: 大型语言模型(LLMs)在自然语言处理方面取得了重大进展，但其基本机制经常被误解。尽管展示出连贯的答案和明显的推理行为，LLMs依赖于词嵌入中的统计模式，而不是真正的认知过程。这导致了“幻觉”和错误信息等脆弱性。本文认为，当前的LLM架构由于依赖于词嵌入向量的顺序模式相关性而本质上不可信。然而，正在进行的研究将生成式基于变压器的模型与事实库和逻辑编程语言相结合，可能会导致可信赖的LLMs的开发，这些LLMs能够根据给定的真相生成陈述并解释其自我推理过程。

更新时间: 2024-08-02 10:35:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.01168v1

Domain Adaptation-Enhanced Searchlight: Enabling brain decoding from visual perception to mental imagery

In cognitive neuroscience and brain-computer interface research, accurately predicting imagined stimuli is crucial. This study investigates the effectiveness of Domain Adaptation (DA) in enhancing imagery prediction using primarily visual data from fMRI scans of 18 subjects. Initially, we train a baseline model on visual stimuli to predict imagined stimuli, utilizing data from 14 brain regions. We then develop several models to improve imagery prediction, comparing different DA methods. Our results demonstrate that DA significantly enhances imagery prediction, especially with the Regular Transfer approach. We then conduct a DA-enhanced searchlight analysis using Regular Transfer, followed by permutation-based statistical tests to identify brain regions where imagery decoding is consistently above chance across subjects. Our DA-enhanced searchlight predicts imagery contents in a highly distributed set of brain regions, including the visual cortex and the frontoparietal cortex, thereby outperforming standard cross-domain classification methods. The complete code and data for this paper have been made openly available for the use of the scientific community.

Updated: 2024-08-02 10:25:19

标题: 领域自适应增强搜索灯：实现从视觉知觉到心理想象的大脑解码

摘要: 在认知神经科学和脑机接口研究中，准确预测想象的刺激是至关重要的。本研究调查了域自适应（DA）在利用来自18名受试者fMRI扫描的主要视觉数据增强想象预测的有效性。首先，我们在视觉刺激上训练一个基准模型来预测想象的刺激，利用了来自14个脑区的数据。然后，我们开发了几个模型来改善想象预测，比较不同的DA方法。我们的结果表明，DA显著增强了想象的预测能力，尤其是采用Regular Transfer方法。然后，我们使用Regular Transfer进行了DA增强的searchlight分析，随后进行了基于排列的统计检验，以识别脑区，其中想象解码在受试者中始终高于随机概率。我们的DA增强searchlight在一组高度分布的脑区中预测了想象内容，包括视觉皮层和前顶叶皮层，从而优于标准跨域分类方法。本文的完整代码和数据已公开提供给科学界使用。

更新时间: 2024-08-02 10:25:19

领域: cs.LG,q-bio.NC,J.3; I.2

下载: http://arxiv.org/abs/2408.01163v1

An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes

A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlapping regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively. Project page: https://aigc3d.github.io/ConsistenTex.

Updated: 2024-08-02 10:19:34

标题: 一个优化框架，用于强化三维网格纹理的多视角一致性

摘要: 使用预训练的文本到图像模型对3D网格进行纹理处理的一个基本问题是确保多视角一致性。最先进的方法通常使用扩散模型来聚合多视角输入，在聚合步骤中由于平均操作导致的模糊或局部特征不一致是常见问题。本文介绍了一个优化框架，通过四个阶段实现多视角一致性。具体而言，第一阶段使用MV一致的扩散过程从预定义的视角集生成一个超完备的2D纹理集。第二阶段选择一组互相一致的视角，同时覆盖底层的3D模型。我们展示了如何通过求解半定程序来实现这一目标。第三阶段执行非刚性对齐，以在重叠区域内对齐所选视角。第四阶段解决一个MRF问题，将每个网格面与所选视角关联起来。特别是第三和第四阶段是迭代的，第四阶段获得的切割鼓励第三阶段的非刚性对齐，以便专注于靠近切割的区域。实验结果表明，我们的方法在质量和数量上都显著优于基线方法。项目页面：https://aigc3d.github.io/ConsistenTex。

更新时间: 2024-08-02 10:19:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.15559v2

TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation

T-cell receptors (TCRs) play a crucial role in the immune system by recognizing and binding to specific antigens presented by infected or cancerous cells. Understanding the sequence patterns of TCRs is essential for developing targeted immune therapies and designing effective vaccines. Language models, such as auto-regressive transformers, offer a powerful solution to this problem by learning the probability distributions of TCR repertoires, enabling the generation of new TCR sequences that inherit the underlying patterns of the repertoire. We introduce TCR-GPT, a probabilistic model built on a decoder-only transformer architecture, designed to uncover and replicate sequence patterns in TCR repertoires. TCR-GPT demonstrates an accuracy of 0.953 in inferring sequence probability distributions measured by Pearson correlation coefficient. Furthermore, by leveraging Reinforcement Learning(RL), we adapted the distribution of TCR sequences to generate TCRs capable of recognizing specific peptides, offering significant potential for advancing targeted immune therapies and vaccine development. With the efficacy of RL, fine-tuned pretrained TCR-GPT models demonstrated the ability to produce TCR repertoires likely to bind specific peptides, illustrating RL's efficiency in enhancing the model's adaptability to the probability distributions of biologically relevant TCR sequences.

Updated: 2024-08-02 10:16:28

标题: TCR-GPT：将自回归模型和强化学习集成用于T细胞受体亚群的生成

摘要: T细胞受体（TCR）通过识别和结合被感染或癌变细胞呈现的特定抗原，在免疫系统中扮演着至关重要的角色。了解TCR的序列模式对于开发靶向免疫疗法和设计有效疫苗至关重要。语言模型，如自回归变压器，通过学习TCR库的概率分布，为解决这一问题提供了有效的解决方案，从而使得能够生成具有库底层模式的新TCR序列。我们介绍了TCR-GPT，这是一个建立在仅解码器变压器架构上的概率模型，旨在揭示和复制TCR库中的序列模式。TCR-GPT在推断由皮尔逊相关系数测量的序列概率分布方面表现出0.953的准确性。此外，通过利用强化学习（RL），我们调整了TCR序列的分布，生成了能够识别特定肽的TCR，为推进靶向免疫疗法和疫苗开发提供了重要潜力。通过RL的有效性，精调的预训练TCR-GPT模型展示了产生有可能结合特定肽的TCR库的能力，展示了RL在提高模型对生物相关TCR序列概率分布的适应性方面的效率。

更新时间: 2024-08-02 10:16:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.01156v1

DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs

Entity Alignment (EA) aims to match equivalent entities in different Knowledge Graphs (KGs), which is essential for knowledge fusion and integration. Recently, embedding-based EA has attracted significant attention and many approaches have been proposed. Early approaches primarily focus on learning entity embeddings from the structural features of KGs, defined by relation triples. Later methods incorporated entities' names and attributes as auxiliary information to enhance embeddings for EA. However, these approaches often used different techniques to encode structural and attribute information, limiting their interaction and mutual enhancement. In this work, we propose a dense entity retrieval framework for EA, leveraging language models to uniformly encode various features of entities and facilitate nearest entity search across KGs. Alignment candidates are first generated through entity retrieval, which are subsequently reranked to determine the final alignments. We conduct comprehensive experiments on both cross-lingual and monolingual EA datasets, demonstrating that our approach achieves state-of-the-art performance compared to existing EA methods.

Updated: 2024-08-02 10:12:42

标题: DERA: 知识图谱中实体对齐的密集实体检索

摘要: 实体对齐（EA）旨在匹配不同知识图谱（KGs）中等价的实体，这对于知识融合和整合至关重要。最近，基于嵌入的EA引起了广泛关注，并提出了许多方法。早期方法主要关注从KGs的结构特征（由关系三元组定义）学习实体嵌入。后来的方法将实体的名称和属性作为辅助信息纳入，以增强EA的嵌入。然而，这些方法通常使用不同的技术来编码结构和属性信息，限制了它们的互动和相互增强。在这项工作中，我们提出了一个用于EA的密集实体检索框架，利用语言模型来统一编码实体的各种特征，并促进跨KG的最近实体搜索。通过实体检索首先生成对齐候选者，然后对其进行重新排名以确定最终对齐结果。我们在跨语言和单语EA数据集上进行了全面实验，证明我们的方法相对于现有EA方法取得了最先进的性能。

更新时间: 2024-08-02 10:12:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.01154v1

Enhanced Prediction of Ventilator-Associated Pneumonia in Patients with Traumatic Brain Injury Using Advanced Machine Learning Techniques

Background: Ventilator-associated pneumonia (VAP) in traumatic brain injury (TBI) patients poses a significant mortality risk and imposes a considerable financial burden on patients and healthcare systems. Timely detection and prognostication of VAP in TBI patients are crucial to improve patient outcomes and alleviate the strain on healthcare resources. Methods: We implemented six machine learning models using the MIMIC-III database. Our methodology included preprocessing steps, such as feature selection with CatBoost and expert opinion, addressing class imbalance with the Synthetic Minority Oversampling Technique (SMOTE), and rigorous model tuning through 5-fold cross-validation to optimize hyperparameters. Key models evaluated included SVM, Logistic Regression, Random Forest, XGBoost, ANN, and AdaBoost. Additionally, we conducted SHAP analysis to determine feature importance and performed an ablation study to assess feature impacts on model performance. Results: XGBoost outperformed the baseline models and the best existing literature. We used metrics, including AUC, Accuracy, Specificity, Sensitivity, F1 Score, PPV, and NPV. XGBoost demonstrated the highest performance with an AUC of 0.940 and an Accuracy of 0.875, which are 23.4% and 23.5% higher than the best results in the existing literature, with an AUC of 0.706 and an Accuracy of 0.640, respectively. This enhanced performance underscores the models' effectiveness in clinical settings. Conclusions: This study enhances the predictive modeling of VAP in TBI patients, improving early detection and intervention potential. Refined feature selection and advanced ensemble techniques significantly boosted model accuracy and reliability, offering promising directions for future clinical applications and medical diagnostics research.

Updated: 2024-08-02 09:44:18

标题: 使用先进的机器学习技术增强对颅脑损伤患者呼吸机相关性肺炎的预测

摘要: 背景：创伤性脑损伤（TBI）患者中的呼吸机相关性肺炎（VAP）对患者构成了重大的死亡风险，并对患者和医疗系统造成了巨大的财务负担。及时检测和预后评估TBI患者中的VAP对于改善患者预后和减轻医疗资源的压力至关重要。方法：我们利用MIMIC-III数据库实施了六个机器学习模型。我们的方法包括预处理步骤，如使用CatBoost和专家意见进行特征选择，使用Synthetic Minority Oversampling Technique（SMOTE）解决类别不平衡问题，通过5折交叉验证进行严格的模型调优以优化超参数。关键模型包括SVM、逻辑回归、随机森林、XGBoost、ANN和AdaBoost。此外，我们进行了SHAP分析以确定特征的重要性，并进行了消融研究以评估特征对模型性能的影响。结果：XGBoost的表现优于基线模型和最佳的现有文献。我们使用了AUC、准确率、特异性、敏感性、F1分数、PPV和NPV等指标。XGBoost表现出最高的性能，AUC为0.940，准确率为0.875，比现有文献中最佳结果的AUC为0.706和准确率为0.640分别高出23.4%和23.5%。这种提升的性能突显了模型在临床设置中的有效性。结论：本研究增强了TBI患者中VAP的预测建模，改善了早期检测和干预的潜力。精细的特征选择和先进的集成技术显著提高了模型的准确性和可靠性，为未来临床应用和医学诊断研究提供了有希望的方向。

更新时间: 2024-08-02 09:44:18

领域: cs.LG

下载: http://arxiv.org/abs/2408.01144v1

Machine learning topological energy braiding of non-Bloch bands

Machine learning has been used to identify phase transitions in a variety of physical systems. However, there is still a lack of relevant research on non-Bloch energy braiding in non-Hermitian systems. In this work, we study non-Bloch energy braiding in one-dimensional non-Hermitian systems using unsupervised and supervised methods. In unsupervised learning, we use diffusion maps to successfully identify non-Bloch energy braiding without any prior knowledge and combine it with k-means to cluster different topological elements into clusters, such as Unlink and Hopf link. In supervised learning, we train a Convolutional Neural Network (CNN) based on Bloch energy data to predict not only Bloch energy braiding but also non-Bloch energy braiding with an accuracy approaching 100%. By analysing the CNN, we can ascertain that the network has successfully acquired the ability to recognise the braiding topology of the energy bands. The present study demonstrates the considerable potential of machine learning in the identification of non-Hermitian topological phases and energy braiding.

Updated: 2024-08-02 09:37:55

标题: 机器学习非布洛赫带的拓扑能量编织

摘要: 机器学习已被用于在各种物理系统中识别相变。然而，在非布洛赫非埃尔米特系统中，相关研究仍然缺乏。在这项工作中，我们使用无监督和监督方法研究了一维非埃尔米特系统中的非布洛赫能量编织。在无监督学习中，我们使用扩散映射成功地识别了非布洛赫能量编织，而无需任何先验知识，并将其与k-means结合起来将不同的拓扑元素聚类成群，如不链接和霍普夫链接。在监督学习中，我们基于布洛赫能量数据训练了一个基于卷积神经网络（CNN）的模型，以预测不仅布洛赫能量编织，还有接近100%准确度的非布洛赫能量编织。通过分析CNN，我们可以确定网络成功地获得了识别能量带编织拓扑的能力。本研究展示了机器学习在识别非埃尔米特拓扑相和能量编织中的巨大潜力。

更新时间: 2024-08-02 09:37:55

领域: cond-mat.mes-hall,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2408.01141v1

Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition

Perturbation robustness evaluates the vulnerabilities of models, arising from a variety of perturbations, such as data corruptions and adversarial attacks. Understanding the mechanisms of perturbation robustness is critical for global interpretability. We present a model-agnostic, global mechanistic interpretability method to interpret the perturbation robustness of image models. This research is motivated by two key aspects. First, previous global interpretability works, in tandem with robustness benchmarks, e.g. mean corruption error (mCE), are not designed to directly interpret the mechanisms of perturbation robustness within image models. Second, we notice that the spectral signal-to-noise ratios (SNR) of perturbed natural images exponentially decay over the frequency. This power-law-like decay implies that: Low-frequency signals are generally more robust than high-frequency signals -- yet high classification accuracy can not be achieved by low-frequency signals alone. By applying Shapley value theory, our method axiomatically quantifies the predictive powers of robust features and non-robust features within an information theory framework. Our method, dubbed as \textbf{I-ASIDE} (\textbf{I}mage \textbf{A}xiomatic \textbf{S}pectral \textbf{I}mportance \textbf{D}ecomposition \textbf{E}xplanation), provides a unique insight into model robustness mechanisms. We conduct extensive experiments over a variety of vision models pre-trained on ImageNet to show that \textbf{I-ASIDE} can not only \textbf{measure} the perturbation robustness but also \textbf{provide interpretations} of its mechanisms.

Updated: 2024-08-02 09:35:06

标题: 使用公理谱重要性分解解释图像模型的全局扰动稳健性

摘要: 扰动鲁棒性评估模型的脆弱性，源自各种扰动，如数据损坏和对抗性攻击。理解扰动鲁棒性的机制对于全局可解释性至关重要。我们提出了一种与模型无关的全局机械解释方法，用于解释图像模型的扰动鲁棒性。这项研究受到两个关键因素的驱动。首先，先前的全局可解释性作品，与鲁棒性基准（如平均损坏误差mCE）一起，并未旨在直接解释图像模型内扰动鲁棒性的机制。其次，我们注意到扰动的自然图像的谱信噪比（SNR）随频率呈指数衰减。这种类似幂律的衰减暗示：低频信号通常比高频信号更稳健 - 然而仅靠低频信号无法实现高分类准确性。通过应用Shapley值理论，我们的方法在信息理论框架内公理化地量化了鲁棒特征和非鲁棒特征的预测能力。我们的方法，被称为\textbf{I-ASIDE}（\textbf{I}mage \textbf{A}xiomatic \textbf{S}pectral \textbf{I}mportance \textbf{D}ecomposition \textbf{E}xplanation），提供了对模型鲁棒性机制的独特洞察。我们对在ImageNet上预训练的各种视觉模型进行了广泛的实验，以展示\textbf{I-ASIDE}不仅能够\textbf{衡量}扰动鲁棒性，还能够\textbf{提供解释}其机制。

更新时间: 2024-08-02 09:35:06

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.01139v1

A Novel Method for News Article Event-Based Embedding

Embedding news articles is a crucial tool for multiple fields, such as media bias detection, identifying fake news, and making news recommendations. However, existing news embedding methods are not optimized to capture the latent context of news events. Most embedding methods rely on full-text information and neglect time-relevant embedding generation. In this paper, we propose a novel lightweight method that optimizes news embedding generation by focusing on entities and themes mentioned in articles and their historical connections to specific events. We suggest a method composed of three stages. First, we process and extract events, entities, and themes from the given news articles. Second, we generate periodic time embeddings for themes and entities by training time-separated GloVe models on current and historical data. Lastly, we concatenate the news embeddings generated by two distinct approaches: Smooth Inverse Frequency (SIF) for article-level vectors and Siamese Neural Networks for embeddings with nuanced event-related information. We leveraged over 850,000 news articles and 1,000,000 events from the GDELT project to test and evaluate our method. We conducted a comparative analysis of different news embedding generation methods for validation. Our experiments demonstrate that our approach can both improve and outperform state-of-the-art methods on shared event detection tasks.

Updated: 2024-08-02 09:30:03

标题: 一种新的基于新闻文章事件的嵌入方法

摘要: 嵌入新闻文章是多个领域的关键工具，如媒体偏见检测、识别假新闻和制定新闻推荐。然而，现有的新闻嵌入方法并未优化捕捉新闻事件的潜在上下文。大多数嵌入方法依赖于全文信息，忽略与时间相关的嵌入生成。在本文中，我们提出了一种新颖的轻量级方法，通过专注于文章中提及的实体和主题及其与特定事件的历史联系，优化新闻嵌入生成。我们建议采用三个阶段的方法。首先，我们从给定的新闻文章中处理和提取事件、实体和主题。其次，我们通过在当前和历史数据上训练时间分隔的GloVe模型生成主题和实体的周期性时间嵌入。最后，我们将由两种不同方法生成的新闻嵌入进行连接：用于文章级向量的平滑逆频率（SIF）和用于带有微妙事件相关信息的嵌入的连体神经网络。我们利用了来自GDELT项目的超过850,000篇新闻文章和1,000,000个事件来测试和评估我们的方法。我们进行了不同新闻嵌入生成方法的比较分析以进行验证。我们的实验表明，我们的方法可以提高并胜过共享事件检测任务上的最先进方法。

更新时间: 2024-08-02 09:30:03

领域: cs.CL,cs.AI,cs.SI

下载: http://arxiv.org/abs/2405.13071v2

A Survey of Mamba

Deep learning, as a vital technique, has sparked a notable revolution in artificial intelligence. As the most representative architecture, Transformers have empowered numerous advanced models, especially the large language models that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computation complexity of attention calculation. Recently, a novel architecture named Mamba, drawing inspiration from classical state space models, has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length. This has sparked an increasing number of studies actively exploring Mamba's potential to achieve impressive performance across diverse domains. Given such rapid evolution, there is a critical need for a systematic review that consolidates existing Mamba-empowered models, offering a comprehensive understanding of this emerging model architecture. In this survey, we therefore conduct an in-depth investigation of recent Mamba-associated studies, covering from three main aspects: the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel. Specifically, we first recall the foundational knowledge of various representative deep learning models and the details of Mamba as preliminaries. Then, to showcase the significance of Mamba, we comprehensively review the related studies focusing on Mamba models' architecture design, data adaptability, and applications. Finally, we present an discussion of current limitations and explore various promising research directions to provide deeper insights for future investigations.

Updated: 2024-08-02 09:18:41

标题: 一项关于曼巴蛇的调查

摘要: 深度学习作为一种重要技术，在人工智能领域引发了一场显著的革命。作为最具代表性的架构，Transformer已经赋予了许多先进模型以力量，尤其是包含数十亿参数的大型语言模型，成为深度学习中的基石。尽管取得了令人印象深刻的成就，但Transformers仍然面临固有限制，特别是由于注意力计算的二次计算复杂性而导致的耗时推断。最近，一种名为Mamba的新型架构，从经典状态空间模型中汲取灵感，已经成为构建基础模型的有希望的替代方案，提供与Transformers相当的建模能力，同时保持对序列长度的近线性可伸缩性。这引发了越来越多的研究积极探索Mamba在各个领域实现出色性能的潜力。鉴于这种快速发展，迫切需要一项系统性审查，整合现有的Mamba驱动模型，提供对这种新兴模型架构的全面了解。在本调查中，我们对最近与Mamba相关的研究进行了深入调查，涵盖三个主要方面：基于Mamba的模型的进展、适应Mamba到不同数据的技术，以及Mamba能够在哪些应用中表现出色。具体来说，我们首先回顾各种代表性深度学习模型的基础知识和Mamba的细节作为初步。然后，为了展示Mamba的重要性，我们全面审查了与Mamba模型的架构设计、数据适应性和应用有关的研究。最后，我们对目前的限制进行了讨论，并探讨了各种有前途的研究方向，以为未来的调查提供更深入的见解。

更新时间: 2024-08-02 09:18:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.01129v1

Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction

The fields of Origin of Life and Artificial Life both question what life is and how it emerges from a distinct set of "pre-life" dynamics. One common feature of most substrates where life emerges is a marked shift in dynamics when self-replication appears. While there are some hypotheses regarding how self-replicators arose in nature, we know very little about the general dynamics, computational principles, and necessary conditions for self-replicators to emerge. This is especially true on "computational substrates" where interactions involve logical, mathematical, or programming rules. In this paper we take a step towards understanding how self-replicators arise by studying several computational substrates based on various simple programming languages and machine instruction sets. We show that when random, non self-replicating programs are placed in an environment lacking any explicit fitness landscape, self-replicators tend to arise. We demonstrate how this occurs due to random interactions and self-modification, and can happen with and without background random mutations. We also show how increasingly complex dynamics continue to emerge following the rise of self-replicators. Finally, we show a counterexample of a minimalistic programming language where self-replicators are possible, but so far have not been observed to arise.

Updated: 2024-08-02 09:10:54

标题: 计算生命：简单交互如何产生形式良好、自我复制的程序

摘要: 生命起源和人工生命领域都在探讨生命是什么以及它是如何从一组独特的“前生命”动态中产生的。大多数生命出现的基质中的一个共同特征是当自我复制出现时动态发生明显变化。虽然有关自我复制体如何在自然界中产生的一些假设，但我们对自我复制体出现的一般动态、计算原则和必要条件知之甚少。这在“计算基质”上尤为真实，其中相互作用涉及逻辑、数学或编程规则。本文通过研究几种基于各种简单编程语言和机器指令集的计算基质，迈出了理解自我复制体如何出现的一步。我们展示了当将随机的非自我复制程序放置在缺乏明确适应度景观的环境中时，自我复制体往往会出现。我们展示了这是由于随机相互作用和自我修改而发生的，并且可以在有或没有背景随机突变的情况下发生。我们还展示了在自我复制体出现后，越来越复杂的动态继续出现。最后，我们展示了一个极简主义编程语言的反例，其中自我复制体是可能的，但迄今为止尚未观察到其出现。

更新时间: 2024-08-02 09:10:54

领域: cs.NE,cs.AI,F.2.2; I.2.11

下载: http://arxiv.org/abs/2406.19108v2

DASA: Delay-Adaptive Multi-Agent Stochastic Approximation

We consider a setting in which $N$ agents aim to speedup a common Stochastic Approximation (SA) problem by acting in parallel and communicating with a central server. We assume that the up-link transmissions to the server are subject to asynchronous and potentially unbounded time-varying delays. To mitigate the effect of delays and stragglers while reaping the benefits of distributed computation, we propose \texttt{DASA}, a Delay-Adaptive algorithm for multi-agent Stochastic Approximation. We provide a finite-time analysis of \texttt{DASA} assuming that the agents' stochastic observation processes are independent Markov chains. Significantly advancing existing results, \texttt{DASA} is the first algorithm whose convergence rate depends only on the mixing time $\tau_{mix}$ and on the average delay $\tau_{avg}$ while jointly achieving an $N$-fold convergence speedup under Markovian sampling. Our work is relevant for various SA applications, including multi-agent and distributed temporal difference (TD) learning, Q-learning and stochastic optimization with correlated data.

Updated: 2024-08-02 09:03:09

标题: DASA：延迟自适应多智能体随机逼近

摘要: 我们考虑这样一个情景：$N$个代理通过并行行动并与中央服务器通信，旨在加速一个共同的随机逼近（SA）问题。我们假设传输到服务器的上行传输受到异步和潜在的时变延迟的影响。为了减轻延迟和落后者的影响，并在获得分布式计算的好处的同时，我们提出了一种适应延迟的算法 \texttt{DASA}，用于多代理随机逼近。我们对 \texttt{DASA} 进行了有限时间分析，假设代理的随机观测过程是独立的马尔可夫链。与现有结果相比，\texttt{DASA} 是第一个收敛速度仅取决于混合时间 $\tau_{mix}$ 和平均延迟 $\tau_{avg}$，同时在马尔可夫抽样下实现 $N$ 倍收敛加速的算法。我们的工作与各种 SA 应用相关，包括多代理和分布式时序差分（TD）学习、Q学习和具有相关数据的随机优化。

更新时间: 2024-08-02 09:03:09

领域: cs.AI,cs.RO,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2403.17247v3

Being Accountable is Smart: Navigating the Technical and Regulatory Landscape of AI-based Services for Power Grid

The emergence of artificial intelligence and digitization of the power grid introduced numerous effective application scenarios for AI-based services for the smart grid. Nevertheless, adopting AI in critical infrastructures presents challenges due to unclear regulations and lacking risk quantification techniques. Regulated and accountable approaches for integrating AI-based services into the smart grid could accelerate the adoption of innovative methods in daily practices and address society's general safety concerns. This paper contributes to this objective by defining accountability and highlighting its importance for AI-based services in the energy sector. It underlines the current shortcomings of the AI Act and proposes an approach to address these issues in a potential delegated act. The proposed technical approach for developing and operating accountable AI-based smart grid services allows for assessing different service life cycle phases and identifying related accountability risks.

Updated: 2024-08-02 09:02:42

标题: 负责任是明智之举：AI基于服务用于电网的技术和监管环境航行

摘要: 人工智能的出现和电网数字化为智能电网引入了许多有效的基于人工智能服务的应用场景。然而，在关键基础设施中采用人工智能存在挑战，因为相关法规不明确且缺乏风险量化技术。在智能电网中整合基于人工智能服务的受监管和负责任的方法可以加速创新方法在日常实践中的应用，并解决社会对安全的普遍担忧。本文通过定义责任和强调其在能源部门基于人工智能服务中的重要性，为实现这一目标做出贡献。它强调了AI法案目前的不足，并提出了在潜在的授权法规中解决这些问题的方法。提出的技术方法用于开发和运营负责任的基于人工智能的智能电网服务，允许评估不同服务生命周期阶段并确定相关的责任风险。

更新时间: 2024-08-02 09:02:42

领域: cs.AI

下载: http://arxiv.org/abs/2408.01121v1

The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer

Lymph node metastasis (LNM) is a crucial factor in determining the initial treatment for patients with lung cancer, yet accurate preoperative diagnosis of LNM remains challenging. Recently, large language models (LLMs) have garnered significant attention due to their remarkable text generation capabilities. Leveraging the extensive medical knowledge learned from vast corpora, LLMs can estimate probabilities for clinical problems, though their performance has historically been inferior to data-driven machine learning models. In this paper, we propose a novel ensemble method that combines the medical knowledge acquired by LLMs with the latent patterns identified by machine learning models to enhance LNM prediction performance. Initially, we developed machine learning models using patient data. We then designed a prompt template to integrate the patient data with the predicted probability from the machine learning model. Subsequently, we instructed GPT-4o, the most advanced LLM developed by OpenAI, to estimate the likelihood of LNM based on patient data and then adjust the estimate using the machine learning output. Finally, we collected three outputs from the GPT-4o using the same prompt and ensembled these results as the final prediction. Using the proposed method, our models achieved an AUC value of 0.765 and an AP value of 0.415 for LNM prediction, significantly improving predictive performance compared to baseline machine learning models. The experimental results indicate that GPT-4o can effectively leverage its medical knowledge and the probabilities predicted by machine learning models to achieve more accurate LNM predictions. These findings demonstrate that LLMs can perform well in clinical risk prediction tasks, offering a new paradigm for integrating medical knowledge and patient data in clinical predictions.

Updated: 2024-08-02 08:55:52

标题: 将数据和知识相结合的力量：GPT-4o在预测肺癌淋巴结转移中是一个有效的机器学习模型解释器

摘要: 淋巴结转移（LNM）是确定肺癌患者初始治疗方案的关键因素，然而，准确的术前诊断仍然具有挑战性。最近，大型语言模型（LLMs）因其出色的文本生成能力而受到极大关注。利用从大量语料库中学习的广泛医学知识，LLMs可以估计临床问题的概率，尽管其性能在历史上一直不如数据驱动的机器学习模型。在本文中，我们提出了一种新的集成方法，将LLMs获得的医学知识与机器学习模型识别的潜在模式相结合，以提高LNM预测性能。首先，我们使用患者数据开发了机器学习模型。然后，我们设计了一个提示模板，将患者数据与机器学习模型的预测概率整合在一起。随后，我们指导OpenAI开发的最先进的LLM GPT-4o，根据患者数据估计LNM的可能性，然后使用机器学习输出进行调整。最后，我们使用相同的提示从GPT-4o收集了三个输出，并将这些结果作为最终预测进行集成。使用提出的方法，我们的模型在LNM预测方面实现了AUC值为0.765和AP值为0.415，与基线机器学习模型相比显著提高了预测性能。实验结果表明，GPT-4o可以有效利用其医学知识和机器学习模型预测的概率，从而实现更准确的LNM预测。这些发现表明，LLMs在临床风险预测任务中表现良好，为在临床预测中整合医学知识和患者数据提供了一种新的范式。

更新时间: 2024-08-02 08:55:52

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17900v3

Improving Sentence Embeddings with Automatic Generation of Training Data Using Few-shot Examples

Decoder-based large language models (LLMs) have shown high performance on many tasks in natural language processing. This is also true for sentence embedding learning, where a decoder-based model, PromptEOL, has achieved the best performance on semantic textual similarity (STS) tasks. However, PromptEOL requires a manually annotated natural language inference (NLI) dataset for fine-tuning. We aim to improve sentence embeddings without using large manually annotated datasets by automatically generating an NLI dataset with an LLM and using it for fine-tuning of PromptEOL. To achieve this, we explore methods of data generation suitable for sentence embedding learning in this study. Specifically, we will focus on automatic dataset generation through few-shot learning and explore the appropriate methods to leverage few-shot examples. Experimental results on the STS tasks demonstrate that our approach outperforms existing models in settings without large manually annotated datasets.

Updated: 2024-08-02 08:49:14

标题: 用少量示例自动生成训练数据来改进句子嵌入

摘要: 基于解码器的大型语言模型（LLMs）在自然语言处理的许多任务中表现出很高的性能。这在句子嵌入学习中也是如此，其中一个基于解码器的模型PromptEOL在语义文本相似性（STS）任务中取得了最佳性能。然而，PromptEOL需要一个手动标注的自然语言推理（NLI）数据集进行微调。我们旨在通过使用LLM自动生成NLI数据集并将其用于PromptEOL的微调，从而改进句子嵌入而不使用大型手动标注数据集。为实现这一目标，我们在本研究中探讨适用于句子嵌入学习的数据生成方法。具体而言，我们将重点关注通过少样本学习自动生成数据集，并探索利用少样本示例的适当方法。对STS任务的实验结果表明，我们的方法在没有大型手动标注数据集的设置中优于现有模型。

更新时间: 2024-08-02 08:49:14

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.15132v2

BioRAG: A RAG-LLM Framework for Biological Question Reasoning

The question-answering system for Life science research, which is characterized by the rapid pace of discovery, evolving insights, and complex interactions among knowledge entities, presents unique challenges in maintaining a comprehensive knowledge warehouse and accurate information retrieval. To address these issues, we introduce BioRAG, a novel Retrieval-Augmented Generation (RAG) with the Large Language Models (LLMs) framework. Our approach starts with parsing, indexing, and segmenting an extensive collection of 22 million scientific papers as the basic knowledge, followed by training a specialized embedding model tailored to this domain. Additionally, we enhance the vector retrieval process by incorporating a domain-specific knowledge hierarchy, which aids in modeling the intricate interrelationships among each query and context. For queries requiring the most current information, BioRAG deconstructs the question and employs an iterative retrieval process incorporated with the search engine for step-by-step reasoning. Rigorous experiments have demonstrated that our model outperforms fine-tuned LLM, LLM with search engines, and other scientific RAG frameworks across multiple life science question-answering tasks.

Updated: 2024-08-02 08:37:03

标题: BioRAG：一种用于生物问题推理的RAG-LLM框架

摘要: 生命科学研究的问答系统以发现速度快、洞察深刻和知识实体之间复杂互动为特征，这给维护全面知识仓库和准确信息检索带来了独特挑战。为解决这些问题，我们引入了BioRAG，一种新颖的带有大型语言模型（LLMs）框架的检索增强生成（RAG）系统。我们的方法首先对约2200万篇科学论文进行解析、索引和分段，作为基础知识，然后训练一个针对该领域定制的专门嵌入模型。此外，我们通过整合领域特定的知识层次结构来增强向量检索过程，有助于建模每个查询和上下文之间的复杂相互关系。对于需要最新信息的查询，BioRAG会解构问题，并在逐步推理过程中与搜索引擎结合使用迭代检索过程。严格的实验表明，我们的模型在多个生命科学问答任务中表现优于经过微调的LLM、LLM与搜索引擎以及其他科学RAG框架。

更新时间: 2024-08-02 08:37:03

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2408.01107v1

SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models

Contemporary large language models (LLMs) primarily rely on next-token prediction method for inference, which significantly impedes their processing speed. In this paper, we introduce a novel inference methodology termed next-sentence prediction, aimed at enhancing the inference efficiency of LLMs. We present Sentence Variational Autoencoder (SentenceVAE), a tiny model consisting of a Sentence Encoder and a Sentence Decoder. The encoder effectively condenses the information within a sentence into a singular token, while the decoder reconstructs this compressed data back into its original sentential form. By integrating SentenceVAE into the input and output layers of LLMs, we develop Sentence-level LLMs (SLLMs) that employ a sentence-by-sentence inference approach, markedly accelerating inference speeds. SentenceVAE also maintains the integrity of the original semantic content by segmenting the text into sentences, thereby improving accuracy while boosting inference speeds. Compared to published LLMs, SLLMs process fewer tokens over equivalent context lengths, significantly reducing memory demands for self-attention computations and facilitating the handling of longer contexts. Our experimental findings reveal that this method can accelerate inference speeds by 204~365%, reduce perplexity (PPL) to 46~75% of its original metric, and decrease memory overhead by 86~91% for the same context length, compared to the token-by-token method. Moreover, the benefits of this approach become even more pronounced as model parameters increase.

Updated: 2024-08-02 08:27:08

标题: SentenceVAE：通过下一个句子预测实现更快、更长、更准确的大型语言模型推理

摘要: 现代大型语言模型（LLMs）主要依赖下一个标记预测方法进行推断，这显著影响了它们的处理速度。在本文中，我们介绍了一种新颖的推断方法，称为下一个句子预测，旨在提高LLMs的推断效率。我们提出了Sentence Variational Autoencoder（SentenceVAE），这是一个由句子编码器和句子解码器组成的微型模型。编码器有效地将句子中的信息压缩成一个单一标记，而解码器将这些压缩数据重新构造为其原始的句子形式。通过将SentenceVAE集成到LLMs的输入和输出层中，我们开发了句子级LLMs（SLLMs），采用逐句推断方法，显著加快了推断速度。SentenceVAE还通过将文本分割成句子来保持原始语义内容的完整性，从而提高准确性并提高推断速度。与已发表的LLMs相比，SLLMs在相同上下文长度下处理更少的标记，大大减少了自注意力计算的内存需求，并促进了更长上下文的处理。我们的实验结果显示，与逐标记方法相比，该方法可以将推断速度提高204~365%，将困惑度（PPL）降低到原始度量的46~75%，并将相同上下文长度的内存开销减少86~91%。此外，随着模型参数的增加，这种方法的好处变得更加显著。

更新时间: 2024-08-02 08:27:08

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.00655v2

Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration

Recently, pre-trained model and efficient parameter tuning have achieved remarkable success in natural language processing and high-level computer vision with the aid of masked modeling and prompt tuning. In low-level computer vision, however, there have been limited investigations on pre-trained models and even efficient fine-tuning strategy has not yet been explored despite its importance and benefit in various real-world tasks such as alleviating memory inflation issue when integrating new tasks on AI edge devices. Here, we propose a novel efficient parameter tuning approach dubbed contribution-based low-rank adaptation (CoLoRA) for multiple image restorations along with effective pre-training method with random order degradations (PROD). Unlike prior arts that tune all network parameters, our CoLoRA effectively fine-tunes small amount of parameters by leveraging LoRA (low-rank adaptation) for each new vision task with our contribution-based method to adaptively determine layer by layer capacity for that task to yield comparable performance to full tuning. Furthermore, our PROD strategy allows to extend the capability of pre-trained models with improved performance as well as robustness to bridge synthetic pre-training and real-world fine-tuning. Our CoLoRA with PROD has demonstrated its superior performance in various image restoration tasks across diverse degradation types on both synthetic and real-world datasets for known and novel tasks.

Updated: 2024-08-02 08:24:05

标题: 使用预训练模型的基于贡献的低秩适应方法用于真实图像恢复

摘要: 最近，在自然语言处理和高级计算机视觉领域，借助掩蔽建模和提示调整，预训练模型和高效参数调整取得了显著成功。然而，在低级计算机视觉领域，尽管预训练模型的重要性和益处已经被认可，但对预训练模型的研究仍然有限，甚至高效微调策略尚未被探索。例如，在AI边缘设备上整合新任务时，减缓内存膨胀问题等各种实际任务中，这些策略的重要性和益处显而易见。在这里，我们提出了一种新颖的高效参数调整方法，名为基于贡献的低秩适应（CoLoRA），用于多图像恢复，同时结合随机顺序降级（PROD）的有效预训练方法。与之前调整所有网络参数的方法不同，我们的CoLoRA通过利用LoRA（低秩适应）为每个新视觉任务有效微调少量参数，并结合我们的基于贡献的方法，逐层自适应确定该任务的层容量，以实现与全面调整相当的性能。此外，我们的PROD策略允许扩展预训练模型的能力，提高性能，并增强对合成预训练和实际微调之间的稳健性。我们的CoLoRA结合PROD在各种图像恢复任务中表现出卓越的性能，涵盖了合成和真实数据集上不同类型的退化，适用于已知和新颖任务。

更新时间: 2024-08-02 08:24:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.01099v1

Dual Operating Modes of In-Context Learning

In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigates various mathematical models to analyze ICL, but existing models explain only one operating mode at a time. We introduce a probabilistic model, with which one can explain the dual operating modes of ICL simultaneously. Focusing on in-context learning of linear functions, we extend existing models for pretraining data by introducing multiple task groups and task-dependent input distributions. We then analyze the behavior of the optimally pretrained model under the squared loss, i.e., the MMSE estimator of the label given in-context examples. Regarding pretraining task distribution as prior and in-context examples as the observation, we derive the closed-form expression of the task posterior distribution. With the closed-form expression, we obtain a quantitative understanding of the two operating modes of ICL. Furthermore, we shed light on an unexplained phenomenon observed in practice: under certain settings, the ICL risk initially increases and then decreases with more in-context examples. Our model offers a plausible explanation for this "early ascent" phenomenon: a limited number of in-context samples may lead to the retrieval of an incorrect skill, thereby increasing the risk, which will eventually diminish as task learning takes effect with more in-context samples. We also theoretically analyze ICL with biased labels, e.g., zero-shot ICL, where in-context examples are assigned random labels. Lastly, we validate our findings and predictions via experiments involving Transformers and large language models.

Updated: 2024-08-02 08:22:57

标题: 上下文学习的双重运行模式

摘要: In-context learning (ICL) 具有双重操作模式：任务学习，即从上下文样本中获取新技能，以及任务检索，即定位并激活相关的预训练技能。最近的理论工作探讨了各种数学模型来分析ICL，但现有模型仅一次解释一个操作模式。我们引入了一个概率模型，可以同时解释ICL的双重操作模式。专注于线性函数的上下文学习，我们通过引入多个任务组和任务相关的输入分布来扩展现有的预训练数据模型。然后，我们分析了在平方损失下，即给定上下文示例的标签的MMSE估计器下，最佳预训练模型的行为。将预训练任务分布视为先验，将上下文示例视为观察结果，我们推导出任务后验分布的闭合形式表达式。通过闭合形式表达式，我们可以量化理解ICL的两种操作模式。此外，我们阐明了实践中观察到的一个未解释现象：在某些设置下，ICL风险最初会随着更多上下文示例而增加，然后减少。我们的模型为这种“早期上升”现象提供了一个合理的解释：有限数量的上下文样本可能导致检索到不正确的技能，从而增加风险，随着更多上下文样本的学习效果发挥作用，风险最终会减少。我们还在理论上分析了带有偏差标签的ICL，例如零样本ICL，其中上下文示例被分配随机标签。最后，我们通过涉及Transformers和大型语言模型的实验验证了我们的发现和预测。

更新时间: 2024-08-02 08:22:57

领域: cs.LG

下载: http://arxiv.org/abs/2402.18819v2

Artificial Neural Networks for Photonic Applications: From Algorithms to Implementation

This tutorial-review on applications of artificial neural networks in photonics targets a broad audience, ranging from optical research and engineering communities to computer science and applied mathematics. We focus here on the research areas at the interface between these disciplines, attempting to find the right balance between technical details specific to each domain and overall clarity. First, we briefly recall key properties and peculiarities of some core neural network types, which we believe are the most relevant to photonics, also linking the layer's theoretical design to some photonics hardware realizations. After that, we elucidate the question of how to fine-tune the selected model's design to perform the required task with optimized accuracy. Then, in the review part, we discuss recent developments and progress for several selected applications of neural networks in photonics, including multiple aspects relevant to optical communications, imaging, sensing, and the design of new materials and lasers. In the following section, we put a special emphasis on how to accurately evaluate the complexity of neural networks in the context of the transition from algorithms to hardware implementation. The introduced complexity characteristics are used to analyze the applications of neural networks in optical communications, as a specific, albeit highly important example, comparing those with some benchmark signal processing methods. We combine the description of the well-known model compression strategies used in machine learning, with some novel techniques introduced recently in optical applications of neural networks. It is important to stress that although our focus in this tutorial-review is on photonics, we believe that the methods and techniques presented here can be handy in a much wider range of scientific and engineering applications.

Updated: 2024-08-02 08:22:49

标题: 人工神经网络在光子应用中的应用：从算法到实施

摘要: 这篇关于人工神经网络在光子学中应用的教程综述面向广泛的受众，包括光学研究和工程界、计算机科学和应用数学领域。我们着重讨论这些学科之间交叉的研究领域，努力在每个领域特定的技术细节和整体清晰度之间找到平衡。首先，我们简要回顾了一些核心神经网络类型的关键特性和特殊性，我们认为这些特性对光子学最为相关，并将网络层的理论设计与一些光子学硬件实现进行了联系。之后，我们阐述了如何微调所选模型的设计，以实现所需任务的最佳准确性。然后，在评论部分，我们讨论了神经网络在光子学中几个选择应用的最新发展和进展，包括与光通信、成像、传感和新材料和激光设计相关的多个方面。在接下来的部分，我们特别强调了如何在算法向硬件实现的过渡背景下准确评估神经网络的复杂性。引入的复杂特征用于分析神经网络在光通信中的应用，作为一个具体但非常重要的例子，将其与一些基准信号处理方法进行比较。我们结合了机器学习中常用的模型压缩策略的描述，以及最近在光学神经网络应用中引入的一些新技术。需要强调的是，尽管我们在这篇教程综述中关注光子学，但我们认为这里介绍的方法和技术在更广泛的科学和工程应用中也会非常有用。

更新时间: 2024-08-02 08:22:49

领域: cs.LG,cs.AI,eess.SP,physics.optics

下载: http://arxiv.org/abs/2408.02685v1

Six Dragons Fly Again: Reviving 15th-Century Korean Court Music with Transformers and Novel Encoding

We introduce a project that revives a piece of 15th-century Korean court music, Chihwapyeong and Chwipunghyeong, composed upon the poem Songs of the Dragon Flying to Heaven. One of the earliest examples of Jeongganbo, a Korean musical notation system, the remaining version only consists of a rudimentary melody. Our research team, commissioned by the National Gugak (Korean Traditional Music) Center, aimed to transform this old melody into a performable arrangement for a six-part ensemble. Using Jeongganbo data acquired through bespoke optical music recognition, we trained a BERT-like masked language model and an encoder-decoder transformer model. We also propose an encoding scheme that strictly follows the structure of Jeongganbo and denotes note durations as positions. The resulting machine-transformed version of Chihwapyeong and Chwipunghyeong were evaluated by experts and performed by the Court Music Orchestra of National Gugak Center. Our work demonstrates that generative models can successfully be applied to traditional music with limited training data if combined with careful design.

Updated: 2024-08-02 08:16:55

标题: 六条龙再次飞翔：利用变形金刚和新型编码复兴15世纪的韩国宫廷音乐

摘要: 我们介绍了一个项目，该项目恢复了一首15世纪朝鲜宫廷音乐《飞龙舞天》的作品，其中包括了《赤和平》和《赤风鸣》。这些作品是根据诗歌《飞龙舞天之歌》创作的。作为朝鲜音乐符谱系统Jeongganbo的最早示例之一，现存版本仅包含基础旋律。我们的研究团队受韩国国立古乐中心委托，旨在将这个古老的旋律转化为适合六重奏表演的编排。通过使用定制的光学音乐识别获取的Jeongganbo数据，我们训练了一个类似于BERT的掩码语言模型和一个编码-解码变换器模型。我们还提出了一个编码方案，严格遵循Jeongganbo的结构，并将音符持续时间表示为位置。经过机器转换后的《赤和平》和《赤风鸣》版本由专家评估，并由国立古乐中心宫廷音乐乐团演奏。我们的工作证明了，如果结合谨慎设计，生成模型可以成功应用于训练数据有限的传统音乐。

更新时间: 2024-08-02 08:16:55

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2408.01096v1

An Encoding--Searching Separation Perspective on Bi-Encoder Neural Search

This paper reviews, analyzes, and proposes a new perspective on the bi-encoder architecture for neural search. While the bi-encoder architecture is widely used due to its simplicity and scalability at test time, it has some notable issues such as low performance on seen datasets and weak zero-shot performance on new datasets. In this paper, we analyze these issues and summarize two main critiques: the encoding information bottleneck problem and limitations of the basic assumption of embedding search. We then construct a thought experiment to logically analyze the encoding and searching operations and challenge the basic assumption of embedding search. Building on these observations, we propose a new perspective on the bi-encoder architecture called the \textit{encoding--searching separation} perspective, which conceptually and practically separates the encoding and searching operations. This new perspective is applied to explain the root cause of the identified issues and discuss ways to mitigate the problems. Finally, we discuss the implications of the ideas underlying the new perspective, the design surface that it exposes and the potential research directions arising from it.

Updated: 2024-08-02 08:13:18

标题: 一种基于编码-搜索分离的双编码器神经搜索模型视角

摘要: 本文对神经搜索中的双编码器架构进行了回顾、分析，并提出了一个新的视角。尽管双编码器架构在测试时简单且可扩展，但在已知数据集上性能较差，在新数据集上的零照射性能也较弱。本文分析了这些问题，并总结了两个主要批评：编码信息瓶颈问题和嵌入搜索基本假设的局限性。然后，我们构建了一个思维实验来逻辑分析编码和搜索操作，并挑战嵌入搜索的基本假设。基于这些观察结果，我们提出了一个新的视角，称为“编码-搜索分离”视角，概念上和实践上分离了编码和搜索操作。这种新的视角被应用于解释已识别问题的根本原因，并讨论缓解问题的方法。最后，我们讨论了新视角背后的思想、它所揭示的设计表面以及由此产生的潜在研究方向的影响。

更新时间: 2024-08-02 08:13:18

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2408.01094v1

Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely enhancing dissonance detection. The dataset and code are here: https://selfcontradiction.github.io/.

Updated: 2024-08-02 08:11:11

标题: 解剖不和谐：将大型多模态模型与自相矛盾的指令进行基准测试

摘要: 大型多模型（LMMs）在遵循人类指令方面表现出色。然而，由于多模式交互和上下文长度增加的趋势，可能会出现自相矛盾的指令，这对语言初学者和弱势人群具有挑战性。我们引入了自相矛盾指令基准来评估LMMs在识别冲突命令方面的能力。该基准包括20,000个冲突，均匀分布在语言和视觉范式之间。它由一种新颖的自动数据集创建框架构建，该框架加速了该过程，并使我们能够涵盖各种指令形式。我们的综合评估显示，目前的LMMs由于缺乏自我意识，一直在努力识别多模型指令不一致。因此，我们提出了认知觉醒提示，以注入来自外部的认知，大大增强了不和谐检测。数据集和代码在这里：https://selfcontradiction.github.io/。

更新时间: 2024-08-02 08:11:11

领域: cs.AI

下载: http://arxiv.org/abs/2408.01091v1

The EAP-AIAS: Adapting the AI Assessment Scale for English for Academic Purposes

The rapid advancement of Generative Artificial Intelligence (GenAI) presents both opportunities and challenges for English for Academic Purposes (EAP) instruction. This paper proposes an adaptation of the AI Assessment Scale (AIAS) specifically tailored for EAP contexts, termed the EAP-AIAS. This framework aims to provide a structured approach for integrating GenAI tools into EAP assessment practices while maintaining academic integrity and supporting language development. The EAP-AIAS consists of five levels, ranging from "No AI" to "Full AI", each delineating appropriate GenAI usage in EAP tasks. We discuss the rationale behind this adaptation, considering the unique needs of language learners and the dual focus of EAP on language proficiency and academic acculturation. This paper explores potential applications of the EAP-AIAS across various EAP assessment types, including writing tasks, presentations, and research projects. By offering a flexible framework, the EAP-AIAS seeks to empower EAP practitioners seeking to deal with the complexities of GenAI integration in education and prepare students for an AI-enhanced academic and professional future. This adaptation represents a step towards addressing the pressing need for ethical and pedagogically sound AI integration in language education.

Updated: 2024-08-02 07:51:29

标题: EAP-AIAS：将AI评估量表调整为英语学术用途

摘要: Generative Artificial Intelligence (GenAI)的快速发展为英语学术目的（EAP）教学带来了机遇和挑战。本文提出了一种专门针对EAP环境进行调整的AI评估量表（AIAS）的适应性框架，称为EAP-AIAS。该框架旨在提供一种结构化方法，将GenAI工具整合到EAP评估实践中，同时保持学术诚信并支持语言发展。EAP-AIAS包括五个级别，从“无AI”到“全AI”，每个级别描述了在EAP任务中适当使用GenAI的方式。我们探讨了这种调整背后的理由，考虑了语言学习者的独特需求以及EAP对语言能力和学术文化适应的双重关注。本文探讨了EAP-AIAS在各种EAP评估类型中的潜在应用，包括写作任务、演讲和研究项目。通过提供一个灵活的框架，EAP-AIAS旨在赋予寻求应对GenAI整合教育复杂性的EAP从业者能力，并为学生为AI增强的学术和职业未来做准备。这种调整代表了朝着解决语言教育中道德和教学上合理的AI整合迫切需要的一步。

更新时间: 2024-08-02 07:51:29

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2408.01075v1

A Survey on Self-play Methods in Reinforcement Learning

Self-play, characterized by agents' interactions with copies or past versions of itself, has recently gained prominence in reinforcement learning. This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then it provides a unified framework and classifies existing self-play algorithms within this framework. Moreover, the paper bridges the gap between the algorithms and their practical implications by illustrating the role of self-play in different scenarios. Finally, the survey highlights open challenges and future research directions in self-play. This paper is an essential guide map for understanding the multifaceted landscape of self-play in RL.

Updated: 2024-08-02 07:47:51

标题: 自我博弈方法在强化学习中的调查

摘要: 最近，自我对弈在强化学习中备受关注，其特点是代理与其副本或过去版本进行交互。本文首先澄清了自我对弈的基础知识，包括多智能体强化学习框架和基本博弈论概念。然后提供了一个统一的框架，并对现有的自我对弈算法进行分类。此外，本文通过说明自我对弈在不同场景中的作用，弥合了算法与其实际影响之间的差距。最后，该调查突出了自我对弈中的挑战和未来研究方向。本文是理解强化学习中自我对弈多维景观的重要指南。

更新时间: 2024-08-02 07:47:51

领域: cs.AI

下载: http://arxiv.org/abs/2408.01072v1

Differentiable Tree Search Network

In decision-making problems with limited training data, policy functions approximated using deep neural networks often exhibit suboptimal performance. An alternative approach involves learning a world model from the limited data and determining actions through online search. However, the performance is adversely affected by compounding errors arising from inaccuracies in the learned world model. While methods like TreeQN have attempted to address these inaccuracies by incorporating algorithmic inductive biases into the neural network architectures, the biases they introduce are often weak and insufficient for complex decision-making tasks. In this work, we introduce Differentiable Tree Search Network (D-TSN), a novel neural network architecture that significantly strengthens the inductive bias by embedding the algorithmic structure of a best-first online search algorithm. D-TSN employs a learned world model to conduct a fully differentiable online search. The world model is jointly optimized with the search algorithm, enabling the learning of a robust world model and mitigating the effect of prediction inaccuracies. Further, we note that a naive incorporation of best-first search could lead to a discontinuous loss function in the parameter space. We address this issue by adopting a stochastic tree expansion policy, formulating search tree expansion as another decision-making task, and introducing an effective variance reduction technique for the gradient computation. We evaluate D-TSN in an offline-RL setting with a limited training data scenario on Procgen games and grid navigation task, and demonstrate that D-TSN outperforms popular model-free and model-based baselines.

Updated: 2024-08-02 07:42:37

标题: 可导树搜索网络

摘要: 在具有有限训练数据的决策问题中，使用深度神经网络逼近的策略函数通常表现出次优性能。另一种方法涉及从有限数据中学习世界模型，并通过在线搜索确定动作。然而，性能受到由学习的世界模型不准确性产生的复合误差的不利影响。虽然像TreeQN这样的方法尝试通过将算法归纳偏差纳入神经网络架构来解决这些不准确性，但它们引入的偏差通常较弱且不足以应对复杂的决策任务。在这项工作中，我们介绍了Differentiable Tree Search Network（D-TSN），这是一种新颖的神经网络架构，通过嵌入最佳优先在线搜索算法的算法结构，显著加强了归纳偏差。D-TSN利用学习的世界模型进行全可微在线搜索。世界模型与搜索算法联合优化，实现了对强健世界模型的学习，并减轻了预测不准确性的影响。此外，我们注意到，对最佳优先搜索的天真整合可能导致参数空间中的不连续损失函数。我们通过采用随机树扩展策略来解决这个问题，将搜索树扩展规划为另一个决策任务，并引入了一种有效的梯度计算方差减少技术。我们在Procgen游戏和网格导航任务的有限训练数据场景下对D-TSN进行离线-RL设置评估，并展示D-TSN优于流行的无模型和基于模型的基线。

更新时间: 2024-08-02 07:42:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.11660v2

Universality of kernel random matrices and kernel regression in the quadratic regime

Kernel ridge regression (KRR) is a popular class of machine learning models that has become an important tool for understanding deep learning. Much of the focus has been on studying the proportional asymptotic regime, $n \asymp d$, where $n$ is the number of training samples and $d$ is the dimension of the dataset. In this regime, under certain conditions on the data distribution, the kernel random matrix involved in KRR exhibits behavior akin to that of a linear kernel. In this work, we extend the study of kernel regression to the quadratic asymptotic regime, where $n \asymp d^2$. In this regime, we demonstrate that a broad class of inner-product kernels exhibit behavior similar to a quadratic kernel. Specifically, we establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix with additional correction terms compared to the Taylor expansion of the kernel functions. The approximation works for general data distributions under a Gaussian-moment-matching assumption with a covariance structure. This new approximation is utilized to obtain a limiting spectral distribution of the original kernel matrix and characterize the precise asymptotic training and generalization errors for KRR in the quadratic regime when $n/d^2$ converges to a non-zero constant. The generalization errors are obtained for both deterministic and random teacher models. Our proof techniques combine moment methods, Wick's formula, orthogonal polynomials, and resolvent analysis of random matrices with correlated entries.

Updated: 2024-08-02 07:29:49

标题: 核随机矩阵的普适性和二次区域中的核回归

摘要: 核岭回归（KRR）是一种流行的机器学习模型类别，已成为理解深度学习的重要工具。很大一部分关注点集中在研究比例渐近区域，其中$n \asymp d$，其中$n$是训练样本的数量，$d$是数据集的维度。在这个区域内，在数据分布的某些条件下，KRR中涉及的核随机矩阵表现出类似于线性核的行为。在这项工作中，我们将核回归的研究扩展到二次渐近区域，其中$n \asymp d^2$。在这个区域内，我们展示了一大类内积核表现出类似于二次核的行为。具体来说，我们建立了原始核随机矩阵与带有额外修正项的二次核随机矩阵之间的算子范数逼近界限，与核函数的Taylor展开相比。这种逼近适用于具有协方差结构的高斯矩匹配假设下的一般数据分布。这种新逼近被用于获得原始核矩阵的极限谱分布，并描述当$n/d^2$收敛到非零常数时KRR在二次区域的训练和泛化误差的精确渐近特性。泛化误差分别针对确定性和随机教师模型获得。我们的证明技术结合了矩方法、Wick's公式、正交多项式和具有相关条目的随机矩阵的解析法。

更新时间: 2024-08-02 07:29:49

领域: stat.ML,cs.LG,math.PR,math.ST,stat.TH

下载: http://arxiv.org/abs/2408.01062v1

Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets

Advanced biological intelligence learns efficiently from an information-rich stream of stimulus information, even when feedback on behaviour quality is sparse or absent. Such learning exploits implicit assumptions about task domains. We refer to such learning as Domain-Adapted Learning (DAL). In contrast, AI learning algorithms rely on explicit externally provided measures of behaviour quality to acquire fit behaviour. This imposes an information bottleneck that precludes learning from diverse non-reward stimulus information, limiting learning efficiency. We consider the question of how biological evolution circumvents this bottleneck to produce DAL. We propose that species first evolve the ability to learn from reward signals, providing inefficient (bottlenecked) but broad adaptivity. From there, integration of non-reward information into the learning process can proceed via gradual accumulation of biases induced by such information on specific task domains. This scenario provides a biologically plausible pathway towards bottleneck-free, domain-adapted learning. Focusing on the second phase of this scenario, we set up a population of NNs with reward-driven learning modelled as Reinforcement Learning (A2C), and allow evolution to improve learning efficiency by integrating non-reward information into the learning process using a neuromodulatory update mechanism. On a navigation task in continuous 2D space, evolved DAL agents show a 300-fold increase in learning speed compared to pure RL agents. Evolution is found to eliminate reliance on reward information altogether, allowing DAL agents to learn from non-reward information exclusively, using local neuromodulation-based connection weight updates only. Code available at github.com/aislab/dal.

Updated: 2024-08-02 07:04:42

标题: 突破瓶颈：神经调节神经网络中从基于奖励驱动学习到基于领域适应学习的演变过渡

摘要: 高级生物智能能够有效地从信息丰富的刺激信息流中学习，即使反馈行为质量的信息稀少或缺失也是如此。这种学习利用了关于任务领域的隐含假设。我们将这种学习称为领域适应学习（DAL）。相比之下，人工智能学习算法依赖于明确提供的行为质量度量来获得合适的行为。这导致了信息瓶颈，阻碍了从多样化的非奖励刺激信息中学习，限制了学习效率。我们探讨了生物进化如何绕过这一瓶颈来产生DAL的问题。我们提出，物种首先进化出了从奖励信号中学习的能力，提供了低效（受瓶颈限制）但广泛适应性。从那里，将非奖励信息整合到学习过程中可以通过逐渐积累由这些信息对特定任务领域引起的偏见来进行。这种情景为无瓶颈、领域适应学习提供了一个生物学上合理的路径。聚焦于这一情景的第二阶段，我们建立了一个以奖励驱动学习建模为强化学习（A2C）的NNs种群，并允许进化通过使用神经调节更新机制将非奖励信息整合到学习过程中以提高学习效率。在连续2D空间中的导航任务中，进化出的DAL代理显示出学习速度比纯RL代理提高了300倍。发现进化完全消除了对奖励信息的依赖，使得DAL代理能够仅从非奖励信息中学习，仅使用基于局部神经调制的连接权重更新。代码可在github.com/aislab/dal找到。

更新时间: 2024-08-02 07:04:42

领域: cs.NE,cs.AI,I.2.6

下载: http://arxiv.org/abs/2404.12631v2

LLM as Runtime Error Handler: A Promising Pathway to Adaptive Self-Healing of Software Systems

Unanticipated runtime errors, lacking predefined handlers, can abruptly terminate execution and lead to severe consequences, such as data loss or system crashes. Despite extensive efforts to identify potential errors during the development phase, such unanticipated errors remain a challenge to to be entirely eliminated, making the runtime mitigation measurements still indispensable to minimize their impact. Automated self-healing techniques, such as reusing existing handlers, have been investigated to reduce the loss coming through with the execution termination. However, the usability of existing methods is retained by their predefined heuristic rules and they fail to handle diverse runtime errors adaptively. Recently, the advent of Large Language Models (LLMs) has opened new avenues for addressing this problem. Inspired by their remarkable capabilities in understanding and generating code, we propose to deal with the runtime errors in a real-time manner using LLMs. Specifically, we propose Healer, the first LLM-assisted self-healing framework for handling runtime errors. When an unhandled runtime error occurs, Healer will be activated to generate a piece of error-handling code with the help of its internal LLM and the code will be executed inside the runtime environment owned by the framework to obtain a rectified program state from which the program should continue its execution. Our exploratory study evaluates the performance of Healer using four different code benchmarks and three state-of-the-art LLMs, GPT-3.5, GPT-4, and CodeQwen-7B. Results show that, without the need for any fine-tuning, GPT-4 can successfully help programs recover from 72.8% of runtime errors, highlighting the potential of LLMs in handling runtime errors.

Updated: 2024-08-02 07:03:00

标题: LLM作为运行时错误处理程序：自适应软件系统自愈的有希望途径

摘要: 未预料到的运行时错误，缺乏预定义的处理程序，可能会突然终止执行并导致严重后果，如数据丢失或系统崩溃。尽管在开发阶段进行了大量努力来识别潜在错误，但这些未预料到的错误仍然是一个难以完全消除的挑战，使得运行时缓解措施仍然不可或缺以最小化其影响。自动化的自我修复技术，例如重复使用现有处理程序，已经被研究用来减少由执行终止带来的损失。然而，现有方法的可用性是通过其预定义的启发式规则保留的，它们无法适应处理各种运行时错误。最近，大型语言模型（LLMs）的出现为解决这一问题开辟了新途径。受到它们在理解和生成代码方面的出色能力的启发，我们提出使用LLMs实时处理运行时错误。具体来说，我们提出了Healer，第一个使用LLM辅助自我修复的框架来处理运行时错误。当发生未处理的运行时错误时，Healer将被激活，利用其内部LLM生成一段错误处理代码，并在框架拥有的运行时环境内执行该代码，以获取一个校正的程序状态，从中程序应继续执行。我们的探索性研究使用四种不同的代码基准和三种最先进的LLM，GPT-3.5，GPT-4和CodeQwen-7B，评估了Healer的性能。结果显示，在不需要任何微调的情况下，GPT-4可以成功帮助程序从72.8%的运行时错误中恢复，突显了LLMs在处理运行时错误方面的潜力。

更新时间: 2024-08-02 07:03:00

领域: cs.SE,cs.AI,cs.CR

下载: http://arxiv.org/abs/2408.01055v1

Enhancing the MILP/MIQCP-based Automatic Search for Differential-Linear Distinguishers of Simon-Like Ciphers

In this paper, we propose an improved method based on Mixed-Integer Linear Programming/Mixed-Integer Quadratic Constraint Programming (MILP/MIQCP) to automatically find better differential-linear (DL) distinguishers for the all members of Simon and Simeck block cipher families. To be specific, we first give the completely precise MILP model to describe the linear part, and explain how to utilize the general expressions of \textsf{Gurobi} solver to model the propagation of continuous difference for the middle part in a quite easy way. Secondly, in order to solve the MILP/MIQCP model in a reasonable time, we propose two heuristic strategies based on the divide-and-conquer idea to speed up the search process. Thirdly, we introduce the transforming technique, which exploits the clustering effect on DL trails, to improve the estimated correlation of the DL approximation. We apply our method to Simon and Simeck block cipher families. Consequently, we find the 14/17/21/26-round theoretical DL distinguishers of Simon32/48/64/96, which extend the previous longest ones of Simon32/48/96 by one round and Simon64 by two rounds, respectively. For Simeck, we do not explore longer distinguishers compared to the currently best results, but refresh all the results of Zhou et al. (the first work to automate finding DL distinguishers for Simon-like ciphers using MILP/MIQCP). Besides, in order to validate the correctness of these distinguishers, the experimental verifications are conducted on Simon32/Simeck32 and Simon48/Simeck48. The results show that our theoretical estimations on correlations are very close to the experimental ones, which can be regarded as a concrete support for the effectiveness of our method.

Updated: 2024-08-02 06:59:03

标题: 增强基于MILP/MIQCP的自动搜索Simon类密码的差分-线性区分器

摘要: 在本文中，我们提出了一种基于混合整数线性规划/混合整数二次约束规划（MILP/MIQCP）的改进方法，用于自动找到更好的差分-线性（DL）区分器，适用于Simon和Simeck分组密码系列的所有成员。具体而言，我们首先提出了完全精确的MILP模型，用于描述线性部分，并解释了如何利用\textsf{Gurobi}求解器的一般表达式来以一种非常简单的方式对中间部分的连续差分传播进行建模。其次，为了在合理的时间内解决MILP/MIQCP模型，我们提出了两种基于分治思想的启发式策略，以加快搜索过程。第三，我们介绍了一种转换技术，利用DL路径上的聚类效应来改善DL近似的估计相关性。我们将我们的方法应用于Simon和Simeck分组密码系列。因此，我们找到了Simon32/48/64/96的14/17/21/26轮理论DL区分器，分别将Simon32/48/96的最长区分器延长了一轮，Simon64延长了两轮。对于Simeck，与目前最佳结果相比，我们并没有探索更长的区分器，但刷新了Zhou等人的所有结果（他们是第一篇使用MILP/MIQCP自动找到Simon-like密码的DL区分器的工作）。此外，为了验证这些区分器的正确性，我们在Simon32/Simeck32和Simon48/Simeck48上进行了实验验证。结果显示，我们对相关性的理论估计非常接近实验结果，这可以视为对我们方法有效性的具体支持。

更新时间: 2024-08-02 06:59:03

领域: cs.CR

下载: http://arxiv.org/abs/2408.01052v1

From Stem to Stern: Contestability Along AI Value Chains

This workshop will grow and consolidate a community of interdisciplinary CSCW researchers focusing on the topic of contestable AI. As an outcome of the workshop, we will synthesize the most pressing opportunities and challenges for contestability along AI value chains in the form of a research roadmap. This roadmap will help shape and inspire imminent work in this field. Considering the length and depth of AI value chains, it will especially spur discussions around the contestability of AI systems along various sites of such chains. The workshop will serve as a platform for dialogue and demonstrations of concrete, successful, and unsuccessful examples of AI systems that (could or should) have been contested, to identify requirements, obstacles, and opportunities for designing and deploying contestable AI in various contexts. This will be held primarily as an in-person workshop, with some hybrid accommodation. The day will consist of individual presentations and group activities to stimulate ideation and inspire broad reflections on the field of contestable AI. Our aim is to facilitate interdisciplinary dialogue by bringing together researchers, practitioners, and stakeholders to foster the design and deployment of contestable AI.

Updated: 2024-08-02 06:57:52

标题: 从头到尾：沿着人工智能价值链的竞争性

摘要: 这个研讨会将发展和巩固一个跨学科的CSCW研究者社区，重点关注可争议人工智能的主题。作为研讨会的成果，我们将综合AI价值链上最紧迫的机遇和挑战，形成一个研究路线图。这个路线图将有助于塑造和激发这一领域即将展开的工作。考虑到AI价值链的长度和深度，它将特别促进关于沿着这些链的各个站点争议性AI系统的讨论。研讨会将作为一个对话平台，展示具体的、成功和不成功的AI系统示例，以识别在各种环境中设计和部署可争议AI的需求、障碍和机会。这将主要以面对面的研讨会形式举行，其中将包含一些混合式安排。当天将包括个人演讲和小组活动，以激发创意和启发对可争议AI领域的广泛反思。我们的目标是通过汇集研究者、实践者和利益相关者促进跨学科对话，推动可争议AI的设计和部署。

更新时间: 2024-08-02 06:57:52

领域: cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2408.01051v1

The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines

The recent surge of open-source large language models (LLMs) enables developers to create AI-based solutions while maintaining control over aspects such as privacy and compliance, thereby providing governance and ownership of the model deployment process. To utilize these LLMs, inference engines are needed. These engines load the model's weights onto available resources, such as GPUs, and process queries to generate responses. The speed of inference, or performance, of the LLM, is critical for real-time applications, as it computes millions or billions of floating point operations per inference. Recently, advanced inference engines such as vLLM have emerged, incorporating novel mechanisms such as efficient memory management to achieve state-of-the-art performance. In this paper, we analyze the performance, particularly the throughput (tokens generated per unit of time), of 20 LLMs using two inference libraries: vLLM and HuggingFace's pipelines. We investigate how various hyperparameters, which developers must configure, influence inference performance. Our results reveal that throughput landscapes are irregular, with distinct peaks, highlighting the importance of hyperparameter optimization to achieve maximum performance. We also show that applying hyperparameter optimization when upgrading or downgrading the GPU model used for inference can improve throughput from HuggingFace pipelines by an average of 9.16% and 13.7%, respectively.

Updated: 2024-08-02 06:56:59

标题: 超参数对大型语言模型推理性能的影响：vLLM和HuggingFace管道的评估

摘要: 最近开源大型语言模型（LLMs）的激增使开发人员能够在保持对隐私和合规等方面的控制的同时创建基于人工智能的解决方案，从而提供模型部署过程的治理和所有权。为了利用这些LLMs，需要推理引擎。这些引擎将模型的权重加载到可用资源（如GPU）上，并处理查询以生成响应。LLM的推理速度或性能对于实时应用程序至关重要，因为它每次推理计算数百万或数十亿次浮点运算。最近出现了像vLLM这样的高级推理引擎，它结合了高效内存管理等新颖机制以实现最先进的性能。在本文中，我们使用两个推理库（vLLM和HuggingFace的pipelines）分析了20个LLM的性能，特别是吞吐量（单位时间生成的标记数）。我们调查了各种超参数对推理性能的影响，开发人员必须配置这些超参数。我们的结果显示，吞吐量景观不规则，具有明显的峰值，突出了超参数优化对实现最大性能的重要性。我们还展示了，在升级或降级用于推理的GPU模型时应用超参数优化可以分别将HuggingFace pipelines的吞吐量提高平均9.16%和13.7%。

更新时间: 2024-08-02 06:56:59

领域: cs.SE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.01050v1

Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private data from distributed devices. However, SL requires additional information exchange for weight updates between the device and the server, which can be exposed to various attacks on private training data. To mitigate the risk of data breaches in classification tasks, inspired from the CutMix regularization, we propose a novel privacy-preserving SL framework that injects Gaussian noise into smashed data and mixes randomly chosen patches of smashed data across clients, coined DP-CutMixSL. Our analysis demonstrates that DP-CutMixSL is a differentially private (DP) mechanism that strengthens privacy protection against membership inference attacks during forward propagation. Through simulations, we show that DP-CutMixSL improves privacy protection against membership inference attacks, reconstruction attacks, and label inference attacks, while also improving accuracy compared to DP-SL and DP-MixSL.

Updated: 2024-08-02 06:24:39

标题: 使用基于补丁的随机和噪声CutMix的视觉Transformer的隐私保护分割学习

摘要: 在计算机视觉领域，视觉变换器（ViT）已经逐渐取代了卷积神经网络（CNN），以提高准确性和鲁棒性。然而，ViT的大型模型尺寸和高样本复杂性使其难以在资源受限的边缘设备上进行训练。拆分学习（SL）作为一种可行的解决方案出现，利用服务器端资源训练ViTs，同时利用分布式设备上的私有数据。然而，SL需要设备和服务器之间的权重更新信息交换，这可能暴露于各种针对私有训练数据的攻击。为了减轻分类任务中数据泄露的风险，受CutMix正则化的启发，我们提出了一种新颖的保护隐私的SL框架，将高斯噪声注入到打碎的数据中，并在客户端之间随机选择打碎的数据块进行混合，命名为DP-CutMixSL。我们的分析表明，DP-CutMixSL是一种差分隐私（DP）机制，可以在前向传播过程中加强对成员推断攻击的隐私保护。通过模拟，我们展示了DP-CutMixSL相对于DP-SL和DP-MixSL能够改善隐私保护，防止成员推断攻击、重建攻击和标签推断攻击，并提高准确性。

更新时间: 2024-08-02 06:24:39

领域: cs.DC,cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.01040v1

LightDE: A Lightweight Method for Eliminating Dangling Pointers

The widespread presence of Use-After-Free (UAF) vulnerabilities poses a serious threat to software security, with dangling pointers being considered the primary cause of these vulnerabilities. However, existing methods for defending against UAF vulnerabilities by eliminating dangling pointers need to interrupt the program's execution when encountering pointer assignment operations in order to store the memory addresses of the pointers in a specific data structure. This makes these methods not lightweight. To overcome this drawback, we propose a novel approach called LightDE. This method does not require storing the memory addresses of pointers during program execution. LightDE uses our proposed structure-sensitive pointer analysis method to determine which objects pointers point to and stores the pointing relationships in the program's data segment during program compilation. Since LightDE only needs to verify if pointers identified by the pointer analysis point to released objects when eliminating dangling pointers, it is very lightweight. Our experimental results show that LightDE can effectively defend against UAF vulnerabilities and the performance overhead it introduces is very low.

Updated: 2024-08-02 06:13:18

标题: LightDE：一种轻量级消除悬空指针的方法

摘要: 广泛存在的使用后释放（UAF）漏洞对软件安全构成严重威胁，被认为是这些漏洞的主要原因是悬空指针。然而，现有的通过消除悬空指针来防御UAF漏洞的方法需要在遇到指针赋值操作时中断程序的执行，以便将指针的内存地址存储在特定的数据结构中。这使得这些方法并不轻量级。为了克服这一缺点，我们提出了一种称为LightDE的新方法。这种方法在程序执行过程中不需要存储指针的内存地址。LightDE使用我们提出的结构敏感指针分析方法来确定指针指向的对象，并在程序编译过程中将指向关系存储在程序的数据段中。由于LightDE只需要在消除悬空指针时验证指针分析识别的指针是否指向已释放的对象，因此非常轻量级。我们的实验结果显示，LightDE能够有效防御UAF漏洞，并且引入的性能开销非常低。

更新时间: 2024-08-02 06:13:18

领域: cs.CR

下载: http://arxiv.org/abs/2405.20697v2

On the Perturbed States for Transformed Input-robust Reinforcement Learning

Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its real-world deployment. To alleviate the challenging point, prior works focus on developing robust training-based procedures, encompassing efforts to fortify the deep neural network component's robustness or subject the agent to adversarial training against potent attacks. In this work, we propose a novel method referred to as Transformed Input-robust RL (TIRL), which explores another avenue to mitigate the impact of adversaries by employing input transformation-based defenses. Specifically, we introduce two principles for applying transformation-based defenses in learning robust RL agents: (1) autoencoder-styled denoising to reconstruct the original state and (2) bounded transformations (bit-depth reduction and vector quantization (VQ)) to achieve close transformed inputs. The transformations are applied to the state before feeding it into the policy network. Extensive experiments on multiple MuJoCo environments demonstrate that input transformation-based defenses, i.e., VQ, defend against several adversaries in the state observations. The official code is available at https://github.com/tunglm2203/tirl

Updated: 2024-08-02 06:05:19

标题: 关于转换输入稳健强化学习中扰动状态的研究

摘要: 强化学习（RL）代理在训练环境中展示熟练技能的同时，在部署过程中对输入观察的对抗性扰动表现出脆弱性。这强调了在实际部署之前构建强大代理的重要性。为了缓解这一挑战，先前的工作集中于开发强大的基于训练的程序，包括加强深度神经网络组件的稳健性或将代理置于对抗性训练中以抵御强大攻击。在这项工作中，我们提出了一种称为转换输入稳健RL（TIRL）的新方法，通过采用基于输入变换的防御来减轻对手的影响。具体来说，我们引入了两个原则，用于在学习稳健RL代理中应用基于转换的防御：（1）自动编码器风格的去噪以重构原始状态和（2）有界变换（位深度减少和矢量量化（VQ））以实现接近转换后的输入。这些转换应用于状态之前将其输入到策略网络中。在多个MuJoCo环境上进行的大量实验表明，基于输入变换的防御，即VQ，能够抵御状态观察中的几种对手。官方代码可在https://github.com/tunglm2203/tirl上找到。

更新时间: 2024-08-02 06:05:19

领域: cs.LG

下载: http://arxiv.org/abs/2408.00023v2

An Adaptive Gradient Regularization Method

Optimizer plays an important role in neural network training with high efficiency and performance. Weight update based on its gradient is the central part of the optimizer. It has been shown that normalization and standardization operation on weight and gradient can accelerate the training process and improve performance such as Weight Standardization (WS), weight normalization (WN) and gradient normalization (GN); there is also gradient centralization (GC). In this work, we introduce a new optimization technique based on the gradient magnitude in a gradient vector named adaptive gradient regularization (AGR), which normalizes the gradient vector in all dimensions as a coefficient vector and subtracts the product of the gradient and its coefficient vector by the vanilla gradient. It can be viewed as an adaptive gradient clipping method. We show that the AGR can improve the loss function Lipschitzness with a more stable training process and better generalization performance. AGR is very simple to be embedded into vanilla optimizers such as Adan and AdamW with only three lines of code. Our experiments are conducted in image generation, image classification and language representation, which shows that our AGR improves the training result.

Updated: 2024-08-02 06:05:10

标题: 一种自适应梯度正则化方法

摘要: 优化器在神经网络训练中扮演着重要的角色，具有高效率和性能。基于梯度的权重更新是优化器的核心部分。已经表明，在权重和梯度上进行归一化和标准化操作可以加速训练过程并提高性能，如权重标准化（WS）、权重归一化（WN）和梯度归一化（GN）；还有梯度集中（GC）。在这项工作中，我们介绍了一种基于梯度向量中梯度幅度的新优化技术，称为自适应梯度正则化（AGR），它将梯度向量在所有维度上归一化为系数向量，并通过减去梯度及其系数向量的乘积来减少原始梯度。它可以看作是一种自适应梯度剪裁方法。我们展示了AGR可以提高损失函数的Lipschitz性质，使训练过程更稳定，性能更好。AGR非常简单，可以嵌入到Adan和AdamW等基本优化器中，只需要三行代码。我们在图像生成、图像分类和语言表示方面进行了实验，结果表明我们的AGR改善了训练结果。

更新时间: 2024-08-02 06:05:10

领域: cs.LG

下载: http://arxiv.org/abs/2407.16944v3

Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments

In embodied instruction-following (EIF), the integration of pretrained language models (LMs) as task planners emerges as a significant branch, where tasks are planned at the skill level by prompting LMs with pretrained skills and user instructions. However, grounding these pretrained skills in different domains remains challenging due to their intricate entanglement with the domain-specific knowledge. To address this challenge, we present a semantic skill grounding (SemGro) framework that leverages the hierarchical nature of semantic skills. SemGro recognizes the broad spectrum of these skills, ranging from short-horizon low-semantic skills that are universally applicable across domains to long-horizon rich-semantic skills that are highly specialized and tailored for particular domains. The framework employs an iterative skill decomposition approach, starting from the higher levels of semantic skill hierarchy and then moving downwards, so as to ground each planned skill to an executable level within the target domain. To do so, we use the reasoning capabilities of LMs for composing and decomposing semantic skills, as well as their multi-modal extension for assessing the skill feasibility in the target domain. Our experiments in the VirtualHome benchmark show the efficacy of SemGro in 300 cross-domain EIF scenarios.

Updated: 2024-08-02 05:50:31

标题: 跨领域环境中具身指令跟随的语义技能基础

摘要: 在具体化教学遵循（EIF）中，将预训练的语言模型（LMs）集成为任务规划者成为一个重要分支，其中任务是通过提示具有预训练技能和用户说明的LMs在技能级别规划的。然而，由于这些预训练技能与领域特定知识的错综复杂联系，将这些预训练技能应用于不同领域仍然具有挑战性。为了解决这一挑战，我们提出了一个语义技能基础（SemGro）框架，利用语义技能的层次性质。SemGro认识到这些技能的广泛范围，从在各个领域通用的短期低语义技能到高度专门化并为特定领域量身定制的长期丰富语义技能。该框架采用迭代技能分解方法，从语义技能层次结构的较高级别开始，然后向下移动，以便将每个规划的技能基础于目标领域内的可执行级别。为此，我们利用LMs的推理能力来组合和分解语义技能，以及它们在评估目标领域中技能可行性方面的多模态扩展。我们在VirtualHome基准测试中的实验显示了SemGro在300个跨领域EIF场景中的有效性。

更新时间: 2024-08-02 05:50:31

领域: cs.AI

下载: http://arxiv.org/abs/2408.01024v1

Distilling interpretable causal trees from causal forests

Machine learning methods for estimating treatment effect heterogeneity promise greater flexibility than existing methods that test a few pre-specified hypotheses. However, one problem these methods can have is that it can be challenging to extract insights from complicated machine learning models. A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates, but it can be hard to understand the underlying patterns; hard to know what the implications of the analysis are. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest. This compares well to existing methods of extracting a single tree, particularly in noisy data or high-dimensional data where there are many correlated features. Here it even outperforms the base causal forest in most simulations. Its estimates are doubly robust and asymptotically normal just as those of the causal forest are.

Updated: 2024-08-02 05:48:15

标题: 从因果森林中提炼可解释的因果树

摘要: 用于估计治疗效应异质性的机器学习方法比现有方法更具灵活性，因为它们可以测试一些预先指定的假设。然而，这些方法可能面临的一个问题是，从复杂的机器学习模型中提取见解可能是具有挑战性的。高维条件平均治疗效应的分布可能提供准确的个体水平估计，但很难理解其中的潜在模式；难以知道分析的含义。本文提出了蒸馏因果树方法，用于从因果森林中提炼出一个可解释的单一因果树。相较于提取单一树的现有方法，特别是在噪声数据或高维数据中存在许多相关特征的情况下，该方法表现良好。在大多数模拟中，它甚至优于基础因果森林。其估计是双重鲁棒的，并且在渐近意义下与因果森林的估计一样正态。

更新时间: 2024-08-02 05:48:15

领域: econ.EM,cs.LG

下载: http://arxiv.org/abs/2408.01023v1

A Family of Distributions of Random Subsets for Controlling Positive and Negative Dependence

Positive and negative dependence are fundamental concepts that characterize the attractive and repulsive behavior of random subsets. Although some probabilistic models are known to exhibit positive or negative dependence, it is challenging to seamlessly bridge them with a practicable probabilistic model. In this study, we introduce a new family of distributions, named the discrete kernel point process (DKPP), which includes determinantal point processes and parts of Boltzmann machines. We also develop some computational methods for probabilistic operations and inference with DKPPs, such as calculating marginal and conditional probabilities and learning the parameters. Our numerical experiments demonstrate the controllability of positive and negative dependence and the effectiveness of the computational methods for DKPPs.

Updated: 2024-08-02 05:46:17

标题: 一个用于控制正负相关性的随机子集分布族

摘要: 积极和消极依赖是表征随机子集的吸引和排斥行为的基本概念。虽然已知一些概率模型表现出积极或消极依赖，但将它们与可行的概率模型无缝连接仍具有挑战性。在这项研究中，我们介绍了一种新的分布家族，称为离散核点过程（DKPP），其中包括确定性点过程和Boltzmann机器的一部分。我们还开发了一些用于DKPP的概率操作和推断的计算方法，如计算边际和条件概率以及学习参数。我们的数值实验展示了积极和消极依赖的可控性以及DKPP的计算方法的有效性。

更新时间: 2024-08-02 05:46:17

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2408.01022v1

GNN-MolKAN: Harnessing the Power of KAN to Advance Molecular Representation Learning with GNNs

Effective molecular representation learning is crucial for molecular property prediction and drug design. However, existing approaches struggle with limitations in insufficient annotations and suboptimal architecture design. For instance, Graph Neural Networks (GNNs) suffer from over-squashing, causing the loss of important structural details in molecules, thus impairing molecular representations. In this work, we propose a new class of GNNs, GNN-MolKAN and its augmented variant, GNN-MolKAN+, that integrate the Kolmogorov-Arnold Networks (KAN) architecture from AI + Science into GNNs to address these challenges. Additionally, we introduce Adaptive FastKAN (AdFastKAN), an advanced KAN that offers increased stability and speed, further enhancing the performance of standard GNNs. Notably, our approach holds three key benefits: 1) Superior Performance: GNN-MolKAN and GNN-MolKAN+ demonstrate superior prediction ability, robust generalization to unseen scaffolds, and versatile transferability across different GNN architectures. 2) Efficiency: These models require less computational time and fewer parameters while matching or surpassing the state-of-the-art (SOTA) self-supervised methods. 3) Few-shot Learning Ability: GNN-MolKAN demonstrates great potential in few-shot learning scenarios, achieving an average improvement of 6.97% across few-shot benchmarks. Overall, we validate our architecture on 6 classification datasets, 6 regression datasets, and 4 few-shot learning datasets, consistently achieving highly competitive results across all of them.

Updated: 2024-08-02 05:36:14

标题: GNN-MolKAN: 利用KAN的力量推动GNNs在分子表示学习中的进展

摘要: 有效的分子表示学习对于分子性质预测和药物设计至关重要。然而，现有方法在不足的注释和次优的架构设计方面存在困难。例如，图神经网络（GNNs）存在过度压缩的问题，导致分子中重要结构细节的丢失，从而损害分子表示。在这项工作中，我们提出了一类新的GNNs，即GNN-MolKAN及其增强变体GNN-MolKAN+，将Kolmogorov-Arnold Networks（KAN）架构从AI + Science整合到GNNs中，以解决这些挑战。此外，我们引入了Adaptive FastKAN（AdFastKAN），这是一个先进的KAN，提供了增强的稳定性和速度，进一步提升了标准GNNs的性能。值得注意的是，我们的方法具有三个关键优势：1）卓越的性能：GNN-MolKAN和GNN-MolKAN+展示了卓越的预测能力，对未见过的支架具有强大的泛化能力，并且可在不同的GNN架构之间灵活地转移。2）效率：这些模型需要更少的计算时间和更少的参数，同时匹配或超越了最先进的自监督方法。3）少样本学习能力：GNN-MolKAN在少样本学习场景中展示了巨大潜力，在少样本基准测试中取得了平均改进6.97%。总的来说，我们在6个分类数据集，6个回归数据集和4个少样本学习数据集上验证了我们的架构，始终在所有数据集上取得了高度竞争力的结果。

更新时间: 2024-08-02 05:36:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.01018v1

IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model

Road traffic congestion prediction is a crucial component of intelligent transportation systems, since it enables proactive traffic management, enhances suburban experience, reduces environmental impact, and improves overall safety and efficiency. Although there are several public datasets, especially for metropolitan areas, these datasets may not be applicable to practical scenarios due to insufficiency in the scale of data (i.e. number of sensors and road links) and several external factors like different characteristics of the target area such as urban, highways and the data collection location. To address this, this paper introduces a novel IBB Traffic graph dataset as an alternative benchmark dataset to mitigate these limitations and enrich the literature with new geographical characteristics. IBB Traffic graph dataset covers the sensor data collected at 2451 distinct locations. Moreover, we propose a novel Road Traffic Prediction Model that strengthens temporal links through feature engineering, node embedding with GLEE to represent inter-related relationships within the traffic network, and traffic prediction with ExtraTrees. The results indicate that the proposed model consistently outperforms the baseline models, demonstrating an average accuracy improvement of 4%.

Updated: 2024-08-02 05:23:19

标题: IBB交通图数据：基准和道路交通预测模型

摘要: 道路交通拥堵预测是智能交通系统的关键组成部分，因为它能够实现交通管理的积极性，提升城市体验，减少环境影响，提高整体安全性和效率。尽管有几个公共数据集，尤其是针对大都市地区，但由于数据规模不足（即传感器数量和道路连接数）以及一些外部因素（如目标区域的不同特征，如城市、高速公路和数据采集位置），这些数据集可能不适用于实际情况。为了解决这个问题，本文介绍了一个新的IBB交通图数据集作为替代基准数据集，以缓解这些限制，并丰富文献中新的地理特征。IBB交通图数据集涵盖了在2451个不同位置收集的传感器数据。此外，我们提出了一个新颖的道路交通预测模型，通过特征工程和利用GLEE对节点进行嵌入，以表示交通网络内部相关性，并利用ExtraTrees进行交通预测。结果表明，所提出的模型始终优于基线模型，平均准确率提高了4%。

更新时间: 2024-08-02 05:23:19

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2408.01016v1

Weighed l1 on the simplex: Compressive sensing meets locality

Sparse manifold learning algorithms combine techniques in manifold learning and sparse optimization to learn features that could be utilized for downstream tasks. The standard setting of compressive sensing can not be immediately applied to this setup. Due to the intrinsic geometric structure of data, dictionary atoms might be redundant and do not satisfy the restricted isometry property or coherence condition. In addition, manifold learning emphasizes learning local geometry which is not reflected in a standard $\ell_1$ minimization problem. We propose weighted $\ell_0$ and weighted $\ell_1$ metrics that encourage representation via neighborhood atoms suited for dictionary based manifold learning. Assuming that the data is generated from Delaunay triangulation, we show the equivalence of weighted $\ell_0$ and weighted $\ell_1$. We discuss an optimization program that learns the dictionaries and sparse coefficients and demonstrate the utility of our regularization on synthetic and real datasets.

Updated: 2024-08-02 05:05:48

标题: 在单纯形上进行加权l1：压缩感知遇到局部性

摘要: 稀疏流形学习算法结合流形学习和稀疏优化技术，学习可用于下游任务的特征。压缩感知的标准设置不能立即应用于这种设置。由于数据的固有几何结构，字典原子可能是冗余的，并且不满足受限等距性质或一致性条件。此外，流形学习强调学习局部几何，这在标准的$\ell_1$最小化问题中没有体现出来。我们提出了加权$\ell_0$和加权$\ell_1$度量，鼓励通过适用于基于字典的流形学习的邻域原子进行表示。假设数据是从Delaunay三角剖分生成的，我们展示了加权$\ell_0$和加权$\ell_1$的等价性。我们讨论了一个学习字典和稀疏系数的优化程序，并展示了我们正则化在合成和真实数据集上的实用性。

更新时间: 2024-08-02 05:05:48

领域: eess.SP,cs.IT,cs.LG,math.IT,math.OC

下载: http://arxiv.org/abs/2104.13894v2

Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing (NLP) tasks, such as question-answering, sentiment analysis, text summarization, and machine translation. However, the ever-growing complexity of LLMs demands immense computational resources, hindering the broader research and application of these models. To address this, various parameter-efficient fine-tuning strategies, such as Low-Rank Approximation (LoRA) and Adapters, have been developed. Despite their potential, these methods often face limitations in compressibility. Specifically, LoRA struggles to scale effectively with the increasing number of trainable parameters in modern large scale LLMs. Additionally, Low-Rank Economic Tensor-Train Adaptation (LoRETTA), which utilizes tensor train decomposition, has not yet achieved the level of compression necessary for fine-tuning very large scale models with limited resources. This paper introduces Tensor Train Low-Rank Approximation (TT-LoRA), a novel parameter-efficient fine-tuning (PEFT) approach that extends LoRETTA with optimized tensor train (TT) decomposition integration. By eliminating Adapters and traditional LoRA-based structures, TT-LoRA achieves greater model compression without compromising downstream task performance, along with reduced inference latency and computational overhead. We conduct an exhaustive parameter search to establish benchmarks that highlight the trade-off between model compression and performance. Our results demonstrate significant compression of LLMs while maintaining comparable performance to larger models, facilitating their deployment on resource-constraint platforms.

Updated: 2024-08-02 04:45:58

标题: 张量列低秩逼近（TT-LoRA）：通过加速LLMs实现AI的民主化

摘要: 近年来，大型语言模型(LLMs)在各种自然语言处理（NLP）任务中展示出了非凡的能力，如问答、情感分析、文本摘要和机器翻译。然而，LLMs日益复杂的特性需要巨大的计算资源，阻碍了这些模型的广泛研究和应用。为了解决这一问题，人们开发了各种参数高效的微调策略，如低秩近似（LoRA）和适配器。尽管这些方法有潜力，但通常在压缩性方面存在局限。具体来说，LoRA难以有效地扩展到现代大规模LLMs中不断增加的可训练参数数量。此外，利用张量列分解的低秩经济张量列适应（LoRETTA）尚未达到微调非常大规模模型所需的压缩水平，而这些模型受到资源限制。本文介绍了张量列低秩近似（TT-LoRA），这是一种新颖的参数高效微调（PEFT）方法，它通过优化张量列（TT）分解集成扩展了LoRETTA。通过消除适配器和传统的LoRA结构，TT-LoRA实现了更大的模型压缩，同时不影响下游任务的性能，并降低推理延迟和计算开销。我们进行了详尽的参数搜索，建立了突显模型压缩与性能之间权衡的基准。我们的结果表明，在保持与更大模型相当的性能的同时，显著压缩了LLMs，便于在资源受限的平台上部署。

更新时间: 2024-08-02 04:45:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.01008v1

A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing

Given that natural language serves as the primary conduit for expressing thoughts and emotions, text analysis has become a key technique in psychological research. It enables the extraction of valuable insights from natural language, facilitating endeavors like personality traits assessment, mental health monitoring, and sentiment analysis in interpersonal communications. In text analysis, existing studies often resort to either human coding, which is time-consuming, using pre-built dictionaries, which often fails to cover all possible scenarios, or training models from scratch, which requires large amounts of labeled data. In this tutorial, we introduce the pretrain-finetune paradigm. The pretrain-finetune paradigm represents a transformative approach in text analysis and natural language processing. This paradigm distinguishes itself through the use of large pretrained language models, demonstrating remarkable efficiency in finetuning tasks, even with limited training data. This efficiency is especially beneficial for research in social sciences, where the number of annotated samples is often quite limited. Our tutorial offers a comprehensive introduction to the pretrain-finetune paradigm. We first delve into the fundamental concepts of pretraining and finetuning, followed by practical exercises using real-world applications. We demonstrate the application of the paradigm across various tasks, including multi-class classification and regression. Emphasizing its efficacy and user-friendliness, the tutorial aims to encourage broader adoption of this paradigm. To this end, we have provided open access to all our code and datasets. The tutorial is highly beneficial across various psychology disciplines, providing a comprehensive guide to employing text analysis in diverse research settings.

Updated: 2024-08-02 04:44:29

标题: 一个关于自然语言处理的预训练微调范式的教程

摘要: 鉴于自然语言是表达思想和情感的主要途径，文本分析已成为心理研究中的关键技术。它能够从自然语言中提取宝贵的见解，促进个性特征评估、心理健康监测和人际交流中的情感分析等工作。在文本分析中，现有研究通常采用人工编码、使用预先构建的词典或从头开始训练模型的方法。然而，这些方法都存在不足，例如人工编码耗时、预先构建的词典往往无法覆盖所有可能的情况，而从头开始训练模型则需要大量的标记数据。在本教程中，我们介绍了预训练-微调范式。预训练-微调范式代表了文本分析和自然语言处理中的一种转变性方法。该范式通过使用大型预训练语言模型，在微调任务中表现出卓越的效率，即使训练数据有限也能取得良好效果。这种效率对社会科学研究特别有益，因为标注样本的数量通常非常有限。我们的教程全面介绍了预训练-微调范式。我们首先深入探讨了预训练和微调的基本概念，然后通过实际应用展示了实践练习。我们展示了该范式在各种任务中的应用，包括多类别分类和回归。强调其高效性和用户友好性，该教程旨在鼓励更广泛地采用这种范式。为此，我们提供了所有代码和数据集的开放访问。该教程对各种心理学学科都有很大益处，为在多样化研究环境中运用文本分析提供了全面指南。

更新时间: 2024-08-02 04:44:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.02504v3

Enhancing Financial Market Predictions: Causality-Driven Feature Selection

This paper introduces the FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset's extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective with 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability. Utilizing the FinSen dataset, we introduce an innovative Focal Calibration Loss, reducing Expected Calibration Error (ECE) to 3.34 percent with the DAN 3 model. This not only improves prediction accuracy but also aligns probabilistic forecasts closely with real outcomes, crucial for the financial sector where predicted probability is paramount. Our approach demonstrates the effectiveness of combining sentiment analysis with precise calibration techniques for trustworthy financial forecasting where the cost of misinterpretation can be high. Finsen Data can be found at [this github URL](https://github.com/EagleAdelaide/FinSen_Dataset.git).

Updated: 2024-08-02 04:40:15

标题: 增强金融市场预测能力：因果驱动的特征选择

摘要: 本文介绍了FinSen数据集，通过整合来自197个国家的经济和金融新闻文章与股票市场数据，彻底改变了金融市场分析。该数据集的广泛覆盖时间跨度为2007年至2023年，提供了丰富的全球视角，包括了160,000条金融市场新闻记录。我们的研究利用验证因果关系的情感分数和LSTM模型来提高市场预测的准确性和可靠性。利用FinSen数据集，我们引入了一种创新的焦点校准损失，将期望校准误差（ECE）降低到3.34%，使用DAN 3模型。这不仅提高了预测准确性，还将概率预测与实际结果密切结合，这对金融行业至关重要，因为预测概率至关重要。我们的方法展示了将情感分析与精确校准技术相结合，以实现可信赖的金融预测的有效性，其中误解的代价可能很高。FinSen数据可以在以下GitHub网址找到：[此GitHub网址](https://github.com/EagleAdelaide/FinSen_Dataset.git)。

更新时间: 2024-08-02 04:40:15

领域: cs.LG,cs.CE,cs.CL,cs.DB

下载: http://arxiv.org/abs/2408.01005v1

Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models

Multimodal Large Language Models (MLLMs) have made significant progress in bridging the gap between visual and language modalities. However, hallucinations in MLLMs, where the generated text does not align with image content, continue to be a major challenge. Existing methods for addressing hallucinations often rely on instruction-tuning, which requires retraining the model with specific data, which increases the cost of utilizing MLLMs further. In this paper, we introduce a novel training-free method, named Piculet, for enhancing the input representation of MLLMs. Piculet leverages multiple specialized models to extract descriptions of visual information from the input image and combine these descriptions with the original image and query as input to the MLLM. We evaluate our method both quantitively and qualitatively, and the results demonstrate that Piculet greatly decreases hallucinations of MLLMs. Our method can be easily extended to different MLLMs while being universal.

Updated: 2024-08-02 04:34:37

标题: Piculet：多模态大语言模型的专门模型引导幻觉减少

摘要: 多模态大型语言模型（MLLMs）在弥合视觉和语言模态之间的差距方面取得了显著进展。然而，在MLLMs中出现幻觉的问题，即生成的文本与图像内容不一致，仍然是一个重大挑战。现有的解决幻觉问题的方法通常依赖于指导调整，这需要重新训练模型使用特定数据，进一步增加了利用MLLMs的成本。在本文中，我们介绍了一种名为Piculet的新颖的无需训练的方法，用于增强MLLMs的输入表示。Piculet利用多个专门的模型从输入图像中提取视觉信息的描述，并将这些描述与原始图像和查询结合作为MLLM的输入。我们从定量和定性两方面评估了我们的方法，结果表明Piculet大大减少了MLLMs的幻觉。我们的方法可以轻松扩展到不同的MLLMs，同时具有通用性。

更新时间: 2024-08-02 04:34:37

领域: cs.AI

下载: http://arxiv.org/abs/2408.01003v1

Adaptive Two-Stage Cloud Resource Scaling via Hierarchical Multi-Indicator Forecasting and Bayesian Decision-Making

The surging demand for cloud computing resources, driven by the rapid growth of sophisticated large-scale models and data centers, underscores the critical importance of efficient and adaptive resource allocation. As major tech enterprises deploy massive infrastructures with thousands of GPUs, existing cloud platforms still struggle with low resource utilization due to key challenges: capturing hierarchical indicator structures, modeling non-Gaussian distributions, and decision-making under uncertainty. To address these challenges, we propose HRAMONY, an adaptive Hierarchical Attention-based Resource Modeling and Decision-Making System. HARMONY combines hierarchical multi-indicator distribution forecasting and uncertainty-aware Bayesian decision-making. It introduces a novel hierarchical attention mechanism that comprehensively models complex inter-indicator dependencies, enabling accurate predictions that can adapt to evolving environment states. By transforming Gaussian projections into adaptive non-Gaussian distributions via Normalizing Flows. Crucially, HARMONY leverages the full predictive distributions in an adaptive Bayesian process, proactively incorporating uncertainties to optimize resource allocation while robustly meeting SLA constraints under varying conditions. Extensive evaluations across four large-scale cloud datasets demonstrate HARMONY's state-of-the-art performance, significantly outperforming nine established methods. A month-long real-world deployment validated HARMONY's substantial practical impact, realizing over 35,000 GPU hours in savings and translating to $100K+ in cost reduction, showcasing its remarkable economic value through adaptive, uncertainty-aware scaling. Our code is available at https://github.com/Floating-LY/HARMONY1.

Updated: 2024-08-02 04:19:25

标题: 自适应两阶段云资源扩展通过层次多指标预测和贝叶斯决策制定

摘要: 随着复杂大规模模型和数据中心的迅速增长，对云计算资源的需求激增，强调了高效和自适应资源分配的关键重要性。随着主要科技企业部署拥有数千个GPU的大规模基础设施，现有云平台仍然面临资源利用率低的挑战，原因是关键挑战：捕捉层次指标结构、建模非高斯分布和在不确定性下做决策。为了解决这些挑战，我们提出了HRAMONY，一种自适应的基于层次注意力的资源建模和决策系统。HARMONY结合了层次多指标分布预测和不确定性感知的贝叶斯决策。它引入了一种新颖的层次注意机制，全面建模复杂的指标间依赖关系，实现可以适应不断演变的环境状态的准确预测。通过将高斯投影转化为自适应的非高斯分布，关键是，HARMONY利用完整的预测分布进行自适应贝叶斯过程，在各种条件下积极地融入不确定性以优化资源分配，同时在满足SLA约束条件的情况下稳健地进行。在四个大规模云数据集上进行的广泛评估表明，HARMONY具有最先进的性能，明显优于九种已建立的方法。一个为期一个月的真实世界部署验证了HARMONY的实质性实用影响，实现了超过35,000个GPU小时的节约，从而节省了超过10万美元的成本，展示了其通过自适应、不确定性感知的扩展展示了其显著的经济价值。我们的代码可在https://github.com/Floating-LY/HARMONY1中找到。

更新时间: 2024-08-02 04:19:25

领域: cs.LG

下载: http://arxiv.org/abs/2408.01000v1

FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation

Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing extraordinary image generation based on natural-language text prompts. However, the issue of lacking controllability of such models restricts their practical applicability for real-life content creation, for which attention has been focused on leveraging a reference image to control text-to-image synthesis. Due to the close correlation between the reference image and the generated image, this problem can also be regarded as the task of manipulating (or editing) the reference image as per the text, namely text-driven image-to-image translation. This paper contributes a novel, concise, and efficient approach that adapts the pre-trained large-scale text-to-image (T2I) diffusion model to the image-to-image (I2I) paradigm in a plug-and-play manner, realizing high-quality and versatile text-driven I2I translation without any model training, model fine-tuning, or online optimization process. To guide T2I generation with a reference image, we propose to model diverse guiding factors with correspondingly different frequency bands of diffusion features in the DCT spectral space, and accordingly devise a novel frequency band substitution layer that dynamically substitutes a certain DCT frequency band of the diffusion features with the corresponding counterpart of the reference image along the reverse sampling process. We demonstrate that our method flexibly enables highly controllable text-driven I2I translation both in the guiding factor and guiding intensity of the reference image, simply by tuning the type and bandwidth of the substituted frequency band, respectively. Extensive qualitative and quantitative experiments verify the superiority of our approach over related methods in I2I translation visual quality, versatility, and controllability.

Updated: 2024-08-02 04:13:38

标题: FBSDiff：用于高度可控文本驱动图像翻译的即插即用频带替换扩散特征

摘要: 大规模文本到图像扩散模型已经成为生成式人工智能和多模态技术演变中的一个革命性里程碑，使得基于自然语言文本提示的非凡图像生成成为可能。然而，这些模型缺乏可控性的问题限制了它们在实际内容创作中的应用，因此人们开始关注如何利用参考图像来控制文本到图像的合成。由于参考图像和生成图像之间的密切相关性，这个问题也可以被视为根据文本来操作（或编辑）参考图像的任务，即文本驱动的图像到图像翻译。本文提出了一种新颖、简洁、高效的方法，通过以即插即用的方式将预训练的大规模文本到图像（T2I）扩散模型转化为图像到图像（I2I）范式，实现了高质量且多功能的文本驱动的I2I翻译，无需任何模型训练、模型微调或在线优化过程。为了引导T2I生成与参考图像，我们提出了在DCT频谱空间中用不同频率带的扩散特征来建模各种引导因素，并相应地设计了一种新颖的频率带替换层，它动态地将扩散特征的某个DCT频率带与参考图像的对应部分进行替换。我们证明了我们的方法通过调整替换频率带的类型和带宽，灵活地实现了高度可控的文本驱动的I2I翻译，无论是在引导因素还是引导强度方面。广泛的定性和定量实验证明了我们的方法在I2I翻译视觉质量、多功能性和可控性方面优于相关方法。

更新时间: 2024-08-02 04:13:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.00998v1

A Safe Exploration Strategy for Model-free Task Adaptation in Safety-constrained Grid Environments

Training a model-free reinforcement learning agent requires allowing the agent to sufficiently explore the environment to search for an optimal policy. In safety-constrained environments, utilizing unsupervised exploration or a non-optimal policy may lead the agent to undesirable states, resulting in outcomes that are potentially costly or hazardous for both the agent and the environment. In this paper, we introduce a new exploration framework for navigating the grid environments that enables model-free agents to interact with the environment while adhering to safety constraints. Our framework includes a pre-training phase, during which the agent learns to identify potentially unsafe states based on both observable features and specified safety constraints in the environment. Subsequently, a binary classification model is trained to predict those unsafe states in new environments that exhibit similar dynamics. This trained classifier empowers model-free agents to determine situations in which employing random exploration or a suboptimal policy may pose safety risks, in which case our framework prompts the agent to follow a predefined safe policy to mitigate the potential for hazardous consequences. We evaluated our framework on three randomly generated grid environments and demonstrated how model-free agents can safely adapt to new tasks and learn optimal policies for new environments. Our results indicate that by defining an appropriate safe policy and utilizing a well-trained model to detect unsafe states, our framework enables a model-free agent to adapt to new tasks and environments with significantly fewer safety violations.

Updated: 2024-08-02 04:09:30

标题: 一个安全的探索策略：在受限于安全性的网格环境中进行无模型任务适应

摘要: 训练一个无模型的强化学习代理需要允许代理足够地探索环境以寻找最优策略。在受安全约束的环境中，利用无监督的探索或非最优策略可能会导致代理进入不良状态，造成代理和环境可能昂贵或有危险的后果。在本文中，我们介绍了一个新的探索框架，用于在网格环境中引导无模型代理与环境交互，同时遵守安全约束。我们的框架包括一个预训练阶段，在此阶段代理学习根据可观察特征和环境中指定的安全约束来识别潜在的不安全状态。随后，训练一个二元分类模型来预测那些在展现类似动态的新环境中的不安全状态。这个经过训练的分类器使无模型代理能够确定在哪些情况下使用随机探索或次优策略可能会带来安全风险，此时我们的框架会促使代理遵循预定义的安全策略以减轻潜在的危险后果。我们在三个随机生成的网格环境上评估了我们的框架，并展示了无模型代理如何安全地适应新任务并学习新环境的最优策略。我们的结果表明，通过定义一个适当的安全策略并利用训练有素的模型来检测不安全状态，我们的框架使无模型代理能够适应新任务和环境，减少了显著数量的安全违规行为。

更新时间: 2024-08-02 04:09:30

领域: cs.AI

下载: http://arxiv.org/abs/2408.00997v1

IncidentNet: Traffic Incident Detection, Localization and Severity Estimation with Sparse Sensing

Prior art in traffic incident detection relies on high sensor coverage and is primarily based on decision-tree and random forest models that have limited representation capacity and, as a result, cannot detect incidents with high accuracy. This paper presents IncidentNet - a novel approach for classifying, localizing, and estimating the severity of traffic incidents using deep learning models trained on data captured from sparsely placed sensors in urban environments. Our model works on microscopic traffic data that can be collected using cameras installed at traffic intersections. Due to the unavailability of datasets that provide microscopic traffic details and traffic incident details simultaneously, we also present a methodology to generate a synthetic microscopic traffic dataset that matches given macroscopic traffic data. IncidentNet achieves a traffic incident detection rate of 98%, with false alarm rates of less than 7% in 197 seconds on average in urban environments with cameras on less than 20% of the traffic intersections.

Updated: 2024-08-02 04:09:15

标题: IncidentNet：利用稀疏传感进行交通事故检测、定位和严重程度估计

摘要: 在交通事故检测方面的先前技术依赖于高传感器覆盖率，并且主要基于决策树和随机森林模型，这些模型具有有限的表示能力，因此无法高精度地检测事故。本文提出了一种新颖的方法IncidentNet，使用深度学习模型对城市环境中稀疏放置的传感器捕获的数据进行分类、定位和估计交通事故的严重程度。我们的模型适用于可以使用安装在交通路口的摄像头收集的微观交通数据。由于缺乏同时提供微观交通细节和交通事故细节的数据集，我们还提出了一种方法来生成一个与给定宏观交通数据匹配的合成微观交通数据集。IncidentNet在城市环境中以平均197秒内在不到20%的交通路口上安装摄像头的情况下，实现了98%的交通事故检测率，误报率低于7%。

更新时间: 2024-08-02 04:09:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.00996v1

Robust Millimeter Beamforming via Self-Supervised Hybrid Deep Learning

Beamforming with large-scale antenna arrays has been widely used in recent years, which is acknowledged as an important part in 5G and incoming 6G. Thus, various techniques are leveraged to improve its performance, e.g., deep learning, advanced optimization algorithms, etc. Although its performance in many previous research scenarios with deep learning is quite attractive, usually it drops rapidly when the environment or dataset is changed. Therefore, designing effective beamforming network with strong robustness is an open issue for the intelligent wireless communications. In this paper, we propose a robust beamforming self-supervised network, and verify it in two kinds of different datasets with various scenarios. Simulation results show that the proposed self-supervised network with hybrid learning performs well in both classic DeepMIMO and new WAIR-D dataset with the strong robustness under the various environments. Also, we present the principle to explain the rationality of this kind of hybrid learning, which is instructive to apply with more kinds of datasets.

Updated: 2024-08-02 04:02:26

标题: 通过自监督混合深度学习实现鲁棒的毫米波波束成形

摘要: 使用大规模天线阵列的波束成形在近年来被广泛应用，被认为是5G和即将到来的6G中的重要部分。因此，各种技术被利用来改进其性能，例如深度学习、先进的优化算法等。尽管在许多先前的研究场景中，利用深度学习的性能相当吸引人，但通常在环境或数据集发生变化时迅速下降。因此，设计具有强大鲁棒性的有效波束成形网络是智能无线通信的一个开放问题。在本文中，我们提出了一个强大的波束成形自监督网络，并在两种不同类型的数据集和各种场景中对其进行验证。模拟结果显示，所提出的具有混合学习的自监督网络在经典的DeepMIMO和新的WAIR-D数据集中表现良好，在各种环境下具有强大的鲁棒性。此外，我们提出了解释这种混合学习的合理性原则，对于应用更多类型的数据集具有指导意义。

更新时间: 2024-08-02 04:02:26

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2303.12653v3

Data Management For Training Large Language Models: A Survey

Data plays a fundamental role in training Large Language Models (LLMs). Efficient data management, particularly in formulating a well-suited training dataset, is significant for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning stages. Despite the considerable importance of data management, the underlying mechanism of current prominent practices are still unknown. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey aims to provide a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various aspects of data management strategy design. Looking into the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through efficient data management practices. The collection of the latest papers is available at https://github.com/ZigeW/data_management_LLM.

Updated: 2024-08-02 03:56:35

标题: 大语言模型训练的数据管理：一项调查

摘要: 数据在训练大型语言模型（LLMs）中发挥着基础性作用。高效的数据管理，特别是在制定一个合适的训练数据集方面，对于提高模型性能和改善预训练和监督微调阶段的训练效率至关重要。尽管数据管理具有相当重要性，当前主流实践的基本机制仍然未知。因此，数据管理的探索在研究界越来越受到关注。本调查旨在全面概述LLMs的预训练和监督微调阶段中数据管理的当前研究，涵盖了数据管理策略设计的各个方面。展望未来，我们推测现有的挑战并概述了该领域发展的有前途的方向。因此，这份调查可作为一份指导资源，帮助渴望通过高效的数据管理实践构建强大LLMs的从业者。最新论文的收集可在https://github.com/ZigeW/data_management_LLM找到。

更新时间: 2024-08-02 03:56:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.01700v3

Infrequent Resolving Algorithm for Online Linear Programming

Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we bridge the gap between these two extremes by proposing an algorithm that achieves a constant regret while solving LPs only $O(\log\log T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we propose an algorithm that can guarantee an $O\left(T^{(1/2+\epsilon)^{M-1}}\right)$ regret. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $O(\log\log T)$ times, and an $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.

Updated: 2024-08-02 03:56:14

标题: 在线线性规划的低频解算法

摘要: 在线线性规划（OLP）由于其广泛的应用，如在线拍卖、网络收入管理和广告，已经引起了研究人员和实践者的极大关注。现有的OLP算法可分为两类：基于LP的算法和无LP的算法。前者通常保证更好的性能，甚至提供恒定的后悔，但需要解决大量的LP问题，这可能在计算上是昂贵的。相比之下，无LP的算法只需要进行一阶计算，但会导致性能较差，缺乏恒定的后悔边界。在本研究中，我们通过提出一种算法来弥合这两个极端之间的差距，该算法在整个时间范围T内仅解决LP问题$O(\log\log T)$次，同时实现恒定的后悔。此外，当我们只允许解决LP问题$M$次时，我们提出一种算法，可以保证$O\left(T^{(1/2+\epsilon)^{M-1}}\right)$的后悔。此外，当到达概率在开始时已知时，我们的算法可以通过解决LP问题$O(\log\log T)$次来保证恒定的后悔，并通过仅解决LP问题$M$次来保证$O\left(T^{(1/2+\epsilon)^{M}}\right)$的后悔。进行了数值实验来展示所提算法的效率。

更新时间: 2024-08-02 03:56:14

领域: cs.DS,cs.LG,math.OC

下载: http://arxiv.org/abs/2408.00465v2

ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models

This paper aims to extend the code generation capability of large language models (LLMs) to automatically manage comprehensive software requirements from given textual descriptions. Such requirements include both functional (i.e. achieving expected behavior for inputs) and non-functional (e.g., time/space performance, robustness, maintainability) requirements. However, textual descriptions can either express requirements verbosely or may even omit some of them. We introduce ARCHCODE, a novel framework that leverages in-context learning to organize requirements observed in descriptions and to extrapolate unexpressed requirements from them. ARCHCODE generates requirements from given descriptions, conditioning them to produce code snippets and test cases. Each test case is tailored to one of the requirements, allowing for the ranking of code snippets based on the compliance of their execution results with the requirements. Public benchmarks show that ARCHCODE enhances to satisfy functional requirements, significantly improving Pass@k scores. Furthermore, we introduce HumanEval-NFR, the first evaluation of LLMs' non-functional requirements in code generation, demonstrating ARCHCODE's superiority over baseline methods. The implementation of ARCHCODE and the HumanEval-NFR benchmark are both publicly accessible.

Updated: 2024-08-02 03:54:36

标题: ArchCode：将软件需求纳入使用大型语言模型生成代码

摘要: 本文旨在扩展大型语言模型（LLMs）的代码生成能力，以自动管理给定文本描述中的全面软件需求。这些需求包括功能性（即为输入实现预期行为）和非功能性（例如时间/空间性能、健壮性、可维护性）需求。然而，文本描述可能会冗长地表达需求，甚至可能遗漏其中一些。我们引入了ARCHCODE，这是一个新颖的框架，利用上下文学习来组织描述中观察到的需求，并从中推断未表达的需求。ARCHCODE从给定描述中生成需求，使它们能够产生代码片段和测试用例。每个测试用例都针对一个需求定制，从而可以根据其执行结果与需求的符合度对代码片段进行排名。公开基准显示，ARCHCODE增强了满足功能性需求的能力，显着提高了Pass@k分数。此外，我们引入了HumanEval-NFR，这是对LLMs在代码生成中非功能性需求的首次评估，展示了ARCHCODE相对于基准方法的优越性。ARCHCODE的实现和HumanEval-NFR基准都可以公开访问。

更新时间: 2024-08-02 03:54:36

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.00994v1

Fairness in Large Language Models in Three Hour

Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. Unlike fairness in traditional machine learning, fairness in LLMs involves unique backgrounds, taxonomies, and fulfillment techniques. This tutorial provides a systematic overview of recent advances in the literature concerning fair LLMs, beginning with real-world case studies to introduce LLMs, followed by an analysis of bias causes therein. The concept of fairness in LLMs is then explored, summarizing the strategies for evaluating bias and the algorithms designed to promote fairness. Additionally, resources for assessing bias in LLMs, including toolkits and datasets, are compiled, and current research challenges and open questions in the field are discussed. The repository is available at \url{https://github.com/LavinWong/Fairness-in-Large-Language-Models}.

Updated: 2024-08-02 03:44:14

标题: 大型语言模型在三小时内的公平性

摘要: 大型语言模型(LLMs)在各个领域取得了显著的成功，但往往缺乏公平考虑，潜在地导致对边缘化人群的歧视性结果。与传统机器学习中的公平性不同，LLMs中的公平性涉及独特的背景、分类法和实现技术。本教程系统地概述了关于公平LLMs的最新进展，从介绍LLMs的现实案例研究开始，分析其中的偏见原因。然后探讨了LLMs中的公平性概念，总结了评估偏见和促进公平性的算法策略。此外，还整理了用于评估LLMs中偏见的资源，包括工具包和数据集，并讨论了该领域的当前研究挑战和未解问题。该代码库可在\url{https://github.com/LavinWong/Fairness-in-Large-Language-Models}找到。

更新时间: 2024-08-02 03:44:14

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.00992v1

Towards Model-Free LQR Control over Rate-Limited Channels

Given the success of model-free methods for control design in many problem settings, it is natural to ask how things will change if realistic communication channels are utilized for the transmission of gradients or policies. While the resulting problem has analogies with the formulations studied under the rubric of networked control systems, the rich literature in that area has typically assumed that the model of the system is known. As a step towards bridging the fields of model-free control design and networked control systems, we ask: \textit{Is it possible to solve basic control problems - such as the linear quadratic regulator (LQR) problem - in a model-free manner over a rate-limited channel?} Toward answering this question, we study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose a new algorithm titled Adaptively Quantized Gradient Descent (\texttt{AQGD}), and prove that above a certain finite threshold bit-rate, \texttt{AQGD} guarantees exponentially fast convergence to the globally optimal policy, with \textit{no deterioration of the exponent relative to the unquantized setting}. More generally, our approach reveals the benefits of adaptive quantization in preserving fast linear convergence rates, and, as such, may be of independent interest to the literature on compressed optimization.

Updated: 2024-08-02 03:39:18

标题: 朝向基于模型的无模型LQR控制在速率受限通道上

摘要: 鉴于模型无关方法在许多问题设置中控制设计方面的成功，自然而然地会问如果利用现实通信渠道传输梯度或策略会如何改变。虽然所得到的问题与网络控制系统下研究的公式类似，但在这个领域丰富的文献通常假定系统的模型是已知的。作为连接模型无关控制设计和网络控制系统领域的一步，我们提出：是否可能以模型无关的方式解决基本控制问题 - 例如线性二次调节器（LQR）问题 - 在有速率限制的通道上？为了回答这个问题，我们研究了这样一个情景：一个工作代理将量化的策略梯度（LQR成本）通过一个无噪声通道以有限比特速率传输到服务器。我们提出了一种名为自适应量化梯度下降（AQGD）的新算法，并证明在超过某个有限阈值比特速率时，AQGD保证指数级快速收敛到全局最优策略，与未量化设置相比，指数没有恶化。更一般地，我们的方法揭示了自适应量化在保持快速线性收敛速率方面的好处，因此可能对压缩优化的文献具有独立的兴趣。

更新时间: 2024-08-02 03:39:18

领域: math.OC,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2401.01258v2

Eliciting Informative Text Evaluations with Large Language Models

Peer prediction mechanisms motivate high-quality feedback with provable guarantees. However, current methods only apply to rather simple reports, like multiple-choice or scalar numbers. We aim to broaden these techniques to the larger domain of text-based reports, drawing on the recent developments in large language models. This vastly increases the applicability of peer prediction mechanisms as textual feedback is the norm in a large variety of feedback channels: peer reviews, e-commerce customer reviews, and comments on social media. We introduce two mechanisms, the Generative Peer Prediction Mechanism (GPPM) and the Generative Synopsis Peer Prediction Mechanism (GSPPM). These mechanisms utilize LLMs as predictors, mapping from one agent's report to a prediction of her peer's report. Theoretically, we show that when the LLM prediction is sufficiently accurate, our mechanisms can incentivize high effort and truth-telling as an (approximate) Bayesian Nash equilibrium. Empirically, we confirm the efficacy of our mechanisms through experiments conducted on two real datasets: the Yelp review dataset and the ICLR OpenReview dataset. We highlight the results that on the ICLR dataset, our mechanisms can differentiate three quality levels -- human-written reviews, GPT-4-generated reviews, and GPT-3.5-generated reviews in terms of expected scores. Additionally, GSPPM penalizes LLM-generated reviews more effectively than GPPM.

Updated: 2024-08-02 03:38:58

标题: 使用大型语言模型引发信息文本评估

摘要: 同行预测机制激励高质量反馈，并具有可证明的保证。然而，当前的方法只适用于相当简单的报告，如多项选择或标量数字。我们的目标是将这些技术扩展到基于文本的报告的更大领域，借鉴最近发展的大语言模型。这大大增加了同行预测机制的适用性，因为文本反馈在各种反馈渠道中是常态：同行评审、电子商务客户评价以及社交媒体评论。我们介绍了两种机制，即生成式同行预测机制（GPPM）和生成式摘要同行预测机制（GSPPM）。这些机制利用LLM作为预测器，从一个代理的报告映射到对其同行报告的预测。从理论上讲，我们展示了当LLM预测足够准确时，我们的机制可以激励高努力和诚实表现为（近似）贝叶斯纳什均衡。在实证上，我们通过在两个真实数据集上进行的实验证实了我们机制的有效性：Yelp评论数据集和ICLR OpenReview数据集。我们突出了在ICLR数据集上的结果，我们的机制可以根据预期得分区分三个质量水平--人工编写的评论、GPT-4生成的评论和GPT-3.5生成的评论。此外，GSPPM比GPPM更有效地惩罚LLM生成的评论。

更新时间: 2024-08-02 03:38:58

领域: cs.CL,cs.AI,cs.GT

下载: http://arxiv.org/abs/2405.15077v3

On the Resilience of Multi-Agent Systems with Malicious Agents

Multi-agent systems, powered by large language models, have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain. However, when agents are deployed separately, there is a risk that malicious users may introduce malicious agents who generate incorrect or irrelevant results that are too stealthy to be identified by other non-specialized agents. Therefore, this paper investigates two essential questions: (1) What is the resilience of various multi-agent system structures (e.g., A$\rightarrow$B$\rightarrow$C, A$\leftrightarrow$B$\leftrightarrow$C) under malicious agents, on different downstream tasks? (2) How can we increase system resilience to defend against malicious agents? To simulate malicious agents, we devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one while preserving its functional integrity. We run comprehensive experiments on four downstream multi-agent systems tasks, namely code generation, math problems, translation, and text evaluation. Results suggest that the "hierarchical" multi-agent structure, i.e., A$\rightarrow$(B$\leftrightarrow$C), exhibits superior resilience with the lowest performance drop of $23.6\%$, compared to $46.4\%$ and $49.8\%$ of other two structures. Additionally, we show the promise of improving multi-agent system resilience by demonstrating that two defense methods, introducing an additional agent to review and correct messages or mechanisms for each agent to challenge others' outputs, can enhance system resilience. Our code and data are available at https://github.com/CUHK-ARISE/MAS-Resilience.

Updated: 2024-08-02 03:25:20

标题: 关于具有恶意代理的多智能体系统的弹性

摘要: 多智能体系统，由大型语言模型驱动，由于专家智能体的协作，在各种任务中展现了出色的能力，每个智能体专注于特定领域。然而，当智能体分开部署时，存在恶意用户可能引入恶意智能体的风险，这些智能体生成不正确或无关的结果，而这些结果过于隐蔽，无法被其他非专业智能体识别。因此，本文研究了两个基本问题：(1)在不同的下游任务中，各种多智能体系统结构（例如A$\rightarrow$B$\rightarrow$C，A$\leftrightarrow$B$\leftrightarrow$C）在恶意智能体下的弹性是多少？(2)我们如何增加系统的弹性以抵御恶意智能体？为了模拟恶意智能体，我们设计了两种方法，AutoTransform和AutoInject，将任何智能体转变为恶意智能体，同时保留其功能完整性。我们在四个下游多智能体系统任务上进行了全面实验，即代码生成、数学问题、翻译和文本评估。结果表明，“分层”多智能体结构，即A$\rightarrow$(B$\leftrightarrow$C)，表现出卓越的弹性，在性能下降方面仅为$23.6\%$，而其他两种结构分别为$46.4\%$和$49.8\%$。此外，我们展示了通过引入额外的智能体来审查和纠正消息或为每个智能体挑战其他输出的机制，可以提高系统的弹性。我们的代码和数据可在https://github.com/CUHK-ARISE/MAS-Resilience找到。

更新时间: 2024-08-02 03:25:20

领域: cs.AI

下载: http://arxiv.org/abs/2408.00989v1

A Survey on LoRA of Large Language Models

Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is necessary to conduct a comprehensive overview of the current progress on LoRA. This survey categorizes and reviews the progress from the perspectives of (1) downstream adaptation improving variants that improve LoRA's performance on downstream tasks; (2) cross-task generalization methods that mix multiple LoRA plugins to achieve cross-task generalization; (3) efficiency-improving methods that boost the computation-efficiency of LoRA; (4) data privacy-preserving methods that use LoRA in federated learning; (5) application. Besides, this survey also discusses the future directions in this field. At last, we provide a Github page (https://github.com/ZJU-LLMs/Awesome-LoRAs.git) for readers to check the updates and initiate discussions on this survey paper.

Updated: 2024-08-02 03:22:22

标题: 关于大型语言模型的 LoRA 调查

摘要: 低秩适应（LoRA）通过可插入的低秩矩阵更新密集神经网络层，是一种性能最佳且参数效率高的微调范式。此外，它在跨任务泛化和隐私保护方面具有显著优势。因此，LoRA最近受到了很多关注，相关文献数量呈指数增长。有必要对LoRA的当前进展进行全面审视。该调查从以下角度对进展进行分类和审查：（1）改进后续适应变体，以提高LoRA在后续任务上的性能；（2）跨任务泛化方法，混合多个LoRA插件以实现跨任务泛化；（3）提高效率的方法，提升LoRA的计算效率；（4）数据隐私保护方法，将LoRA应用于联邦学习；（5）应用。此外，该调查还讨论了该领域的未来方向。最后，我们提供了一个Github页面（https://github.com/ZJU-LLMs/Awesome-LoRAs.git），供读者查看更新并发起对该调查论文的讨论。

更新时间: 2024-08-02 03:22:22

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.11046v2

SemiSFL: Split Federated Learning on Unlabeled and Non-IID Data

Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data at the network edge. However, training and deploying large-scale models on resource-constrained devices is challenging. Fortunately, Split Federated Learning (SFL) offers a feasible solution by alleviating the computation and/or communication burden on clients. However, existing SFL works often assume sufficient labeled data on clients, which is usually impractical. Besides, data non-IIDness poses another challenge to ensure efficient model training. To our best knowledge, the above two issues have not been simultaneously addressed in SFL. Herein, we propose a novel Semi-supervised SFL system, termed SemiSFL, which incorporates clustering regularization to perform SFL with unlabeled and non-IID client data. Moreover, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data have an influence on the effectiveness of clustering regularization. To mitigate the training inconsistency, we develop an algorithm for dynamically adjusting the global updating frequency, so as to improve training performance. Extensive experiments on benchmark models and datasets show that our system provides a 3.8x speed-up in training time, reduces the communication cost by about 70.3% while reaching the target accuracy, and achieves up to 5.8% improvement in accuracy under non-IID scenarios compared to the state-of-the-art baselines.

Updated: 2024-08-02 03:16:07

标题: 半分布式学习：在未标记和非独立同分布数据上的分割联邦学习

摘要: 联邦学习（FL）已经出现，允许多个客户端在网络边缘共同训练机器学习模型，使用他们的私有数据。然而，在资源受限的设备上训练和部署大规模模型是具有挑战性的。幸运的是，拆分式联邦学习（SFL）通过减轻客户端的计算和/或通信负担，提供了一个可行的解决方案。然而，现有的SFL作品通常假定客户端具有足够的标记数据，这通常是不切实际的。此外，数据的非独立同分布性为确保有效模型训练提出了另一个挑战。据我们所知，在SFL中，上述两个问题尚未同时得到解决。在此，我们提出了一个新颖的半监督SFL系统，称为SemiSFL，它结合了聚类正则化来执行使用未标记和非IIDD客户端数据的SFL。此外，我们对模型收敛进行的理论和实验研究揭示出，标记数据和未标记数据上不一致的训练过程对聚类正则化的有效性有影响。为了减轻训练的不一致性，我们开发了一种算法来动态调整全局更新频率，以改善训练性能。对基准模型和数据集进行的大量实验表明，我们的系统在训练时间上提供了3.8倍的加速，在达到目标精度的同时减少了约70.3％的通信成本，并在非IIDD场景下相比最先进的基线模型取得了高达5.8％的准确度提升。

更新时间: 2024-08-02 03:16:07

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2307.15870v5

A SAT-based approach to rigorous verification of Bayesian networks

Recent advancements in machine learning have accelerated its widespread adoption across various real-world applications. However, in safety-critical domains, the deployment of machine learning models is riddled with challenges due to their complexity, lack of interpretability, and absence of formal guarantees regarding their behavior. In this paper, we introduce a verification framework tailored for Bayesian networks, designed to address these drawbacks. Our framework comprises two key components: (1) a two-step compilation and encoding scheme that translates Bayesian networks into Boolean logic literals, and (2) formal verification queries that leverage these literals to verify various properties encoded as constraints. Specifically, we introduce two verification queries: if-then rules (ITR) and feature monotonicity (FMO). We benchmark the efficiency of our verification scheme and demonstrate its practical utility in real-world scenarios.

Updated: 2024-08-02 03:06:51

标题: 一种基于SAT的严格验证贝叶斯网络的方法

摘要: 最近机器学习的进展加速了其在各种现实世界应用中的广泛采用。然而，在安全关键领域，由于其复杂性、缺乏可解释性以及行为缺乏正式保证，机器学习模型的部署充满挑战。本文介绍了一个专为贝叶斯网络设计的验证框架，旨在解决这些缺陷。我们的框架包括两个关键组成部分：(1) 一个两步编译和编码方案，将贝叶斯网络转换为布尔逻辑文字，以及 (2) 利用这些文字来验证各种属性编码为约束的形式验证查询。具体来说，我们引入了两种验证查询：if-then规则 (ITR) 和特征单调性 (FMO)。我们对我们的验证方案的效率进行了基准测试，并展示了它在现实场景中的实用性。

更新时间: 2024-08-02 03:06:51

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2408.00986v1

Reconstructing Richtmyer-Meshkov instabilities from noisy radiographs using low dimensional features and attention-based neural networks

A trained attention-based transformer network can robustly recover the complex topologies given by the Richtmyer-Meshkoff instability from a sequence of hydrodynamic features derived from radiographic images corrupted with blur, scatter, and noise. This approach is demonstrated on ICF-like double shell hydrodynamic simulations. The key component of this network is a transformer encoder that acts on a sequence of features extracted from noisy radiographs. This encoder includes numerous self-attention layers that act to learn temporal dependencies in the input sequences and increase the expressiveness of the model. This approach is demonstrated to exhibit an excellent ability to accurately recover the Richtmyer-Meshkov instability growth rates, even despite the gas-metal interface being greatly obscured by radiographic noise.

Updated: 2024-08-02 03:02:39

标题: 使用低维特征和基于注意力的神经网络从嘈杂的X光照片重建Richtmyer-Meshkov不稳定性

摘要: 一个经过训练的基于注意力的变压器网络可以稳健地从受模糊、散射和噪声污染的放射图像派生的一系列流体动力学特征中恢复由Richtmyer-Meshkoff不稳定性给出的复杂拓扑结构。该方法在类似ICF的双壳流体动力学模拟中进行了演示。该网络的关键组件是一个变压器编码器，它作用于从嘈杂的射线照片中提取的特征序列。该编码器包括许多自注意力层，用于学习输入序列中的时间依赖关系，并增加模型的表现力。该方法被证明具有出色的能力，即使气-金属界面受到射线图像噪声的严重遮挡，也能准确恢复Richtmyer-Meshkov不稳定性的增长率。

更新时间: 2024-08-02 03:02:39

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2408.00985v1

META-ANOVA: Screening interactions for interpretable machine learning

There are two things to be considered when we evaluate predictive models. One is prediction accuracy,and the other is interpretability. Over the recent decades, many prediction models of high performance, such as ensemble-based models and deep neural networks, have been developed. However, these models are often too complex, making it difficult to intuitively interpret their predictions. This complexity in interpretation limits their use in many real-world fields that require accountability, such as medicine, finance, and college admissions. In this study, we develop a novel method called Meta-ANOVA to provide an interpretable model for any given prediction model. The basic idea of Meta-ANOVA is to transform a given black-box prediction model to the functional ANOVA model. A novel technical contribution of Meta-ANOVA is a procedure of screening out unnecessary interaction before transforming a given black-box model to the functional ANOVA model. This screening procedure allows the inclusion of higher order interactions in the transformed functional ANOVA model without computational difficulties. We prove that the screening procedure is asymptotically consistent. Through various experiments with synthetic and real-world datasets, we empirically demonstrate the superiority of Meta-ANOVA

Updated: 2024-08-02 01:49:29

标题: META-ANOVA: 用于可解释机器学习的交互作用筛选

摘要: 在评估预测模型时，有两件事需要考虑。一是预测准确性，另一是可解释性。在最近几十年里，许多高性能的预测模型，如基于集成的模型和深度神经网络，已经被开发出来。然而，这些模型往往过于复杂，使得难以直观解释它们的预测。这种解释上的复杂性限制了它们在许多需要问责制的现实世界领域的应用，如医学、金融和大学招生。在这项研究中，我们开发了一种名为Meta-ANOVA的新方法，为任何给定的预测模型提供一个可解释的模型。Meta-ANOVA的基本思想是将给定的黑匣子预测模型转换为功能ANOVA模型。Meta-ANOVA的一个新技术贡献是在将给定的黑匣子模型转换为功能ANOVA模型之前筛选出不必要的交互作用的过程。这个筛选过程允许在转换后的功能ANOVA模型中包含更高阶的交互作用，而不会出现计算困难。我们证明了筛选过程是渐近一致的。通过对合成数据集和真实世界数据集的各种实验，我们实证地证明了Meta-ANOVA的优越性。

更新时间: 2024-08-02 01:49:29

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2408.00973v1

Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification

Classification of movement trajectories has many applications in transportation and is a key component for large-scale movement trajectory generation and anomaly detection which has key safety applications in the aftermath of a disaster or other external shock. However, the current state-of-the-art (SOTA) are based on supervised deep learning - which leads to challenges when the distribution of trajectories changes due to such a shock. We provide a neuro-symbolic rule-based framework to conduct error correction and detection of these models to integrate into our movement trajectory platform. We provide a suite of experiments on several recent SOTA models where we show highly accurate error detection, the ability to improve accuracy with a changing test distribution, and accuracy improvement for the base use case in addition to a suite of theoretical properties that informed algorithm development. Specifically, we show an F1 scores for predicting errors of up to 0.984, significant performance increase for out-of distribution accuracy (8.51% improvement over SOTA for zero-shot accuracy), and accuracy improvement over the SOTA model.

Updated: 2024-08-02 01:38:16

标题: 基于规则的错误检测和修正以实现运动轨迹分类

摘要: 运动轨迹的分类在交通领域有许多应用，并且是大规模运动轨迹生成和异常检测的关键组成部分，这在灾难或其他外部冲击后的关键安全应用中具有重要意义。然而，当前的最先进技术(SOTA)基于监督深度学习，这导致当轨迹分布由于这种冲击而发生变化时出现挑战。我们提供了一个基于神经符号规则的框架，用于进行误差校正和模型检测，以集成到我们的运动轨迹平台中。我们在几种最近的SOTA模型上进行了一系列实验，展示了高度准确的错误检测能力，能够随着测试分布的变化提高准确度，以及针对基本用例的准确度改进，还提供了一系列理论属性，以指导算法开发。具体来说，我们展示了预测错误的F1分数高达0.984，对于超出分布准确性的显著性能提升(零样本准确性比SOTA提升8.51%)，以及对SOTA模型的准确度改进。

更新时间: 2024-08-02 01:38:16

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2308.14250v3

Multimodal Guidance Network for Missing-Modality Inference in Content Moderation

Multimodal deep learning, especially vision-language models, have gained significant traction in recent years, greatly improving performance on many downstream tasks, including content moderation and violence detection. However, standard multimodal approaches often assume consistent modalities between training and inference, limiting applications in many real-world use cases, as some modalities may not be available during inference. While existing research mitigates this problem through reconstructing the missing modalities, they unavoidably increase unnecessary computational cost, which could be just as critical, especially for large, deployed infrastructures in industry. To this end, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models to be used for inference. Real-world experiments in violence detection shows that our proposed framework trains single-modality models that significantly outperform traditionally trained counterparts, while avoiding increases in computational cost for inference.

Updated: 2024-08-02 01:33:25

标题: 多模式指导网络用于内容审核中缺失模态推断

摘要: 多模态深度学习，特别是视觉-语言模型，近年来取得了显著的进展，在许多下游任务上取得了很好的性能提升，包括内容审核和暴力检测。然而，标准的多模态方法通常假设训练和推断之间的模态一致性，限制了在许多真实世界用例中的应用，因为在推断过程中可能缺少某些模态。虽然现有研究通过重建缺失的模态来缓解这个问题，但它们不可避免地增加了不必要的计算成本，这可能同样关键，特别是对于工业中大规模部署的基础设施。因此，我们提出了一种新颖的引导网络，在训练过程中促进知识共享，利用多模态表示优势来训练更好的用于推断的单模态模型。在暴力检测的真实世界实验中，我们提出的框架训练出了明显优于传统训练对照组的单模态模型，同时避免了推断过程中计算成本的增加。

更新时间: 2024-08-02 01:33:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2309.03452v2

DNSSEC+: An Enhanced DNS Scheme Motivated by Benefits and Pitfalls of DNSSEC

The absence of security measures between DNS recursive resolvers and authoritative nameservers has been exploited by both inline and off-path attacks. While many security proposals have been made in practice and previous literature, they typically suffer from deployability barriers and/or inadequate security properties. The absence of a broadly adopted security solution between resolvers and nameservers motivates a new scheme that mitigates these issues in previous proposals. We present DNSSEC+, which addresses security and deployability downsides of DNSSEC, while retaining its benefits. DNSSEC+ takes advantage of the existent DNSSEC trust model and authorizes the nameservers within a zone for short intervals to serve the zone data securely, facilitating real-time security properties for DNS responses, without requiring long-term private keys to be duplicated (thus put at risk) on authoritative nameservers. Regarding name resolution latency, DNSSEC+ offers a performance comparable to less secure schemes. We define nine security, privacy, and deployability properties for name resolution, and show how DNSSEC+ fulfills these properties.

Updated: 2024-08-02 01:25:14

标题: DNSSEC+：一种受DNSSEC利益和缺陷启发的增强DNS方案

摘要: DNS递归解析器和权威域名服务器之间缺乏安全措施已被内联和离线攻击利用。虽然在实践和以往文献中提出了许多安全性建议，但它们通常存在部署障碍和/或不足的安全性属性。解析器和域名服务器之间缺乏广泛采纳的安全解决方案激励了一种新方案，该方案缓解了先前提议中的这些问题。我们提出了DNSSEC+，它解决了DNSSEC的安全性和部署性缺陷，同时保留了其优点。DNSSEC+利用现有的DNSSEC信任模型，授权区域内的域名服务器在短时间内安全提供区域数据，为DNS响应提供实时安全性属性，而无需在权威域名服务器上复制（因此置于风险中）长期私钥。就名称解析延迟而言，DNSSEC+提供与较不安全方案相当的性能。我们为名称解析定义了九个安全性、隐私性和部署性属性，并展示了DNSSEC+如何实现这些属性。

更新时间: 2024-08-02 01:25:14

领域: cs.CR

下载: http://arxiv.org/abs/2408.00968v1

Adaptive Self-training Framework for Fine-grained Scene Graph Generation

Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.

Updated: 2024-08-02 01:22:46

标题: 细粒度场景图生成的自适应自训练框架

摘要: 场景图生成（SGG）模型在基准数据集方面存在固有问题，如长尾谓词分布和缺失注释问题。在这项工作中，我们旨在通过利用未标记的三元组来缓解SGG的长尾问题。为此，我们引入了一种自训练框架用于SGG（ST-SGG），该框架基于未标记的三元组为其分配伪标签，用于训练SGG模型。虽然自我训练在图像识别方面取得了显著进展，但为SGG任务设计自训练框架更具挑战性，因为其固有性质，如语义模糊和谓词类别的长尾分布。因此，我们提出了一种新颖的SGG伪标记技术，称为类别特定自适应阈值动量（CATM），这是一个与模型无关的框架，可应用于任何现有的SGG模型。此外，我们设计了一个图结构学习器（GSL），在采用我们提出的自训练框架到最先进的基于消息传递神经网络（MPNN）的SGG模型时是有益的。我们的广泛实验证实了ST-SGG在各种SGG模型上的有效性，特别是在提升精细谓词类别性能方面。

更新时间: 2024-08-02 01:22:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.09786v5

Integrating ESG and AI: A Comprehensive Responsible AI Assessment Framework

Artificial Intelligence (AI) is a widely developed and adopted technology across entire industry sectors. Integrating environmental, social, and governance (ESG) considerations with AI investments is crucial for ensuring ethical and sustainable technological advancement. Particularly from an investor perspective, this integration not only mitigates risks but also enhances long-term value creation by aligning AI initiatives with broader societal goals. Yet, this area has been less explored in both academia and industry. To bridge the gap, we introduce a novel ESG-AI framework, which is developed based on insights from engagements with 28 companies and comprises three key components. The framework provides a structured approach to this integration, developed in collaboration with industry practitioners. The ESG-AI framework provides an overview of the environmental and social impacts of AI applications, helping users such as investors assess the materiality of AI use. Moreover, it enables investors to evaluate a company's commitment to responsible AI through structured engagements and thorough assessment of specific risk areas. We have publicly released the framework and toolkit in April 2024, which has received significant attention and positive feedback from the investment community. This paper details each component of the framework, demonstrating its applicability in real-world contexts and its potential to guide ethical AI investments.

Updated: 2024-08-02 00:58:01

标题: 整合ESG和AI：一个全面的负责任AI评估框架

摘要: 人工智能（AI）是跨越整个行业部门广泛发展和采用的技术。将环境、社会和治理（ESG）考虑因素与AI投资整合在一起对于确保道德和可持续的技术进步至关重要。特别是从投资者的角度来看，这种整合不仅可以减轻风险，还可以通过将AI倡议与更广泛的社会目标相一致，增强长期价值创造。然而，这一领域在学术界和行业中都受到较少探索。为了弥合这一差距，我们引入了一个新颖的ESG-AI框架，该框架基于与28家公司的合作洞见而开发，包括三个关键组成部分。该框架提供了一种结构化方法来进行这种整合，与行业从业者合作开发。ESG-AI框架概述了AI应用的环境和社会影响，帮助用户（如投资者）评估AI使用的实质性。此外，它使投资者能够通过结构化的参与和对特定风险领域的彻底评估来评估公司对负责任AI的承诺。我们于2024年4月公开发布了该框架和工具包，受到投资界的重视和积极反馈。本文详细介绍了框架的每个组成部分，在现实环境中展示其适用性以及引导道德AI投资的潜力。

更新时间: 2024-08-02 00:58:01

领域: cs.AI

下载: http://arxiv.org/abs/2408.00965v1

A Quantal Response Analysis of Defender-Attacker Sequential Security Games

We explore a scenario involving two sites and a sequential game between a defender and an attacker, where the defender is responsible for securing the sites while the attacker aims to attack them. Each site holds a loss value for the defender when compromised, along with a probability of successful attack. The defender can reduce these probabilities through security investments at each site. The attacker's objective is to target the site that maximizes the expected loss for the defender, taking into account the defender's security investments. While previous studies have examined security investments in such scenarios, our work investigates the impact of bounded rationality exhibited by the defender, as identified in behavioral economics. Specifically, we consider quantal behavioral bias, where humans make errors in selecting efficient (pure) strategies. We demonstrate the existence of a quantal response equilibrium in our sequential game and analyze how this bias affects the defender's choice of optimal security investments. Additionally, we quantify the inefficiency of equilibrium investments under quantal decision-making compared to an optimal solution devoid of behavioral biases. We provide numerical simulations to validate our main findings.

Updated: 2024-08-02 00:40:48

标题: 一种防御者-攻击者顺序安全博弈的量子响应分析

摘要: 我们探讨了涉及两个场所和一场顺序游戏的情景，其中防御者负责保护这些场所，而攻击者的目标是攻击它们。每个场所在受损时对防御者造成损失价值，同时有成功攻击的概率。防御者可以通过在每个场所进行安全投资来降低这些概率。攻击者的目标是选择使防御者预期损失最大化的场所，考虑到防御者的安全投资。虽然先前的研究已经研究了这种情景下的安全投资，但我们的工作探讨了防御者表现出的有限理性的影响，正如行为经济学中所指出的。具体来说，我们考虑到量子行为偏差，人类在选择高效（纯）策略时会出现错误。我们证明了在我们的顺序游戏中存在量子响应均衡，并分析了这种偏差如何影响防御者选择最佳安全投资。此外，我们量化了在量子决策相对于没有行为偏差的最佳解的均衡投资的低效性。我们提供了数值模拟来验证我们的主要发现。

更新时间: 2024-08-02 00:40:48

领域: cs.GT,cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2408.00964v1

MIS-ME: A Multi-modal Framework for Soil Moisture Estimation

Soil moisture estimation is an important task to enable precision agriculture in creating optimal plans for irrigation, fertilization, and harvest. It is common to utilize statistical and machine learning models to estimate soil moisture from traditional data sources such as weather forecasts, soil properties, and crop properties. However, there is a growing interest in utilizing aerial and geospatial imagery to estimate soil moisture. Although these images capture high-resolution crop details, they are expensive to curate and challenging to interpret. Imagine, an AI-enhanced software tool that predicts soil moisture using visual cues captured by smartphones and statistical data given by weather forecasts. This work is a first step towards that goal of developing a multi-modal approach for soil moisture estimation. In particular, we curate a dataset consisting of real-world images taken from ground stations and their corresponding weather data. We also propose MIS-ME - Meteorological & Image based Soil Moisture Estimator, a multi-modal framework for soil moisture estimation. Our extensive analysis shows that MIS-ME achieves a MAPE of 10.79%, outperforming traditional unimodal approaches with a reduction of 2.6% in MAPE for meteorological data and 1.5% in MAPE for image data, highlighting the effectiveness of tailored multi-modal approaches.

Updated: 2024-08-02 00:35:18

标题: MIS-ME：一种用于土壤湿度估计的多模态框架

摘要: 土壤湿度估计是实现精准农业的重要任务，可以创建灌溉、施肥和收获的最佳计划。通常使用统计和机器学习模型来估计土壤湿度，这些模型利用传统数据源如天气预报、土壤属性和作物属性。然而，越来越多的人对利用航空和地理空间图像来估计土壤湿度感兴趣。尽管这些图像捕捉了高分辨率的作物细节，但它们的筛选成本高，解释起来困难。想象一下，一种AI增强的软件工具，利用智能手机捕捉的视觉线索和天气预报给出的统计数据来预测土壤湿度。这项工作是朝着开发多模态方法进行土壤湿度估计的目标迈出的第一步。具体来说，我们整理了一个由地面站拍摄的真实图像和相应的天气数据组成的数据集。我们还提出了MIS-ME - 基于气象和图像的土壤湿度估计器，这是一个多模态框架用于土壤湿度估计。我们的广泛分析显示，MIS-ME实现了10.79%的MAPE，优于传统的单模态方法，MAPE在气象数据方面减少了2.6%，在图像数据方面减少了1.5%，突显了定制的多模态方法的有效性。

更新时间: 2024-08-02 00:35:18

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.00963v1

PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting

Understanding the nuances of a user's extensive interaction history is key to building accurate and personalized natural language systems that can adapt to evolving user preferences. To address this, we introduce PERSOMA, Personalized Soft Prompt Adapter architecture. Unlike previous personalized prompting methods for large language models, PERSOMA offers a novel approach to efficiently capture user history. It achieves this by resampling and compressing interactions as free form text into expressive soft prompt embeddings, building upon recent research utilizing embedding representations as input for LLMs. We rigorously validate our approach by evaluating various adapter architectures, first-stage sampling strategies, parameter-efficient tuning techniques like LoRA, and other personalization methods. Our results demonstrate PERSOMA's superior ability to handle large and complex user histories compared to existing embedding-based and text-prompt-based techniques.

Updated: 2024-08-02 00:24:22

标题: PERSOMA: 个性化语言提示的个性化软提示适配器架构

摘要: 理解用户广泛互动历史的细微差别是构建准确和个性化的自然语言系统的关键，这些系统可以适应不断变化的用户偏好。为了解决这个问题，我们引入了PERSOMA，个性化软提示适配器架构。与先前针对大型语言模型的个性化提示方法不同，PERSOMA提供了一种有效捕获用户历史的新方法。它通过将互动重新采样并压缩为自由形式文本，转换为富有表现力的软提示嵌入，借鉴了最近利用嵌入表示作为LLM输入的研究。我们通过评估各种适配器架构、第一阶段采样策略、如LoRA的参数高效调整技术以及其他个性化方法，严格验证了我们的方法。我们的结果表明，与现有基于嵌入和文本提示的技术相比，PERSOMA具有处理大型和复杂用户历史的卓越能力。

更新时间: 2024-08-02 00:24:22

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2408.00960v1

Multi-State TD Target for Model-Free Reinforcement Learning

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.The code is provided on GitHub.

Updated: 2024-08-02 00:21:41

标题: 无模型强化学习的多状态TD目标

摘要: 时间差异（TD）学习是强化学习中的一项基本技术，它通过使用TD目标来更新状态或状态-动作对的值估计。这个目标代表了一个通过结合即时奖励和后续状态的估计值来改进真实值的估计。传统上，TD学习依赖于单个后续状态的值。我们提出了一个增强型多状态TD（MSTD）目标，该目标利用了多个后续状态的估计值。基于这个新的MSTD概念，我们开发了完整的演员-评论家算法，包括在两种模式下管理重放缓冲区，并与深度确定性策略优化（DDPG）和软演员-评论家（SAC）集成。实验结果表明，采用MSTD目标的算法与传统方法相比显著提高了学习性能。代码提供在GitHub上。

更新时间: 2024-08-02 00:21:41

领域: cs.LG,cs.AI,68T05(Primary)

下载: http://arxiv.org/abs/2405.16522v4