Arxiv Day: Article

Transcendence: Generative Models Can Outperform The Experts That Train Them

Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outperform the humans on their original objectives. In this work, we study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data. We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset. We theoretically prove that transcendence can be enabled by low-temperature sampling, and rigorously assess this claim experimentally. Finally, we discuss other sources of transcendence, laying the groundwork for future investigation of this phenomenon in a broader setting.

Updated: 2024-06-21 23:45:55

标题: 超越：生成模型可以胜过训练它们的专家

摘要: 生成模型的训练目标很简单，即模仿它们所训练数据所产生的条件概率分布。因此，当在人类生成的数据上进行训练时，我们可能不会期望人工模型在原始目标上胜过人类。在这项工作中，我们研究了超越现象：当一个生成模型实现了超越生成其数据的专家能力的能力。我们通过训练一个自回归变换器来玩棋盘游戏，从游戏记录中，展示了超越现象，并展示出训练模型有时可以达到比数据集中所有玩家更好的表现。我们在理论上证明，超越可以通过低温采样实现，并在实验中严格评估了这一说法。最后，我们讨论了超越的其他来源，为未来在更广泛的环境中研究这一现象奠定了基础。

更新时间: 2024-06-21 23:45:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.11741v2

The Case for Transport-Level Encryption in Datacenter Networks

Cloud applications need network data encryption to isolate from other tenants and protect their data from potential eavesdroppers in the network infrastructure. This paper presents SDP, a protocol design for emerging datacenter transport protocols, such as pHost, NDP, and Homa, to integrate data encryption with the use of existing NIC offloading of cryptographic operations designed for TLS over TCP. Therefore, SDP could enable a deployment path of new transport protocols in datacenters without giving up hardware offloading support, which would otherwise make encryption on those protocols even slower than TLS over TCP. SDP is based on Homa, and outperforms TLS over TCP by up to 29 % in throughput. SDP currently supports two real-world applications, Redis, improving throughput by up to 24 %, and in-kernel NVMe-oF, cutting P99 latency by up to 21 %.

Updated: 2024-06-21 23:28:36

标题: 数据中心网络中传输级加密的案例

摘要: 云应用需要网络数据加密来隔离其他租户，并保护其数据免受网络基础设施中潜在窃听者的侵害。本文介绍了SDP，这是一个针对新兴数据中心传输协议（如pHost、NDP和Homa）的协议设计，用于将数据加密与现有NIC卸载密码操作相结合，该操作设计用于TLS over TCP。因此，SDP可以使新传输协议在数据中心中得以部署而无需放弃硬件卸载支持，否则这将使这些协议上的加密速度比TLS over TCP更慢。SDP基于Homa，并在吞吐量上比TLS over TCP提高了高达29％。目前，SDP支持两个真实世界应用程序，Redis的吞吐量提高了高达24％，在内核NVMe-oF中，P99延迟降低了高达21％。

更新时间: 2024-06-21 23:28:36

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2406.15686v1

Credit Card Fraud Detection Using Advanced Transformer Model

With the proliferation of various online and mobile payment systems, credit card fraud has emerged as a significant threat to financial security. This study focuses on innovative applications of the latest Transformer models for more robust and precise fraud detection. To ensure the reliability of the data, we meticulously processed the data sources, balancing the dataset to address the issue of data sparsity significantly. We also selected highly correlated vectors to strengthen the training process.To guarantee the reliability and practicality of the new Transformer model, we conducted performance comparisons with several widely adopted models, including Support Vector Machine (SVM), Random Forest, Neural Network, and Logistic Regression. We rigorously compared these models using metrics such as Precision, Recall, and F1 Score. Through these detailed analyses and comparisons, we present to the readers a highly efficient and powerful anti-fraud mechanism with promising prospects. The results demonstrate that the Transformer model not only excels in traditional applications but also shows great potential in niche areas like fraud detection, offering a substantial advancement in the field.

Updated: 2024-06-21 22:48:12

标题: 使用先进的Transformer模型进行信用卡欺诈检测

摘要: 随着各种在线和移动支付系统的普及，信用卡欺诈已经成为金融安全的重要威胁。本研究专注于最新Transformer模型的创新应用，以实现更强大和精确的欺诈检测。为确保数据的可靠性，我们精心处理了数据源，平衡数据集以解决数据稀疏性的问题。我们还选择了高度相关的向量来加强训练过程。为了确保新Transformer模型的可靠性和实用性，我们与几种广泛采用的模型进行了性能比较，包括支持向量机（SVM）、随机森林、神经网络和逻辑回归。我们严格使用精度、召回率和F1分数等指标比较这些模型。通过这些详细的分析和比较，我们向读者展示了一个高效而强大的反欺诈机制，具有很好的前景。结果表明，Transformer模型不仅在传统应用中表现出色，而且在欺诈检测等利基领域展现出巨大潜力，为该领域的重大进步提供了有力支持。

更新时间: 2024-06-21 22:48:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.03733v2

MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datasets that are difficult to prepare, or they require substantial computational resources for fine-tuning. Inspired by findings that LLMs know how to produce the right answer but struggle to select the correct reasoning path, we propose a purely inference-based searching method -- MindStar (M*). This method formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. We evaluate the M* framework on both the GSM8K and MATH datasets, comparing its performance with existing open and closed-source LLMs. Our results demonstrate that M* significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1, but with substantially reduced model size and computational costs.

Updated: 2024-06-21 22:41:08

标题: MindStar: 在推理时间增强预训练LLMs中的数学推理

摘要: 尽管大型语言模型（LLMs）在各种任务中表现出色，但它们通常在复杂推理任务（如回答数学问题）中遇到困难。最近为解决这一问题所做的努力主要集中在通过监督微调或自我改进技术利用数学数据集。然而，这些方法通常依赖于难以准备的高质量数据集，或者需要大量计算资源进行微调。受到LLMs知道如何产生正确答案但难以选择正确推理路径的发现的启发，我们提出了一种纯推理搜索方法--MindStar（M*）。该方法将推理任务制定为搜索问题，并提出了两种搜索思路来确定最佳推理路径。我们在GSM8K和MATH数据集上评估了M*框架，比较其与现有开源和闭源LLMs的性能。我们的结果表明，M*显著提升了开源模型（如Llama-2-13B和Mistral-7B）的推理能力，并且在模型尺寸和计算成本大幅减少的情况下，与GPT-3.5和Grok-1实现了可比的性能。

更新时间: 2024-06-21 22:41:08

领域: cs.LG

下载: http://arxiv.org/abs/2405.16265v3

Inferring Pluggable Types with Machine Learning

Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.

Updated: 2024-06-21 22:32:42

标题: 用机器学习推断可插拔类型

摘要: 可插拔的类型系统允许程序员扩展编程语言的类型系统，以强制执行程序员定义的语义属性。可插拔的类型系统在遗留代码库中很难部署，因为它们要求程序员手动编写类型注释。本文研究如何使用机器学习自动推断类型限定符。我们提出了一种新颖的表示形式NaP-AST，它编码了用于有效推断类型限定符的最小数据流提示。我们评估了几种用于推断类型限定符的模型架构，包括图转换网络、图卷积网络和大型语言模型。我们进一步通过将这些模型应用于先前对NullAway可插拔类型检查器的12个开源程序的评估中，降低了所有但一个未注释项目中的警告。我们发现GTN表现最佳，召回率为0.89，精度为0.6。此外，我们进行了一项研究，估计了训练模型良好性能所需的Java类的数量。在我们的可行性研究中，性能在大约16k个类的范围内得到改善，由于过拟合而恶化在大约22k个类的范围内。

更新时间: 2024-06-21 22:32:42

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.15676v1

Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery

We propose CoNSAL (Combining Neural networks and Symbolic regression for Analytical Lyapunov function) to construct analytical Lyapunov functions for nonlinear dynamic systems. This framework contains a neural Lyapunov function and a symbolic regression component, where symbolic regression is applied to distill the neural network to precise analytical forms. Our approach utilizes symbolic regression not only as a tool for translation but also as a means to uncover counterexamples. This procedure terminates when no counterexamples are found in the analytical formulation. Compared with previous results, our algorithm directly produces an analytical form of the Lyapunov function with improved interpretability in both the learning process and the final results. We apply our algorithm to 2-D inverted pendulum, path following, Van Der Pol Oscillator, 3-D trig dynamics, 4-D rotating wheel pendulum, 6-D 3-bus power system, and demonstrate that our algorithm successfully finds their valid Lyapunov functions.

Updated: 2024-06-21 22:31:06

标题: 结合神经网络和符号回归用于分析李亚普诺夫函数的发现

摘要: 我们提出了CoNSAL（结合神经网络和符号回归的分析李雅普诺夫函数）来为非线性动力系统构建分析李雅普诺夫函数。该框架包含一个神经网络李雅普诺夫函数和一个符号回归组件，其中符号回归被应用于将神经网络精确化为分析形式。我们的方法不仅利用符号回归作为翻译工具，还作为发现反例的手段。当在分析公式中找不到反例时，该程序终止。与先前的结果相比，我们的算法直接生成具有改进的可解释性的李雅普诺夫函数的分析形式，无论是在学习过程中还是在最终结果中。我们将我们的算法应用于2-D倒立摆、路径跟踪、Van Der Pol振荡器、3-D三角动力学、4-D旋转轮摆、6-D 3总线电力系统，并证明我们的算法成功地找到了它们的有效李雅普诺夫函数。

更新时间: 2024-06-21 22:31:06

领域: eess.SY,cs.AI,cs.SC,cs.SY

下载: http://arxiv.org/abs/2406.15675v1

Large Language Models have Intrinsic Self-Correction Ability

Large language models (LLMs) have attracted significant attention for their remarkable abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential.

Updated: 2024-06-21 22:29:40

标题: 大型语言模型具有固有的自我校正能力

摘要: 大型语言模型（LLMs）因其在各种自然语言处理任务中的出色能力而受到广泛关注，但它们存在导致性能下降的幻觉问题。改进LLMs性能的一个有前途的解决方案是要求LLMs在生成后修改其答案，这种技术被称为自我校正。在两种自我校正类型中，内在自我校正被认为是一个有前途的方向，因为它不利用外部知识。然而，最近的研究怀疑LLM进行内在自我校正的能力的有效性。本文通过理论分析和实证实验，提出了关于LLMs内在自我校正能力的新视角。此外，我们确定了成功自我校正的两个关键因素：零温度和公平提示。利用这些因素，我们展示了内在自我校正能力在多个现有LLMs中的表现。我们的研究结果为理解LLMs自我校正行为的基本理论提供了见解，并强调了在发挥它们的全部潜力中公正提示和零温度设置的重要性。

更新时间: 2024-06-21 22:29:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.15673v1

LMDX: Language Model-based Document Information Extraction and Localization

Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich documents, which is at the core of many document processing workflows and involving the extraction of key entities from semi-structured documents, has not yet been successful. The main obstacles to adopting LLMs for this task include the absence of layout encoding within LLMs, which is critical for high quality extraction, and the lack of a grounding mechanism to localize the predicted entities within the document. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to reframe the document information extraction task for a LLM. LMDX enables extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. Finally, we apply LMDX to the PaLM 2-S and Gemini Pro LLMs and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers.

Updated: 2024-06-21 21:55:07

标题: LMDX：基于语言模型的文档信息提取和定位

摘要: 大型语言模型（LLM）已经彻底改变了自然语言处理（NLP），提高了最先进技术，并在各种任务中展示了新兴能力。然而，它们在从视觉丰富的文档中提取信息的应用，这是许多文档处理工作流程的核心，并涉及从半结构化文档中提取关键实体，目前尚未取得成功。采用LLM进行此任务的主要障碍包括LLM内部缺乏布局编码，这对高质量提取至关重要，以及缺乏一个基础机制来定位文档中预测的实体。在本文中，我们介绍了基于语言模型的文档信息提取和定位（LMDX），这是一种重新构思文档信息提取任务的方法，适用于LLM。LMDX使得可以提取单一、重复和分层实体，无论是否有训练数据，同时提供定位保证并将实体定位在文档中。最后，我们将LMDX应用于PaLM 2-S和Gemini Pro LLM，并在VRDU和CORD基准测试上进行评估，取得了新的最先进技术，并展示了LMDX如何实现高质量、数据高效的解析器的创建。

更新时间: 2024-06-21 21:55:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.10952v2

Flat Posterior Does Matter For Bayesian Transfer Learning

The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning for BNNs has not been widely investigated and shows limited improvement. We hypothesize that this issue arises from the inability to find flat minima, which is crucial for generalization performance. To address this, we evaluate the sharpness of BNNs in various settings, revealing their insufficiency in seeking flat minima and the influence of flatness on BMA performance. Therefore, we propose Sharpness-aware Bayesian Model Averaging (SA-BMA), a Bayesian-fitting flat posterior seeking optimizer integrated with Bayesian transfer learning. SA-BMA calculates the divergence between posteriors in the parameter space, aligning with the nature of BNNs, and serves as a generalized version of existing sharpness-aware optimizers. We validate that SA-BMA improves generalization performance in few-shot classification and distribution shift scenarios by ensuring flatness.

Updated: 2024-06-21 21:44:27

标题: 平坦的后验对贝叶斯迁移学习很重要

摘要: 大规模预训练神经网络在提升下游任务性能方面取得了显著成功。另一个有前途的泛化方法是贝叶斯神经网络(BNN)，它将贝叶斯方法集成到神经网络架构中，提供了诸如贝叶斯模型平均(BMA)和不确定性量化等优势。尽管具有这些好处，BNN的迁移学习并没有得到广泛研究，并且改进有限。我们假设这个问题源于无法找到平坦的极小值，这对泛化性能至关重要。为解决这一问题，我们在不同设置下评估了BNN的尖锐性，揭示了它们在寻找平坦极小值方面的不足以及平坦性对BMA性能的影响。因此，我们提出了一种考虑尖锐度的贝叶斯模型平均(SA-BMA)，这是一个集成了贝叶斯迁移学习的平坦后验寻找优化器。SA-BMA计算参数空间中后验之间的离散度，与BNN的本质相一致，并作为现有尖锐度感知优化器的通用版本。我们验证了SA-BMA通过确保平坦性来提高少样本分类和分布转移情景中的泛化性能。

更新时间: 2024-06-21 21:44:27

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.15664v1

Matching Problems to Solutions: An Explainable Way of Solving Machine Learning Problems

Domain experts from all fields are called upon, working with data scientists, to explore the use of ML techniques to solve their problems. Starting from a domain problem/question, ML-based problem-solving typically involves three steps: (1) formulating the business problem (problem domain) as a data analysis problem (solution domain), (2) sketching a high-level ML-based solution pattern, given the domain requirements and the properties of the available data, and (3) designing and refining the different components of the solution pattern. There has to be a substantial body of ML problem solving knowledge that ML researchers agree on, and that ML practitioners routinely apply to solve the most common problems. Our work deals with capturing this body of knowledge, and embodying it in a ML problem solving workbench to helps domain specialists who are not ML experts to explore the ML solution space. This paper focuses on: 1) the representation of domain problems, ML problems, and the main ML solution artefacts, and 2) a heuristic matching function that helps identify the ML algorithm family that is most appropriate for the domain problem at hand, given the domain (expert) requirements, and the characteristics of the training data. We review related work and outline our strategy for validating the workbench

Updated: 2024-06-21 21:39:34

标题: 匹配问题与解决方案：解释性的解决机器学习问题的方法

摘要: 各个领域的专家被要求与数据科学家合作，探索使用机器学习技术解决问题。从一个领域问题/疑问开始，基于机器学习的问题解决通常包括三个步骤：(1) 将业务问题(问题领域)表述为数据分析问题(解决领域)；(2) 根据领域要求和可用数据的属性，勾勒一个高层次的基于机器学习的解决方案模式；(3) 设计和完善解决方案模式的不同组件。必须有一个大量的机器学习问题解决知识，这是机器学习研究人员一致认可的，机器学习实践者通常应用来解决最常见问题。我们的工作涉及捕捉这些知识，并将其体现在一个机器学习问题解决工作台中，以帮助不是机器学习专家的领域专家探索机器学习解决方案空间。本文重点讨论：1) 领域问题、机器学习问题和主要机器学习解决方案工件的表现形式；2) 一个启发式匹配函数，帮助确定对于手头的领域问题最合适的机器学习算法族，考虑到领域(专家)需求和训练数据的特征。我们回顾相关工作并概述验证工作台的策略。

更新时间: 2024-06-21 21:39:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.15662v1

The Stochastic Occupation Kernel Method for System Identification

The method of occupation kernels has been used to learn ordinary differential equations from data in a non-parametric way. We propose a two-step method for learning the drift and diffusion of a stochastic differential equation given snapshots of the process. In the first step, we learn the drift by applying the occupation kernel algorithm to the expected value of the process. In the second step, we learn the diffusion given the drift using a semi-definite program. Specifically, we learn the diffusion squared as a non-negative function in a RKHS associated with the square of a kernel. We present examples and simulations.

Updated: 2024-06-21 21:36:18

标题: 随机占据核方法用于系统辨识

摘要: 职业内核方法已被用于以非参数方式从数据中学习常微分方程。我们提出了一种两步方法，用于从过程的快照中学习随机微分方程的漂移和扩散。在第一步中，我们通过将职业内核算法应用于过程的期望值来学习漂移。在第二步中，我们通过半定规划学习扩散给定漂移。具体来说，我们在与内核平方相关的RKHS中学习扩散平方作为非负函数。我们提供了示例和模拟。

更新时间: 2024-06-21 21:36:18

领域: stat.ML,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.15661v1

Contextual Sprint Classification in Soccer Based on Deep Learning

The analysis of high-intensity runs (or sprints) in soccer has long been a topic of interest for sports science researchers and practitioners. In particular, recent studies suggested contextualizing sprints based on their tactical purposes to better understand the physical-tactical requirements of modern match-play. However, they have a limitation in scalability, as human experts have to manually classify hundreds of sprints for every match. To address this challenge, this paper proposes a deep learning framework for automatically classifying sprints in soccer into contextual categories. The proposed model covers the permutation-invariant and sequential nature of multi-agent trajectories in soccer by deploying Set Transformers and a bidirectional GRU. We train the model with category labels made through the collaboration of human annotators and a rule-based classifier. Experimental results show that our model classifies sprints in the test dataset into 15 categories with the accuracy of 77.65%, implying the potential of the proposed framework for facilitating the integrated analysis of soccer sprints at scale.

Updated: 2024-06-21 21:33:51

标题: 基于深度学习的足球比赛中的情境冲刺分类

摘要: 长期以来，对足球中高强度奔跑（或冲刺）的分析一直是体育科研人员和实践者关注的话题。特别是，最近的研究表明，根据其战术目的对冲刺进行情境化可以更好地了解现代比赛所需的身体和战术要求。然而，他们存在可扩展性有限的局限性，因为人类专家必须手动为每场比赛分类数百次冲刺。为了解决这一挑战，本文提出了一个深度学习框架，用于自动将足球中的冲刺分类为情境类别。所提出的模型通过部署集合变压器和双向GRU来覆盖足球中多代理轨迹的排列不变性和顺序特性。我们使用人类注释者和基于规则的分类器的合作制作的类别标签对模型进行训练。实验结果表明，我们的模型将测试数据集中的冲刺分类为15个类别，准确率为77.65％，表明所提出的框架具有促进大规模集成分析足球冲刺的潜力。

更新时间: 2024-06-21 21:33:51

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2406.15659v1

TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning

Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generation, geographic question answering, etc. Even though SRL has become the foundation of almost all geospatial artificial intelligence (GeoAI) research, we have not yet seen significant efforts to develop an extensive deep learning framework and benchmark to support SRL model development and evaluation. To fill this gap, we propose TorchSpatial, a learning framework and benchmark for location (point) encoding, which is one of the most fundamental data types of spatial representation learning. TorchSpatial contains three key components: 1) a unified location encoding framework that consolidates 15 commonly recognized location encoders, ensuring scalability and reproducibility of the implementations; 2) the LocBench benchmark tasks encompassing 7 geo-aware image classification and 4 geo-aware image regression datasets; 3) a comprehensive suite of evaluation metrics to quantify geo-aware models' overall performance as well as their geographic bias, with a novel Geo-Bias Score metric. Finally, we provide a detailed analysis and insights into the model performance and geographic bias of different location encoders. We believe TorchSpatial will foster future advancement of spatial representation learning and spatial fairness in GeoAI research. The TorchSpatial model framework, LocBench, and Geo-Bias Score evaluation framework are available at https://github.com/seai-lab/TorchSpatial.

Updated: 2024-06-21 21:33:16

标题: TorchSpatial：一种用于空间表示学习的位置编码框架和基准测试

摘要: 空间表示学习（SRL）旨在从各种类型的空间数据（例如点、折线、多边形、网络、图像等）中以其原生格式学习通用的神经网络表示。学习良好的空间表示是各种下游应用的基本问题，如物种分布建模、天气预测、轨迹生成、地理问题回答等。尽管SRL已成为几乎所有地理空间人工智能（GeoAI）研究的基础，但我们尚未看到大力发展广泛的深度学习框架和基准来支持SRL模型的开发和评估的努力。为了填补这一空白，我们提出了TorchSpatial，一个用于位置（点）编码的学习框架和基准，这是空间表示学习中最基本的数据类型之一。TorchSpatial包含三个关键组成部分：1）统一的位置编码框架，整合了15个常见的位置编码器，确保实现的可扩展性和可重复性；2）LocBench基准任务包括7个地理感知图像分类和4个地理感知图像回归数据集；3）一套全面的评估指标，用于量化地理感知模型的整体性能以及其地理偏差，包括新颖的Geo-Bias Score指标。最后，我们对不同位置编码器的模型性能和地理偏差进行了详细分析和洞察。我们相信TorchSpatial将促进未来空间表示学习和地理空间公平性在GeoAI研究中的进一步发展。TorchSpatial模型框架、LocBench和Geo-Bias Score评估框架可在https://github.com/seai-lab/TorchSpatial 上获得。

更新时间: 2024-06-21 21:33:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.15658v1

A survey on fairness of large language models in e-commerce: progress, application, and challenge

This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing the key principles underlying the use of LLMs in e-commerce, detailing the processes of pretraining, fine-tuning, and prompting that tailor these models to specific needs. It then explores the varied applications of LLMs in e-commerce, including product reviews, where they synthesize and analyze customer feedback; product recommendations, where they leverage consumer data to suggest relevant items; product information translation, enhancing global accessibility; and product question and answer sections, where they automate customer support. The paper critically addresses the fairness challenges in e-commerce, highlighting how biases in training data and algorithms can lead to unfair outcomes, such as reinforcing stereotypes or discriminating against certain groups. These issues not only undermine consumer trust, but also raise ethical and legal concerns. Finally, the work outlines future research directions, emphasizing the need for more equitable and transparent LLMs in e-commerce. It advocates for ongoing efforts to mitigate biases and improve the fairness of these systems, ensuring they serve diverse global markets effectively and ethically. Through this comprehensive analysis, the survey provides a holistic view of the current landscape of LLMs in e-commerce, offering insights into their potential and limitations, and guiding future endeavors in creating fairer and more inclusive e-commerce environments.

Updated: 2024-06-21 21:26:03

标题: 大型语言模型在电子商务中公平性调查：进展、应用和挑战

摘要: 这项调查探讨了大型语言模型（LLMs）在电子商务中的公平性，审视它们的进展、应用和面临的挑战。LLMs在电子商务领域变得至关重要，提供创新解决方案并增强客户体验。本文对LLMs在电子商务中的应用和挑战进行了全面调查。论文首先介绍了LLMs在电子商务中使用的关键原则，详细说明了预训练、微调和提示的过程，以满足这些模型的特定需求。然后探讨了LLMs在电子商务中的各种应用，包括产品评论，在那里它们综合和分析客户反馈；产品推荐，在那里它们利用消费者数据建议相关物品；产品信息翻译，增强全球可访问性；以及产品问答部分，在那里它们自动化客户支持。该论文批判性地讨论了电子商务中的公平性挑战，突出了训练数据和算法中的偏见如何导致不公平结果，例如强化刻板印象或歧视某些群体。这些问题不仅破坏了消费者信任，还引发了道德和法律上的担忧。最后，该工作概述了未来的研究方向，强调了在电子商务中需要更具公平和透明性的LLMs。它主张持续努力减轻偏见，改善这些系统的公平性，确保它们有效且符合伦理地为各种全球市场提供服务。通过这种全面分析，该调查提供了对LLMs在电子商务中当前景观的整体视角，为其潜力和局限性提供了见解，并引导未来努力创建更加公平和包容的电子商务环境。

更新时间: 2024-06-21 21:26:03

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.13025v2

Testing the Feasibility of Linear Programs with Bandit Feedback

While the recent literature has seen a surge in the study of constrained bandit problems, all existing methods for these begin by assuming the feasibility of the underlying problem. We initiate the study of testing such feasibility assumptions, and in particular address the problem in the linear bandit setting, thus characterising the costs of feasibility testing for an unknown linear program using bandit feedback. Concretely, we test if $\exists x: Ax \ge 0$ for an unknown $A \in \mathbb{R}^{m \times d}$, by playing a sequence of actions $x_t\in \mathbb{R}^d$, and observing $Ax_t + \mathrm{noise}$ in response. By identifying the hypothesis as determining the sign of the value of a minimax game, we construct a novel test based on low-regret algorithms and a nonasymptotic law of iterated logarithms. We prove that this test is reliable, and adapts to the `signal level,' $\Gamma,$ of any instance, with mean sample costs scaling as $\widetilde{O}(d^2/\Gamma^2)$. We complement this by a minimax lower bound of $\Omega(d/\Gamma^2)$ for sample costs of reliable tests, dominating prior asymptotic lower bounds by capturing the dependence on $d$, and thus elucidating a basic insight missing in the extant literature on such problems.

Updated: 2024-06-21 20:56:35

标题: 测试具有强盗反馈的线性规划的可行性

摘要: 最近的文献中出现了对受限赌博机问题的研究激增，所有现有的方法都是基于假设基本问题的可行性开始的。我们开始研究测试这种可行性假设，并特别解决线性赌博机环境中的问题，从而描述使用赌博反馈测试未知线性规划的可行性测试的成本。具体来说，我们通过玩一个序列的动作$x_t \in \mathbb{R}^d$，并观察响应中的$Ax_t + \mathrm{noise}$来测试$\exists x: Ax \ge 0$对于未知的$A \in \mathbb{R}^{m \times d}$是否成立。通过确定假设来确定一个极小极大游戏的价值的符号，我们构建了一种基于低遗憾算法和非渐进对数律的新颖测试。我们证明了这个测试是可靠的，并且能够适应任何实例的“信号水平”$\Gamma$，平均样本成本的缩放为$\widetilde{O}(d^2/\Gamma^2)$。我们通过一个可靠测试的样本成本的极小极大下界为$\Omega(d/\Gamma^2)$来补充这一点，通过捕捉对$d$的依赖性，支配了先前的渐近下界，从而阐明了现有文献中缺少的基本见解。

更新时间: 2024-06-21 20:56:35

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2406.15648v1

Generating Music with Structure Using Self-Similarity as Attention

Despite the innovations in deep learning and generative AI, creating long term structure as well as the layers of repeated structure common in musical works remains an open challenge in music generation. We propose an attention layer that uses a novel approach applying user-supplied self-similarity matrices to previous time steps, and demonstrate it in our Similarity Incentivized Neural Generator (SING) system, a deep learning autonomous music generation system with two layers. The first is a vanilla Long Short Term Memory layer, and the second is the proposed attention layer. During generation, this attention mechanism imposes a suggested structure from a template piece on the generated music. We train SING on the MAESTRO dataset using a novel variable batching method, and compare its performance to the same model without the attention mechanism. The addition of our proposed attention mechanism significantly improves the network's ability to replicate specific structures, and it performs better on an unseen test set than a model without the attention mechanism.

Updated: 2024-06-21 20:56:12

标题: 使用自相似性作为注意力生成具有结构的音乐

摘要: 尽管深度学习和生成式人工智能方面取得了创新，但在音乐生成中创建长期结构以及音乐作品中常见的重复结构层仍然是一个挑战。我们提出了一种使用新颖方法将用户提供的自相似矩阵应用于先前时间步的注意力层，并在我们的相似性激励神经生成器（SING）系统中展示它，这是一个具有两层的深度学习自主音乐生成系统。第一层是一个普通的长短期记忆层，第二层是提出的注意力层。在生成过程中，这种注意力机制将模板音乐作品上的建议结构强加到生成的音乐上。我们使用一种新颖的可变批处理方法在MAESTRO数据集上训练SING，并将其性能与不带注意力机制的相同模型进行比较。我们提出的注意力机制的添加显著提高了网络复制特定结构的能力，并在未见测试集上表现比没有注意力机制的模型更好。

更新时间: 2024-06-21 20:56:12

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.15647v1

Dynamic Embeddings with Task-Oriented prompting

This paper introduces Dynamic Embeddings with Task-Oriented prompting (DETOT), a novel approach aimed at improving the adaptability and efficiency of machine learning models by implementing a flexible embedding layer. Unlike traditional static embeddings [14], DETOT dynamically adjusts embeddings based on task-specific requirements and performance feedback, optimizing input data representation for individual tasks [4]. This method enhances both accuracy and computational performance by tailoring the representation layer to meet the unique needs of each task. The structure of DETOT is detailed, highlighting its task-specific adaptation, continuous feedback loop, and mechanisms for preventing overfitting. Empirical evaluations demonstrate its superiority over existing methods.

Updated: 2024-06-21 20:51:59

标题: Task-Oriented提示的动态嵌入

摘要: 本文介绍了一种名为Dynamic Embeddings with Task-Oriented prompting（DETOT）的新颖方法，旨在通过实现灵活的嵌入层来提高机器学习模型的适应性和效率。与传统的静态嵌入不同，DETOT根据任务特定要求和性能反馈动态调整嵌入，优化输入数据表示以适应每个任务。该方法通过将表示层量身定制以满足每个任务的独特需求，提高了准确性和计算性能。DETOT的结构被详细介绍，突出了其任务特定的适应性、连续反馈循环以及防止过拟合的机制。实证评估证明了其优于现有方法的优越性。

更新时间: 2024-06-21 20:51:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.11117v2

Root Cause Analysis of Anomalies in 5G RAN Using Graph Neural Network and Transformer

The emergence of 5G technology marks a significant milestone in developing telecommunication networks, enabling exciting new applications such as augmented reality and self-driving vehicles. However, these improvements bring an increased management complexity and a special concern in dealing with failures, as the applications 5G intends to support heavily rely on high network performance and low latency. Thus, automatic self-healing solutions have become effective in dealing with this requirement, allowing a learning-based system to automatically detect anomalies and perform Root Cause Analysis (RCA). However, there are inherent challenges to the implementation of such intelligent systems. First, there is a lack of suitable data for anomaly detection and RCA, as labelled data for failure scenarios is uncommon. Secondly, current intelligent solutions are tailored to LTE networks and do not fully capture the spatio-temporal characteristics present in the data. Considering this, we utilize a calibrated simulator, Simu5G, and generate open-source data for normal and failure scenarios. Using this data, we propose Simba, a state-of-the-art approach for anomaly detection and root cause analysis in 5G Radio Access Networks (RANs). We leverage Graph Neural Networks to capture spatial relationships while a Transformer model is used to learn the temporal dependencies of the data. We implement a prototype of Simba and evaluate it over multiple failures. The outcomes are compared against existing solutions to confirm the superiority of Simba.

Updated: 2024-06-21 20:34:08

标题: 使用图神经网络和Transformer进行5G RAN异常根本原因分析

摘要: 5G技术的出现标志着发展电信网络迈出了重要的里程碑，使得增强现实和自动驾驶等令人兴奋的新应用成为可能。然而，这些改进带来了管理复杂性的增加，以及处理故障时特别关注，因为5G打算支持的应用严重依赖于高网络性能和低延迟。因此，自动自愈解决方案已经成为在应对这一需求方面有效的方法，使得基于学习的系统能够自动检测异常并执行根本原因分析（RCA）。然而，实施这种智能系统存在固有的挑战。首先，缺乏适用于异常检测和RCA的数据，因为标记的故障场景数据并不常见。其次，当前的智能解决方案针对LTE网络进行了定制，无法充分捕捉数据中存在的时空特征。考虑到这一点，我们利用经过校准的模拟器Simu5G，生成了用于正常和故障场景的开源数据。利用这些数据，我们提出了Simba，这是一种在5G无线接入网络（RANs）中进行异常检测和根本原因分析的最先进方法。我们利用图神经网络捕捉空间关系，同时使用Transformer模型学习数据的时间依赖关系。我们实现了Simba的原型，并对其在多次故障中进行评估。将结果与现有解决方案进行比较，以确认Simba的优越性。

更新时间: 2024-06-21 20:34:08

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2406.15638v1

DataFreeShield: Defending Adversarial Attacks without Training Data

Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data become inapplicable. Thus we investigate the pivotal problem of data-free adversarial robustness, where we try to achieve adversarial robustness without accessing any real data. Through a preliminary study, we highlight the severity of the problem by showing that robustness without the original dataset is difficult to achieve, even with similar domain datasets. To address this issue, we propose DataFreeShield, which tackles the problem from two perspectives: surrogate dataset generation and adversarial training using the generated data. Through extensive validation, we show that DataFreeShield outperforms baselines, demonstrating that the proposed method sets the first entirely data-free solution for the adversarial robustness problem.

Updated: 2024-06-21 20:24:03

标题: DataFreeShield：无需训练数据的对抗攻击防御

摘要: 最近针对对抗性鲁棒性的最新进展依赖于丰富的训练数据，其中使用外部或附加数据集已成为常见设置。然而，在现实生活中，由于安全和隐私问题，训练数据通常被保留为私有，而只有预训练权重对公众可用。在这种情况下，假设可以访问原始数据的现有方法变得不适用。因此，我们调查了无数据对抗性鲁棒性的关键问题，即在不访问任何真实数据的情况下实现对抗性鲁棒性。通过初步研究，我们展示了在没有原始数据的情况下实现鲁棒性是困难的，即使使用类似领域的数据集也是如此。为了解决这个问题，我们提出了DataFreeShield，从两个角度解决了这个问题：生成替代数据集和使用生成的数据进行对抗性训练。通过广泛的验证，我们展示了DataFreeShield优于基线，证明了所提出的方法为对抗性鲁棒性问题提供了第一个完全无数据的解决方案。

更新时间: 2024-06-21 20:24:03

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.15635v1

Neural Koopman prior for data assimilation

With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.

Updated: 2024-06-21 20:14:59

标题: 神经Koopman先验用于数据同化

摘要: 随着大规模数据集、计算能力和自动微分和表达丰富的神经网络架构等工具的日益可用，序列数据现在通常以数据驱动的方式处理，通过从观测数据训练出动态模型。虽然神经网络常被视为不可解释的黑盒架构，但它们仍然可以受益于数据的物理先验和数学知识。在本文中，我们使用神经网络架构，利用长期以来已知的Koopman算子理论，将动态系统嵌入到潜在空间中，其中它们的动态可以线性描述，从而实现了一系列吸引人的特性。我们介绍了一些方法，使这样的模型能够进行长期连续重建的训练，即使在数据以不规则采样的时间序列出现的困难情况下也能做到。我们还展示了自监督学习的潜力，展示了训练过的动态模型作为变分数据同化技术的先验的有希望的应用，例如时间序列插值和预测。

更新时间: 2024-06-21 20:14:59

领域: cs.LG

下载: http://arxiv.org/abs/2309.05317v3

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds

Multi-label imbalanced classification poses a significant challenge in machine learning, particularly evident in bioacoustics where animal sounds often co-occur, and certain sounds are much less frequent than others. This paper focuses on the specific case of classifying anuran species sounds using the dataset AnuraSet, that contains both class imbalance and multi-label examples. To address these challenges, we introduce Mixture of Mixups (Mix2), a framework that leverages mixing regularization methods Mixup, Manifold Mixup, and MultiMix. Experimental results show that these methods, individually, may lead to suboptimal results; however, when applied randomly, with one selected at each training iteration, they prove effective in addressing the mentioned challenges, particularly for rare classes with few occurrences. Further analysis reveals that Mix2 is also proficient in classifying sounds across various levels of class co-occurrences.

Updated: 2024-06-21 20:09:05

标题: 混合混淆用于罕见无尾两栖动物声音的多标签分类

摘要: 多标签不平衡分类在机器学习中是一个重要挑战，尤其在生物声学中更为明显，动物声音经常同时出现，某些声音频率远低于其他声音。本文专注于使用包含类别不平衡和多标签示例的数据集AnuraSet来对无尾两栖动物物种声音进行分类。为解决这些挑战，我们引入了混合混合（Mix2）框架，该框架利用混合规则方法Mixup、Manifold Mixup和MultiMix。实验结果显示，这些方法单独使用可能导致次优结果；然而，当随机应用它们时，在每次训练迭代中选择一个，它们在解决上述挑战方面非常有效，特别是对于出现次数较少的稀有类别。进一步分析表明，Mix2还擅长在各种类别共现水平下对声音进行分类。

更新时间: 2024-06-21 20:09:05

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2403.09598v2

Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph

Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML). The rapid proliferation of large language models (LLMs) has stimulated researchers to seek efficient and effective approaches to UQ in text generation tasks, as in addition to their emerging capabilities, these models have introduced new challenges for building safe applications. As with other ML models, LLMs are prone to make incorrect predictions, ``hallucinate'' by fabricating claims, or simply generate low-quality output for a given input. UQ is a key element in dealing with these challenges. However research to date on UQ methods for LLMs has been fragmented, with disparate evaluation methods. In this work, we tackle this issue by introducing a novel benchmark that implements a collection of state-of-the-art UQ baselines, and provides an environment for controllable and consistent evaluation of novel techniques by researchers in various text generation tasks. Our benchmark also supports the assessment of confidence normalization methods in terms of their ability to provide interpretable scores. Using our benchmark, we conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks and shed light on the most promising approaches.

Updated: 2024-06-21 20:06:31

标题: 使用LM-Polygraph对大型语言模型进行不确定性量化方法的基准测试

摘要: 不确定性量化（UQ）越来越被认为是依赖于机器学习（ML）应用的关键组成部分。大型语言模型（LLM）的快速增长刺激研究人员寻求在文本生成任务中高效有效的UQ方法，因为除了它们新兴的能力外，这些模型还引入了构建安全应用程序的新挑战。与其他ML模型一样，LLMs容易做出不正确的预测，通过制造声明“幻觉”，或者仅仅为给定输入生成质量低劣的输出。UQ是应对这些挑战的关键元素。然而，迄今为止关于LLMs的UQ方法的研究是零碎的，评估方法是分散的。在这项工作中，我们通过引入一个实施一系列最先进UQ基线的新型基准，解决了这个问题，并为各种文本生成任务的研究人员提供了一个可控和一致的评估环境，从而支持对新技术的评估。我们的基准还支持对信心标准化方法的评估，以评估它们提供可解释得分的能力。利用我们的基准，我们在九个任务中进行了大规模的实证研究，探讨了UQ和标准化技术中最有前景的方法。

更新时间: 2024-06-21 20:06:31

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.15627v1

Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem

This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of information retrieved from a constrained database of digitized pedagogical materials (dictionaries and grammar lessons) and parallel corpora. Using both automatic and human evaluation of model output, we conduct ablation studies that manipulate (1) context type (morpheme translations, grammar descriptions, and corpus examples), (2) retrieval methods (automated vs. manual), and (3) model type. Our results suggest that even relatively small LLMs are capable of utilizing prompt context for zero-shot low-resource translation when provided a minimally sufficient amount of relevant linguistic information. However, the variable effects of prompt type, retrieval method, model type, and language-specific factors highlight the limitations of using even the best LLMs as translation systems for the majority of the world's 7,000+ languages and their speakers.

Updated: 2024-06-21 20:02:22

标题: LLMs在低资源翻译中的缺陷：检索和理解皆有问题

摘要: 这项研究调查了预训练的大型语言模型（LLMs）在作为自动机器翻译流程的一部分，被指示将文本从低资源语言翻译成高资源语言时的上下文学习能力。我们进行了一系列实验，将南部克丘亚语翻译成西班牙语，并检查了从受限的数字化教学材料数据库（词典和语法课程）和平行语料库中检索的各种信息的信息量。通过对模型输出进行自动和人工评估，我们进行了消融研究，操纵（1）上下文类型（形态素翻译、语法描述和语料库示例）、（2）检索方法（自动 vs. 手动）和（3）模型类型。我们的结果表明，即使是相对较小的LLMs，在提供了足够数量的相关语言信息时，也能够利用提示上下文进行零-shot低资源翻译。然而，提示类型、检索方法、模型类型和语言特定因素的可变效果突显出即使是最好的LLMs也存在使用限制，无法作为世界上7000多种语言及其使用者的翻译系统。

更新时间: 2024-06-21 20:02:22

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.15625v1

Marrying Compressed Sensing and Deep Signal Separation

Blind signal separation (BSS) is an important and challenging signal processing task. Given an observed signal which is a superposition of a collection of unknown (hidden/latent) signals, BSS aims at recovering the separate, underlying signals from only the observed mixed signal. As an underdetermined problem, BSS is notoriously difficult to solve in general, and modern deep learning has provided engineers with an effective set of tools to solve this problem. For example, autoencoders learn a low-dimensional hidden encoding of the input data which can then be used to perform signal separation. In real-time systems, a common bottleneck is the transmission of data (communications) to a central command in order to await decisions. Bandwidth limits dictate the frequency and resolution of the data being transmitted. To overcome this, compressed sensing (CS) technology allows for the direct acquisition of compressed data with a near optimal reconstruction guarantee. This paper addresses the question: can compressive acquisition be combined with deep learning for BSS to provide a complete acquire-separate-predict pipeline? In other words, the aim is to perform BSS on a compressively acquired signal directly without ever having to decompress the signal. We consider image data (MNIST and E-MNIST) and show how our compressive autoencoder approach solves the problem of compressive BSS. We also provide some theoretical insights into the problem.

Updated: 2024-06-21 20:00:34

标题: 将压缩感知和深度信号分离结合在一起

摘要: 盲信号分离（BSS）是一项重要且具有挑战性的信号处理任务。给定一个观测信号，它是一组未知（隐藏/潜在）信号的叠加，BSS的目标是仅从观测到的混合信号中恢复单独的基础信号。作为一个欠定问题，BSS在一般情况下是极其难以解决的，而现代深度学习为工程师提供了一套有效的工具来解决这个问题。例如，自动编码器学习输入数据的低维隐藏编码，然后可以用来执行信号分离。在实时系统中，一个常见的瓶颈是将数据（通信）传输到中央命令以等待决策。带宽限制决定了传输的数据频率和分辨率。为了克服这一问题，压缩感知（CS）技术允许直接获取带有几乎最优重建保证的压缩数据。本文讨论了一个问题：压缩采集是否可以与深度学习相结合，以提供完整的获取-分离-预测流程？换句话说，目标是在压缩获取的信号上直接执行BSS，而无需解压缩信号。我们考虑图像数据（MNIST和E-MNIST），并展示了我们的压缩自动编码器方法如何解决压缩BSS的问题。我们还提供了一些理论见解。

更新时间: 2024-06-21 20:00:34

领域: math.NA,cs.AI,cs.NA,68T07 68T07

下载: http://arxiv.org/abs/2406.15623v1

Efficient Adversarial Training in LLMs with Continuous Attacks

Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial training has proven to be one of the most promising methods to reliably improve robustness against such attacks. Yet, in the context of LLMs, current methods for adversarial training are hindered by the high computational costs required to perform discrete adversarial attacks at each training iteration. We address this problem by instead calculating adversarial attacks in the continuous embedding space of the LLM, which is orders of magnitudes more efficient. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses: the first makes the model robust on continuous embedding attacks computed on an adversarial behaviour dataset; the second ensures the usefulness of the final model by fine-tuning on utility data. Moreover, we introduce C-AdvIPO, an adversarial variant of IPO that does not require utility data for adversarially robust alignment. Our empirical evaluation on four models from different families (Gemma, Phi3, Mistral, Zephyr) and at different scales (2B, 3.8B, 7B) shows that both algorithms substantially enhance LLM robustness against discrete attacks (GCG, AutoDAN, PAIR), while maintaining utility. Our results demonstrate that robustness to continuous perturbations can extrapolate to discrete threat models. Thereby, we present a path toward scalable adversarial training algorithms for robustly aligning LLMs.

Updated: 2024-06-21 19:59:31

标题: 在具有连续攻击的LLMs中高效对抗训练

摘要: 大型语言模型（LLMs）容易受到对抗性攻击，可以绕过它们的安全防护措施。在许多领域，对抗性训练已被证明是可靠提高抗攻击鲁棒性的最有前途的方法之一。然而，在LLMs的背景下，目前的对抗性训练方法受到每次训练迭代中执行离散对抗攻击所需的高计算成本的阻碍。我们通过在LLM的连续嵌入空间中计算对抗攻击来解决这个问题，这比离散攻击有效得多。我们提出了一种快速对抗性训练算法（C-AdvUL），由两个损失组成：第一个使模型在对抗行为数据集上计算的连续嵌入攻击上具有鲁棒性；第二个通过在实用数据上进行微调来确保最终模型的实用性。此外，我们引入了C-AdvIPO，这是一个IPO的对抗性变体，不需要实用数据来进行对抗性鲁棒对齐。我们对来自不同家族（Gemma、Phi3、Mistral、Zephyr）和不同规模（2B、3.8B、7B）的四个模型进行了实证评估，结果显示这两种算法显著增强了LLM对离散攻击（GCG、AutoDAN、PAIR）的鲁棒性，同时保持了实用性。我们的结果表明，对连续扰动的鲁棒性可以推广到离散威胁模型。因此，我们提出了一条路径，为在LLMs上进行鲁棒对齐的可伸缩对抗训练算法。

更新时间: 2024-06-21 19:59:31

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.15589v2

Physics Informed Machine Learning (PIML) methods for estimating the remaining useful lifetime (RUL) of aircraft engines

This paper is aimed at using the newly developing field of physics informed machine learning (PIML) to develop models for predicting the remaining useful lifetime (RUL) aircraft engines. We consider the well-known benchmark NASA Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) data as the main data for this paper, which consists of sensor outputs in a variety of different operating modes. C-MAPSS is a well-studied dataset with much existing work in the literature that address RUL prediction with classical and deep learning methods. In the absence of published empirical physical laws governing the C-MAPSS data, our approach first uses stochastic methods to estimate the governing physics models from the noisy time series data. In our approach, we model the various sensor readings as being governed by stochastic differential equations, and we estimate the corresponding transition density mean and variance functions of the underlying processes. We then augment LSTM (long-short term memory) models with the learned mean and variance functions during training and inferencing. Our PIML based approach is different from previous methods, and we use the data to first learn the physics. Our results indicate that PIML discovery and solutions methods are well suited for this problem and outperform previous data-only deep learning methods for this data set and task. Moreover, the framework developed herein is flexible, and can be adapted to other situations (other sensor modalities or combined multi-physics environments), including cases where the underlying physics is only partially observed or known.

Updated: 2024-06-21 19:55:34

标题: 基于物理信息的机器学习（PIML）方法用于估算飞机发动机剩余寿命（RUL）

摘要: 本文旨在利用新兴的物理信息机器学习（PIML）领域，为预测飞机发动机剩余寿命（RUL）开发模型。我们将著名的NASA商用模块化航空推进系统仿真（C-MAPSS）数据作为本文的主要数据，该数据包括各种不同操作模式下的传感器输出。C-MAPSS是一个经过深入研究的数据集，在文献中已经存在许多关于使用经典和深度学习方法进行RUL预测的工作。在缺乏规范C-MAPSS数据的已发表实证物理定律的情况下，我们的方法首先使用随机方法从嘈杂的时间序列数据中估计控制物理模型。在我们的方法中，我们将各种传感器读数建模为受随机微分方程控制，并估计底层过程的相应转换密度均值和方差函数。然后我们在训练和推断过程中使用学习到的均值和方差函数增强LSTM（长短期记忆）模型。我们基于PIML的方法不同于先前的方法，我们使用数据首先学习物理知识。我们的结果表明，PIML的发现和解决方法非常适用于这个问题，并且在这个数据集和任务中胜过先前仅使用数据的深度学习方法。此外，本文开发的框架是灵活的，可以适应其他情况（其他传感器模式或结合多物理环境），包括在部分观察或已知底层物理的情况下。

更新时间: 2024-06-21 19:55:34

领域: cs.LG,cs.AI,cs.NA,math.NA,65C20 65C20 65C20

下载: http://arxiv.org/abs/2406.15619v1

Stackelberg Games with $k$-Submodular Function under Distributional Risk-Receptiveness and Robustness

We study submodular optimization in adversarial context, applicable to machine learning problems such as feature selection using data susceptible to uncertainties and attacks. We focus on Stackelberg games between an attacker (or interdictor) and a defender where the attacker aims to minimize the defender's objective of maximizing a $k$-submodular function. We allow uncertainties arising from the success of attacks and inherent data noise, and address challenges due to incomplete knowledge of the probability distribution of random parameters. Specifically, we introduce Distributionally Risk-Averse $k$-Submodular Interdiction Problem (DRA $k$-SIP) and Distributionally Risk-Receptive $k$-Submodular Interdiction Problem (DRR $k$-SIP) along with finitely convergent exact algorithms for solving them. The DRA $k$-SIP solution allows risk-averse interdictor to develop robust strategies for real-world uncertainties. Conversely, DRR $k$-SIP solution suggests aggressive tactics for attackers, willing to embrace (distributional) risk to inflict maximum damage, identifying critical vulnerable components, which can be used for the defender's defensive strategies. The optimal values derived from both DRA $k$-SIP and DRR $k$-SIP offer a confidence interval-like range for the expected value of the defender's objective function, capturing distributional ambiguity. We conduct computational experiments using instances of feature selection and sensor placement problems, and Wisconsin breast cancer data and synthetic data, respectively.

Updated: 2024-06-21 19:51:28

标题: 在分布风险接受度和鲁棒性下具有$k$-次模函数的斯塔克尔贝格博弈

摘要: 我们研究对抗背景下的次模优化，适用于机器学习问题，如使用易受不确定性和攻击的数据进行特征选择。我们关注攻击者（或拦截者）和防御者之间的斯塔克贝格博弈，攻击者旨在最小化防御者最大化$k$-次模函数的目标。我们允许由于攻击成功和固有数据噪音而产生的不确定性，并解决由于对随机参数的概率分布的不完全了解而导致的挑战。具体来说，我们引入了分布风险规避$k$-次模拦截问题（DRA $k$-SIP）和分布风险接受$k$-次模拦截问题（DRR $k$-SIP），以及用于解决它们的有限收敛精确算法。DRA $k$-SIP解决方案允许风险规避的拦截者制定针对现实世界不确定性的强大策略。相反，DRR $k$-SIP解决方案建议攻击者采取积极策略，愿意接受（分布）风险以造成最大损害，识别关键的易受攻击组件，这些组件可用于防御者的防御策略。从DRA $k$-SIP和DRR $k$-SIP中得出的最优值为防御者目标函数的期望值提供了类似于置信区间的范围，捕捉了分布的不确定性。我们使用特征选择和传感器布置问题的实例以及威斯康星州乳腺癌数据和合成数据进行了计算实验。

更新时间: 2024-06-21 19:51:28

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.13023v2

BrowNNe: Brownian Nonlocal Neurons & Activation Functions

It is generally thought that the use of stochastic activation functions in deep learning architectures yield models with superior generalization abilities. However, a sufficiently rigorous statement and theoretical proof of this heuristic is lacking in the literature. In this paper, we provide several novel contributions to the literature in this regard. Defining a new notion of nonlocal directional derivative, we analyze its theoretical properties (existence and convergence). Second, using a probabilistic reformulation, we show that nonlocal derivatives are epsilon-sub gradients, and derive sample complexity results for convergence of stochastic gradient descent-like methods using nonlocal derivatives. Finally, using our analysis of the nonlocal gradient of Holder continuous functions, we observe that sample paths of Brownian motion admit nonlocal directional derivatives, and the nonlocal derivatives of Brownian motion are seen to be Gaussian processes with computable mean and standard deviation. Using the theory of nonlocal directional derivatives, we solve a highly nondifferentiable and nonconvex model problem of parameter estimation on image articulation manifolds. Using Brownian motion infused ReLU activation functions with the nonlocal gradient in place of the usual gradient during backpropagation, we also perform experiments on multiple well-studied deep learning architectures. Our experiments indicate the superior generalization capabilities of Brownian neural activation functions in low-training data regimes, where the use of stochastic neurons beats the deterministic ReLU counterpart.

Updated: 2024-06-21 19:40:30

标题: BrowNNe：布朗非局部神经元和激活函数

摘要: 通常认为，在深度学习架构中使用随机激活函数会产生具有优越泛化能力的模型。然而，在文献中缺乏足够严格的陈述和理论证明这一启发式。在本文中，我们在这方面为文献提供了几项新的贡献。通过定义一个新的非局部方向导数的概念，我们分析了其理论特性（存在性和收敛性）。其次，通过使用概率重构，我们展示了非局部导数是ε-次梯度，并推导了使用非局部导数的随机梯度下降类方法的收敛的样本复杂度结果。最后，通过我们对Holder连续函数的非局部梯度的分析，我们观察到布朗运动的样本路径具有非局部方向导数，并且布朗运动的非局部导数被视为具有可计算均值和标准差的高斯过程。利用非局部方向导数理论，我们解决了图像关节流形上参数估计的高度不可微和非凸模型问题。在反向传播过程中使用布朗运动注入的ReLU激活函数，将非局部梯度代替常规梯度，我们还在多个深度学习架构上进行了实验。我们的实验表明，在低训练数据情况下，布朗神经激活函数具有优越的泛化能力，其中使用随机神经元胜过确定性ReLU对应物。

更新时间: 2024-06-21 19:40:30

领域: cs.LG,cs.NA,math.NA,90C30

下载: http://arxiv.org/abs/2406.15617v1

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

Large Language Models (LLMs) still struggle with natural language reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. ReConcile enhances collaborative reasoning between LLM agents via multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism that leads to a better consensus. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. Experiments on seven benchmarks demonstrate that ReConcile significantly improves LLMs' reasoning -- both individually and as a team -- surpassing prior single-agent and multi-agent baselines by up to 11.4% and even outperforming GPT-4 on three datasets. ReConcile also flexibly incorporates different combinations of agents, including API-based, open-source, and domain-specific models, leading to an 8% improvement on MATH. Finally, we analyze the individual components of ReConcile, demonstrating that the diversity originating from different models is critical to its superior performance. Code: https://github.com/dinobby/ReConcile

Updated: 2024-06-21 19:34:27

标题: 调和：圆桌会议通过多元法学硕士的共识改善推理

摘要: 大型语言模型（LLMs）仍然在自然语言推理任务中面临困难。受到心智社会理论（Minsky，1988）的启发，我们提出了ReConcile，这是一个多模型多代理框架，旨在作为各种LLM代理之间的圆桌会议。ReConcile通过多轮讨论增强LLM代理之间的协作推理，学会说服其他代理改进他们的答案，并采用置信度加权投票机制，从而达到更好的共识。在每一轮中，ReConcile通过一个“讨论提示”引发代理之间的讨论，其中包括上一轮每个代理生成的答案和解释，他们的置信分数，以及用于说服其他代理的答案矫正人类解释的演示。对七个基准进行的实验表明，ReConcile显著改进了LLMs的推理能力，无论是个体还是团队，都比以往的单一代理和多代理基线提高了高达11.4％，甚至在三个数据集上表现优于GPT-4。ReConcile还灵活地结合了不同组合的代理，包括基于API的、开源的和领域特定的模型，在数学上提高了8％。最后，我们分析了ReConcile的各个组成部分，表明源自不同模型的多样性对其卓越性能至关重要。代码：https://github.com/dinobby/ReConcile

更新时间: 2024-06-21 19:34:27

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.13007v3

MOUNTAINEER: Topology-Driven Visual Analytics for Comparing Local Explanations

With the increasing use of black-box Machine Learning (ML) techniques in critical applications, there is a growing demand for methods that can provide transparency and accountability for model predictions. As a result, a large number of local explainability methods for black-box models have been developed and popularized. However, machine learning explanations are still hard to evaluate and compare due to the high dimensionality, heterogeneous representations, varying scales, and stochastic nature of some of these methods. Topological Data Analysis (TDA) can be an effective method in this domain since it can be used to transform attributions into uniform graph representations, providing a common ground for comparison across different explanation methods. We present a novel topology-driven visual analytics tool, Mountaineer, that allows ML practitioners to interactively analyze and compare these representations by linking the topological graphs back to the original data distribution, model predictions, and feature attributions. Mountaineer facilitates rapid and iterative exploration of ML explanations, enabling experts to gain deeper insights into the explanation techniques, understand the underlying data distributions, and thus reach well-founded conclusions about model behavior. Furthermore, we demonstrate the utility of Mountaineer through two case studies using real-world data. In the first, we show how Mountaineer enabled us to compare black-box ML explanations and discern regions of and causes of disagreements between different explanations. In the second, we demonstrate how the tool can be used to compare and understand ML models themselves. Finally, we conducted interviews with three industry experts to help us evaluate our work.

Updated: 2024-06-21 19:28:50

标题: 登山者：基于拓扑的可视分析工具，用于比较局部解释

摘要: 随着黑盒机器学习（ML）技术在关键应用中的不断增加使用，对于能够提供模型预测透明度和可追溯性的方法的需求也在增长。因此，已经开发并推广了大量针对黑盒模型的本地可解释性方法。然而，由于这些方法中存在高维度、异质表示、不同尺度和随机性，机器学习解释仍然难以评估和比较。拓扑数据分析（TDA）可以是这个领域的一种有效方法，因为它可以将属性转换为统一的图表示，提供了一个比较不同解释方法的共同基础。我们提出了一种新颖的基于拓扑的视觉分析工具Mountaineer，使ML从业者能够通过将拓扑图与原始数据分布、模型预测和特征归因相连，交互式地分析和比较这些表示。Mountaineer促进了ML解释的快速和迭代探索，使专家能够更深入地了解解释技术，理解基础数据分布，从而对模型行为做出有根据的结论。此外，我们通过两个使用真实数据的案例研究展示了Mountaineer的实用性。在第一个案例中，我们展示了如何通过Mountaineer比较黑盒ML解释，并区分不同解释之间的分歧原因和区域。在第二个案例中，我们展示了该工具如何用于比较和理解ML模型本身。最后，我们还进行了与三位行业专家的访谈，以帮助我们评估我们的工作。

更新时间: 2024-06-21 19:28:50

领域: cs.LG,cs.GR,math.AT

下载: http://arxiv.org/abs/2406.15613v1

Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.

Updated: 2024-06-21 19:27:46

标题: 灾难风险感知的基于极值理论的策略梯度的强化学习

摘要: 本文讨论了在顺序决策过程中缓解灾难性风险（即频率非常低但严重程度非常高的风险）的问题。由于在累积成本（负回报）分布的尾部观测稀缺，这个问题特别具有挑战性。我们开发了一种名为POTPG的策略梯度算法，该算法基于极值理论导出的尾风险的近似值。数值实验突出了我们的方法在依赖经验分布的常见基准上的表现优越性。文章还介绍了该方法在金融风险管理中的应用，更具体地说是对金融期权的动态套期保值。

更新时间: 2024-06-21 19:27:46

领域: cs.LG,q-fin.RM

下载: http://arxiv.org/abs/2406.15612v1

Automated radiotherapy treatment planning guided by GPT-4Vision

Radiotherapy treatment planning is a time-consuming and potentially subjective process that requires the iterative adjustment of model parameters to balance multiple conflicting objectives. Recent advancements in large foundation models offer promising avenues for addressing the challenges in planning and clinical decision-making. This study introduces GPT-RadPlan, a fully automated treatment planning framework that harnesses prior radiation oncology knowledge encoded in multi-modal large language models, such as GPT-4Vision (GPT-4V) from OpenAI. GPT-RadPlan is made aware of planning protocols as context and acts as an expert human planner, capable of guiding a treatment planning process. Via in-context learning, we incorporate clinical protocols for various disease sites as prompts to enable GPT-4V to acquire treatment planning domain knowledge. The resulting GPT-RadPlan agent is integrated into our in-house inverse treatment planning system through an API. The efficacy of the automated planning system is showcased using multiple prostate and head & neck cancer cases, where we compared GPT-RadPlan results to clinical plans. In all cases, GPT-RadPlan either outperformed or matched the clinical plans, demonstrating superior target coverage and organ-at-risk sparing. Consistently satisfying the dosimetric objectives in the clinical protocol, GPT-RadPlan represents the first multimodal large language model agent that mimics the behaviors of human planners in radiation oncology clinics, achieving remarkable results in automating the treatment planning process without the need for additional training.

Updated: 2024-06-21 19:23:03

标题: 由GPT-4Vision指导的自动化放疗治疗计划

摘要: 放射治疗计划是一个耗时且潜在主观的过程，需要迭代调整模型参数以平衡多个冲突的目标。大型基础模型的最新进展为解决规划和临床决策中的挑战提供了有希望的途径。本研究介绍了GPT-RadPlan，这是一个完全自动化的治疗计划框架，利用编码在多模态大型语言模型中的放射肿瘤学知识，例如OpenAI的GPT-4Vision（GPT-4V）。GPT-RadPlan意识到规划协议作为上下文，并扮演专家人类规划者的角色，能够引导治疗计划过程。通过上下文学习，我们将各种疾病部位的临床协议作为提示，使GPT-4V能够获得治疗计划领域知识。生成的GPT-RadPlan代理通过API集成到我们的内部逆向治疗计划系统中。自动化规划系统的有效性通过多个前列腺和头颈癌症案例展示，我们将GPT-RadPlan的结果与临床计划进行比较。在所有案例中，GPT-RadPlan要么表现出色，要么与临床计划相匹配，表现出优越的目标覆盖和器官风险保护。始终满足临床协议中的剂量目标，GPT-RadPlan代表了第一个模仿放射肿瘤学临床中人类规划者行为的多模态大型语言模型代理，实现了在不需要额外培训的情况下自动化治疗计划过程的卓越结果。

更新时间: 2024-06-21 19:23:03

领域: physics.med-ph,cs.AI

下载: http://arxiv.org/abs/2406.15609v1

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

We study how to subvert language models from following the rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. We prove that although transformers can faithfully abide by such rules, maliciously crafted prompts can nevertheless mislead even theoretically constructed models. Empirically, we find that attacks on our theoretical models mirror popular attacks on large language models. Our work suggests that studying smaller theoretical models can help understand the behavior of large language models in rule-based settings like logical reasoning and jailbreak attacks.

Updated: 2024-06-21 19:18:16

标题: Logicbreaks: 一个用于理解基于规则推理颠覆的框架

摘要: 我们研究如何颠覆语言模型遵循规则。我们将遵循规则建模为命题Horn逻辑中的推理，这是一个数学系统，其中规则的形式为“如果$P$和$Q$，则$R$”，其中$P$，$Q$和$R$是一些命题。我们证明，虽然transformers可以忠实地遵守这些规则，但恶意制作的提示仍然可以误导甚至理论上构建的模型。在实证研究中，我们发现对我们的理论模型的攻击反映了对大型语言模型的流行攻击。我们的工作表明，研究较小的理论模型可以帮助理解大型语言模型在基于规则的设置（如逻辑推理和越狱攻击）中的行为。

更新时间: 2024-06-21 19:18:16

领域: cs.AI,cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.00075v1

How to train your VAE

Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning. This paper explores a nuanced aspect of VAEs, focusing on interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) that governs the trade-off between reconstruction accuracy and regularization. Meanwhile, the KL Divergence enforces alignment between latent variable distributions and a prior imposing a structure on the overall latent space but leaves individual variable distributions unconstrained. The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term to prevent variance collapse, and employs a PatchGAN discriminator to enhance texture realism. Implementation details involve ResNetV2 architectures for both the Encoder and Decoder. The experiments demonstrate the ability to generate realistic faces, offering a promising solution for enhancing VAE-based generative models.

Updated: 2024-06-21 19:15:54

标题: 如何训练您的VAE

摘要: 变分自动编码器（VAEs）已成为机器学习中生成建模和表示学习的基石。本文探讨了VAEs的一个微妙方面，重点在于解释Kullback-Leibler（KL）散度，这是Evidence Lower Bound（ELBO）中的一个关键组成部分，它控制重建精度和正则化之间的权衡。与此同时，KL散度强制在潜变量分布和先验之间强加结构，但留下个别变量分布不受限制。所提出的方法重新定义了后验概率的ELBO，引入了一个正则项以防止方差崩溃，并采用PatchGAN鉴别器来增强纹理的真实感。实施细节涉及ResNetV2架构用于编码器和解码器。实验证明了生成逼真面孔的能力，为增强基于VAE的生成模型提供了一个有前途的解决方案。

更新时间: 2024-06-21 19:15:54

领域: cs.LG,cs.AI,cs.CV,68T07,I.2.4; I.4.5

下载: http://arxiv.org/abs/2309.13160v3

QuADTool: Attack-Defense-Tree Synthesis, Analysis and Bridge to Verification

Ranking risks and countermeasures is one of the foremost goals of quantitative security analysis. One of the popular frameworks, used also in industrial practice, for this task are attack-defense trees. Standard quantitative analyses available for attack-defense trees can distinguish likely from unlikely vulnerabilities. We provide a tool that allows for easy synthesis and analysis of those models, also featuring probabilities, costs and time. Furthermore, it provides a variety of interfaces to existing model checkers and analysis tools. Unfortunately, currently available tools rely on precise quantitative inputs (probabilities, timing, or costs of attacks), which are rarely available. Instead, only statistical, imprecise information is typically available, leaving us with probably approximately correct (PAC) estimates of the real quantities. As a part of our tool, we extend the standard analysis techniques so they can handle the PAC input and yield rigorous bounds on the imprecision and uncertainty of the final result of the analysis.

Updated: 2024-06-21 19:14:27

标题: QuADTool：攻击-防御树合成、分析和验证桥梁

摘要: 排名风险和对策是定量安全分析的首要目标之一。用于此任务的流行框架之一是攻击-防御树，该框架在工业实践中也被广泛使用。攻击-防御树的标准定量分析可以区分可能性和不太可能性的漏洞。我们提供了一个工具，可以轻松合成和分析这些模型，还包括概率、成本和时间。此外，它提供了各种界面，用于现有的模型检查器和分析工具。不幸的是，当前可用的工具依赖于准确的定量输入（攻击的概率、时间或成本），这些输入很少可用。相反，通常只有统计的、不精确的信息可用，使我们仅能得到对真实数量的可能近似正确（PAC）估计。作为我们工具的一部分，我们扩展了标准分析技术，使其能够处理PAC输入，并对最终分析结果的不确定性和不精确性产生严格的界限。

更新时间: 2024-06-21 19:14:27

领域: cs.CR

下载: http://arxiv.org/abs/2406.15605v1

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct an extensive human study on GenAI-Bench to evaluate the performance of leading image and video generation models in various aspects of compositional text-to-visual generation. We also compare automated evaluation metrics against our collected human ratings and find that VQAScore -- a metric measuring the likelihood that a VQA model views an image as accurately depicting the prompt -- significantly outperforms previous metrics such as CLIPScore. In addition, VQAScore can improve generation in a black-box manner (without finetuning) via simply ranking a few (3 to 9) candidate images. Ranking by VQAScore is 2x to 3x more effective than other scoring methods like PickScore, HPSv2, and ImageReward at improving human alignment ratings for DALL-E 3 and Stable Diffusion, especially on compositional prompts that require advanced visio-linguistic reasoning. We will release a new GenAI-Rank benchmark with over 40,000 human ratings to evaluate scoring metrics on ranking images generated from the same prompt. Lastly, we discuss promising areas for improvement in VQAScore, such as addressing fine-grained visual details. We will release all human ratings (over 80,000) to facilitate scientific benchmarking of both generative models and automated metrics.

Updated: 2024-06-21 19:09:36

标题: GenAI-Bench：评估和改进组合式文本到视觉生成

摘要: 尽管文本到视觉模型现在能够生成逼真的图像和视频，但它们在涉及属性、关系和逻辑等高阶推理的组合文本提示方面仍然存在困难。在这项工作中，我们在GenAI-Bench上进行了一项广泛的人类研究，评估了领先的图像和视频生成模型在组合文本到视觉生成各个方面的表现。我们还将自动评估指标与我们收集的人类评分进行比较，并发现VQAScore - 一种衡量VQA模型是否准确查看图像的指标 - 在显著优于以前的指标如CLIPScore。此外，VQAScore可以通过简单地对几个（3到9个）候选图像进行排序来以黑盒方式（无需微调）改进生成。与其他评分方法如PickScore、HPSv2和ImageReward相比，根据VQAScore进行排序对于改善人类对DALL-E 3和Stable Diffusion的对齐评分尤其有效，特别是对于需要高级视觉语言推理的组合提示。我们将发布一个新的GenAI-Rank基准，其中包含超过40,000个人类评分，用于评估从相同提示生成的图像的排名指标。最后，我们讨论了VQAScore的改进前景，如解决细粒度的视觉细节。我们将发布所有人类评分（超过80,000个）以促进对生成模型和自动度量进行科学基准测试。

更新时间: 2024-06-21 19:09:36

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2406.13743v2

Conditional score-based diffusion models for solving inverse problems in mechanics

We propose a framework to perform Bayesian inference using conditional score-based diffusion models to solve a class of inverse problems in mechanics involving the inference of a specimen's spatially varying material properties from noisy measurements of its mechanical response to loading. Conditional score-based diffusion models are generative models that learn to approximate the score function of a conditional distribution using samples from the joint distribution. More specifically, the score functions corresponding to multiple realizations of the measurement are approximated using a single neural network, the so-called score network, which is subsequently used to sample the posterior distribution using an appropriate Markov chain Monte Carlo scheme based on Langevin dynamics. Training the score network only requires simulating the forward model. Hence, the proposed approach can accommodate black-box forward models and complex measurement noise. Moreover, once the score network has been trained, it can be re-used to solve the inverse problem for different realizations of the measurements. We demonstrate the efficacy of the proposed approach on a suite of high-dimensional inverse problems in mechanics that involve inferring heterogeneous material properties from noisy measurements. Some examples we consider involve synthetic data, while others include data collected from actual elastography experiments. Further, our applications demonstrate that the proposed approach can handle different measurement modalities, complex patterns in the inferred quantities, non-Gaussian and non-additive noise models, and nonlinear black-box forward models. The results show that the proposed framework can solve large-scale physics-based inverse problems efficiently.

Updated: 2024-06-21 19:01:31

标题: Mechanics中解决逆问题的基于条件分数扩散模型

摘要: 我们提出了一个框架，使用条件得分基础扩散模型进行贝叶斯推断，以解决力学中涉及从材料响应的噪声测量中推断试样空间变化材料特性的一类逆问题。条件得分基础扩散模型是生成模型，通过从联合分布中采样来学习近似条件分布的得分函数。更具体地，多次测量的得分函数使用一个神经网络来近似，即所谓的得分网络，然后使用基于朗格朗日动力学的适当马尔可夫链蒙特卡罗方案来对后验分布进行采样。训练得分网络只需要模拟正向模型。因此，该方法可以适应黑盒正向模型和复杂测量噪声。此外，一旦得分网络训练完成，就可以重复使用它来解决不同测量实现的逆问题。我们在涉及从噪声测量中推断异质材料特性的高维力学逆问题套件上展示了所提出方法的有效性。我们考虑的一些示例涉及合成数据，而其他示例包括从实际弹性成像实验中收集的数据。此外，我们的应用展示了所提出方法可以处理不同的测量模态、推断数量中的复杂模式、非高斯和非加性噪声模型，以及非线性黑盒正向模型。结果表明，所提出的框架可以高效地解决大规模基于物理的逆问题。

更新时间: 2024-06-21 19:01:31

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13154v2

Pareto-Optimal Learning from Preferences with Hidden Context

Ensuring AI models align with human values is essential for their safety and functionality. Reinforcement learning from human feedback (RLHF) uses human preferences to achieve this alignment. However, preferences sourced from diverse populations can result in point estimates of human values that may be sub-optimal or unfair to specific groups. We propose Pareto Optimal Preference Learning (POPL), which frames discrepant group preferences as objectives with potential trade-offs, aiming for policies that are Pareto-optimal on the preference dataset. POPL utilizes Lexicase selection, an iterative process to select diverse and Pareto-optimal solutions. Our empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions, effectively catering to distinct groups without access to group numbers or membership labels. Furthermore, we illustrate that POPL can serve as a foundation for techniques optimizing specific notions of group fairness, ensuring inclusive and equitable AI model alignment.

Updated: 2024-06-21 18:57:38

标题: 帕累托最优学习：从隐藏的背景中学习偏好

摘要: 确保人工智能模型与人类价值观一致对于它们的安全性和功能性至关重要。通过人类反馈进行强化学习（RLHF）利用人类偏好来实现这种一致性。然而，来自不同人群的偏好可能导致人类价值的点估计可能不够优化或对特定群体不公平。我们提出了帕累托最优偏好学习（POPL），将不同群体偏好作为潜在权衡的目标，旨在获得在偏好数据集上帕累托最优的政策。POPL利用Lexicase选择，这是一个选择多样和帕累托最优解的迭代过程。我们的实证评估表明，POPL在学习奖励函数集方面超过了基准方法，有效地满足不同群体的需求，而无需访问群体数量或成员标签。此外，我们展示了POPL可以作为优化特定群体公平概念的技术基础，确保人工智能模型对齐具有包容性和公平性。

更新时间: 2024-06-21 18:57:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.15599v1

DiVerify: Diversifying Identity Verification in Next-Generation Software Signing

Code signing enables software developers to digitally sign their code using cryptographic keys, thereby associating the code to their identity. This allows users to verify the authenticity and integrity of the software, ensuring it has not been tampered with. Next-generation software signing such as Sigstore and OpenPubKey simplify code signing by providing streamlined mechanisms to verify and link signer identities to the public key. However, their designs have vulnerabilities: reliance on an identity provider introduces a single point of failure, and the failure to follow the principle of least privilege on the client side increases security risks. We introduce Diverse Identity Verification (DiVerify) scheme, which strengthens the security guarantees of next-generation software signing by leveraging threshold identity validations and scope mechanisms. We formalize a general definition of diverse verification scope and how it applies to next-generation software signing solutions, enabling clients to protect themselves from the impact of a compromised identity provider and help identity providers minimize the impact of compromised clients. As proof of concept, we implement DiVerify in the Sigstore ecosystem and evaluate the security improvements. By using fine-grained access control mechanisms and implementing threshold validations over account signing capabilities, we demonstrate that signing tools can protect themselves against threats from compromised identity providers and malicious signing clients.

Updated: 2024-06-21 18:53:52

标题: DiVerify：在下一代软件签名中实现身份验证的多样化

摘要: 代码签名使软件开发人员能够使用加密密钥对其代码进行数字签名，从而将代码与其身份关联起来。这使用户能够验证软件的真实性和完整性，确保其未被篡改。下一代软件签名，如Sigstore和OpenPubKey，通过提供简化的机制来验证和将签名者身份与公钥关联，简化了代码签名。然而，它们的设计存在漏洞：依赖身份提供者会引入单点故障，并且在客户端未遵循最小权限原则增加了安全风险。我们引入了多样化身份验证（DiVerify）方案，通过利用阈值身份验证和范围机制，增强了下一代软件签名的安全性保证。我们形式化了多样化验证范围的一般定义以及它如何适用于下一代软件签名解决方案，使客户能够保护自己免受身份提供者受损的影响，并帮助身份提供者最小化受损客户的影响。作为概念验证，我们在Sigstore生态系统中实施了DiVerify并评估了安全改进。通过使用精细粒度的访问控制机制，并在帐户签名功能上实施阈值验证，我们证明签名工具可以保护自己免受受损身份提供者和恶意签名客户的威胁。

更新时间: 2024-06-21 18:53:52

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.15596v1

Detecting and Classifying Flares in High-Resolution Solar Spectra with Supervised Machine Learning

Flares are a well-studied aspect of the Sun's magnetic activity. Detecting and classifying solar flares can inform the analysis of contamination caused by stellar flares in exoplanet transmission spectra. In this paper, we present a standardized procedure to classify solar flares with the aid of supervised machine learning. Using flare data from the RHESSI mission and solar spectra from the HARPS-N instrument, we trained several supervised machine learning models, and found that the best performing algorithm is a C-Support Vector Machine (SVC) with non-linear kernels, specifically Radial Basis Functions (RBF). The best-trained model, SVC with RBF kernels, achieves an average aggregate accuracy score of 0.65, and categorical accuracy scores of over 0.70 for the no-flare and weak-flare classes, respectively. In comparison, a blind classification algorithm would have an accuracy score of 0.33. Testing showed that the model is able to detect and classify solar flares in entirely new data with different characteristics and distributions from those of the training set. Future efforts could focus on enhancing classification accuracy, investigating the efficacy of alternative models, particularly deep learning models, and incorporating more datasets to extend the application of this framework to stars that host exoplanets.

Updated: 2024-06-21 18:52:03

标题: 使用监督式机器学习在高分辨率太阳光谱中检测和分类耀斑

摘要: 耀斑是太阳磁活动的一个研究重点。检测和分类太阳耀斑可以帮助分析恒星耀斑在系外行星传输光谱中造成的污染。本文介绍了一种标准化的程序，利用监督机器学习来分类太阳耀斑。利用RHESSI任务的耀斑数据和HARPS-N仪器的太阳光谱，我们训练了多个监督机器学习模型，发现表现最佳的算法是具有非线性核的C-支持向量机（SVC），特别是径向基函数（RBF）。最佳训练模型，SVC与RBF核，实现了平均聚合准确度得分为0.65，并分别获得了无耀斑和弱耀斑类别的准确度得分超过0.70。相比之下，盲目分类算法的准确度得分为0.33。测试表明，该模型能够检测和分类完全不同于训练集特征和分布的全新数据中的太阳耀斑。未来的工作可以集中在提高分类准确度，研究替代模型的有效性，特别是深度学习模型，并整合更多数据集，以将此框架应用于托管系外行星的恒星。

更新时间: 2024-06-21 18:52:03

领域: astro-ph.SR,astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2406.15594v1

Ten Years of ZMap

Since ZMap's debut in 2013, networking and security researchers have used the open-source scanner to write hundreds of research papers that study Internet behavior. In addition, ZMap powers much of the attack-surface management and security ratings industries, and more than a dozen security companies have built products on top of ZMap. Behind the scenes, much of ZMap's behavior - ranging from its pseudorandom IP generation to its packet construction - has quietly evolved as we have learned more about how to scan the Internet. In this work, we quantify ZMap's adoption over the ten years since its release, describe its modern behavior (and the measurements that motivated those changes), and offer lessons from releasing and maintaining ZMap.

Updated: 2024-06-21 18:40:57

标题: 十年的ZMap

摘要: 自从ZMap于2013年首次亮相以来，网络和安全研究人员已经使用这个开源扫描器撰写了数百篇研究互联网行为的论文。此外，ZMap还为攻击面管理和安全评分产业提供支持，超过十家安全公司已经在ZMap基础上开发了产品。在幕后，ZMap的许多行为，从伪随机IP生成到数据包构造，已经悄然进化，因为我们对如何扫描互联网有了更多了解。在这项工作中，我们量化了自ZMap发布以来十年来的采用情况，描述了其现代行为（以及促使这些变化的测量），并提供了发布和维护ZMap的经验教训。

更新时间: 2024-06-21 18:40:57

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2406.15585v1

Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various sampling-based and historical-embedding-based methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketch-based algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensor-sketch (PTS) theory, our framework provides a novel protocol for sketching non-linear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality-sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on large-graph benchmarks demonstrate the scalability and competitive performance of our Sketch-GNNs versus their full-size GNN counterparts.

Updated: 2024-06-21 18:22:11

标题: Sketch-GNN: 具有亚线性训练复杂性的可扩展图神经网络

摘要: 图神经网络（GNNs）被广泛应用于图学习问题，如节点分类。当将GNNs的基础图扩展到更大的规模时，我们不得不要么在完整图上进行训练并保持完整的图邻接和节点嵌入在内存中（这通常是不可行的），要么对图进行小批量采样（这会导致与GNN层的数量成指数增长的计算复杂度）。提出了各种基于采样和基于历史嵌入的方法来避免这种复杂度的指数增长。然而，这些解决方案都没有消除对图大小的线性依赖。本文提出了一种基于草图的算法，通过在少量紧凑的图邻接和节点嵌入草图上训练GNNs，使其训练时间和内存与图大小呈次线性增长。基于多项式张量草图（PTS）理论，我们的框架为在GNNs中草绘非线性激活和图卷积矩阵提供了一种新颖的协议，与现有方法不同，现有方法在神经网络中草绘线性权重或梯度。此外，我们开发了一种局部敏感哈希（LSH）技术，可以通过训练来改进草图的质量。在大图基准上的实验表明，相对于其完整大小的GNN对应物，我们的草图-GNN具有良好的可扩展性和竞争性能。

更新时间: 2024-06-21 18:22:11

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.15575v1

Discovering influential text using convolutional neural networks

Experimental methods for estimating the impacts of text on human evaluation have been widely used in the social sciences. However, researchers in experimental settings are usually limited to testing a small number of pre-specified text treatments. While efforts to mine unstructured texts for features that causally affect outcomes have been ongoing in recent years, these models have primarily focused on the topics or specific words of text, which may not always be the mechanism of the effect. We connect these efforts with NLP interpretability techniques and present a method for flexibly discovering clusters of similar text phrases that are predictive of human reactions to texts using convolutional neural networks. When used in an experimental setting, this method can identify text treatments and their effects under certain assumptions. We apply the method to two datasets. The first enables direct validation of the model's ability to detect phrases known to cause the outcome. The second demonstrates its ability to flexibly discover text treatments with varying textual structures. In both cases, the model learns a greater variety of text treatments compared to benchmark methods, and these text features quantitatively meet or exceed the ability of benchmark methods to predict the outcome.

Updated: 2024-06-21 18:14:42

标题: 使用卷积神经网络发现有影响力的文本

摘要: 实验方法用于估计文本对人类评价的影响在社会科学中被广泛使用。然而，在实验设置中，研究人员通常仅限于测试少量预先指定的文本处理方式。尽管近年来一直在努力挖掘非结构化文本中导致结果的特征，但这些模型主要关注文本的主题或特定单词，这可能并不总是效果的机制。我们将这些努力与自然语言处理可解释性技术联系起来，并提出一种通过卷积神经网络灵活发现预测人类对文本反应的相似文本短语聚类的方法。在实验设置中使用时，该方法可以识别文本处理方式及其效果在某些假设下。我们将该方法应用于两个数据集。第一个数据集可直接验证模型检测导致结果的短语的能力。第二个数据集展示了其发现具有不同文本结构的文本处理方式的能力。在这两种情况下，与基准方法相比，该模型学习了更多种类的文本处理方式，并且这些文本特征在定量上达到或超过了基准方法预测结果的能力。

更新时间: 2024-06-21 18:14:42

领域: cs.CL,cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.10086v2

DEM: Distribution Edited Model for Training with Mixed Data Distributions

Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a sub-optimal performance across data sources and require multiple expensive training runs. In this paper, we propose a simple and efficient alternative for better optimization of the data sources by combining models individually trained on each data source with the base model using basic element-wise vector operations. The resulting model, namely Distribution Edited Model (DEM), is 11x cheaper than standard data mixing and outperforms strong baselines on a variety of benchmarks, yielding up to 6.2% improvement on MMLU, 11.5% on BBH, 16.1% on DROP, and 9.3% on HELM with models of size 3B to 13B. Notably, DEM does not require full re-training when modifying a single data-source, thus making it very flexible and scalable for training with diverse data sources.

Updated: 2024-06-21 18:07:46

标题: DEM：用于训练混合数据分布的分布编辑模型

摘要: 训练混合数据分布是创建多任务和遵循指令模型的常见且重要的部分。数据分布的多样性和联合训练的成本使优化过程极具挑战性。数据混合方法在一定程度上解决了这个问题，尽管在数据源之间表现出次优性能，并且需要多次昂贵的训练运行。在本文中，我们提出了一种简单而高效的替代方案，通过基本的逐元素向量操作将在每个数据源上单独训练的模型与基础模型结合，以更好地优化数据源。结果模型，即分布编辑模型（DEM），比标准数据混合便宜11倍，并在各种基准测试中表现优异，使MMLU提高了6.2％，BBH提高了11.5％，DROP提高了16.1％，HELM提高了9.3％，且模型规模为3B至13B。值得注意的是，DEM在修改单个数据源时不需要完全重新训练，因此非常灵活和可扩展，适用于训练多样化的数据源。

更新时间: 2024-06-21 18:07:46

领域: cs.CL,cs.LG,68T50,F.2.2; I.2.7

下载: http://arxiv.org/abs/2406.15570v1

Robust Reinforcement Learning from Corrupted Human Feedback

Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data. For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels. To tackle this challenge, we propose a robust RLHF approach -- $R^3M$, which models the potentially corrupted preference label as sparse outliers. Accordingly, we formulate the robust reward learning as an $\ell_1$-regularized maximum likelihood estimation problem. Computationally, we develop an efficient alternating optimization algorithm, which only incurs negligible computational overhead compared with the standard RLHF approach. Theoretically, we prove that under proper regularity conditions, $R^3M$ can consistently learn the underlying reward and identify outliers, provided that the number of outlier labels scales sublinearly with the preference sample size. Furthermore, we remark that $R^3M$ is versatile and can be extended to various preference optimization methods, including direct preference optimization (DPO). Our experiments on robotic control and natural language generation with large language models (LLMs) show that $R^3M$ improves robustness of the reward against several types of perturbations to the preference data.

Updated: 2024-06-21 18:06:30

标题: 受损人类反馈的强化学习的稳健性

摘要: 人类反馈的强化学习（RLHF）提供了一个原则性框架，用于将人工智能系统与人类偏好数据对齐。由于个人偏见、环境模糊、缺乏训练等各种原因，人类注释者可能会给出不正确或不一致的偏好标签。为了解决这一挑战，我们提出了一种强大的RLHF方法 - $R^3M$，将潜在受损的偏好标签建模为稀疏的异常值。因此，我们将鲁棒的奖励学习形式化为一个$\ell_1$正则化的最大似然估计问题。在计算上，我们开发了一种高效的交替优化算法，与标准的RLHF方法相比只会产生微不足道的计算开销。从理论上讲，我们证明了在适当的正则条件下，$R^3M$可以持续学习潜在的奖励并识别异常值，前提是异常值标签的数量与偏好样本大小的规模呈亚线性关系。此外，我们指出$R^3M$是多功能的，并可以扩展到各种偏好优化方法，包括直接偏好优化（DPO）。我们在机器人控制和自然语言生成以及大型语言模型（LLMs）上的实验表明，$R^3M$提高了对偏好数据多种干扰的奖励的鲁棒性。

更新时间: 2024-06-21 18:06:30

领域: cs.LG

下载: http://arxiv.org/abs/2406.15568v1

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation and suffers from distribution shift issues. To address this, we establish that online LLM alignment is underpinned by bilevel optimization. By reducing this formulation to an efficient single-level first-order method (using the reward-policy equivalence), our approach generates new samples and iteratively refines model alignment by exploring responses and regulating preference labels. In doing so, we permit alignment methods to operate in an online and self-improving manner, as well as generalize prior online RLHF methods as special cases. Compared to state-of-the-art iterative RLHF methods, our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead.

Updated: 2024-06-21 18:05:35

标题: SAIL: 自我改进的高效在线对齐大型语言模型

摘要: 人类反馈强化学习（RLHF）是将大型语言模型（LLMs）与人类偏好对齐的关键方法。然而，当前的离线对齐方法如DPO、IPO和SLiC过度依赖固定偏好数据集，可能导致次优性能。另一方面，最近的文献着重设计在线RLHF方法，但仍缺乏统一的概念表述，并存在分布转移问题。为了解决这个问题，我们确定在线LLM对齐是基于双层优化的。通过将这个表述简化为一个高效的单层一阶方法（使用奖励-策略等价性），我们的方法通过探索响应和调节偏好标签生成新样本，并通过迭代地完善模型对齐。通过这样做，我们允许对齐方法以在线和自我改进的方式运作，并将先前的在线RLHF方法推广为特例。与最先进的迭代RLHF方法相比，我们的方法在开源数据集上显著提高了对齐性能，且计算开销最小。

更新时间: 2024-06-21 18:05:35

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2406.15567v1

Unseen Object Reasoning with Shared Appearance Cues

This paper introduces an innovative approach to open world recognition (OWR), where we leverage knowledge acquired from known objects to address the recognition of previously unseen objects. The traditional method of object modeling relies on supervised learning with strict closed-set assumptions, presupposing that objects encountered during inference are already known at the training phase. However, this assumption proves inadequate for real-world scenarios due to the impracticality of accounting for the immense diversity of objects. Our hypothesis posits that object appearances can be represented as collections of "shareable" mid-level features, arranged in constellations to form object instances. By adopting this framework, we can efficiently dissect and represent both known and unknown objects in terms of their appearance cues. Our paper introduces a straightforward yet elegant method for modeling novel or unseen objects, utilizing established appearance cues and accounting for inherent uncertainties. This representation not only enables the detection of out-of-distribution objects or novel categories among unseen objects but also facilitates a deeper level of reasoning, empowering the identification of the superclass to which an unknown instance belongs. This novel approach holds promise for advancing open world recognition in diverse applications.

Updated: 2024-06-21 18:04:13

标题: 利用共享外观线索进行未见物体推理

摘要: 本文介绍了一种创新的开放世界识别（OWR）方法，我们利用从已知对象中获取的知识来处理以前未见过的对象的识别。传统的对象建模方法依赖于具有严格封闭集假设的监督学习，假设在推断过程中遇到的对象在训练阶段已经被了解。然而，这种假设在真实世界场景中被证明是不足的，因为考虑到对象的巨大多样性是不切实际的。我们的假设认为，对象的外观可以表示为“可共享”的中级特征集合，这些特征集合以星座的形式组成对象实例。通过采用这种框架，我们可以高效地解剖和表示已知和未知对象的外观线索。我们的论文介绍了一种简单而优雅的方法，用于建模新颖或未见过的对象，利用已建立的外观线索并考虑固有的不确定性。这种表示不仅能够检测未知对象或新类别中的分布外对象，还能促进更深层次的推理，从而使识别未知实例所属的超类成为可能。这种新颖的方法有望推动开放世界识别在各种应用中取得进展。

更新时间: 2024-06-21 18:04:13

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.15565v1

The Fragility of Optimized Bandit Algorithms

Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the regret distribution of the associated algorithms necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for $p>1$, the $p$'th moment of the regret distribution grows much faster than poly-logarithmically, in particular as a power of the total number of arm plays. We show that optimized UCB bandit designs are also fragile in an additional sense, namely when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is sub-optimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification. In doing so, we also show a sharp trade-off between the amount of UCB exploration and the heaviness of the resulting regret distribution tail.

Updated: 2024-06-21 18:01:17

标题: 优化的赌博算法的脆弱性

摘要: 很多关于赌博算法最佳设计的文献都是基于最小化期望遗憾。众所周知，某些指数族上的最佳设计可以实现期望遗憾随着臂次数的对数增长，增长速率由Lai-Robbins下界控制。在本文中，我们展示了当使用这种优化设计时，相关算法的遗憾分布必然具有非常重的尾部，具体而言，是一个截尾柯西分布。此外，对于$p>1$，遗憾分布的$p$阶矩增长远远快于多对数增长，特别是作为臂次数的幂。我们展示了优化的UCB赌博设计在另一个方面也是脆弱的，即当问题略微错误时，遗憾可能增长得比传统理论所暗示的更快。我们的论点基于标准的测度变换思想，并表明遗憾超出预期的最有可能方式是当最佳臂在最初的几次臂播放中返回低于平均奖励时，从而导致算法认为该臂是次优的。为了缓解暴露出的脆弱性问题，我们展示了UCB算法可以被修改以确保对错规范的所需程度的鲁棒性。在这样做的过程中，我们还展示了UCB探索量和结果遗憾分布尾部的重量之间的明显权衡。

更新时间: 2024-06-21 18:01:17

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2109.13595v7

NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

Benchmarking vision-based driving policies is challenging. On one hand, open-loop evaluation with real data is easy, but these results do not reflect closed-loop performance. On the other, closed-loop evaluation is possible in simulation, but is hard to scale due to its significant computational demands. Further, the simulators available today exhibit a large domain gap to real data. This has resulted in an inability to draw clear conclusions from the rapidly growing body of research on end-to-end autonomous driving. In this paper, we present NAVSIM, a middle ground between these evaluation paradigms, where we use large datasets in combination with a non-reactive simulator to enable large-scale real-world benchmarking. Specifically, we gather simulation-based metrics, such as progress and time to collision, by unrolling bird's eye view abstractions of the test scenes for a short simulation horizon. Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other. As we demonstrate empirically, this decoupling allows open-loop metric computation while being better aligned with closed-loop evaluations than traditional displacement errors. NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights. On a large set of challenging scenarios, we observe that simple methods with moderate compute requirements such as TransFuser can match recent large-scale end-to-end driving architectures such as UniAD. Our modular framework can potentially be extended with new datasets, data curation strategies, and metrics, and will be continually maintained to host future challenges. Our code is available at https://github.com/autonomousvision/navsim.

Updated: 2024-06-21 17:59:02

标题: NAVSIM：基于数据驱动的非反应式自主车辆仿真和基准测试

摘要: 基于视觉的驾驶策略的基准测试具有挑战性。一方面，使用真实数据进行开环评估很容易，但这些结果并不反映闭环性能。另一方面，在模拟中进行闭环评估是可能的，但由于其巨大的计算需求，很难扩展。此外，今天可用的模拟器与真实数据存在较大的领域差距。这导致我们无法从不断增长的端到端自动驾驶研究成果中得出明确结论。在本文中，我们提出了NAVSIM，这是这些评估范式之间的一个中间地带，我们利用大型数据集与非反应性模拟器相结合，实现大规模真实世界基准测试。具体来说，我们通过展开测试场景的鸟瞰抽象，收集基于模拟的度量，如进度和碰撞时间，以进行短期模拟。我们的模拟是非反应性的，即评估的策略和环境不会相互影响。正如我们在实证中证明的那样，这种解耦允许进行开环度量计算，同时与传统位移误差相比更符合闭环评估。NAVSIM在CVPR 2024年举办了一场新的竞赛，共有143支队伍提交了463份作品，得出了一些新的见解。在大量具有挑战性的场景中，我们观察到，像TransFuser这样具有适度计算要求的简单方法可以与最近的大规模端到端驾驶架构（如UniAD）相匹配。我们的模块化框架可能会通过新的数据集、数据筛选策略和度量标准进行扩展，并将持续维护以举办未来的挑战。我们的代码可在https://github.com/autonomousvision/navsim获取。

更新时间: 2024-06-21 17:59:02

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.15349v1

Provable Guarantees for Model Performance via Mechanistic Interpretability

In this work, we propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.

Updated: 2024-06-21 17:58:28

标题: 通过机制可解释性证明模型性能的保证

摘要: 在这项工作中，我们提出使用机械解释性技术——将模型权重逆向工程为人类可解释的算法——来推导并简洁地证明模型性能的正式保证。我们通过正式证明151个小型变压器在最大-$K$任务上的准确性下限来原型化这种方法。我们创建了102种不同的计算机辅助证明策略，并评估它们在每个模型上的长度和紧密度。使用定量指标，我们发现较短的证明似乎需要并提供更多的机械理解。此外，我们发现更忠实的机械理解导致更紧密的性能界限。我们通过定性地检查我们的部分证据来确认这些联系。最后，我们确定结构噪声的复合是使用机械解释性生成模型性能紧凑证明的关键挑战。

更新时间: 2024-06-21 17:58:28

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2406.11779v4

Privacy Preserved Blood Glucose Level Cross-Prediction: An Asynchronous Decentralized Federated Learning Approach

Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant "cold start" problem in patient care. Utilizing population models to address this challenge is a potential solution, but collecting patient data for training population models in a privacy-conscious manner is challenging, especially given that such data is often stored on personal devices. Considering the privacy protection and addressing the "cold start" problem in diabetes care, we propose "GluADFL", blood Glucose prediction by Asynchronous Decentralized Federated Learning. We compared GluADFL with eight baseline methods using four distinct T1D datasets, comprising 298 participants, which demonstrated its superior performance in accurately predicting BG levels for cross-patient analysis. Furthermore, patients' data might be stored and shared across various communication networks in GluADFL, ranging from highly interconnected (e.g., random, performs the best among others) to more structured topologies (e.g., cluster and ring), suitable for various social networks. The asynchronous training framework supports flexible participation. By adjusting the ratios of inactive participants, we found it remains stable if less than 70% are inactive. Our results confirm that GluADFL offers a practical, privacy-preserving solution for BG prediction in T1D, significantly enhancing the quality of diabetes management.

Updated: 2024-06-21 17:57:39

标题: 隐私保护的血糖水平跨预测：一种异步分散式联邦学习方法

摘要: 新诊断的1型糖尿病（T1D）患者通常很难获得有效的血糖（BG）预测模型，因为缺乏来自连续血糖监测（CGM）的足够BG数据，这在患者护理中构成了一个重要的“冷启动”问题。利用人口模型来解决这一挑战是一个潜在的解决方案，但以尊重隐私的方式收集患者数据以训练人口模型是具有挑战性的，尤其是考虑到这些数据通常存储在个人设备上。考虑到隐私保护和解决糖尿病护理中的“冷启动”问题，我们提出了“GluADFL”，即通过异步分散式联邦学习进行血糖预测。我们使用四个不同的T1D数据集（共298名参与者）将GluADFL与八种基准方法进行比较，结果表明其在准确预测交叉患者分析的BG水平方面表现出色。此外，在GluADFL中，患者的数据可以在各种通信网络中存储和共享，从高度相互连接（例如，随机，在其他方法中表现最佳）到更结构化的拓扑结构（例如，集群和环），适用于各种社交网络。异步训练框架支持灵活的参与。通过调整不活跃参与者的比例，我们发现如果不活跃者少于70％，它将保持稳定。我们的结果证实，GluADFL为T1D中BG预测提供了实用的、保护隐私的解决方案，显著提高了糖尿病管理的质量。

更新时间: 2024-06-21 17:57:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.15346v1

GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians

Recent advancements in machine learning have significantly improved the identification of disease-associated genes from gene expression datasets. However, these processes often require extensive expertise and manual effort, limiting their scalability. Large Language Model (LLM)-based agents have shown promise in automating these tasks due to their increasing problem-solving abilities. To support the evaluation and development of such methods, we introduce GenoTEX, a benchmark dataset for the automatic exploration of gene expression data, involving the tasks of dataset selection, preprocessing, and statistical analysis. GenoTEX provides annotated code and results for solving a wide range of gene identification problems, in a full analysis pipeline that follows the standard of computational genomics. These annotations are curated by human bioinformaticians who carefully analyze the datasets to ensure accuracy and reliability. To provide baselines for these tasks, we present GenoAgents, a team of LLM-based agents designed with context-aware planning, iterative correction, and domain expert consultation to collaboratively explore gene datasets. Our experiments with GenoAgents demonstrate the potential of LLM-based approaches in genomics data analysis, while error analysis highlights the challenges and areas for future improvement. We propose GenoTEX as a promising resource for benchmarking and enhancing AI-driven methods for genomics data analysis. We make our benchmark publicly available at \url{https://github.com/Liu-Hy/GenoTex}.

Updated: 2024-06-21 17:55:24

标题: GenoTEX：评估基于LLM的基因表达数据探索与生物信息学家对齐的基准

摘要: 最近机器学习的进展显著提高了从基因表达数据集中识别与疾病相关的基因的能力。然而，这些过程通常需要广泛的专业知识和手工劳动，限制了它们的可扩展性。基于大型语言模型（LLM）的代理人已经显示出自动化这些任务的潜力，因为它们具有越来越强的问题解决能力。为了支持这些方法的评估和发展，我们引入了GenoTEX，一个用于自动探索基因表达数据的基准数据集，涉及数据集选择、预处理和统计分析任务。GenoTEX提供了解决各种基因识别问题的注释代码和结果，在一个遵循计算基因组学标准的完整分析流程中。这些注释由人类生物信息学家策划，他们仔细分析数据集以确保准确性和可靠性。为了为这些任务提供基准线，我们提出了GenoAgents，一个团队由基于LLM的代理人组成，设计有上下文感知规划、迭代修正和领域专家咨询，共同探索基因数据集。我们对GenoAgents的实验展示了基于LLM的方法在基因组数据分析中的潜力，而错误分析突显了未来改进的挑战和领域。我们提出GenoTEX作为一个有前途的资源，用于评估和增强基因组数据分析的人工智能驱动方法。我们将我们的基准数据公开发布在\url{https://github.com/Liu-Hy/GenoTex}。

更新时间: 2024-06-21 17:55:24

领域: cs.LG,cs.AI,q-bio.GN

下载: http://arxiv.org/abs/2406.15341v1

Image Conductor: Precision Control for Interactive Video Synthesis

Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/

Updated: 2024-06-21 17:55:05

标题: 图像导体：交互式视频合成的精密控制

摘要: 电影制作和动画制作通常需要复杂的技术来协调摄像机过渡和物体移动，通常涉及劳动密集型的现实世界捕捉。尽管在生成AI视频方面取得了进展，但实现对交互式视频资产生成的运动的精确控制仍然具有挑战性。为此，我们提出了Image Conductor，这是一种用于精确控制摄像机过渡和物体移动以从单个图像生成视频资产的方法。提出了一个良好培养的训练策略，通过摄像机LoRA权重和物体LoRA权重将不同的摄像机和物体运动分开。为了进一步解决来自不合理轨迹的电影变化，我们在推理过程中引入了一种无摄像机指导技术，增强了物体移动并消除了摄像机过渡。此外，我们开发了一个以轨迹为导向的视频运动数据策划流程用于训练。定量和定性实验证明了我们的方法在从图像生成可控运动视频方面的精度和细粒度控制，推进了交互式视频合成的实际应用。项目网页位于https://liyaowei-stu.github.io/project/ImageConductor/

更新时间: 2024-06-21 17:55:05

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.15339v1

Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, which processes both text and images, requiring additional tokens. This motivates the need for a multimodal method to compress many shots into fewer tokens without finetuning. In this work, we enable LMMs to perform multimodal, many-shot in-context learning by leveraging Multimodal Task Vectors (MTV)--compact implicit representations of in-context examples compressed in the model's attention heads. Specifically, we first demonstrate the existence of such MTV in LMMs and then leverage these extracted MTV to enable many-shot in-context learning for various vision-and-language tasks. Our experiments suggest that MTV can scale in performance with the number of compressed shots and generalize to similar out-of-domain tasks without additional context length for inference.

Updated: 2024-06-21 17:50:02

标题: 多模态任务向量实现许多次多模态上下文学习

摘要: 最近在少样本学习中，交织的大型多模态模型(LMMs)取得了成功，这表明在上下文学习(ICL)中使用大量示例可能对学习新任务有所希望。然而，这种多样本多模态ICL设置存在一个关键问题：它在预训练时设定的模型上下文长度基本受限。这个问题在多模态领域尤为突出，因为它同时处理文本和图像，需要额外的标记。这促使了需要一种多模态方法，可以将许多示例压缩成更少的标记而无需微调。在这项工作中，我们通过利用多模态任务向量(MTV)使LMMs能够执行多模态、多样本上下文学习，这些任务向量是在模型的注意力头中压缩的上下文示例的紧凑隐式表示。具体来说，我们首先证明了LMMs中存在这样的MTV，然后利用这些提取出的MTV，为各种视觉与语言任务实现多样本上下文学习。我们的实验表明，MTV可以随着压缩样本数量的增加而提高性能，并且能够推广到类似的领域外任务，而无需额外的上下文长度进行推断。

更新时间: 2024-06-21 17:50:02

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.15334v1

Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild

Virtual Try-On (VTON) is a highly active line of research, with increasing demand. It aims to replace a piece of garment in an image with one from another, while preserving person and garment characteristics as well as image fidelity. Current literature takes a supervised approach for the task, impairing generalization and imposing heavy computation. In this paper, we present a novel zero-shot training-free method for inpainting a clothing garment by reference. Our approach employs the prior of a diffusion model with no additional training, fully leveraging its native generalization capabilities. The method employs extended attention to transfer image information from reference to target images, overcoming two significant challenges. We first initially warp the reference garment over the target human using deep features, alleviating "texture sticking". We then leverage the extended attention mechanism with careful masking, eliminating leakage of reference background and unwanted influence. Through a user study, qualitative, and quantitative comparison to state-of-the-art approaches, we demonstrate superior image quality and garment preservation compared unseen clothing pieces or human figures.

Updated: 2024-06-21 17:45:37

标题: 在野外的零售虚拟试穿中掩盖的扩展注意力

摘要: 虚拟试穿（VTON）是一项高度活跃的研究领域，需求不断增加。它旨在将图像中的一件服装替换为另一件服装，同时保留人物和服装特征以及图像的真实性。当前文献采用监督方法进行任务处理，影响泛化并导致计算量大。在本文中，我们提出了一种新颖的零样本训练的方法，用于参照填充服装。我们的方法利用扩散模型的先验，无需额外训练，充分利用其原生泛化能力。该方法采用扩展的注意力机制，将参考图像的信息传输到目标图像，克服了两个重要挑战。首先，我们使用深度特征将参考服装初始变形到目标人体上，减轻“纹理粘附”。然后，我们利用仔细掩模的扩展注意力机制，消除了参考背景和不必要影响的泄漏。通过用户研究，定性和定量与最先进的方法进行比较，我们展示了卓越的图像质量和服装保留，与看不见的服装或人物相比。

更新时间: 2024-06-21 17:45:37

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2406.15331v1

Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance

Large language models (LLMs) have revolutionized lots of fields of research. Although it is well-known that fine-tuning is essential for enhancing the capabilities of LLMs, existing research suggests that there is potential redundancy in the fine-tuning process and therefore proposes to update only a subset of parameters. However, these methods fail to leverage the task-specific information to identify important parameters during training. Based on the insight that gradients inherently contain information on task-specific data, we propose Gradient-Mask Tuning (GMT), a method that selectively updates parameters during training based on their gradient information. Specifically, we compute the absolute values of the gradients and apply masking to those with relatively smaller magnitudes. Our empirical results across various tasks demonstrate that GMT not only outperforms traditional fine-tuning methods but also elevates the upper limits of LLM performance. Further analysis indicates that GMT exhibits insensitivity to mask ratio and possesses computational efficiency comparable to vanilla SFT.

Updated: 2024-06-21 17:42:52

标题: 梯度掩模调整提高了LLM性能的上限

摘要: 大型语言模型（LLMs）已经彻底改变了许多研究领域。虽然众所周知，微调对于增强LLMs的功能至关重要，但现有研究表明微调过程中存在潜在的冗余，因此建议仅更新部分参数。然而，这些方法未能利用任务特定信息来识别训练过程中的重要参数。基于梯度本身包含有关任务特定数据的信息这一观点，我们提出了梯度屏蔽微调（GMT）方法，该方法根据梯度信息在训练过程中有选择性地更新参数。具体来说，我们计算梯度的绝对值，并对相对较小幅度的梯度应用屏蔽。我们在各种任务中的实证结果表明，GMT不仅优于传统微调方法，而且提升了LLM性能的上限。进一步分析表明，GMT对掩码比率不敏感，并具有与普通SFT相当的计算效率。

更新时间: 2024-06-21 17:42:52

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.15330v1

Trading Devil: Robust backdoor attack via Stochastic investment models and Bayesian approach

With the growing use of voice-activated systems and speech recognition technologies, the danger of backdoor attacks on audio data has grown significantly. This research looks at a specific type of attack, known as a Stochastic investment-based backdoor attack (MarketBack), in which adversaries strategically manipulate the stylistic properties of audio to fool speech recognition systems. The security and integrity of machine learning models are seriously threatened by backdoor attacks, in order to maintain the reliability of audio applications and systems, the identification of such attacks becomes crucial in the context of audio data. Experimental results demonstrated that MarketBack is feasible to achieve an average attack success rate close to 100% in seven victim models when poisoning less than 1% of the training data.

Updated: 2024-06-21 17:42:32

标题: 交易恶魔：通过随机投资模型和贝叶斯方法的强大后门攻击

摘要: 随着语音激活系统和语音识别技术的不断应用，对音频数据进行后门攻击的危险显著增加。这项研究探讨了一种特定类型的攻击，称为随机投资型后门攻击（MarketBack），在这种攻击中，对手会策略性地操纵音频的风格属性，以愚弄语音识别系统。后门攻击严重威胁着机器学习模型的安全性和完整性，为了维护音频应用和系统的可靠性，识别此类攻击在音频数据的背景下变得至关重要。实验结果表明，MarketBack 在污染不到 1% 的训练数据时，可以实现在七个受害模型中攻击成功率接近 100% 的平均值。

更新时间: 2024-06-21 17:42:32

领域: cs.CR,cs.LG,q-fin.CP,q-fin.ST,stat.ML

下载: http://arxiv.org/abs/2406.10719v3

An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT

An end-to-end, segmentation-free, deep learning model trained from scratch is proposed, leveraging DCNN for feature extraction, alongside Bidirectional Long-Short Term Memory (BLSTM) for sequence recognition and Connectionist Temporal Classification (CTC) loss function on the KHATT database. The training phase yields remarkable results 84% recognition rate on the test dataset at the character level and 71% on the word level, establishing an image-based sequence recognition framework that operates without segmentation only at the line level. The analysis and preprocessing of the KFUPM Handwritten Arabic TexT (KHATT) database are also presented. Finally, advanced image processing techniques, including filtering, transformation, and line segmentation are implemented. The importance of this work is highlighted by its wide-ranging applications. Including digitizing, documentation, archiving, and text translation in fields such as banking. Moreover, AHR serves as a pivotal tool for making images searchable, enhancing information retrieval capabilities, and enabling effortless editing. This functionality significantly reduces the time and effort required for tasks such as Arabic data organization and manipulation.

Updated: 2024-06-21 17:42:07

标题: 一种端到端、无分割、基于KHATT的阿拉伯手写识别模型

摘要: 提出了一种端到端、无分割、从头开始训练的深度学习模型，利用DCNN进行特征提取，同时结合双向长短期记忆（BLSTM）用于序列识别和KHATT数据库上的连接主义时间分类（CTC）损失函数。训练阶段在字符级别上取得了显著结果，在测试数据集上达到了84%的识别率，在单词级别上达到了71%，确立了一个基于图像的序列识别框架，仅在行级别上进行分割。同时还介绍了KFUPM手写阿拉伯文本（KHATT）数据库的分析和预处理。最后，实施了包括滤波、转换和行分割在内的高级图像处理技术。这项工作的重要性在于其广泛的应用，包括数字化、文档编制、存档和在银行等领域的文本翻译。此外，AHR作为一个关键工具，使图像可搜索，增强信息检索能力，并实现轻松编辑。这种功能显著减少了阿拉伯数据组织和操作等任务所需的时间和精力。

更新时间: 2024-06-21 17:42:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.15329v1

Fine-grained Attention in Hierarchical Transformers for Tabular Time-series

Tabular data is ubiquitous in many real-life systems. In particular, time-dependent tabular data, where rows are chronologically related, is typically used for recording historical events, e.g., financial transactions, healthcare records, or stock history. Recently, hierarchical variants of the attention mechanism of transformer architectures have been used to model tabular time-series data. At first, rows (or columns) are encoded separately by computing attention between their fields. Subsequently, encoded rows (or columns) are attended to one another to model the entire tabular time-series. While efficient, this approach constrains the attention granularity and limits its ability to learn patterns at the field-level across separate rows, or columns. We take a first step to address this gap by proposing Fieldy, a fine-grained hierarchical model that contextualizes fields at both the row and column levels. We compare our proposal against state of the art models on regression and classification tasks using public tabular time-series datasets. Our results show that combining row-wise and column-wise attention improves performance without increasing model size. Code and data are available at https://github.com/raphaaal/fieldy.

Updated: 2024-06-21 17:40:46

标题: Hierarchical Transformers中的细粒度注意力在表格时间序列中的应用

摘要: 表格数据在许多现实生活系统中无处不在。特别是，时间相关的表格数据，其中行在时间上有关联，通常用于记录历史事件，例如金融交易、医疗记录或股票历史。最近，变压器结构的分层变体注意机制已被用于建模表格时间序列数据。首先，通过计算它们字段之间的注意力来分别对行（或列）进行编码。随后，对编码的行（或列）进行对应以建模整个表格时间序列。虽然高效，但这种方法限制了注意力的粒度，并限制了其跨不同行或列学习字段级别模式的能力。我们通过提出 Fieldy，一个细粒度的分层模型，在同时在行和列级别上对字段进行上下文化，来解决这一差距的第一步。我们在公共表格时间序列数据集上使用我们的提议与最先进模型进行了比较回归和分类任务。我们的结果表明，结合行级和列级注意力可以提高性能，而不增加模型大小。代码和数据可在 https://github.com/raphaaal/fieldy 上找到。

更新时间: 2024-06-21 17:40:46

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2406.15327v1

Specify What? Enhancing Neural Specification Synthesis by Symbolic Methods

We investigate how combinations of Large Language Models (LLMs) and symbolic analyses can be used to synthesise specifications of C programs. The LLM prompts are augmented with outputs from two formal methods tools in the Frama-C ecosystem, Pathcrawler and EVA, to produce C program annotations in the specification language ACSL. We demonstrate how the addition of symbolic analysis to the workflow impacts the quality of annotations: information about input/output examples from Pathcrawler produce more context-aware annotations, while the inclusion of EVA reports yields annotations more attuned to runtime errors. In addition, we show that the method infers rather the programs intent than its behaviour, by generating specifications for buggy programs and observing robustness of the result against bugs.

Updated: 2024-06-21 17:39:57

标题: 确定什么？通过符号方法增强神经特异性合成

摘要: 我们研究了如何结合大型语言模型（LLMs）和符号分析来合成C程序的规范。LLM提示与Frama-C生态系统中的两个形式方法工具Pathcrawler和EVA的输出相结合，以生成规范语言ACSL中的C程序注释。我们展示了符号分析对注释质量的影响：来自Pathcrawler的输入/输出示例信息产生了更具上下文意识的注释，而包含EVA报告则产生了更适应运行时错误的注释。此外，我们展示了该方法推断程序意图而非行为，通过为有缺陷的程序生成规范并观察结果对缺陷的鲁棒性。

更新时间: 2024-06-21 17:39:57

领域: cs.SE,cs.FL,cs.LG

下载: http://arxiv.org/abs/2406.15540v1

Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks

Recent research in Needle-in-a-Haystack (NIAH) benchmarks has explored the capabilities of Large Language Models (LLMs) in retrieving contextual information from large text documents. However, as LLMs become increasingly integrated into software development processes, it is crucial to evaluate their performance in code-based environments. As LLMs are further developed for program synthesis, we need to ensure that LLMs can understand syntax and write syntactically correct code. As a step in ensuring LLMs understand syntax, LLMs can be evaluated in their ability to find and detect syntax bugs. Our benchmark, Bug In The Code Stack (BICS), is designed to assess the ability of LLMs to identify simple syntax bugs within large source code. Our findings reveal three key insights: (1) code-based environments pose significantly more challenge compared to text-based environments for retrieval tasks, (2) there is a substantial performance disparity among different models, and (3) there is a notable correlation between longer context lengths and performance degradation, though the extent of this degradation varies between models.

Updated: 2024-06-21 17:37:10

标题: 代码堆栈中的错误：LLMs能否发现大型Python代码堆栈中的错误

摘要: 最近关于“大海捞针”基准测试的研究探索了大型语言模型（LLMs）在从大型文本文档中检索上下文信息方面的能力。然而，随着LLMs越来越多地集成到软件开发流程中，评估它们在基于代码的环境中的性能至关重要。随着LLMs进一步发展用于程序合成，我们需要确保LLMs能够理解语法并编写语法正确的代码。为了确保LLMs理解语法的一步，可以评估LLMs在找到和检测语法错误方面的能力。我们的基准测试“代码堆栈中的错误”（BICS）旨在评估LLMs在大型源代码中识别简单语法错误的能力。我们的研究结果揭示了三个关键见解：（1）与文本环境相比，基于代码的环境对检索任务提出了更大挑战，（2）不同模型之间存在显著的性能差异，（3）长上下文长度与性能下降之间存在显著相关性，尽管这种下降程度在不同模型之间有所不同。

更新时间: 2024-06-21 17:37:10

领域: cs.AI,cs.SE,68T50,I.2.7; D.2.5

下载: http://arxiv.org/abs/2406.15325v1

Large Reasoning Models for 3D Floorplanning in EDA: Learning from Imperfections

In this paper, we introduce Dreamweaver, which belongs to a new class of auto-regressive decision-making models known as large reasoning models (LRMs). Dreamweaver is designed to improve 3D floorplanning in electronic design automation (EDA) via an architecture that melds advancements in sequence-to-sequence reinforcement learning algorithms. A significant advantage of our approach is its ability to effectively reason over large discrete action spaces, which is essential for handling the numerous potential positions for various functional blocks in floorplanning. Additionally, Dreamweaver demonstrates strong performance even when trained on entirely random trajectories, showcasing its capacity to leverage sub-optimal or non-expert trajectories to enhance its results. This innovative approach contributes to streamlining the integrated circuit (IC) design flow and reducing the high computational costs typically associated with floorplanning. We evaluate its performance against a current state-of-the-art method, highlighting notable improvements.

Updated: 2024-06-21 17:36:12

标题: 在EDA中用于3D楼层规划的大型推理模型：从缺陷中学习

摘要: 在本文中，我们介绍了Dreamweaver，它属于一类新型的自回归决策模型，被称为大推理模型（LRMs）。Dreamweaver旨在通过将序列到序列的强化学习算法的进展融入其中的架构，改善电子设计自动化（EDA）中的3D布局设计。我们方法的一个显著优势是其能够有效地推理大型离散动作空间，这对于处理布局设计中各种功能块的大量潜在位置至关重要。此外，Dreamweaver即使在完全随机轨迹上训练时也表现出色，展示了其利用次优或非专家轨迹来增强结果的能力。这种创新方法有助于简化集成电路（IC）设计流程，并减少与布局设计通常相关的高计算成本。我们将其性能与当前最先进的方法进行评估，突出了显着的改进。

更新时间: 2024-06-21 17:36:12

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2406.10538v2

Large language models surpass human experts in predicting neuroscience results

Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.

Updated: 2024-06-21 17:35:46

标题: 大型语言模型在预测神经科学结果方面超越人类专家

摘要: 科学发现往往取决于合成数十年的研究，这是一项潜在超越人类信息处理能力的任务。大型语言模型（LLMs）提供了一种解决方案。在广泛的科学文献上训练的LLMs可能比人类专家更好地整合嘈杂但相关的发现，从而预测新的结果。为了评估这种可能性，我们创建了BrainBench，一个用于预测神经科学结果的前瞻性基准。我们发现LLMs在预测实验结果方面超过了专家。BrainGPT是一个我们在神经科学文献上调整的LLM，表现更好。和人类专家一样，当LLMs对他们的预测有信心时，他们更有可能是正确的，这预示着一个未来，在这个未来中，人类和LLMs将共同合作进行发现。我们的方法不是特定于神经科学，并且可以转移到其他知识密集型的努力中。

更新时间: 2024-06-21 17:35:46

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2403.03230v3

MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs

Translation Quality Evaluation (TQE) is an essential step of the modern translation production process. TQE is critical in assessing both machine translation (MT) and human translation (HT) quality without reference translations. The ability to evaluate or even simply estimate the quality of translation automatically may open significant efficiency gains through process optimisation. This work examines whether the state-of-the-art large language models (LLMs) can be used for this purpose. We take OpenAI models as the best state-of-the-art technology and approach TQE as a binary classification task. On eight language pairs including English to Italian, German, French, Japanese, Dutch, Portuguese, Turkish, and Chinese, our experimental results show that fine-tuned gpt3.5 can demonstrate good performance on translation quality prediction tasks, i.e. whether the translation needs to be edited. Another finding is that simply increasing the sizes of LLMs does not lead to apparent better performances on this task by comparing the performance of three different versions of OpenAI models: curie, davinci, and gpt3.5 with 13B, 175B, and 175B parameters, respectively.

Updated: 2024-06-21 17:34:47

标题: MTUncertainty：通过微调OpenAI LLMs评估机器翻译输出的后编辑需求

摘要: Translation Quality Evaluation (TQE)是现代翻译生产过程中的一个重要步骤。TQE在评估机器翻译（MT）和人工翻译（HT）质量时起着关键作用，而且无需参考翻译。自动评估甚至简单地估计翻译质量的能力可能通过流程优化带来显著的效率提升。本文研究了最先进的大型语言模型（LLMs）是否可以用于这一目的。我们以OpenAI模型为最先进技术，将TQE视为一个二元分类任务。在包括英语到意大利语、德语、法语、日语、荷兰语、葡萄牙语、土耳其语和中文在内的八种语言对上，我们的实验结果显示，经过微调的gpt3.5能够在翻译质量预测任务上表现出良好的性能，即判断翻译是否需要编辑。另一个发现是，仅仅增加LLMs的大小并不会导致在这一任务上明显更好的表现，通过比较三种不同版本的OpenAI模型的性能，即带有13B、175B和175B参数的curie、davinci和gpt3.5。

更新时间: 2024-06-21 17:34:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2308.00158v6

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of object hallucinations. Specifically, LVLMs predominantly attend to prompt-independent global image features, while failing to capture prompt-relevant local features, consequently undermining the visual grounding capacity of LVLMs and leading to hallucinations. To this end, we propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates object hallucinations by exploring an ensemble of global features for response generation and local features for visual discrimination simultaneously. Our approach exhibits an image-prompt matching scheme that captures prompt-relevant local features from images, leading to an augmented view of the input image where prompt-relevant content is reserved while irrelevant distractions are masked. With the augmented view, a calibrated decoding distribution can be derived by integrating generative global features from the original image and discriminative local features from the augmented image. Extensive experiments show that AGLA consistently mitigates object hallucinations and enhances general perception capability for LVLMs across various discriminative and generative benchmarks. Our code will be released at https://github.com/Lackel/AGLA.

Updated: 2024-06-21 17:33:21

标题: AGLA：使用全局和局部注意力组装减轻大型视觉语言模型中的对象幻觉

摘要: 尽管大型视觉语言模型(LVLMs)在各种多模态任务中取得了巨大成功，但它们面临着一个普遍的问题，即对象幻觉，即生成的文本响应与给定图像中的实际对象不一致。本文调查了各种LVLMs，并确定了对区分性局部图像特征的注意力缺乏是对象幻觉的一个根本原因。具体而言，LVLMs主要关注与提示无关的全局图像特征，而未能捕捉与提示相关的局部特征，从而削弱了LVLMs的视觉基础能力，导致幻觉。为此，我们提出了组合全局和局部注意(AGLA)，这是一种无需训练且即插即用的方法，通过同时探索全局特征进行响应生成和局部特征进行视觉区分来减轻对象幻觉。我们的方法展示了一种图像提示匹配方案，从图像中捕捉与提示相关的局部特征，使输入图像呈现出扩增的视图，其中保留了与提示相关的内容，同时掩盖了无关的干扰。通过扩增视图，可以通过将原始图像的生成全局特征和扩增图像的区分性局部特征整合来得出校准的解码分布。大量实验表明，AGLA能够持续减轻对象幻觉，并增强LVLMs在各种区分性和生成基准上的普遍感知能力。我们的代码将在https://github.com/Lackel/AGLA上发布。

更新时间: 2024-06-21 17:33:21

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12718v2

Testing Calibration in Nearly-Linear Time

In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we initiate the algorithmic study of calibration through the lens of property testing. We define the problem of calibration testing from samples where given $n$ draws from a distribution $\mathcal{D}$ on $(predictions, binary outcomes)$, our goal is to distinguish between the case where $\mathcal{D}$ is perfectly calibrated, and the case where $\mathcal{D}$ is $\varepsilon$-far from calibration. We make the simple observation that the empirical smooth calibration linear program can be reformulated as an instance of minimum-cost flow on a highly-structured graph, and design an exact dynamic programming-based solver for it which runs in time $O(n\log^2(n))$, and solves the calibration testing problem information-theoretically optimally in the same time. This improves upon state-of-the-art black-box linear program solvers requiring $\Omega(n^\omega)$ time, where $\omega > 2$ is the exponent of matrix multiplication. We also develop algorithms for tolerant variants of our testing problem improving upon black-box linear program solvers, and give sample complexity lower bounds for alternative calibration measures to the one considered in this work. Finally, we present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes.

Updated: 2024-06-21 17:27:22

标题: 在几乎线性时间中测试校准

摘要: 在最近关于机器学习和决策的文献中，校准已经成为二元预测模型输出的一个理想且广泛研究的统计性质。然而，测量模型校准的算法方面仍然相对较少被探索。受到[BGHN23]的启发，该文提出了一个严格的框架来衡量与校准的距离，我们通过性质测试的视角开启了对校准的算法研究。我们定义了从样本中进行校准测试的问题，其中给定一个分布$\mathcal{D}$上的$n$次抽样（预测，二元结果），我们的目标是区分$\mathcal{D}$完全校准的情况和$\mathcal{D}$与校准相差$\varepsilon$的情况。我们简单观察到经验平滑校准线性规划可以重新表述为一个高度结构化图上最小成本流的实例，并设计了一个基于动态规划的精确求解器，其运行时间为$O(n\log^2(n))$，在相同时间内信息理论上最优地解决了校准测试问题。这改进了需要$\Omega(n^\omega)$时间的最先进的黑盒线性规划求解器，其中$\omega > 2$是矩阵乘法的指数。我们还为我们测试问题的宽容变体开发了算法，改进了黑盒线性规划求解器，并为本文考虑的替代校准度量给出了样本复杂性下界。最后，我们展示了我们定义的测试问题如何忠实地捕捉标准校准概念，并且我们的算法能够有效地扩展以适应大样本量。

更新时间: 2024-06-21 17:27:22

领域: cs.LG,cs.DS,stat.CO,stat.ML

下载: http://arxiv.org/abs/2402.13187v2

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

In traditional RAG framework, the basic retrieval units are normally short. The common retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design forces the retriever to search over a large corpus to find the `needle' unit. In contrast, the readers only need to extract answers from the short retrieved units. Such an imbalanced `heavy' retriever and `light' reader design can lead to sub-optimal performance. In order to alleviate the imbalance, we propose a new framework LongRAG, consisting of a `long retriever' and a `long reader'. LongRAG processes the entire Wikipedia into 4K-token units, which is 30x longer than before. By increasing the unit size, we significantly reduce the total units from 22M to 700K. This significantly lowers the burden of retriever, which leads to a remarkable retrieval score: answer recall@1=71% on NQ (previously 52%) and answer recall@2=72% (previously 47%) on HotpotQA (full-wiki). Then we feed the top-k retrieved units ($\approx$ 30K tokens) to an existing long-context LLM to perform zero-shot answer extraction. Without requiring any training, LongRAG achieves an EM of 62.7% on NQ, which is the best known result. LongRAG also achieves 64.3% on HotpotQA (full-wiki), which is on par of the SoTA model. Our study offers insights into the future roadmap for combining RAG with long-context LLMs.

Updated: 2024-06-21 17:23:21

标题: LongRAG：利用长上下文LLMs增强检索辅助生成

摘要: 在传统的RAG框架中，基本的检索单元通常很短。像DPR这样的常见检索器通常与100字的维基百科段落一起工作。这样的设计迫使检索器在大语料库中搜索以找到“针”单元。相比之下，读者只需从短检索单元中提取答案。这种不平衡的“重型”检索器和“轻型”读者设计可能导致性能次优。为了缓解不平衡，我们提出了一个新的框架LongRAG，包括一个“长检索器”和一个“长读者”。LongRAG将整个维基百科处理为4K令牌单元，比以前长30倍。通过增加单元大小，我们将总单元数量从22M减少到700K。这极大地减轻了检索器的负担，从而导致显著的检索分数：NQ上的答案召回率为71%（之前为52%），HotpotQA（全文）上的答案召回率为72%（之前为47%）。然后我们将前k个检索到的单元（约30K个令牌）馈送给现有的长上下文LLM来进行零射答案提取。LongRAG在NQ上达到了62.7%的EM，这是已知的最佳结果。LongRAG在HotpotQA（全文）上也达到了64.3%，与SoTA模型相当。我们的研究为将RAG与长上下文LLM相结合的未来路线提供了见解。

更新时间: 2024-06-21 17:23:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.15319v1

Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models

Humor is a fundamental facet of human cognition and interaction. Yet, despite recent advances in natural language processing, humor detection remains a challenging task that is complicated by the scarcity of datasets that pair humorous texts with similar non-humorous counterparts. In our work, we investigate whether large language models (LLMs), can generate synthetic data for humor detection via editing texts. We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes, as judged by humans and as measured on the downstream task of humor detection. We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators and provides challenging adversarial examples for humor classifiers.

Updated: 2024-06-21 17:12:35

标题: 认真对待幽默：利用不好笑的大型语言模型构建幽默数据集

摘要: 幽默是人类认知和互动的基本要素。然而，尽管自然语言处理方面取得了最新进展，幽默检测仍然是一项具有挑战性的任务，这是因为幽默文本与类似非幽默文本的数据集稀缺。在我们的研究中，我们调查了大型语言模型（LLMs）是否可以通过编辑文本生成合成数据用于幽默检测。我们在现有的人类数据集上对LLMs进行基准测试，并展示当前LLMs显示出令人印象深刻的能力来“去幽默”笑话，这是由人类评判和在幽默检测的下游任务上衡量得出的。我们将我们的方法扩展到一个混合编码的英语-印地语幽默数据集，在那里我们发现GPT-4的合成数据被双语注释者高度评价，并为幽默分类器提供具有挑战性的对抗性示例。

更新时间: 2024-06-21 17:12:35

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.00794v2

R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity

Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception. Leveraging recent advancements in extensive datasets and pre-trained computational models, we construct mappings between neural data and latent representations of musical stimuli. Our approach integrates functional and anatomical alignment techniques to facilitate cross-subject decoding, addressing the challenges posed by the low temporal resolution and signal-to-noise ratio (SNR) in fMRI data. Starting from the GTZan fMRI dataset, where five participants listened to 540 musical stimuli from 10 different genres while their brain activity was recorded, we used the CLAP (Contrastive Language-Audio Pretraining) model to extract latent representations of the musical stimuli and developed voxel-wise encoding models to identify brain regions responsive to these stimuli. By applying a threshold to the association between predicted and actual brain activity, we identified specific regions of interest (ROIs) which can be interpreted as key players in music processing. Our decoding pipeline, primarily retrieval-based, employs a linear map to project brain activity to the corresponding CLAP features. This enables us to predict and retrieve the musical stimuli most similar to those that originated the fMRI data. Our results demonstrate state-of-the-art identification accuracy, with our methods significantly outperforming existing approaches. Our findings suggest that neural-based music retrieval systems could enable personalized recommendations and therapeutic applications. Future work could use higher temporal resolution neuroimaging and generative models to improve decoding accuracy and explore the neural underpinnings of music perception and emotion.

Updated: 2024-06-21 17:11:45

标题: R&B -- 节奏与大脑：基于人类大脑活动的音乐跨学科解码

摘要: 音乐是一种深刻影响跨文化人类经验的普遍现象。本研究调查了在感知过程中，音乐是否可以从使用功能性磁共振成像（fMRI）测量的人类大脑活动中解码。利用最近的广泛数据集和预训练的计算模型的进展，我们构建了神经数据和音乐刺激的潜在表征之间的映射。我们的方法整合了功能和解剖对齐技术，以促进跨主体解码，解决了fMRI数据中低时间分辨率和信噪比（SNR）带来的挑战。从GTZan fMRI数据集开始，其中五名参与者在其大脑活动被记录时听取了来自10种不同流派的540个音乐刺激，我们使用CLAP（对比语言音频预训练）模型提取了音乐刺激的潜在表征，并开发了基于体素的编码模型来识别对这些刺激响应的大脑区域。通过在预测和实际大脑活动之间的关联上应用阈值，我们识别出可以解释为音乐处理中的关键参与者的特定感兴趣区域（ROI）。我们的解码流水线主要是基于检索的，采用线性映射将大脑活动投影到相应的CLAP特征。这使我们能够预测和检索与生成fMRI数据的音乐刺激最相似的音乐刺激。我们的结果展示了最先进的识别准确性，我们的方法明显优于现有方法。我们的发现表明，基于神经的音乐检索系统可以实现个性化推荐和治疗应用。未来的工作可以利用更高的时间分辨率神经影像学和生成模型来提高解码准确性，并探索音乐感知和情绪的神经基础。

更新时间: 2024-06-21 17:11:45

领域: q-bio.NC,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.15537v1

Impact of Decentralized Learning on Player Utilities in Stackelberg Games

When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent's objective. We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case linear regret for at least one player. To better capture these systems, we construct a relaxed regret benchmark that is tolerant to small learning errors by agents. We show that standard learning algorithms fail to provide sublinear regret, and we develop algorithms to achieve near-optimal $O(T^{2/3})$ regret for both players with respect to these benchmarks. We further design relaxed environments under which faster learning ($O(\sqrt{T})$) is possible. Altogether, our results take a step towards assessing how two-agent interactions in sequential and decentralized learning environments affect the utility of both agents.

Updated: 2024-06-21 17:11:10

标题: 分散学习对斯塔克贝格博弈中玩家效用的影响

摘要: 在世界中部署时，学习代理（如推荐系统或聊天机器人）通常会随着时间与另一个学习代理（如用户）反复交互。在许多这样的双代理系统中，每个代理都会单独学习，并且两个代理的奖励并不完全一致。为了更好地理解这种情况，我们研究了两个代理系统的学习动态以及对每个代理目标的影响。我们将这些系统建模为具有分散学习的斯塔克尔伯格博弈，并展示标准的遗憾基准（如斯塔克尔伯格均衡收益）导致至少一个玩家的最坏情况线性遗憾。为了更好地捕捉这些系统，我们建立了一个对代理的小学习错误具有容忍性的放松遗憾基准。我们展示了标准学习算法无法提供亚线性遗憾，并且我们开发了算法，以实现对这些基准而言两个玩家的近乎最优的$O(T^{2/3})$遗憾。我们进一步设计了一些放松的环境，在这些环境下更快的学习（$O(\sqrt{T})$）是可能的。总的来说，我们的研究结果是朝着评估顺序和分散学习环境中两个代理相互作用如何影响两个代理效用的方向迈出了一步。

更新时间: 2024-06-21 17:11:10

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2403.00188v2

DASB -- Discrete Audio and Speech Benchmark

Discrete audio tokens have recently gained considerable attention for their potential to connect audio and language processing, enabling the creation of modern multimodal large language models. Ideal audio tokens must effectively preserve phonetic and semantic content along with paralinguistic information, speaker identity, and other details. While several types of audio tokens have been recently proposed, identifying the optimal tokenizer for various tasks is challenging due to the inconsistent evaluation settings in existing studies. To address this gap, we release the Discrete Audio and Speech Benchmark (DASB), a comprehensive leaderboard for benchmarking discrete audio tokens across a wide range of discriminative tasks, including speech recognition, speaker identification and verification, emotion recognition, keyword spotting, and intent classification, as well as generative tasks such as speech enhancement, separation, and text-to-speech. Our results show that, on average, semantic tokens outperform compression tokens across most discriminative and generative tasks. However, the performance gap between semantic tokens and standard continuous representations remains substantial, highlighting the need for further research in this field.

Updated: 2024-06-21 17:07:17

标题: DASB -- 离散音频和语音基准

摘要: 离散音频标记最近引起了相当大的关注，因为它们有潜力连接音频和语言处理，从而实现现代多模态大型语言模型的创建。理想的音频标记必须有效地保留语音和语义内容以及语言信息、说话者身份和其他细节。虽然最近提出了几种类型的音频标记，但由于现有研究中评估设置的不一致性，确定各种任务的最佳分词器是具有挑战性的。为了填补这一空白，我们发布了离散音频和语音基准（DASB），这是一个全面的排行榜，用于在广泛的区分任务中对离散音频标记进行基准测试，包括语音识别、说话者识别和验证、情感识别、关键词识别和意图分类，以及生成任务，如语音增强、分离和文本转语音。我们的结果显示，平均而言，语义标记在大多数区分和生成任务中优于压缩标记。然而，语义标记和标准连续表示之间的性能差距仍然很大，突出了在这一领域进一步研究的必要性。

更新时间: 2024-06-21 17:07:17

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.14294v2

The Privacy-Utility Trade-off in the Topics API

The ongoing deprecation of third-party cookies by web browser vendors has sparked the proposal of alternative methods to support more privacy-preserving personalized advertising on web browsers and applications. The Topics API is being proposed by Google to provide third-parties with "coarse-grained advertising topics that the page visitor might currently be interested in". In this paper, we analyze the re-identification risks for individual Internet users and the utility provided to advertising companies by the Topics API, i.e. learning the most popular topics and distinguishing between real and random topics. We provide theoretical results dependent only on the API parameters that can be readily applied to evaluate the privacy and utility implications of future API updates, including novel general upper-bounds that account for adversaries with access to unknown, arbitrary side information, the value of the differential privacy parameter $\epsilon$, and experimental results on real-world data that validate our theoretical model.

Updated: 2024-06-21 17:01:23

标题: 主题API中的隐私与效用权衡

摘要: 网络浏览器供应商持续淘汰第三方Cookie，促使提出替代方法以支持在网络浏览器和应用程序上更注重隐私保护的个性化广告。Google提出了Topics API，以向第三方提供“页面访问者当前可能感兴趣的粗粒度广告主题”。在本文中，我们分析了个别互联网用户的重新识别风险，以及Topics API为广告公司提供的效用，即学习最受欢迎的主题并区分真实和随机主题。我们提供了仅依赖于API参数的理论结果，这些结果可以轻松应用于评估未来API更新的隐私和效用影响，包括考虑对未知的、任意的辅助信息具有访问权限的对手、差分隐私参数$\epsilon$的价值，以及对验证我们的理论模型的真实世界数据的实验结果。

更新时间: 2024-06-21 17:01:23

领域: cs.CR

下载: http://arxiv.org/abs/2406.15309v1

Offline Diversity Maximization Under Imitation Constraints

There has been significant recent progress in the area of unsupervised skill discovery, utilizing various information-theoretic objectives as measures of diversity. Despite these advances, challenges remain: current methods require significant online interaction, fail to leverage vast amounts of available task-agnostic data and typically lack a quantitative measure of skill utility. We address these challenges by proposing a principled offline algorithm for unsupervised skill discovery that, in addition to maximizing diversity, ensures that each learned skill imitates state-only expert demonstrations to a certain degree. Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery to maximize a mutual information objective subject to KL-divergence state occupancy constraints. Furthermore, we demonstrate the effectiveness of our method on the standard offline benchmark D4RL and on a custom offline dataset collected from a 12-DoF quadruped robot for which the policies trained in simulation transfer well to the real robotic system.

Updated: 2024-06-21 16:59:57

标题: 在模仿约束下的离线多样性最大化

摘要: 最近在无监督技能发现领域取得了显著进展，利用各种信息论目标作为多样性的度量。尽管取得了这些进展，仍然存在挑战：当前方法需要大量的在线交互，未能充分利用大量可用的与任务无关的数据，通常缺乏技能效用的定量测量。我们通过提出一种基于原则的离线算法来解决这些挑战，该算法除了最大化多样性外，还确保每个学习到的技能在一定程度上模仿仅基于状态的专家演示。我们的主要分析贡献是将Fenchel对偶、强化学习和无监督技能发现相连接，以最大化相互信息目标，同时受到KL-散度状态占用约束的限制。此外，我们在标准离线基准D4RL和从一个12自由度四足机器人收集的自定义离线数据集上展示了我们方法的有效性，这些在仿真中训练的策略成功地转移到了真实的机器人系统。

更新时间: 2024-06-21 16:59:57

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2307.11373v3

The Normal Distributions Indistinguishability Spectrum and its Application to Privacy-Preserving Machine Learning

Differential Privacy (DP) (and its variants) is the most common method for machine learning (ML) on privacy-sensitive data. In big data analytics, one often uses randomized sketching/aggregation algorithms to make processing high-dimensional data tractable. Intuitively, such ML algorithms should provide some inherent privacy, yet most existing DP mechanisms do not leverage or under-utilize this inherent randomness, resulting in potentially redundant noising. The motivating question of our work is: (How) can we improve the utility of DP mechanisms for randomized ML queries, by leveraging the randomness of the query itself? Towards a (positive) answer, our key contribution is (proving) what we call the NDIS theorem, a theoretical result with several practical implications. In a nutshell, NDIS is a closed-form analytic computation for the (varepsilon,delta)-indistinguishability-spectrum (IS) of two arbitrary normal distributions N1 and N2, i.e., the optimal delta (for any given varepsilon) such that N1 and N2 are (varepsilon,delta)-close according to the DP distance. The importance of the NDIS theorem lies in that (1) it yields efficient estimators for IS, and (2) it allows us to analyze DP-mechanism with normally-distributed outputs, as well as more general mechanisms by leveraging their behavior on large inputs. We apply the NDIS theorem to derive DP mechanisms for queries with normally-distributed outputs--i.e., Gaussian Random Projections (RP)--and for more general queries--i.e., Ordinary Least Squares (OLS). Compared to existing techniques, our new DP mechanisms achieve superior privacy/utility trade-offs by leveraging the randomness of the underlying algorithms. We then apply the NDIS theorem to a data-driven DP notion--in particular relative DP introduced by Lu et al. [S&P 2024]. Our method identifies the range of (varepsilon,delta) for which no additional noising is needed.

Updated: 2024-06-21 16:54:57

标题: 正态分布不可区分性谱及其在隐私保护机器学习中的应用

摘要: 差分隐私(DP)及其变体是处理隐私敏感数据的机器学习(ML)中最常见的方法。在大数据分析中，人们经常使用随机草图/聚合算法来使处理高维数据变得可行。直觉上，这样的ML算法应该提供一定的隐私保护，然而大多数现有的DP机制并未利用或充分利用这种固有的随机性，导致潜在的冗余加噪。我们工作的动机性问题是：我们如何通过利用查询本身的随机性来改进随机化ML查询的DP机制的效用？为了得到肯定的答案，我们的关键贡献是证明了我们所谓的NDIS定理，这是一个具有多个实际含义的理论结果。简而言之，NDIS是两个任意正态分布N1和N2的(ε,δ)-不可区分性谱(IS)的闭式解析计算，即最优的δ(对于任何给定的ε)，使得N1和N2根据DP距离是(ε,δ)-接近。NDIS定理的重要性在于，它提供了IS的高效估计器，并且通过利用其在大输入上的行为，允许我们分析DP机制的正态分布输出以及更一般的机制。我们应用NDIS定理来推导具有正态分布输出的查询的DP机制--即高斯随机投影(RP)--以及更一般的查询--即普通最小二乘(OLS)。与现有技术相比，我们的新DP机制通过利用基础算法的随机性实现了更优越的隐私/效用权衡。然后，我们将NDIS定理应用于数据驱动的DP概念--特别是由Lu等人引入的相对DP[S&P 2024]。我们的方法确定了(varepsilon,delta)的范围，无需额外的加噪。

更新时间: 2024-06-21 16:54:57

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2309.01243v3

DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual learning of translation tasks to provide effective feedback, thereby enhancing the models' self-reflective abilities and improving translation performance. The application of this method across various translation tasks has proven its effectiveness in improving translation accuracy and eliminating ambiguities, especially in translation tasks with low-resource language pairs.

Updated: 2024-06-21 16:49:33

标题: DUAL-REFLECT：通过双向学习反馈机制增强大型语言模型以进行反思式翻译

摘要: 最近，通过自我反思增强的大型语言模型（LLMs）在机器翻译领域取得了令人期待的表现。关键思想是指导LLMs生成带有人类反馈的翻译。然而，现有的自我反思方法缺乏有效的反馈信息，限制了翻译性能。为了解决这个问题，我们引入了一个DUAL-REFLECT框架，利用翻译任务的双向学习提供有效的反馈，从而增强模型的自我反思能力，提高翻译性能。该方法在各种翻译任务中的应用已经证明了其在提高翻译准确性和消除歧义方面的有效性，特别是在低资源语言对的翻译任务中。

更新时间: 2024-06-21 16:49:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07232v2

Deep hybrid models: infer and plan in the real world

Determining an optimal plan to accomplish a goal is a hard problem in realistic scenarios, which often comprise dynamic and causal relationships between several entities. Although traditionally such problems have been tackled with optimal control and reinforcement learning, a recent biologically-motivated proposal casts planning and control as an inference process. Among these new approaches, one is particularly promising: active inference. This new paradigm assumes that action and perception are two complementary aspects of life whereby the role of the former is to fulfill the predictions inferred by the latter. In this study, we present an effective solution, based on active inference, to complex control tasks. The proposed architecture exploits hybrid (discrete and continuous) processing to construct a hierarchical and dynamic representation of the self and the environment, which is then used to produce a flexible plan consisting of subgoals at different temporal scales. We evaluate this deep hybrid model on a non-trivial task: reaching a moving object after having picked a moving tool. This study extends past work on planning as inference and advances an alternative direction to optimal control and reinforcement learning.

Updated: 2024-06-21 16:46:55

标题: 深度混合模型：在现实世界中推理和规划

摘要: 确定实现目标的最佳计划是现实场景中的一个困难问题，通常涉及多个实体之间的动态和因果关系。尽管传统上这类问题通常通过最优控制和强化学习来解决，但最近一个受生物启发的提议将规划和控制视为一种推理过程。在这些新方法中，一种尤为有前途：主动推理。这种新的范式假设行动和感知是生活的两个互补方面，前者的角色是实现后者推断出的预测。在这项研究中，我们提出了一种基于主动推理的有效解决方案，用于复杂的控制任务。所提出的架构利用混合（离散和连续）处理来构建自身和环境的层次动态表示，然后用于生成由不同时间尺度的子目标组成的灵活计划。我们在一个非平凡的任务上评估了这个深度混合模型：在选择了一个移动工具后，到达一个移动的物体。这项研究扩展了过去关于规划作为推理的工作，并推进了一种替代方向，即最优控制和强化学习。

更新时间: 2024-06-21 16:46:55

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2402.10088v2

BliMe Linter

Outsourced computation presents a risk to the confidentiality of clients' sensitive data since they have to trust that the service providers will not mishandle this data. Blinded Memory (BliMe) is a set of hardware extensions that addresses this problem by using hardware-based taint tracking to keep track of sensitive client data and enforce a security policy that prevents software from leaking this data, either directly or through side channels. Since programs can leak sensitive data through timing channels and memory access patterns when this data is used in control-flow or memory access instructions, BliMe prohibits such unsafe operations and only allows constant-time code to operate on sensitive data. The question is how a developer can confirm that their code will run correctly on BliMe. While a program can be manually checked to see if it is constant-time, this process is tedious and error-prone. In this paper, we introduce the BliMe linter, a set of compiler extensions built on top of SVF that analyze LLVM bitcode to identify possible BliMe violations. We evaluate the BliMe linter analytically and empirically and show that it is sound.

Updated: 2024-06-21 16:46:45

标题: BliMe Linter

摘要: 外包计算对客户敏感数据的保密性构成风险，因为他们必须相信服务提供商不会处理不当这些数据。BliMe（Blinded Memory）是一组硬件扩展，通过使用基于硬件的污点跟踪来跟踪敏感客户数据，并执行防止软件直接或通过侧信道泄露此数据的安全策略，从而解决了这个问题。由于程序在控制流或内存访问指令中使用这些数据时可能通过时间信道和内存访问模式泄露敏感数据，BliMe禁止这种不安全的操作，并只允许常数时间代码操作敏感数据。问题在于开发人员如何确认他们的代码在BliMe上能够正确运行。虽然可以手动检查程序是否是常数时间，但这个过程繁琐且容易出错。在本文中，我们介绍了BliMe检查器，这是一组在SVF之上构建的编译器扩展，用于分析LLVM位码以识别可能存在的BliMe违规行为。我们从理论和实证角度评估了BliMe检查器，并展示其是有效的。

更新时间: 2024-06-21 16:46:45

领域: cs.CR

下载: http://arxiv.org/abs/2406.15302v1

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to backpropagate the reward function gradient through the full sampling procedure, and that doing so achieves strong performance on a variety of rewards, outperforming reinforcement learning-based approaches. We then propose more efficient variants of DRaFT: DRaFT-K, which truncates backpropagation to only the last K steps of sampling, and DRaFT-LV, which obtains lower-variance gradient estimates for the case when K=1. We show that our methods work well for a variety of reward functions and can be used to substantially improve the aesthetic quality of images generated by Stable Diffusion 1.4. Finally, we draw connections between our approach and prior work, providing a unifying perspective on the design space of gradient-based fine-tuning algorithms.

Updated: 2024-06-21 16:45:11

标题: 直接在可微分奖励上微调扩散模型

摘要: 我们提出了直接奖励微调（DRaFT）方法，这是一种简单有效的方法，用于微调扩散模型以最大化可微分奖励函数，如来自人类偏好模型的分数。我们首先展示了通过完整的采样过程反向传播奖励函数梯度是可能的，并且这样做在各种奖励上取得了强大的性能，优于基于强化学习的方法。然后，我们提出了更高效的DRaFT变体：DRaFT-K，它将反向传播截断到仅在最后K步的采样，并且DRaFT-LV，它获得了低方差的梯度估计，对于K=1的情况。我们展示了我们的方法在各种奖励函数上表现良好，并可用于大幅改善由稳定扩散1.4生成的图像的审美质量。最后，我们将我们的方法与先前的工作联系起来，提供了对基于梯度微调算法设计空间的统一视角。

更新时间: 2024-06-21 16:45:11

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2309.17400v2

Learning Spatio-Temporal Patterns of Polar Ice Layers With Physics-Informed Graph Neural Network

Learning spatio-temporal patterns of polar ice layers is crucial for monitoring the change in ice sheet balance and evaluating ice dynamic processes. While a few researchers focus on learning ice layer patterns from echogram images captured by airborne snow radar sensors via different convolutional neural networks, the noise in the echogram images proves to be a major obstacle. Instead, we focus on geometric deep learning based on graph neural networks to learn the spatio-temporal patterns from thickness information of shallow ice layers and make predictions for deep layers. In this paper, we propose a physics-informed hybrid graph neural network that combines the GraphSAGE framework for graph feature learning with the long short-term memory (LSTM) structure for learning temporal changes, and introduce measurements of physical ice properties from Model Atmospheric Regional (MAR) weather model as physical node features. We found that our proposed network can consistently outperform the current non-inductive or non-physical model in predicting deep ice layer thickness.

Updated: 2024-06-21 16:41:02

标题: 用物理信息图神经网络学习极地冰层的时空模式

摘要: 学习极地冰层的时空模式对于监测冰盖平衡的变化和评估冰动力过程至关重要。虽然一些研究人员专注于通过不同的卷积神经网络从由空中雪雷达传感器捕获的回波图像学习冰层模式，但回波图像中的噪音证明是一个重要障碍。相反，我们专注于基于图神经网络的几何深度学习，从浅冰层厚度信息中学习时空模式，并为深层进行预测。在本文中，我们提出了一种基于物理信息的混合图神经网络，将GraphSAGE框架用于图特征学习，将长短期记忆（LSTM）结构用于学习时间变化，并引入来自Model Atmospheric Regional（MAR）天气模型的物理冰特性测量作为物理节点特征。我们发现，我们提出的网络在预测深冰层厚度方面始终优于当前的非归纳或非物理模型。

更新时间: 2024-06-21 16:41:02

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.15299v1

Grants4Companies: Applying Declarative Methods for Recommending and Reasoning About Business Grants in the Austrian Public Administration (System Description)

We describe the methods and technologies underlying the application Grants4Companies. The application uses a logic-based expert system to display a list of business grants suitable for the logged-in business. To evaluate suitability of the grants, formal representations of their conditions are evaluated against properties of the business, taken from the registers of the Austrian public administration. The logical language for the representations of the grant conditions is based on S-expressions. We further describe a Proof of Concept implementation of reasoning over the formalised grant conditions. The proof of concept is implemented in Common Lisp and interfaces with a reasoning engine implemented in Scryer Prolog. The application has recently gone live and is provided as part of the Business Service Portal by the Austrian Federal Ministry of Finance.

Updated: 2024-06-21 16:38:02

标题: 《Grants4Companies：应用声明性方法推荐和理解奥地利公共管理中的商业补助（系统描述）》

摘要: 我们描述了应用程序Grants4Companies的方法和技术。该应用程序使用基于逻辑的专家系统来显示适合登录企业的商业补助金列表。为了评估补助金的适用性，将其条件的形式表示与来自奥地利公共管理注册的企业属性进行评估。用于表示补助金条件的逻辑语言基于S表达式。我们进一步描述了对形式化补助金条件进行推理的概念证明实现。概念证明是在Common Lisp中实现的，并与Scryer Prolog中实现的推理引擎进行接口。该应用程序最近已上线，并作为奥地利财政部的企业服务门户的一部分提供。

更新时间: 2024-06-21 16:38:02

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2406.15293v1

Pessimistic asynchronous sampling in high-cost Bayesian optimization

Asynchronous Bayesian optimization is a recently implemented technique that allows for parallel operation of experimental systems and disjointed workflows. Contrasting with serial Bayesian optimization which individually selects experiments one at a time after conducting a measurement for each experiment, asynchronous policies sequentially assign multiple experiments before measurements can be taken and evaluate new measurements continuously as they are made available. This technique allows for faster data generation and therefore faster optimization of an experimental space. This work extends the capabilities of asynchronous optimization methods beyond prior studies by evaluating four additional policies that incorporate pessimistic predictions in the training data set. Combined with a conventional greedy policy, the five total policies were evaluated in a simulated environment and benchmarked with serial sampling. Under some conditions and parameter space dimensionalities, the pessimistic asynchronous policy reached optimum experimental conditions in significantly fewer experiments than equivalent serial policies and proved to be less susceptible to convergence onto local optima at higher dimensions. Without accounting for the faster sampling rate, the pessimistic asynchronous algorithm presented in this work could result in more efficient algorithm driven optimization of high-cost experimental spaces. Accounting for sampling rate, the presented asynchronous algorithm could allow for faster optimization in experimental spaces where multiple experiments can be run before results are collected.

Updated: 2024-06-21 16:35:27

标题: 高成本贝叶斯优化中悲观的异步采样

摘要: 异步贝叶斯优化是一种最近实施的技术，允许实验系统和不连续的工作流的并行操作。与串行贝叶斯优化相比，串行贝叶斯优化在每次实验之后逐个选择实验，异步策略在测量之前将多个实验顺序分配，并在可用时连续评估新的测量结果。该技术允许更快的数据生成，从而更快地优化实验空间。本研究通过评估四种集成悲观预测的附加策略，扩展了异步优化方法的能力，结合传统的贪婪策略，在模拟环境中评估了五种总体策略，并与串行采样进行了基准测试。在某些条件和参数空间维度下，悲观异步策略比等效串行策略在较少的实验中达到最佳实验条件，并且在更高维度下对局部最优解的收敛更不敏感。不考虑更快的采样速率，本研究中提出的悲观异步算法可能导致更高成本实验空间的更有效的算法驱动的优化。考虑到采样速率，所提出的异步算法可能允许在多个实验运行之前收集结果的实验空间中更快地优化。

更新时间: 2024-06-21 16:35:27

领域: cs.LG,J.m

下载: http://arxiv.org/abs/2406.15291v1

FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection

Early and accurate detection of anomalous events on the freeway, such as accidents, can improve emergency response and clearance. However, existing delays and errors in event identification and reporting make it a difficult problem to solve. Current large-scale freeway traffic datasets are not designed for anomaly detection and ignore these challenges. In this paper, we introduce the first large-scale lane-level freeway traffic dataset for anomaly detection. Our dataset consists of a month of weekday radar detection sensor data collected in 4 lanes along an 18-mile stretch of Interstate 24 heading toward Nashville, TN, comprising over 3.7 million sensor measurements. We also collect official crash reports from the Nashville Traffic Management Center and manually label all other potential anomalies in the dataset. To show the potential for our dataset to be used in future machine learning and traffic research, we benchmark numerous deep learning anomaly detection models on our dataset. We find that unsupervised graph neural network autoencoders are a promising solution for this problem and that ignoring spatial relationships leads to decreased performance. We demonstrate that our methods can reduce reporting delays by over 10 minutes on average while detecting 75% of crashes. Our dataset and all preprocessing code needed to get started are publicly released at https://vu.edu/ft-aed/ to facilitate future research.

Updated: 2024-06-21 16:27:17

标题: FT-AED：用于高速公路交通异常事件早期检测的基准数据集

摘要: 高速公路上异常事件的早期和准确检测，如事故，可以改善应急响应和清理。然而，现有的事件识别和报告存在的延迟和错误使这成为一个难题。当前大规模的高速公路交通数据集并未设计用于异常检测，忽略了这些挑战。在本文中，我们介绍了第一个用于异常检测的大规模车道级高速公路交通数据集。我们的数据集包括在18英里长的通过田纳西州纳什维尔的24号州际公路上的4条车道上收集的一个月的工作日雷达检测传感器数据，包括超过370万个传感器测量。我们还收集了纳什维尔交通管理中心的官方事故报告，并手动标记了数据集中的所有其他潜在异常。为了展示我们的数据集在未来机器学习和交通研究中的潜力，我们在我们的数据集上对许多深度学习异常检测模型进行了基准测试。我们发现，无监督的图神经网络自动编码器是这个问题的一个有前途的解决方案，并且忽略空间关系会导致性能下降。我们证明了我们的方法可以将平均报告延迟减少超过10分钟，同时检测到75%的事故。我们的数据集和所有必要的预处理代码已公开发布在https://vu.edu/ft-aed/，以促进未来研究。

更新时间: 2024-06-21 16:27:17

领域: cs.LG

下载: http://arxiv.org/abs/2406.15283v1

Computing Optimal Manipulations in Cryptographic Self-Selection Proof-of-Stake Protocols

Cryptographic Self-Selection is a paradigm employed by modern Proof-of-Stake consensus protocols to select a block-proposing "leader." Algorand [Chen and Micali, 2019] proposes a canonical protocol, and Ferreira et al. [2022] establish bounds $f(\alpha,\beta)$ on the maximum fraction of rounds a strategic player can lead as a function of their stake $\alpha$ and a network connectivity parameter $\beta$. While both their lower and upper bounds are non-trivial, there is a substantial gap between them (for example, they establish $f(10\%,1) \in [10.08\%, 21.12\%]$), leaving open the question of how significant of a concern these manipulations are. We develop computational methods to provably nail $f(\alpha,\beta)$ for any desired $(\alpha,\beta)$ up to arbitrary precision, and implement our method on a wide range of parameters (for example, we confirm $f(10\%,1) \in [10.08\%, 10.15\%]$). Methodologically, estimating $f(\alpha,\beta)$ can be phrased as estimating to high precision the value of a Markov Decision Process whose states are countably-long lists of real numbers. Our methodological contributions involve (a) reformulating the question instead as computing to high precision the expected value of a distribution that is a fixed-point of a non-linear sampling operator, and (b) provably bounding the error induced by various truncations and sampling estimations of this distribution (which appears intractable to solve in closed form). One technical challenge, for example, is that natural sampling-based estimates of the mean of our target distribution are \emph{not} unbiased estimators, and therefore our methods necessarily go beyond claiming sufficiently-many samples to be close to the mean.

Updated: 2024-06-21 16:20:39

标题: 计算密码学自选择权益证明协议中的最佳操作

摘要: 密码学自选择是现代权益证明共识协议用来选择区块提议“领导者”的范式。Algorand [Chen and Micali, 2019]提出了一个经典协议，而Ferreira等人 [2022]建立了关于最大分数$f(\alpha,\beta)$的界限，作为一个策略性玩家可以作为他们的股权$\alpha$和网络连接参数$\beta$的函数引导的轮数的最大部分。尽管他们的下界和上界都是非平凡的，但它们之间存在着实质性的差距（例如，他们建立了$f(10\%,1) \in [10.08\%, 21.12\%]$），这让人们对这些操控有多大影响感到担忧。我们开发了计算方法，可以确切地确定任何所需的$(\alpha,\beta)$的$f(\alpha,\beta)$，直至任意精度，并将我们的方法应用于广泛的参数范围（例如，我们确认$f(10\%,1) \in [10.08\%, 10.15\%]$）。方法论上，估计$f(\alpha,\beta)$可以被表述为高精度估计一个马尔可夫决策过程的价值，其状态是实数的可数长列表。我们的方法论贡献包括(a)重新构思问题，而是计算一个非线性抽样算子的不动点的期望值，以高精度计算，以及(b)证明由各种截断和抽样估计而引起的误差的界限（这在闭式解中似乎难以解决）。例如，一个技术挑战是，我们的目标分布的自然基于抽样的均值估计\emph{不}是无偏估计，因此我们的方法必然超越声称足够多的样本接近均值。

更新时间: 2024-06-21 16:20:39

领域: cs.GT,cs.CR,econ.TH,G.3

下载: http://arxiv.org/abs/2406.15282v1

Cross-Modality Safety Alignment

As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.

Updated: 2024-06-21 16:14:15

标题: 跨模态安全对齐

摘要: 随着人工通用智能（AGI）日益融入人类生活的各个方面，确保这些系统的安全性和道德一致性至关重要。先前的研究主要集中在单模态威胁上，但考虑到跨模态交互的综合和复杂性质，这可能不足以满足要求。我们引入了一个名为“安全输入但不安全输出”（SIUO）的新型安全调整挑战，以评估跨模态安全调整。具体而言，它考虑了单一模态独立安全的情况，但当结合时可能导致不安全或不道德的输出。为了在实证上调查这个问题，我们开发了SIUO，一个跨模态基准，涵盖了9个关键的安全领域，如自残、非法活动和侵犯隐私。我们的研究结果揭示了闭源和开源LVLMs（如GPT-4V和LLaVA）在现实场景中可靠解释和响应复杂问题的能力不足，存在显著的安全漏洞。

更新时间: 2024-06-21 16:14:15

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.15279v1

Towards Robust Training Datasets for Machine Learning with Ontologies: A Case Study for Emergency Road Vehicle Detection

Countless domains rely on Machine Learning (ML) models, including safety-critical domains, such as autonomous driving, which this paper focuses on. While the black box nature of ML is simply a nuisance in some domains, in safety-critical domains, this makes ML models difficult to trust. To fully utilize ML models in safety-critical domains, it would be beneficial to have a method to improve trust in model robustness and accuracy without human experts checking each decision. This research proposes a method to increase trust in ML models used in safety-critical domains by ensuring the robustness and completeness of the model's training dataset. Because ML models embody what they are trained with, ensuring the completeness of training datasets can help to increase the trust in the training of ML models. To this end, this paper proposes the use of a domain ontology and an image quality characteristic ontology to validate the domain completeness and image quality robustness of a training dataset. This research also presents an experiment as a proof of concept for this method, where ontologies are built for the emergency road vehicle domain.

Updated: 2024-06-21 16:03:38

标题: 朝向使用本体构建稳健的机器学习训练数据集：应急道路车辆检测的案例研究

摘要: 无数领域依赖机器学习（ML）模型，包括自动驾驶等安全关键领域，本文重点关注该领域。尽管在某些领域中，ML的黑匣子性质仅仅是一个麻烦，但在安全关键领域中，这使得ML模型难以信任。为了充分利用ML模型在安全关键领域中，有必要提出一种方法，以提高对模型稳健性和准确性的信任，而无需人工专家检查每个决策。本研究提出了一种方法，通过确保模型的训练数据集的稳健性和完整性，来增加在安全关键领域中使用ML模型的信任度。因为ML模型体现了它们所训练的内容，确保训练数据集的完整性可以帮助增加对ML模型训练的信任。为此，本文提出了使用领域本体和图像质量特征本体来验证训练数据集的领域完整性和图像质量稳健性。本研究还提出了一项实验证明这种方法的概念，其中为紧急道路车辆领域构建了本体。

更新时间: 2024-06-21 16:03:38

领域: cs.AI

下载: http://arxiv.org/abs/2406.15268v1

A Tiny Transformer for Low-Power Arrhythmia Classification on Microcontrollers

Wearable systems for the continuous and real-time monitoring of cardiovascular diseases are becoming widespread and valuable assets in diagnosis and therapy. A promising approach for real-time analysis of the electrocardiographic (ECG) signal and the detection of heart conditions, such as arrhythmia, is represented by the transformer machine learning model. Transformers are powerful models for the classification of time series, although efficient implementation in the wearable domain raises significant design challenges, to combine adequate accuracy and a suitable complexity. In this work, we present a tiny transformer model for the analysis of the ECG signal, requiring only 6k parameters and reaching 98.97% accuracy in the recognition of the 5 most common arrhythmia classes from the MIT-BIH Arrhythmia database, assessed considering 8-bit integer inference as required for efficient execution on low-power microcontroller-based devices. We explored an augmentation-based training approach for improving the robustness against electrode motion artifacts noise, resulting in a worst-case post-deployment performance assessment of 98.36% accuracy. Suitability for wearable monitoring solutions is finally demonstrated through efficient deployment on the parallel ultra-low-power GAP9 processor, where inference execution requires 4.28ms and 0.09mJ.

Updated: 2024-06-21 15:55:13

标题: 一个用于微控制器低功耗心律失常分类的微型变压器

摘要: 可穿戴系统对心血管疾病的持续和实时监测变得越来越普遍和有价值，对诊断和治疗是宝贵的资产。一种有前途的方法是使用变压器机器学习模型进行实时分析心电图（ECG）信号和检测心脏状况，如心律失常。变压器是强大的模型，用于时间序列的分类，尽管在可穿戴领域的高效实现会带来重大的设计挑战，需要兼顾适当的准确性和适度的复杂性。在这项工作中，我们提出了一个微型变压器模型，用于分析ECG信号，只需6k个参数，并在识别MIT-BIH心律失常数据库中5个常见心律失常类别时达到98.97%的准确性，考虑了8位整数推理，以适应低功耗微控制器设备的高效执行。我们探索了一种基于增强训练方法，以提高对电极运动伪影噪音的鲁棒性，结果在部署后的最坏情况下性能评估达到了98.36%的准确性。最终通过在并行超低功耗GAP9处理器上的高效部署来展示适用于可穿戴监测解决方案，推理执行需要4.28毫秒和0.09毫焦。

更新时间: 2024-06-21 15:55:13

领域: eess.SP,cs.HC,cs.LG

下载: http://arxiv.org/abs/2402.10748v2

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of this filter, our method is unbiased relative to the target, while known discretisations of the Mirror Langevin dynamics including the Mirror Langevin algorithm have an asymptotic bias. For this algorithm, we also give upper bounds for the number of iterations taken to mix to a constrained distribution whose potential is relatively smooth, convex, and Lipschitz continuous with respect to a self-concordant mirror function. As a consequence of the reversibility of the Markov chain induced by the inclusion of the Metropolis-Hastings filter, we obtain an exponentially better dependence on the error tolerance for approximate constrained sampling. We also present numerical experiments that corroborate our theoretical findings.

Updated: 2024-06-21 15:52:52

标题: 使用Metropolis调整的Mirror Langevin算法从受限空间中快速抽样

摘要: 我们提出了一种新方法，称为Metropolis-adjusted Mirror Langevin算法，用于从支持是紧凸集的分布中进行近似抽样。该算法将接受-拒绝过滤器添加到由Mirror Langevin算法的单步引起的马尔可夫链中（Zhang等人，2020），这是Mirror Langevin动力学的基本离散化。由于包含了这个过滤器，我们的方法相对于目标是无偏的，而已知的Mirror Langevin动力学的离散化，包括Mirror Langevin算法，具有渐近偏差。对于这个算法，我们还给出了迭代次数的上限，以便混合到一个相对平滑、凸、并且相对于一个自共轭镜像函数是Lipschitz连续的约束分布。由于由Metropolis-Hastings过滤器引起的马尔可夫链的可逆性，我们获得了对于近似约束抽样的误差容忍度的指数级更好的依赖关系。我们还展示了证实我们理论发现的数值实验。

更新时间: 2024-06-21 15:52:52

领域: stat.CO,cs.DS,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2312.08823v3

V-RECS, a Low-Cost LLM4VIS Recommender with Explanations, Captioning and Suggestions

NL2VIS (natural language to visualization) is a promising and recent research area that involves interpreting natural language queries and translating them into visualizations that accurately represent the underlying data. As we navigate the era of big data, NL2VIS holds considerable application potential since it greatly facilitates data exploration by non-expert users. Following the increasingly widespread usage of generative AI in NL2VIS applications, in this paper we present V-RECS, the first LLM-based Visual Recommender augmented with explanations(E), captioning(C), and suggestions(S) for further data exploration. V-RECS' visualization narratives facilitate both response verification and data exploration by non-expert users. Furthermore, our proposed solution mitigates computational, controllability, and cost issues associated with using powerful LLMs by leveraging a methodology to effectively fine-tune small models. To generate insightful visualization narratives, we use Chain-of-Thoughts (CoT), a prompt engineering technique to help LLM identify and generate the logical steps to produce a correct answer. Since CoT is reported to perform poorly with small LLMs, we adopted a strategy in which a large LLM (GPT-4), acting as a Teacher, generates CoT-based instructions to fine-tune a small model, Llama-2-7B, which plays the role of a Student. Extensive experiments-based on a framework for the quantitative evaluation of AI-based visualizations and on manual assessment by a group of participants-show that V-RECS achieves performance scores comparable to GPT-4, at a much lower cost. The efficacy of the V-RECS teacher-student paradigm is also demonstrated by the fact that the un-tuned Llama fails to perform the task in the vast majority of test cases. We release V-RECS for the visualization community to assist visualization designers throughout the entire visualization generation process.

Updated: 2024-06-21 15:50:10

标题: V-RECS，一个具有解释、字幕和建议功能的低成本LLM4VIS推荐系统

摘要: NL2VIS（自然语言到可视化）是一个有前景的、最近的研究领域，涉及解释自然语言查询并将其翻译成准确表示基础数据的可视化。在我们进入大数据时代的同时，NL2VIS具有相当大的应用潜力，因为它极大地便利了非专家用户的数据探索。随着生成式人工智能在NL2VIS应用中的越来越广泛的使用，在本文中我们提出了V-RECS，这是第一个基于LLM的可视化推荐系统，增加了解释（E）、字幕（C）和进一步数据探索的建议（S）。V-RECS的可视化叙述有助于非专家用户进行响应验证和数据探索。此外，我们提出的解决方案通过利用一种有效微调小型模型的方法，缓解了使用强大LLM所带来的计算、可控性和成本问题。为了生成富有洞见的可视化叙述，我们使用了Chain-of-Thoughts（CoT），这是一种提示工程技术，帮助LLM识别和生成产生正确答案所需的逻辑步骤。由于CoT在小型LLM上表现不佳，我们采用了一种策略，即一个大型LLM（GPT-4）作为教师生成基于CoT的指导，来微调一个小模型Llama-2-7B，扮演学生的角色。基于一个用于定量评估基于人工智能的可视化和由一组参与者进行手动评估的框架的大量实验表明，V-RECS在成本大大降低的情况下实现了与GPT-4相当的性能得分。V-RECS师生范式的有效性也通过未微调的Llama在绝大多数测试案例中无法完成任务的事实得到证明。我们发布V-RECS供可视化社区使用，以协助可视化设计师在整个可视化生成过程中。

更新时间: 2024-06-21 15:50:10

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.15259v1

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train MantisScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between MantisScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that MantisScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe MantisScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.

Updated: 2024-06-21 15:43:46

标题: MantisScore：构建用于视频生成的自动度量标准以模拟精细的人类反馈

摘要: 近年来，视频生成技术取得了巨大进展。然而，自动生成视频质量评估指标的发展明显滞后。目前没有任何现有的指标能够在生成的视频上提供可靠的评分。主要障碍是缺乏大规模的人工注释数据集。本文发布了VideoFeedback，这是第一个包含人类提供的多方面评分的大规模数据集，涵盖了来自11种现有视频生成模型的37.6K合成视频。我们基于VideoFeedback训练了MantisScore（从Mantis初始化），以实现自动视频质量评估。实验表明，MantisScore与人类之间的Spearman相关性在VideoFeedback-test上可以达到77.1，比之前最佳指标高出约50个点。在其他保留数据集EvalCrafter、GenAI-Bench和VBench上的进一步结果显示，MantisScore与人类评委之间的相关性始终比其他指标高得多。基于这些结果，我们相信MantisScore可以作为人类评分者的一个很好的代理，用于（1）评价不同的视频模型以跟踪进展，（2）在强化学习与人类反馈（RLHF）中模拟细粒度的人类反馈，以改进当前的视频生成模型。

更新时间: 2024-06-21 15:43:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.15252v1

Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency

We consider achieving equivariance in machine learning systems via frame averaging. Current frame averaging methods involve a costly sum over large frames or rely on sampling-based approaches that only yield approximate equivariance. Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant. The general foundations of MFA also allow us to extend frame averaging to more groups than previously considered, including the Lorentz group for describing symmetries in space-time, and the unitary group for complex-valued domains. Results demonstrate the efficiency and effectiveness of encoding symmetries via MFA across a diverse range of tasks, including $n$-body simulation, top tagging in collider physics, and relaxed energy prediction. Our code is available at https://github.com/divelab/MFA.

Updated: 2024-06-21 15:43:36

标题: 通过最小帧平均实现等变性，实现更多对称性和效率

摘要: 我们考虑通过帧平均实现机器学习系统中的等变性。当前的帧平均方法涉及对大帧的昂贵求和，或依赖于仅产生近似等变性的基于采样的方法。在这里，我们提出了最小帧平均（MFA），这是一个数学框架，用于构建可证明的最小帧，这些帧完全等变。MFA的一般基础还使我们能够将帧平均扩展到比以前考虑的更多的群体，包括用于描述时空对称性的洛伦兹群，以及用于复值域的酉群。结果表明，通过MFA对各种任务（包括n体模拟、对撞机物理中的顶标记和松弛能量预测）进行对称性编码的效率和有效性。我们的代码可在https://github.com/divelab/MFA上找到。

更新时间: 2024-06-21 15:43:36

领域: cs.LG

下载: http://arxiv.org/abs/2406.07598v4

Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning

Reinforcement Learning (RL) has shown great empirical success in various application domains. The theoretical aspects of the problem have been extensively studied over past decades, particularly under tabular and linear Markov Decision Process structures. Recently, non-linear function approximation using kernel-based prediction has gained traction. This approach is particularly interesting as it naturally extends the linear structure, and helps explain the behavior of neural-network-based models at their infinite width limit. The analytical results however do not adequately address the performance guarantees for this case. We will highlight this open problem, overview existing partial results, and discuss related challenges.

Updated: 2024-06-21 15:43:02

标题: 开放问题：基于核的强化学习的最佳次序遗憾界限

摘要: 强化学习（RL）在各种应用领域取得了巨大的经验成功。过去几十年来，已经对问题的理论方面进行了广泛研究，特别是在表格化和线性马尔可夫决策过程结构下。最近，使用基于核的预测进行非线性函数逼近已经引起了关注。这种方法特别有趣，因为它自然地扩展了线性结构，并有助于解释神经网络模型在其无限宽度限制下的行为。然而，分析结果并没有充分解决这种情况下的性能保证问题。我们将重点介绍这个未解决的问题，概述现有的部分结果，并讨论相关的挑战。

更新时间: 2024-06-21 15:43:02

领域: cs.LG

下载: http://arxiv.org/abs/2406.15250v1

Unsupervised Morphological Tree Tokenizer

As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To address this drawback, we introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of words. Specifically, the deep model jointly encodes internal structures and representations of words with a mechanism named $\textit{MorphOverriding}$ to ensure the indecomposability of morphemes. By training the model with self-supervised objectives, our method is capable of inducing character-level structures that align with morphological rules without annotated training data. Based on the induced structures, our algorithm tokenizes words through vocabulary matching in a top-down manner. Empirical results indicate that the proposed method effectively retains complete morphemes and outperforms widely adopted methods such as BPE and WordPiece on both morphological segmentation tasks and language modeling tasks. The code will be released later.

Updated: 2024-06-21 15:35:49

标题: 无监督形态树分词器

摘要: 作为语言建模的基石，分词涉及将文本输入分割成预定义的原子单位。传统的统计分词器经常破坏单词内部的组成边界，从而破坏语义信息。为了解决这个缺点，我们引入了形态结构指导到分词，并提出了一个深度模型来诱导单词的字符级结构。具体地，深度模型联合编码单词的内部结构和表示，通过一种名为MorphOverriding的机制来确保形态素的不可分解性。通过使用自监督目标训练模型，我们的方法能够诱导与形态规则一致的字符级结构，而无需注释的训练数据。基于诱导的结构，我们的算法通过自上而下的方式通过词汇匹配来分词。实证结果表明，所提出的方法有效地保留完整的形态素，并在形态分割任务和语言建模任务上优于广泛采用的方法，如BPE和WordPiece。代码将稍后发布。

更新时间: 2024-06-21 15:35:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.15245v1

Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models

Do large language models (LLMs) know the law? These models are increasingly being used to augment legal practice, education, and research, yet their revolutionary potential is threatened by the presence of hallucinations -- textual output that is not consistent with legal facts. We present the first systematic evidence of these hallucinations, documenting LLMs' varying performance across jurisdictions, courts, time periods, and cases. Our work makes four key contributions. First, we develop a typology of legal hallucinations, providing a conceptual framework for future research in this area. Second, we find that legal hallucinations are alarmingly prevalent, occurring between 58% of the time with ChatGPT 4 and 88% with Llama 2, when these models are asked specific, verifiable questions about random federal court cases. Third, we illustrate that LLMs often fail to correct a user's incorrect legal assumptions in a contra-factual question setup. Fourth, we provide evidence that LLMs cannot always predict, or do not always know, when they are producing legal hallucinations. Taken together, our findings caution against the rapid and unsupervised integration of popular LLMs into legal tasks. Even experienced lawyers must remain wary of legal hallucinations, and the risks are highest for those who stand to benefit from LLMs the most -- pro se litigants or those without access to traditional legal resources.

Updated: 2024-06-21 15:32:27

标题: 大型法律虚构：剖析大型语言模型中的法律幻觉

摘要: 大型语言模型（LLMs）是否了解法律？这些模型越来越被用来增强法律实践、教育和研究，然而它们的革命潜力受到幻觉的威胁——文本输出与法律事实不一致。我们提出了第一个系统性证据，记录了LLMs在不同司法管辖区、法院、时间段和案例中的表现。我们的工作做出了四项关键贡献。首先，我们制定了法律幻觉的分类学，为今后在这一领域进行研究提供了概念框架。其次，我们发现法律幻觉异常普遍，当这些模型被问及关于随机联邦法院案例的具体、可验证问题时，ChatGPT 4的发生率为58%，Llama 2的发生率为88%。第三，我们说明LLMs经常未能纠正用户在反事实问题设置中的错误法律假设。第四，我们提供证据表明LLMs并不总是能够预测或知道自己何时产生法律幻觉。总的来说，我们的发现警告不要迅速和无监督地将流行的LLMs整合到法律任务中。即使是经验丰富的律师也必须对法律幻觉保持警惕，而对于那些最有可能从LLMs中受益的人来说，风险最高——如无律师代理人或无法接触传统法律资源的人。

更新时间: 2024-06-21 15:32:27

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2401.01301v2

Reinforcement Learning with Latent State Inference for Autonomous On-ramp Merging under Observation Delay

This paper presents a novel approach to address the challenging problem of autonomous on-ramp merging, where a self-driving vehicle needs to seamlessly integrate into a flow of vehicles on a multi-lane highway. We introduce the Lane-keeping, Lane-changing with Latent-state Inference and Safety Controller (L3IS) agent, designed to perform the on-ramp merging task safely without comprehensive knowledge about surrounding vehicles' intents or driving styles. We also present an augmentation of this agent called AL3IS that accounts for observation delays, allowing the agent to make more robust decisions in real-world environments with vehicle-to-vehicle (V2V) communication delays. By modeling the unobservable aspects of the environment through latent states, such as other drivers' intents, our approach enhances the agent's ability to adapt to dynamic traffic conditions, optimize merging maneuvers, and ensure safe interactions with other vehicles. We demonstrate the effectiveness of our method through extensive simulations generated from real traffic data and compare its performance with existing approaches. L3IS shows a 99.90% success rate in a challenging on-ramp merging case generated from the real US Highway 101 data. We further perform a sensitivity analysis on AL3IS to evaluate its robustness against varying observation delays, which demonstrates an acceptable performance of 93.84% success rate in 1-second V2V communication delay.

Updated: 2024-06-21 15:31:50

标题: 使用潜在状态推断的强化学习在观察延迟下自主匝道合并

摘要: 本文提出了一种新颖的方法来解决自主匝道合并的难题，其中自动驾驶车辆需要无缝地融入多车道高速公路上的车流。我们引入了Lane-keeping, Lane-changing with Latent-state Inference and Safety Controller (L3IS)代理，旨在在没有关于周围车辆意图或驾驶风格的全面知识的情况下安全地执行匝道合并任务。我们还介绍了这个代理的增强版AL3IS，考虑了观测延迟，使代理能够在具有车辆之间通信延迟的真实环境中做出更加稳健的决策。通过通过潜在状态建模环境的不可观察方面，如其他驾驶员的意图，我们的方法增强了代理适应动态交通条件、优化合并操纵并确保与其他车辆的安全交互的能力。我们通过从真实交通数据生成的大量模拟来展示我们方法的有效性，并将其性能与现有方法进行比较。L3IS在从真实美国101号高速公路数据生成的具有挑战性的匝道合并案例中显示出99.90%的成功率。我们进一步对AL3IS进行了敏感性分析，以评估其对不同观测延迟的稳健性，结果显示在1秒车辆之间通信延迟下93.84%的成功率表现是可接受的。

更新时间: 2024-06-21 15:31:50

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.11852v3

Large Batch Analysis for Adagrad Under Anisotropic Smoothness

Adaptive gradient algorithms have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite their huge success in practice, their theoretical advantages over stochastic gradient descent (SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice. This is because the only theoretical result that can demonstrate the benefit of Adagrad over SGD was obtained in the original paper of Adagrad for nonsmooth objective functions. However, for nonsmooth objective functions, there can be a linear slowdown of convergence when batch size increases, and thus a convergence analysis based on nonsmooth assumption cannot be used for large batch algorithms. In this work, we resolve this gap between theory and practice by providing a new analysis of Adagrad on both convex and nonconvex smooth objectives suitable for the large batch setting. It is shown that under the anisotropic smoothness and noise conditions, increased batch size does not slow down convergence for Adagrad, and thus it can still achieve a faster convergence guarantee over SGD even in the large batch setting. We present detailed comparisons between SGD and Adagrad to provide a better understanding of the benefits of adaptive gradient methods. Experiments in logistic regression and instruction following fine-tuning tasks provide strong evidence to support our theoretical analysis.

Updated: 2024-06-21 15:29:31

标题: Adagrad在各向异性平滑度下的大批量分析

摘要: 自适应梯度算法在训练大规模深度神经网络中得到了广泛应用，尤其是大型基础模型。尽管在实践中取得了巨大成功，但它们相对于随机梯度下降（SGD）的理论优势尚未被完全理解，特别是在实践中常用的大批量设置中。这是因为唯一能够证明Adagrad相对于SGD的好处的理论结果是在Adagrad的原始论文中获得的，用于非光滑目标函数。然而，对于非光滑目标函数，当批量大小增加时，可能会出现收敛速度减慢，因此基于非光滑假设的收敛分析无法用于大批量算法。在这项工作中，我们通过对适用于大批量设置的凸和非凸光滑目标的Adagrad进行新的分析，解决了理论和实践之间的差距。结果表明，在各向异性光滑性和噪声条件下，增加批量大小不会减慢Adagrad的收敛速度，因此即使在大批量设置下，它仍然可以实现比SGD更快的收敛保证。我们对SGD和Adagrad进行了详细的比较，以更好地理解自适应梯度方法的好处。逻辑回归和指令跟踪微调任务的实验提供了强有力的证据支持我们的理论分析。

更新时间: 2024-06-21 15:29:31

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.15244v1

Understanding Ethereum Mempool Security under Asymmetric DoS by Symbolized Stateful Fuzzing

In blockchains, mempool controls transaction flow before consensus, denial of whose service hurts the health and security of blockchain networks. This paper presents MPFUZZ, the first mempool fuzzer to find asymmetric DoS bugs by symbolically exploring mempool state space and optimistically estimating the promisingness an intermediate state is in reaching bug oracles. Compared to the baseline blockchain fuzzers, MPFUZZ achieves a > 100x speedup in finding known DETER exploits. Running MPFUZZ on six major Ethereum clients leads to the discovering of new mempool vulnerabilities, which exhibit a wide variety of sophisticated patterns including stealthy mempool eviction and mempool locking. Rule-based mitigation schemes are proposed against newly discovered vulnerabilities.

Updated: 2024-06-21 15:24:33

标题: 理解以符号化有状态模糊测试为基础的不对称DoS情况下的以太坊内存池安全

摘要: 在区块链中，mempool在共识之前控制交易流动，其服务的拒绝会损害区块链网络的健康和安全。本文介绍了MPFUZZ，这是第一个通过符号探索mempool状态空间并乐观估计中间状态在达到漏洞神谕方面的潜在性来发现不对称DoS漏洞的mempool fuzzer。与基准区块链fuzzer相比，MPFUZZ在发现已知DETER漏洞方面实现了>100倍的加速。在六个主要以太坊客户端上运行MPFUZZ导致发现新的mempool漏洞，这些漏洞展示了各种复杂的模式，包括隐密的mempool驱逐和mempool锁定。针对新发现的漏洞提出了基于规则的缓解方案。

更新时间: 2024-06-21 15:24:33

领域: cs.CR

下载: http://arxiv.org/abs/2312.02642v3

Detecting Synthetic Lyrics with Few-Shot Inference

In recent years, generated content in music has gained significant popularity, with large language models being effectively utilized to produce human-like lyrics in various styles, themes, and linguistic structures. This technological advancement supports artists in their creative processes but also raises issues of authorship infringement, consumer satisfaction and content spamming. To address these challenges, methods for detecting generated lyrics are necessary. However, existing works have not yet focused on this specific modality or on creative text in general regarding machine-generated content detection methods and datasets. In response, we have curated the first dataset of high-quality synthetic lyrics and conducted a comprehensive quantitative evaluation of various few-shot content detection approaches, testing their generalization capabilities and complementing this with a human evaluation. Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods, which are shown competitive in other domains at distinguishing human-written from machine-generated content. It also shows good generalization capabilities to new artists and models, and effectively detects post-generation paraphrasing. This study emphasizes the need for further research on creative content detection, particularly in terms of generalization and scalability with larger song catalogs. All datasets, pre-processing scripts, and code are available publicly on GitHub and Hugging Face under the Apache 2.0 license.

Updated: 2024-06-21 15:19:21

标题: 用少样本推断检测合成歌词

摘要: 近年来，音乐中生成的内容已经获得了显着的流行度，大型语言模型被有效利用来产生各种风格、主题和语言结构的类人歌词。这种技术进步支持艺术家在其创作过程中，但也引发了关于版权侵权、消费者满意度和内容垃圾邮件等问题。为了解决这些挑战，检测生成歌词的方法是必要的。然而，现有的研究尚未专注于这种特定模式或机器生成内容检测方法和数据集方面的创意文本。为此，我们精选了第一个高质量合成歌词数据集，并对各种少样本内容检测方法进行了全面的定量评估，测试它们的泛化能力，并结合人工评估。我们基于LLM2Vec的最佳少样本检测器超越了在其他领域中显示出竞争力的风格和统计方法，可以区分人写的内容和机器生成的内容。它还表现出对新艺术家和模型的良好泛化能力，并有效地检测后生成的改写。这项研究强调了对创意内容检测的进一步研究的必要性，特别是在泛化和与更大歌曲目录的可扩展性方面。所有数据集、预处理脚本和代码都在GitHub和Hugging Face上以Apache 2.0许可证公开可用。

更新时间: 2024-06-21 15:19:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.15231v1

ExDAG: Exact learning of DAGs

There has been a growing interest in causal learning in recent years. Commonly used representations of causal structures, including Bayesian networks and structural equation models (SEM), take the form of directed acyclic graphs (DAGs). We provide a novel mixed-integer quadratic programming formulation and associated algorithm that identifies DAGs on up to 50 vertices, where these are identifiable. We call this method ExDAG, which stands for Exact learning of DAGs. Although there is a superexponential number of constraints that prevent the formation of cycles, the algorithm adds constraints violated by solutions found, rather than imposing all constraints in each continuous-valued relaxation. Our empirical results show that ExDAG outperforms local state-of-the-art solvers in terms of precision and outperforms state-of-the-art global solvers with respect to scaling, when considering Gaussian noise. We also provide validation with respect to other noise distributions.

Updated: 2024-06-21 15:15:38

标题: ExDAG：DAGs的精确学习

摘要: 近年来，对因果学习的兴趣日益增长。常用的因果结构表示，包括贝叶斯网络和结构方程模型（SEM），采用有向无环图（DAG）的形式。我们提供了一种新颖的混合整数二次规划公式和相关算法，可以识别包含最多50个顶点的DAG，只要这些DAG是可识别的。我们将这种方法称为ExDAG，即精确学习DAG。尽管存在一种超指数数量的约束阻止了循环的形成，但该算法会添加被发现的解违反的约束，而不是在每个连续值放松中强加所有约束。我们的实验结果表明，在考虑高斯噪声时，ExDAG在精度方面优于当地最先进的解算器，且在扩展性方面优于最先进的全局解算器。我们还提供了关于其他噪声分布的验证。

更新时间: 2024-06-21 15:15:38

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.15229v1

Fine-grained analysis of non-parametric estimation for pairwise learning

In this paper, we are concerned with the generalization performance of non-parametric estimation for pairwise learning. Most of the existing work requires the hypothesis space to be convex or a VC-class, and the loss to be convex. However, these restrictive assumptions limit the applicability of the results in studying many popular methods, especially kernel methods and neural networks. We significantly relax these restrictive assumptions and establish a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses. Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC maximization, pairwise regression, and metric and similarity learning. As an application, we apply our general results to study pairwise least squares regression and derive an excess generalization bound that matches the minimax lower bound for pointwise least squares regression up to a logrithmic term. The key novelty here is to construct a structured deep ReLU neural network as an approximation of the true predictor and design the targeted hypothesis space consisting of the structured networks with controllable complexity. This successful application demonstrates that the obtained general results indeed help us to explore the generalization performance on a variety of problems that cannot be handled by existing approaches.

Updated: 2024-06-21 15:10:29

标题: 细粒度的非参数估计对成对学习的分析

摘要: 在这篇论文中，我们关注非参数估计在配对学习中的泛化性能。大部分现有研究要求假设空间是凸的或者是VC类，并且损失是凸的。然而，这些限制性假设限制了结果在研究许多流行方法，特别是核方法和神经网络时的适用性。我们显著放宽了这些限制性假设，并为Lipschitz连续的配对损失建立了经验最小化器的明晰Oracle不等式，适用于一般假设空间。我们的结果可以用于处理一系列配对学习问题，包括排名、AUC最大化、配对回归以及度量和相似性学习。作为一个应用，我们将我们的一般结果应用于研究配对最小二乘回归，并推导出一个超出泛化界限，与点对最小二乘回归的极小下界相匹配，多出一个对数项。关键的创新点在于构建一个结构化的深度ReLU神经网络作为真实预测器的近似，并设计一个由具有可控复杂性的结构化网络组成的目标假设空间。这个成功的应用表明，得到的一般结果确实帮助我们探索一系列问题的泛化性能，这些问题无法通过现有方法处理。

更新时间: 2024-06-21 15:10:29

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.19640v2

Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting

Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. This paper addresses these limitations by proposing a Deep Reinforcement Learning (DRL) framework for UAV path planning with assured connectivity (DUPAC). During UAV flight, DUPAC determines the best route from a defined source to the destination in terms of distance and signal quality. The viability and performance of DUPAC are evaluated under simulated real-world urban scenarios using the Unity framework. The results confirm that DUPAC achieves an autonomous UAV flight path similar to base method with only 2% increment while maintaining an average 9% better connection quality throughout the flight.

Updated: 2024-06-21 15:10:25

标题: 在密集城市环境中具有可靠连接性的深度无人机路径规划

摘要: 无人机（UAV）服务与5G连接是一个新兴领域，具有许多应用。操作员控制的UAV飞行和手动静态飞行配置是广泛采用UAV服务的扩展性的主要限制。几项服务依赖于与蜂窝网络的良好UAV连接，并在预定的飞行路径中保持连接是具有挑战性的。本文通过提出一种深度强化学习（DRL）框架，用于确保连接性的UAV路径规划（DUPAC），解决了这些限制。在UAV飞行过程中，DUPAC确定从定义的源到目的地的最佳路线，考虑距离和信号质量。使用Unity框架在模拟的真实世界城市场景下评估了DUPAC的可行性和性能。结果证实，DUPAC实现了类似基本方法的自主UAV飞行路径，仅增加了2％，同时在整个飞行过程中保持平均9％更好的连接质量。

更新时间: 2024-06-21 15:10:25

领域: cs.AI,cs.RO,eess.SP

下载: http://arxiv.org/abs/2406.15225v1

Sound and Fury, Signifying Nothing? Impact of Data Breach Disclosure Laws

Data breach disclosure (DBD) is presumed to improve firms' cybersecurity practices by inducing fear of subsequent revenue loss. This revenue loss, the theory goes, will occur if customers punish an offending firm by refusing to buy from them and is assumed to be the primary mechanism through which DBD laws will change firm behavior ex ante. However, our analysis of a large-scale data breach at a US retailer reveals no evidence of a decline in revenue. Using a difference-in-difference design on revenue data from 302 stores over a 20-week period around the breach disclosure, we found no evidence of a decline either across all stores or when sub-sampling by prior revenue size (to account for any heterogeneity in prior revenue size). Therefore, we posit that the presumed primary mechanism of DBD laws, and thus these laws may be ineffective and merely a lot of "sound and fury, signifying nothing."

Updated: 2024-06-21 14:57:49

标题: 声音和狂怒，意味着什么？数据泄露披露法律的影响

摘要: 数据泄露披露（DBD）被认为可以通过引发对随后收入损失的恐惧来改善公司的网络安全实践。理论认为，如果客户惩罚一家违法的公司拒绝购买他们的产品，将会导致收入损失，并假定这是DBD法律将在公司行为改变之前改变公司行为的主要机制。然而，我们对美国一家零售商的一次大规模数据泄露的分析未发现收入下降的证据。通过对泄露周围20周内302家门店的收入数据进行差异分析设计，我们发现无论是在所有门店还是在按先前收入规模进行子采样时（以解释先前收入规模的异质性），都没有发现下降的证据。因此，我们认为DBD法律的假定主要机制可能是无效的，因此这些法律可能只是“声音和怒火，意味着什么也没有”。

更新时间: 2024-06-21 14:57:49

领域: cs.CR,cs.CY,econ.GN,q-fin.EC,K.4; K.5; K.6

下载: http://arxiv.org/abs/2406.15215v1

Large Language Model-Enabled Multi-Agent Manufacturing Systems

Traditional manufacturing faces challenges adapting to dynamic environments and quickly responding to manufacturing changes. The use of multi-agent systems has improved adaptability and coordination but requires further advancements in rapid human instruction comprehension, operational adaptability, and coordination through natural language integration. Large language models like GPT-3.5 and GPT-4 enhance multi-agent manufacturing systems by enabling agents to communicate in natural language and interpret human instructions for decision-making. This research introduces a novel framework where large language models enhance the capabilities of agents in manufacturing, making them more adaptable, and capable of processing context-specific instructions. A case study demonstrates the practical application of this framework, showing how agents can effectively communicate, understand tasks, and execute manufacturing processes, including precise G-code allocation among agents. The findings highlight the importance of continuous large language model integration into multi-agent manufacturing systems and the development of sophisticated agent communication protocols for a more flexible manufacturing system.

Updated: 2024-06-21 14:54:46

标题: 大型语言模型支持的多智能体制造系统

摘要: 传统制造业面临着适应动态环境和快速响应制造变化的挑战。多智能体系统的使用已经提高了适应性和协调性，但需要在快速理解人类指令、操作适应性和通过自然语言整合协调方面取得进一步进展。像GPT-3.5和GPT-4这样的大语言模型通过使智能体能够用自然语言交流并解释人类指令来增强多智能体制造系统。本研究引入了一个新颖的框架，其中大语言模型增强了制造中智能体的能力，使它们更具适应性，并能够处理特定上下文的指令。一项案例研究展示了该框架的实际应用，展示了智能体如何有效地交流、理解任务并执行制造过程，包括在智能体之间对精确的G代码分配。研究结果突显了将大语言模型持续整合到多智能体制造系统中的重要性，以及为更灵活的制造系统开发复杂的智能体通信协议。

更新时间: 2024-06-21 14:54:46

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2406.01893v2

Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors

Recent advances in large text-conditional image generative models such as Stable Diffusion, Midjourney, and DALL-E 3 have revolutionized the field of image generation, allowing users to produce high-quality, realistic images from textual prompts. While these developments have enhanced artistic creation and visual communication, they also present an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence society and spread propaganda. In this paper, we demonstrate the possibility of such a bias injection threat by an adversary who backdoors such models with a small number of malicious data samples; the implemented backdoor is activated when special triggers exist in the input prompt of the backdoored models. On the other hand, the model's utility is preserved in the absence of the triggers, making the attack highly undetectable. We present a novel framework that enables efficient generation of poisoning samples with composite (multi-word) triggers for such an attack. Our extensive experiments using over 1 million generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases can bypass conventional detection mechanisms, highlighting the challenges in proving the existence of biases within operational constraints. Our cost analysis confirms the low financial barrier to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in text-to-image generation models.

Updated: 2024-06-21 14:53:19

标题: 通过复合触发后门在文本到图像模型中注入偏见

摘要: 最近在大型文本条件图像生成模型方面取得的进展，如稳定扩散、Midjourney和DALL-E 3已经彻底改变了图像生成领域，使用户能够从文本提示中生成高质量、逼真的图像。尽管这些发展增强了艺术创作和视觉交流，但它们也提供了一个未被充分探索的攻击机会：通过对生成的图像注入偏见，从而达到恶意意图，例如影响社会和传播宣传。在本文中，我们展示了通过少量恶意数据样本向这类模型后门注入偏见的可能性；当后门模型的输入提示中存在特殊触发器时，实施的后门将被激活。另一方面，在没有触发器的情况下，模型的实用性是得到保留的，使攻击高度不易检测。我们提出了一个新颖的框架，可以有效生成具有复合（多词）触发器的毒化样本，用于此类攻击。我们进行了广泛的实验，使用超过100万个生成的图像，针对数百个经过微调的模型，证明了所提出的后门攻击的可行性。我们阐明了这些偏见如何绕过传统的检测机制，突显了在运营约束条件下证明偏见存在的挑战。我们的成本分析证实了执行此类攻击的低财务壁垒，强调了需要针对文本到图像生成模型中的此类漏洞采取强有力的防御策略。

更新时间: 2024-06-21 14:53:19

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.15213v1

How Effective is GPT-4 Turbo in Generating School-Level Questions from Textbooks Based on Bloom's Revised Taxonomy?

We evaluate the effectiveness of GPT-4 Turbo in generating educational questions from NCERT textbooks in zero-shot mode. Our study highlights GPT-4 Turbo's ability to generate questions that require higher-order thinking skills, especially at the "understanding" level according to Bloom's Revised Taxonomy. While we find a notable consistency between questions generated by GPT-4 Turbo and those assessed by humans in terms of complexity, there are occasional differences. Our evaluation also uncovers variations in how humans and machines evaluate question quality, with a trend inversely related to Bloom's Revised Taxonomy levels. These findings suggest that while GPT-4 Turbo is a promising tool for educational question generation, its efficacy varies across different cognitive levels, indicating a need for further refinement to fully meet educational standards.

Updated: 2024-06-21 14:52:37

标题: GPT-4 Turbo在基于Bloom修订后的分类法从教科书中生成学校级问题中的效果如何？

摘要: 我们评估了GPT-4 Turbo在零次试验模式下从NCERT教科书中生成教育问题的有效性。我们的研究突出了GPT-4 Turbo生成需要高阶思维技能的问题的能力，特别是根据布鲁姆修订的分类法的“理解”水平。虽然我们发现GPT-4 Turbo生成的问题与人类评估的问题在复杂性方面存在显著一致性，但偶尔也会有差异。我们的评估还揭示了人类和机器评估问题质量的差异，这种趋势与布鲁姆修订的分类法水平呈反比。这些发现表明，虽然GPT-4 Turbo是一个有前途的教育问题生成工具，但其效力在不同认知水平上存在差异，这表明需要进一步完善以完全符合教育标准。

更新时间: 2024-06-21 14:52:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.15211v1

Assessing Effectiveness of Cyber Essentials Technical Controls

Cyber Essentials (CE) comprise a set of controls designed to protect organisations, irrespective of their size, against cyber attacks. The controls are firewalls, secure configuration, user access control, malware protection & security update management. In this work, we explore the extent to which CE remains robust against an ever-evolving threat landscape. To that end, we reconstruct 45 breaches mapped to MiTRE ATT&CK using an Incident Fault Tree ( IFT ) approach. Our method reveals the intersections where the placement of controls could have protected organisations. Then we identify appropriate Cyber Essential controls and/or Additional Controls for these vulnerable intersections. Our results show that CE controls can effectively protect against most attacks during the initial attack phase. However, they may need to be complemented with additional Controls if the attack proceeds further into organisational systems & networks. The Additional Controls (AC) we identify include back-ups, security awareness, logging and monitoring. Our analysis brings to the fore a foundational issue as to whether controls should exclude recovery and focus only on pre-emption. The latter makes the strong assumption that a prior identification of all controls in a dynamic threat landscape is indeed possible. Furthermore, any potential broadening of technical controls entails re-scoping the skills that are required for a Cyber Essentials (CE) assessor. To that end, we suggest human factors and security operations and incident management as two potential knowledge areas from Cyber Security Body of Knowledge (CyBOK) if there is any broadening of CE based on these findings.

Updated: 2024-06-21 14:52:36

标题: 评估网络基本技术控制的有效性

摘要: 网络安全基本要求（CE）包括一套旨在保护组织免受网络攻击的控制措施，无论其规模大小。这些控制措施包括防火墙、安全配置、用户访问控制、恶意软件防护和安全更新管理。在这项工作中，我们探讨CE在不断演变的威胁环境中的稳固程度。为此，我们使用故障树（IFT）方法重建了45起违规事件，并将其映射到MiTRE ATT&CK。我们的方法揭示了控制措施的布置可以保护组织的交叉点。然后我们确定了适当的网络安全基本控制和/或附加控制以应对这些脆弱的交叉点。我们的结果表明，CE控制在初始攻击阶段可以有效防御大多数攻击。然而，如果攻击进一步在组织系统和网络中进行，可能需要补充额外的控制措施。我们确定的附加控制措施（AC）包括备份、安全意识、日志记录和监控。我们的分析凸显了一个基础问题，即控制措施是否应排除恢复，而仅专注于预防。后者假设在动态威胁环境中事先识别所有控制措施的可能性。此外，任何可能扩大技术控制措施都需要重新定义网络安全基本要求（CE）评估员所需的技能范围。因此，根据这些发现，我们建议如果CE有任何扩展，可考虑人因素、安全运营和事件管理作为网络安全知识体系（CyBOK）中的两个潜在知识领域。

更新时间: 2024-06-21 14:52:36

领域: cs.CR

下载: http://arxiv.org/abs/2406.15210v1

A Low-Overhead Incorporation-Extrapolation based Few-Shot CSI Feedback Framework for Massive MIMO Systems

Accurate channel state information (CSI) is essential for downlink precoding in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems with orthogonal frequency-division multiplexing (OFDM). However, obtaining CSI through feedback from the user equipment (UE) becomes challenging with the increasing scale of antennas and subcarriers and leads to extremely high CSI feedback overhead. Deep learning-based methods have emerged for compressing CSI but these methods generally require substantial collected samples and thus pose practical challenges. Moreover, existing deep learning methods also suffer from dramatically growing feedback overhead owing to their focus on full-dimensional CSI feedback. To address these issues, we propose a low-overhead Incorporation-Extrapolation based Few-Shot CSI feedback Framework (IEFSF) for massive MIMO systems. An incorporation-extrapolation scheme for eigenvector-based CSI feedback is proposed to reduce the feedback overhead. Then, to alleviate the necessity of extensive collected samples and enable few-shot CSI feedback, we further propose a knowledge-driven data augmentation (KDDA) method and an artificial intelligence-generated content (AIGC) -based data augmentation method by exploiting the domain knowledge of wireless channels and by exploiting a novel generative model, respectively. Experimental results based on the DeepMIMO dataset demonstrate that the proposed IEFSF significantly reduces CSI feedback overhead by 64 times compared with existing methods while maintaining higher feedback accuracy using only several hundred collected samples.

Updated: 2024-06-21 14:51:24

标题: 基于低开销的融合-外推的少样本CSI反馈框架，用于大规模MIMO系统

摘要: 准确的信道状态信息（CSI）对于频分双工（FDD）大规模多输入多输出（MIMO）系统中使用正交频分复用（OFDM）的下行预编码至关重要。然而，随着天线和子载波规模的增加，通过从用户设备（UE）的反馈获取CSI变得具有挑战性，并导致极高的CSI反馈开销。基于深度学习的方法已经出现用于压缩CSI，但这些方法通常需要大量收集样本，因此存在实际挑战。此外，现有的深度学习方法也受到反馈开销急剧增长的困扰，因为它们专注于全维度CSI反馈。为了解决这些问题，我们提出了一种基于低开销的融合外推的少样本CSI反馈框架（IEFSF）用于大规模MIMO系统。提出了一种基于特征向量的CSI反馈的融合外推方案，以减少反馈开销。然后，为了减轻对广泛收集的样本的必要性并实现少样本CSI反馈，我们进一步提出了一种基于知识驱动的数据增强（KDDA）方法和一种基于人工智能生成内容（AIGC）的数据增强方法，分别利用无线信道的领域知识和一种新颖的生成模型。基于DeepMIMO数据集的实验结果表明，所提出的IEFSF与现有方法相比，将CSI反馈开销显著减少64倍，同时仅使用几百个收集样本就能保持更高的反馈准确性。

更新时间: 2024-06-21 14:51:24

领域: cs.IT,cs.AI,eess.SP,math.IT

下载: http://arxiv.org/abs/2312.04062v2

Explainable Online Unsupervised Anomaly Detection for Cyber-Physical Systems via Causal Discovery from Time Series

Online unsupervised detection of anomalies is crucial to guarantee the correct operation of cyber-physical systems and the safety of humans interacting with them. State-of-the-art approaches based on deep learning via neural networks achieve outstanding performance at anomaly recognition, evaluating the discrepancy between a normal model of the system (with no anomalies) and the real-time stream of sensor time series. However, large training data and time are typically required, and explainability is still a challenge to identify the root of the anomaly and implement predictive maintainance. In this paper, we use causal discovery to learn a normal causal graph of the system, and we evaluate the persistency of causal links during real-time acquisition of sensor data to promptly detect anomalies. On two benchmark anomaly detection datasets, we show that our method has higher training efficiency, outperforms the accuracy of state-of-the-art neural architectures and correctly identifies the sources of $>10$ different anomalies. The code for experimental replication is at http://tinyurl.com/case24causal.

Updated: 2024-06-21 14:48:45

标题: 通过时间序列的因果发现解释在线无监督异常检测在网络物理系统中的应用

摘要: 在线无监督检测异常对于保证网络物理系统的正确运行和与之交互的人类的安全至关重要。基于深度学习和神经网络的最新方法在异常识别方面表现出色，评估了系统的正常模型（没有异常）与实时传感器时间序列之间的差异。然而，通常需要大量的训练数据和时间，并且解释性仍然是识别异常根源并实施预测性维护的挑战。在本文中，我们使用因果发现来学习系统的正常因果图，并在实时传感器数据采集过程中评估因果链接的持久性，以及时检测异常。在两个基准异常检测数据集上，我们展示了我们的方法具有更高的训练效率，优于最先进的神经网络架构的准确性，并正确识别了超过10种不同异常的来源。实验复制的代码位于http://tinyurl.com/case24causal。

更新时间: 2024-06-21 14:48:45

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.09871v2

Landscape More Secure Than Portrait? Zooming Into the Directionality of Digital Images With Security Implications

The orientation in which a source image is captured can affect the resulting security in downstream applications. One reason for this is that many state-of-the-art methods in media security assume that image statistics are similar in the horizontal and vertical directions, allowing them to reduce the number of features (or trainable weights) by merging coefficients. We show that this artificial symmetrization tends to suppress important properties of natural images and common processing operations, causing a loss of performance. We also observe the opposite problem, where unaddressed directionality causes learning-based methods to overfit to a single orientation. These are vulnerable to manipulation if an adversary chooses inputs with the less common orientation. This paper takes a comprehensive approach, identifies and systematizes causes of directionality at several stages of a typical acquisition pipeline, measures their effect, and demonstrates for three selected security applications (steganalysis, forensic source identification, and the detection of synthetic images) how the performance of state-of-the-art methods can be improved by properly accounting for directionality.

Updated: 2024-06-21 14:48:25

标题: 风景比肖像更安全吗？聚焦具有安全影响的数字图像方向性

摘要: 捕捉源图像的方向会影响下游应用中的安全性。其中一个原因是许多媒体安全领域的最新方法假设图像统计在水平和垂直方向相似，从而能够通过合并系数来减少特征（或可训练权重）的数量。我们发现这种人为对称化往往会抑制自然图像和常见处理操作的重要特性，导致性能下降。我们还观察到相反的问题，即未解决的方向性会导致基于学习的方法过拟合于单一方向。如果对手选择具有较少常见方向的输入，则这些方法容易受到操纵。本文采用综合方法，识别和系统化典型采集管道的几个阶段中方向性的原因，衡量其影响，并且通过适当考虑方向性，展示了三个选定安全应用（隐写分析、法证源识别和合成图像检测）的最新方法性能如何得以提高。

更新时间: 2024-06-21 14:48:25

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.15206v1

A Survey on Intelligent Internet of Things: Applications, Security, Privacy, and Future Directions

The rapid advances in the Internet of Things (IoT) have promoted a revolution in communication technology and offered various customer services. Artificial intelligence (AI) techniques have been exploited to facilitate IoT operations and maximize their potential in modern application scenarios. In particular, the convergence of IoT and AI has led to a new networking paradigm called Intelligent IoT (IIoT), which has the potential to significantly transform businesses and industrial domains. This paper presents a comprehensive survey of IIoT by investigating its significant applications in mobile networks, as well as its associated security and privacy issues. Specifically, we explore and discuss the roles of IIoT in a wide range of key application domains, from smart healthcare and smart cities to smart transportation and smart industries. Through such extensive discussions, we investigate important security issues in IIoT networks, where network attacks, confidentiality, integrity, and intrusion are analyzed, along with a discussion of potential countermeasures. Privacy issues in IIoT networks were also surveyed and discussed, including data, location, and model privacy leakage. Finally, we outline several key challenges and highlight potential research directions in this important area.

Updated: 2024-06-21 14:43:41

标题: 一份关于智能物联网的调查：应用、安全、隐私和未来方向

摘要: 物联网（IoT）的快速发展推动了通信技术的革命，并提供了各种客户服务。人工智能（AI）技术已被利用来促进物联网操作，并最大限度地发挥其在现代应用场景中的潜力。特别是，物联网和人工智能的融合导致了一种称为智能物联网（IIoT）的新型网络范式，这种范式有可能显著改变企业和工业领域。本文通过对IIoT在移动网络中的重要应用及相关安全和隐私问题进行调查，全面调查了IIoT。具体地，我们探讨了IIoT在各种关键应用领域的角色，从智能医疗和智能城市到智能交通和智能工业。通过这些广泛的讨论，我们调查了IIoT网络中的重要安全问题，包括网络攻击、保密性、完整性和入侵，以及潜在对策的讨论。物联网网络中的隐私问题也进行了调查和讨论，包括数据、位置和模型隐私泄露。最后，我们概述了这一重要领域中的若干关键挑战，并突出了潜在的研究方向。

更新时间: 2024-06-21 14:43:41

领域: cs.NI,cs.AI,cs.CR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2406.03820v2

Incentivizing High-Quality Content in Online Recommender Systems

In content recommender systems such as TikTok and YouTube, the platform's recommendation algorithm shapes content producer incentives. Many platforms employ online learning, which generates intertemporal incentives, since content produced today affects recommendations of future content. We study the game between producers and analyze the content created at equilibrium. We show that standard online learning algorithms, such as Hedge and EXP3, unfortunately incentivize producers to create low-quality content, where producers' effort approaches zero in the long run for typical learning rate schedules. Motivated by this negative result, we design learning algorithms that incentivize producers to invest high effort and achieve high user welfare. At a conceptual level, our work illustrates the unintended impact that a platform's learning algorithm can have on content quality and introduces algorithmic approaches to mitigating these effects.

Updated: 2024-06-21 14:39:07

标题: 在线推荐系统中激励高质量内容

摘要: 在像TikTok和YouTube这样的内容推荐系统中，平台的推荐算法塑造了内容生产者的激励机制。许多平台采用在线学习，这产生了跨期激励，因为今天生产的内容会影响未来内容的推荐。我们研究了生产者之间的游戏，并分析了均衡时创建的内容。我们发现，标准的在线学习算法，如Hedge和EXP3，不幸地激励生产者创建低质量内容，其中生产者的努力在典型的学习速率计划下最终趋近于零。受到这一负面结果的启发，我们设计了激励生产者投入高努力并实现高用户福利的学习算法。在概念层面上，我们的工作说明了平台学习算法对内容质量可能产生的意外影响，并引入了算法方法来减轻这些影响。

更新时间: 2024-06-21 14:39:07

领域: cs.GT,cs.IR,cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.07479v3

Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms

Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental condition characterized by inattention, hyperactivity, and impulsivity, which can significantly impact an individual's daily functioning and quality of life. Occupational therapy plays a crucial role in managing ADHD by fostering the development of skills needed for daily living and enhancing an individual's ability to participate fully in school, home, and social situations. Recent studies highlight the potential of integrating Large Language Models (LLMs) like ChatGPT and Socially Assistive Robots (SAR) to improve psychological treatments. This integration aims to overcome existing limitations in mental health therapy by providing tailored support and adapting to the unique needs of this sensitive group. However, there remains a significant gap in research exploring the combined use of these advanced technologies in ADHD therapy, suggesting an opportunity for novel therapeutic approaches. Thus, we integrated two advanced language models, ChatGPT-4 Turbo and Claude-3 Opus, into a robotic assistant to explore how well each model performs in robot-assisted interactions. Additionally, we have compared their performance in a simulated therapy scenario to gauge their effectiveness against a clinically validated customized model. The results of this study show that ChatGPT-4 Turbo excelled in performance and responsiveness, making it suitable for time-sensitive applications. Claude-3 Opus, on the other hand, showed strengths in understanding, coherence, and ethical considerations, prioritizing safe and engaging interactions. Both models demonstrated innovation and adaptability, but ChatGPT-4 Turbo offered greater ease of integration and broader language support. The selection between them hinges on the specific demands of ADHD therapy.

Updated: 2024-06-21 14:38:25

标题: 探究机器人助手与ChatGPT和Claude在增强ADHD治疗中的功效：创新治疗范式

摘要: 注意力缺陷多动障碍（ADHD）是一种神经发育疾病，其特征是注意力缺陷、多动和冲动，这可能会显著影响个体的日常功能和生活质量。职业治疗在管理ADHD方面发挥着至关重要的作用，通过培养日常生活所需的技能和提高个体参与学校、家庭和社交场合的能力。最近的研究突出了整合ChatGPT和社交辅助机器人（SAR）等大型语言模型（LLMs）以改善心理治疗的潜力。此整合旨在通过提供量身定制的支持并适应这一敏感群体的独特需求，克服心理健康治疗中现有的局限性。然而，目前仍存在研究探索这些先进技术在ADHD治疗中综合使用的显著差距，这表明了一种新型治疗方法的机会。因此，我们将两种先进语言模型ChatGPT-4 Turbo和Claude-3 Opus整合到一个机器人助手中，以探索每个模型在机器人辅助交互中的表现。此外，我们还比较了它们在模拟治疗场景中的表现，以评估它们针对临床验证的定制模型的有效性。本研究结果显示，ChatGPT-4 Turbo在性能和响应性方面表现出色，适用于对时间敏感的应用。另一方面，Claude-3 Opus在理解、连贯性和伦理考虑方面表现出优势，优先考虑安全和引人入胜的互动。这两种模型都展示了创新和适应性，但ChatGPT-4 Turbo提供了更大的整合便利性和更广泛的语言支持。选择它们之间取决于ADHD治疗的具体需求。

更新时间: 2024-06-21 14:38:25

领域: cs.AI,cs.HC,cs.SE

下载: http://arxiv.org/abs/2406.15198v1

Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry

This article provides a methodology and open-source implementation of Reinforcement Learning algorithms for finding optimal routes in a packet-optical network scenario. The algorithm uses measurements provided by the physical layer (pre-FEC bit error rate and propagation delay) and the link layer (link load) to configure a set of latency-based rewards and penalties based on such measurements. Then, the algorithm executes Q-learning based on this set of rewards for finding the optimal routing strategies. It is further shown that the algorithm dynamically adapts to changing network conditions by re-calculating optimal policies upon either link load changes or link degradation as measured by pre-FEC BER.

Updated: 2024-06-21 14:35:08

标题: 基于强化学习的混合遥测分组光网络路由

摘要: 本文提供了一种方法论和开源实现的强化学习算法，用于在数据包光网络场景中寻找最佳路由。该算法利用物理层提供的测量数据（前FEC比特错误率和传播延迟）和链路层（链路负载），根据这些数据配置一组基于延迟的奖励和惩罚。然后，该算法基于这组奖励执行Q学习，以找到最佳的路由策略。进一步显示，该算法通过根据链路负载变化或根据前FEC BER测量的链路退化重新计算最佳策略，动态适应不断变化的网络条件。

更新时间: 2024-06-21 14:35:08

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2406.12602v2

Causal Learning in Biomedical Applications

We present a benchmark for methods in causal learning. Specifically, we consider training a rich class of causal models from time-series data, and we suggest the use of the Krebs cycle and models of metabolism more broadly.

Updated: 2024-06-21 14:31:45

标题: 医学应用中的因果学习

摘要: 我们提出了一个因果学习方法的基准。具体来说，我们考虑从时间序列数据中训练丰富的因果模型，建议使用克雷布斯循环和更广泛的新陈代谢模型。

更新时间: 2024-06-21 14:31:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.15189v1

UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis

The use of Retrieval-Augmented Generation (RAG) has improved Large Language Models (LLMs) in collaborating with external data, yet significant challenges exist in real-world scenarios. In areas such as academic literature and finance question answering, data are often found in raw text and tables in HTML or PDF formats, which can be lengthy and highly unstructured. In this paper, we introduce a benchmark suite, namely Unstructured Document Analysis (UDA), that involves 2,965 real-world documents and 29,590 expert-annotated Q&A pairs. We revisit popular LLM- and RAG-based solutions for document analysis and evaluate the design choices and answer qualities across multiple document domains and diverse query types. Our evaluation yields interesting findings and highlights the importance of data parsing and retrieval. We hope our benchmark can shed light and better serve real-world document analysis applications. The benchmark suite and code can be found at https://github.com/qinchuanhui/UDA-Benchmark.

Updated: 2024-06-21 14:29:39

标题: UDA：真实世界文档分析中的检索增强生成基准套件

摘要: 检索增强生成（RAG）的使用已经改进了大型语言模型（LLMs）在与外部数据合作方面的表现，但在现实场景中仍存在重大挑战。在学术文献和金融问答等领域，数据通常以HTML或PDF格式的原始文本和表格形式存在，这些数据可能非常冗长且高度非结构化。在本文中，我们介绍了一个基准套件，即非结构化文档分析（UDA），其中包含2,965个真实世界文档和29,590个专家注释的问答对。我们重新审视了用于文档分析的流行LLM和RAG解决方案，并评估了跨多个文档领域和不同查询类型的设计选择和答案质量。我们的评估得出了有趣的发现，并强调了数据解析和检索的重要性。我们希望我们的基准可以为现实世界的文档分析应用提供启示。基准套件和代码可以在https://github.com/qinchuanhui/UDA-Benchmark找到。

更新时间: 2024-06-21 14:29:39

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.15187v1

Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss

Accurately modeling idiomatic or non-compositional language has been a longstanding challenge in Natural Language Processing (NLP). This is partly because these expressions do not derive their meanings solely from their constituent words, but also due to the scarcity of relevant data resources, and their impact on the performance of downstream tasks such as machine translation and simplification. In this paper we propose an approach to model idiomaticity effectively using a triplet loss that incorporates the asymmetric contribution of components words to an idiomatic meaning for training language models by using adaptive contrastive learning and resampling miners to build an idiomatic-aware learning objective. Our proposed method is evaluated on a SemEval challenge and outperforms previous alternatives significantly in many metrics.

Updated: 2024-06-21 14:21:41

标题: 通过自适应对比三元损失增强多语言中的惯用表达

摘要: 准确建模成语或非组合性语言一直是自然语言处理（NLP）中长期存在的挑战。这部分是因为这些表达不仅仅从其构成词语中获得其含义，还因为相关数据资源的稀缺以及它们对机器翻译和简化等下游任务性能的影响。在本文中，我们提出了一种有效建模成语性的方法，该方法利用三元损失函数，通过使用自适应对比学习和重新抽样挖掘器来构建一个成语感知学习目标，有效地建模成语性。我们的提出的方法在SemEval挑战赛上进行了评估，在许多指标上显著优于先前的替代方案。

更新时间: 2024-06-21 14:21:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.15175v1

Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens

Large Language Models (LLMs) like ChatGPT or Bard have revolutionized information retrieval and captivated the audience with their ability to generate custom responses in record time, regardless of the topic. In this article, we assess the capabilities of various LLMs in producing reliable, comprehensive, and sufficiently relevant responses about historical facts in French. To achieve this, we constructed a testbed comprising numerous history-related questions of varying types, themes, and levels of difficulty. Our evaluation of responses from ten selected LLMs reveals numerous shortcomings in both substance and form. Beyond an overall insufficient accuracy rate, we highlight uneven treatment of the French language, as well as issues related to verbosity and inconsistency in the responses provided by LLMs.

Updated: 2024-06-21 14:19:57

标题: Evaluation of the response capabilities of large language models (LLM) for historians' questions

摘要: 大型语言模型（LLMs）如ChatGPT或Bard已经彻底改变了信息检索，并以其在记录时间内生成定制响应的能力而吸引了观众，无论主题如何。在本文中，我们评估了各种LLMs在法语历史事实方面产生可靠、全面和足够相关响应的能力。为了实现这一目标，我们构建了一个测试基准，包括各种类型、主题和难度级别的历史相关问题。我们对十个选定的LLM的响应进行评估，发现在内容和形式上存在许多缺陷。除了总体不足的准确率外，我们还强调了法语语言的不均匀处理，以及LLMs提供的响应中的冗长性和不一致性问题。

更新时间: 2024-06-21 14:19:57

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.15173v1

Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research

The applications of large language models (LLMs) are promising for biomedical and healthcare research. Despite the availability of open-source LLMs trained using a wide range of biomedical data, current research on the applications of LLMs to genomics and proteomics is still limited. To fill this gap, we propose a collection of finetuned LLMs and multimodal LLMs (MLLMs), known as Geneverse, for three novel tasks in genomic and proteomic research. The models in Geneverse are trained and evaluated based on domain-specific datasets, and we use advanced parameter-efficient finetuning techniques to achieve the model adaptation for tasks including the generation of descriptions for gene functions, protein function inference from its structure, and marker gene selection from spatial transcriptomic data. We demonstrate that adapted LLMs and MLLMs perform well for these tasks and may outperform closed-source large-scale models based on our evaluations focusing on both truthfulness and structural correctness. All of the training strategies and base models we used are freely accessible.

Updated: 2024-06-21 14:19:10

标题: 基因宇宙：用于基因组和蛋白质组研究的开源多模态大型语言模型集合

摘要: 大型语言模型（LLMs）在生物医学和医疗保健研究中的应用前景广阔。尽管有许多使用各种生物医学数据训练的开源LLMs可用，但目前关于LLMs在基因组学和蛋白质组学应用方面的研究仍然有限。为填补这一空白，我们提出了一组微调的LLMs和多模式LLMs（MLLMs），称为Geneverse，用于基因组学和蛋白质组学研究中的三项新任务。Geneverse中的模型基于领域特定数据集进行训练和评估，我们使用先进的参数高效微调技术来实现模型适应性，包括生成基因功能描述、从蛋白质结构推断功能以及从空间转录组数据中选择标记基因等任务。我们展示了适应的LLMs和MLLMs在这些任务中表现良好，并且在着重考虑真实性和结构正确性的评估中可能胜过基于闭源大规模模型。我们所使用的所有训练策略和基础模型都是免费可访问的。

更新时间: 2024-06-21 14:19:10

领域: cs.LG,cs.AI,cs.CL,q-bio.QM

下载: http://arxiv.org/abs/2406.15534v1

Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras

The performance of image-based Reinforcement Learning (RL) agents can vary depending on the position of the camera used to capture the images. Training on multiple cameras simultaneously, including a first-person egocentric camera, can leverage information from different camera perspectives to improve the performance of RL. However, hardware constraints may limit the availability of multiple cameras in real-world deployment. Additionally, cameras may become damaged in the real-world preventing access to all cameras that were used during training. To overcome these hardware constraints, we propose Multi-View Disentanglement (MVD), which uses multiple cameras to learn a policy that is robust to a reduction in the number of cameras to generalise to any single camera from the training set. Our approach is a self-supervised auxiliary task for RL that learns a disentangled representation from multiple cameras, with a shared representation that is aligned across all cameras to allow generalisation to a single camera, and a private representation that is camera-specific. We show experimentally that an RL agent trained on a single third-person camera is unable to learn an optimal policy in many control tasks; but, our approach, benefiting from multiple cameras during training, is able to solve the task using only the same single third-person camera.

Updated: 2024-06-21 14:12:54

标题: 多视图分离的强化学习与多摄像头

摘要: 基于图像的强化学习（RL）代理的性能可能会因拍摄图像的摄像机位置而变化。同时训练多个摄像头，包括第一人称主观摄像头，可以利用不同摄像机视角的信息来提高RL的性能。然而，硬件限制可能会限制在真实世界部署中多个摄像头的可用性。此外，摄像头在现实世界中可能会损坏，从而无法访问在训练期间使用的所有摄像头。为了克服这些硬件限制，我们提出了多视角解缠（MVD），利用多个摄像头学习一种对减少摄像头数量具有鲁棒性的策略，以便将泛化到训练集中的任何单个摄像头。我们的方法是RL的自监督辅助任务，它从多个摄像头中学习出解缠表示，其中共享表示在所有摄像头上对齐，以允许泛化到单个摄像头，私有表示是特定于摄像头的。我们通过实验证明，训练在单个第三人称摄像头上的RL代理无法在许多控制任务中学习最佳策略；但是，我们的方法在训练期间受益于多个摄像头，可以仅使用相同单个第三人称摄像头解决任务。

更新时间: 2024-06-21 14:12:54

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.14064v2

This actually looks like that: Proto-BagNets for local and global interpretability-by-design

Interpretability is a key requirement for the use of machine learning models in high-stakes applications, including medical diagnosis. Explaining black-box models mostly relies on post-hoc methods that do not faithfully reflect the model's behavior. As a remedy, prototype-based networks have been proposed, but their interpretability is limited as they have been shown to provide coarse, unreliable, and imprecise explanations.In this work, we introduce Proto-BagNets, an interpretable-by-design prototype-based model that combines the advantages of bag-of-local feature models and prototype learning to provide meaningful, coherent, and relevant prototypical parts needed for accurate and interpretable image classification tasks. We evaluated the Proto-BagNet for drusen detection on publicly available retinal OCT data. The Proto-BagNet performed comparably to the state-of-the-art interpretable and non-interpretable models while providing faithful, accurate, and clinically meaningful local and global explanations. The code is available at https://github.com/kdjoumessi/Proto-BagNets.

Updated: 2024-06-21 14:12:15

标题: 这实际上看起来像这样：设计局部和全局可解释性的原型BagNets

摘要: 可解释性是在高风险应用中使用机器学习模型的关键要求，包括医学诊断。解释黑匣子模型主要依赖于事后方法，这些方法不能忠实地反映模型的行为。为此，提出了基于原型的网络，但它们的可解释性有限，因为已经证明它们提供的解释粗糙、不可靠且不精确。在这项工作中，我们介绍了Proto-BagNets，一种通过设计可解释的基于原型的模型，它结合了局部特征模型和原型学习的优势，提供了用于准确和可解释的图像分类任务所需的有意义、连贯和相关的原型部分。我们在公开可用的视网膜OCT数据上评估了Proto-BagNet在脓疱检测方面的性能。Proto-BagNet在提供忠实、准确和临床相关的局部和全局解释的同时，表现出与最先进的可解释和不可解释模型相当。代码可在https://github.com/kdjoumessi/Proto-BagNets 上找到。

更新时间: 2024-06-21 14:12:15

领域: cs.AI

下载: http://arxiv.org/abs/2406.15168v1

ApiQ: Finetuning of 2-Bit Quantized Large Language Model

Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across diverse bit-width quantizations and multifaceted tasks. This inconsistency largely stems from the detrimental impact of the quantization process on preserved knowledge, leading to catastrophic forgetting and undermining the utilization of pretrained models for finetuning purposes. In this work, we introduce a novel quantization framework, ApiQ, designed to restore the lost information from quantization by concurrently initializing the LoRA components and quantizing the weights of LLMs. This approach ensures the maintenance of the original LLM's activation precision while mitigating the error propagation from shallower into deeper layers. Through comprehensive evaluations conducted on a spectrum of language tasks with various LLMs, ApiQ demonstrably minimizes activation error during quantization. Consequently, it consistently achieves superior finetuning results across various bit-widths.

Updated: 2024-06-21 14:03:48

标题: ApiQ：对2比特量化的大型语言模型进行微调

摘要: 最近，由于大型语言模型（LLMs）的增大，内存高效微调引起了极大关注，主要是由于GPU内存限制以及与完整微调相比这些方法的有效性。尽管取得了进展，但目前的内存高效微调策略，如QLoRA，在不同位宽量化和多方面任务中表现出不一致的性能。这种不一致主要源自量化过程对保存知识的有害影响，导致灾难性遗忘并削弱了预训练模型用于微调目的的利用。在这项工作中，我们引入了一个新颖的量化框架ApiQ，旨在通过同时初始化LoRA组件并量化LLMs的权重来恢复量化中丢失的信息。这种方法确保了原始LLM的激活精度的保持，同时减轻了错误从浅层传播到深层的过程。通过对各种LLMs进行的一系列语言任务的全面评估，ApiQ显著减少了量化过程中的激活错误。因此，它在各种位宽上始终实现了优越的微调结果。

更新时间: 2024-06-21 14:03:48

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.05147v3

Perks and Pitfalls of Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs

As Graph Neural Networks (GNNs) become more pervasive, it becomes paramount to build robust tools for computing explanations of their predictions. A key desideratum is that these explanations are faithful, i.e., that they portray an accurate picture of the GNN's reasoning process. A number of different faithfulness metrics exist, begging the question of what faithfulness is exactly, and what its properties are. We begin by showing that existing metrics are not interchangeable -- i.e., explanations attaining high faithfulness according to one metric may be unfaithful according to others -- and can be systematically insensitive to important properties of the explanation, and suggest how to address these issues. We proceed to show that, surprisingly, optimizing for faithfulness is not always a sensible design goal. Specifically, we show that for injective regular GNN architectures, perfectly faithful explanations are completely uninformative. The situation is different for modular GNNs, such as self-explainable and domain-invariant architectures, where optimizing faithfulness does not compromise informativeness, and is also unexpectedly tied to out-of-distribution generalization.

Updated: 2024-06-21 14:01:23

标题: 忠实性在常规、自解释和领域不变的GNN中的优点和缺点

摘要: 随着图神经网络（GNNs）变得更加普遍，构建计算其预测解释的稳健工具变得至关重要。一个关键的要求是这些解释是忠实的，即它们描绘了GNN推理过程的准确图像。存在许多不同的忠实度指标，这引发了一个问题，即忠实度究竟是什么，以及它的特性是什么。我们首先展示现有的指标并不可互换 - 即，根据一个指标获得高忠实度的解释可能在其他指标下是不忠实的，并且可能对解释的重要特性系统地不敏感，建议如何解决这些问题。我们继续展示，令人惊讶的是，优化忠实度并不总是一个明智的设计目标。具体地，我们展示对于可逆正规GNN架构，完全忠实的解释是完全无信息的。对于模块化的GNNs，如可自解释和域不变的架构，情况有所不同，优化忠实度不会损害信息量，并且出乎意料地与超出分布的泛化能力相关联。

更新时间: 2024-06-21 14:01:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.15156v1

Are LLMs Naturally Good at Synthetic Tabular Data Generation?

Large language models (LLMs) have demonstrated their prowess in generating synthetic text and images; however, their potential for generating tabular data -- arguably the most common data type in business and scientific applications -- is largely underexplored. This paper demonstrates that LLMs, used as-is, or after traditional fine-tuning, are severely inadequate as synthetic table generators. Due to the autoregressive nature of LLMs, fine-tuning with random order permutation runs counter to the importance of modeling functional dependencies, and renders LLMs unable to model conditional mixtures of distributions (key to capturing real world constraints). We showcase how LLMs can be made to overcome some of these deficiencies by making them permutation-aware.

Updated: 2024-06-21 14:00:02

标题: LLM是否天生擅长合成表格数据生成？

摘要: 大型语言模型（LLMs）已经展示出它们在生成合成文本和图像方面的实力；然而，它们在生成表格数据方面的潜力--可以说是商业和科学应用中最常见的数据类型--在很大程度上尚未被充分探索。本文证明，LLMs，无论是原样使用还是经过传统微调后，作为合成表格生成器是严重不足的。由于LLMs的自回归性质，随机顺序排列的微调与建模功能依赖性的重要性相悖，并使LLMs无法对条件混合分布进行建模（捕捉真实世界约束的关键）。我们展示了如何使LLMs能够克服一些这些不足之处，通过使它们意识到排列。

更新时间: 2024-06-21 14:00:02

领域: cs.LG

下载: http://arxiv.org/abs/2406.14541v2

Generative Topological Networks

Generative models have seen significant advancements in recent years, yet often remain challenging and costly to train and use. We introduce Generative Topological Networks (GTNs) -- a new class of generative models that addresses these shortcomings. GTNs are trained deterministically using a simple supervised learning approach grounded in topology theory. GTNs are fast to train, and require only a single forward pass in a standard feedforward neural network to generate samples. We demonstrate the strengths of GTNs in several datasets, including MNIST, celebA and the Hands and Palm Images dataset. Finally, the theory behind GTNs offers insights into how to train generative models for improved performance.

Updated: 2024-06-21 13:55:34

标题: 生成式拓扑网络

摘要: 生成模型在近年来取得了显著进展，但通常仍然具有挑战性且成本高昂。我们引入了生成拓扑网络（GTNs）-一种新型的生成模型，解决了这些缺点。GTNs通过一种简单的基于拓扑理论的监督学习方法进行确定性训练。GTNs训练速度快，仅需要在标准前馈神经网络中进行一次前向传播来生成样本。我们在几个数据集中展示了GTNs的优势，包括MNIST、celebA和Hands and Palm Images数据集。最后，GTNs背后的理论提供了关于如何训练生成模型以提高性能的见解。

更新时间: 2024-06-21 13:55:34

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.15152v1

Speech foundation models in healthcare: Effect of layer selection on pathological speech feature prediction

Accurately extracting clinical information from speech is critical to the diagnosis and treatment of many neurological conditions. As such, there is interest in leveraging AI for automatic, objective assessments of clinical speech to facilitate diagnosis and treatment of speech disorders. We explore transfer learning using foundation models, focusing on the impact of layer selection for the downstream task of predicting pathological speech features. We find that selecting an optimal layer can greatly improve performance (~15.8% increase in balanced accuracy per feature as compared to worst layer, ~13.6% increase as compared to final layer), though the best layer varies by predicted feature and does not always generalize well to unseen data. A learned weighted sum offers comparable performance to the average best layer in-distribution (only ~1.2% lower) and had strong generalization for out-of-distribution data (only 1.5% lower than the average best layer).

Updated: 2024-06-21 13:49:56

标题: 在医疗保健中的语音基础模型：层选择对病理性语音特征预测的影响

摘要: 从语音中准确提取临床信息对于许多神经系统疾病的诊断和治疗至关重要。因此，人们对利用人工智能进行自动、客观的临床语音评估以促进言语障碍的诊断和治疗表示兴趣。我们探讨了使用基础模型的迁移学习，重点关注了层选择对预测病理性语音特征的下游任务的影响。我们发现，选择最佳层可以极大地提高性能（与最差层相比，每个特征的平衡准确性增加约15.8％，与最终层相比增加约13.6％），尽管最佳层因预测特征而异，并且并不总是很好地推广到未见数据。学习的加权总和在分布中表现出与平均最佳层相当的性能（仅比平均最佳层低约1.2％），并且对于分布外数据有很强的泛化能力（仅比平均最佳层低1.5％）。

更新时间: 2024-06-21 13:49:56

领域: eess.AS,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.01796v2

Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

Simulators are powerful tools for autonomous robot learning as they offer scalable data generation, flexible design, and optimization of trajectories. However, transferring behavior learned from simulation data into the real world proves to be difficult, usually mitigated with compute-heavy domain randomization methods or further model fine-tuning. We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks. To this end, we first build a simulator by integrating Gaussian Splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks. In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, crafty programming of expert demonstration training data, and the task understanding capabilities of Liquid networks. Through a series of quantitative flight tests, we demonstrate the robust transfer of navigation skills learned in a single simulation scene directly to the real world. We further show the ability to maintain performance beyond the training environment under drastic distribution and physical environment changes. Our learned Liquid policies, trained on single target manoeuvres curated from a photorealistic simulated indoor flight only, generalize to multi-step hikes onboard a real hardware platform outdoors.

Updated: 2024-06-21 13:48:37

标题: 高斯点云在液体网络中实现现实世界飞行导航传输

摘要: 模拟器是自主机器人学习的强大工具，因为它们提供可扩展的数据生成、灵活的设计和轨迹优化。然而，将从模拟数据中学到的行为转移到现实世界通常是困难的，通常需要使用计算密集型的领域随机化方法或进一步的模型微调来缓解。我们提出了一种方法，以改善在从模拟到真实视觉四旋翼导航任务中的分布转移中的泛化性和鲁棒性。为此，我们首先通过将高斯喷涂与四旋翼飞行动力学相结合来构建一个模拟器，然后使用液体神经网络来训练稳健的导航策略。通过这种方式，我们获得了一个完整的模仿学习协议，结合了3D高斯喷涂辐射场渲染、专家演示训练数据的巧妙编程以及液体网络的任务理解能力的进步。通过一系列定量飞行测试，我们展示了在单一模拟场景中学习的导航技能直接转移到现实世界的鲁棒性转移。我们进一步展示了在剧烈分布和物理环境变化下保持性能的能力。我们在仅从逼真的室内模拟飞行中策划的单一目标机动训练数据上训练的液体策略，可以推广到户外真实硬件平台上的多步行程。

更新时间: 2024-06-21 13:48:37

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.15149v1

Branches: A Fast Dynamic Programming and Branch & Bound Algorithm for Optimal Decision Trees

Decision Tree Learning is a fundamental problem for Interpretable Machine Learning, yet it poses a formidable optimization challenge. Despite numerous efforts dating back to the early 1990's, practical algorithms have only recently emerged, primarily leveraging Dynamic Programming (DP) and Branch & Bound (B&B) techniques. These breakthroughs led to the development of two distinct approaches. Algorithms like DL8.5 and MurTree operate on the space of nodes (or branches), they are very fast, but do not penalise complex Decision Trees, i.e. they do not solve for sparsity. On the other hand, algorithms like OSDT and GOSDT operate on the space of Decision Trees, they solve for sparsity but at the detriment of speed. In this work, we introduce Branches, a novel algorithm that integrates the strengths of both paradigms. Leveraging DP and B&B, Branches achieves exceptional speed while also solving for sparsity. Central to its efficiency is a novel analytical bound enabling substantial pruning of the search space. Furthermore, Branches does not necessitate binary features. Theoretical analysis demonstrates that Branches has a lower complexity bound compared to state-of-the-art methods, a claim validated through extensive empirical evaluation. Our results illustrate that Branches outperforms the state of the art in terms of speed and number of iterations while consistently yielding optimal Decision Trees.

Updated: 2024-06-21 13:45:51

标题: 分支：一种用于最优决策树的快速动态规划和分支定界算法

摘要: 决策树学习是可解释机器学习的一个基本问题，然而它提出了一个艰巨的优化挑战。尽管自上世纪90年代初以来进行了许多努力，但实用算法直到最近才出现，主要利用动态规划（DP）和分支与界限（B&B）技术。这些突破导致了两种不同的方法的发展。像DL8.5和MurTree这样的算法在节点（或分支）空间上运行，它们非常快，但不惩罚复杂的决策树，即它们不解决稀疏性问题。另一方面，像OSDT和GOSDT这样的算法在决策树空间上运行，它们解决稀疏性问题，但以速度为代价。在这项工作中，我们介绍了Branches，这是一种集成了两种范例优势的新算法。利用DP和B&B，Branches实现了异常的速度，同时也解决了稀疏性问题。其效率的关键在于一种新颖的分析边界，可以大幅修剪搜索空间。此外，Branches不需要二进制特征。理论分析表明，与最先进的方法相比，Branches具有更低的复杂性边界，这一主张通过广泛的经验评估得到验证。我们的结果表明，Branches在速度和迭代次数方面优于最先进技术，同时始终产生最优的决策树。

更新时间: 2024-06-21 13:45:51

领域: cs.LG

下载: http://arxiv.org/abs/2406.02175v2

Uncertainty-Aware Probabilistic Graph Neural Networks for Road-Level Traffic Accident Prediction

Traffic accidents present substantial challenges to human safety and socioeconomic development in urban areas. Developing a reliable and responsible traffic accident prediction model is crucial to addressing growing public safety concerns and enhancing the safety of urban mobility systems. Traditional methods face limitations at fine spatiotemporal scales due to the sporadic nature of highrisk accidents and the predominance of nonaccident characteristics. Furthermore, while most current models show promising occurrence prediction, they overlook the uncertainties arising from the inherent nature of accidents, and then fail to adequately map the hierarchical ranking of accident risk values for more precise insights. To address these issues, we introduce the Spatiotemporal ZeroInflated Tweedie Graph Neural Network ,STZITDGNN, the first uncertainty-aware probabilistic graph deep learning model in roadlevel traffic accident prediction for multi-steps. This model integrates the interpretability of the statistical Tweedie family model and the expressive power of graph neural networks. Its decoder innovatively employs a compound Tweedie model, a Poisson distribution to model the frequency of accident occurrences and a Gamma distribution to assess injury severity, supplemented by a zeroinflated component to effectively identify exessive non-incident instances. Empirical tests using realworld traffic data from London, UK, demonstrate that the STZITDGNN surpasses other baseline models across multiple benchmarks and metrics, including accident risk value prediction, uncertainty minimisation, nonaccident road identification and accident occurrence accuracy. Our study demonstrates that STZTIDGNN can effectively inform targeted road monitoring, thereby improving urban road safety strategies.

Updated: 2024-06-21 13:45:44

标题: 不确定性感知的概率图神经网络用于道路级交通事故预测

摘要: Traffic accidents are a significant challenge for human safety and urban development. Developing a reliable prediction model is essential for addressing public safety concerns and improving urban mobility systems. Traditional methods have limitations in predicting high-risk accidents due to their sporadic nature and the prevalence of non-accident characteristics. Current models may overlook uncertainties and fail to accurately rank accident risks. To address these issues, we introduce the Spatiotemporal Zero-Inflated Tweedie Graph Neural Network (STZITDGNN), the first probabilistic graph deep learning model for road-level traffic accident prediction. This model combines the interpretability of the Tweedie family model with the power of graph neural networks. It uses a compound Tweedie model with Poisson and Gamma distributions to predict accident frequency and severity, along with a zero-inflated component to identify non-incident instances. Empirical tests with real-world data from London show that the STZITDGNN outperforms baseline models in various benchmarks and metrics, such as accident risk prediction, uncertainty reduction, non-accident road identification, and accident occurrence accuracy. Our study demonstrates that the STZITDGNN can enhance urban road safety strategies through targeted road monitoring.

更新时间: 2024-06-21 13:45:44

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2309.05072v2

Chain-of-Thought Unfaithfulness as Disguised Accuracy

Understanding the extent to which Chain-of-Thought (CoT) generations align with a large language model's (LLM) internal computations is critical for deciding whether to trust an LLM's output. As a proxy for CoT faithfulness, Lanham et al. (2023) propose a metric that measures a model's dependence on its CoT for producing an answer. Within a single family of proprietary models, they find that LLMs exhibit a scaling-then-inverse-scaling relationship between model size and their measure of faithfulness, and that a 13 billion parameter model exhibits increased faithfulness compared to models ranging from 810 million to 175 billion parameters in size. We evaluate whether these results generalize as a property of all LLMs. We replicate the experimental setup in their section focused on scaling experiments with three different families of models and, under specific conditions, successfully reproduce the scaling trends for CoT faithfulness they report. However, after normalizing the metric to account for a model's bias toward certain answer choices, unfaithfulness drops significantly for smaller less-capable models. This normalized faithfulness metric is also strongly correlated ($R^2$=0.74) with accuracy, raising doubts about its validity for evaluating faithfulness.

Updated: 2024-06-21 13:39:14

标题: 思维链的不忠诚之处隐含着准确性

摘要: 理解链式思维（CoT）生成与大型语言模型（LLM）内部计算的一致程度对于决定是否信任LLM的输出至关重要。作为CoT忠诚度的代理，Lanham等人（2023年）提出了一种衡量模型依赖其CoT产生答案的度量标准。在一个专有模型系列中，他们发现LLMs展示了模型大小和其忠诚度测量之间的缩放-反向缩放关系，并且一个130亿参数模型相比于大小范围在8.1亿到1750亿参数的模型表现出了增加的忠诚度。我们评估这些结果是否作为所有LLMs的一个属性而泛化。我们复制了他们关注缩放实验的部分中针对三个不同系列模型的实验设置，并在特定条件下成功地重现了他们报告的CoT忠诚度的缩放趋势。然而，在对度量标准进行归一化以考虑模型对某些答案选择的偏好后，对于较小的能力较低的模型，不忠诚度显著下降。这种归一化的忠诚度度量也与准确性强相关（$R^2$=0.74），对其用于评估忠诚度的有效性产生了疑问。

更新时间: 2024-06-21 13:39:14

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.14897v3

KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

Probabilistic State Space Models (SSMs) are essential for Reinforcement Learning (RL) from high-dimensional, partial information as they provide concise representations for control. Yet, they lack the computational efficiency of their recent deterministic counterparts such as S4 or Mamba. We propose KalMamba, an efficient architecture to learn representations for RL that combines the strengths of probabilistic SSMs with the scalability of deterministic SSMs. KalMamba leverages Mamba to learn the dynamics parameters of a linear Gaussian SSM in a latent space. Inference in this latent space amounts to standard Kalman filtering and smoothing. We realize these operations using parallel associative scanning, similar to Mamba, to obtain a principled, highly efficient, and scalable probabilistic SSM. Our experiments show that KalMamba competes with state-of-the-art SSM approaches in RL while significantly improving computational efficiency, especially on longer interaction sequences.

Updated: 2024-06-21 13:27:36

标题: KalMamba：面向不确定性下高效概率状态空间模型的RL

摘要: 概率状态空间模型（SSMs）对于从高维度、部分信息中进行强化学习（RL）至关重要，因为它们为控制提供了简洁的表示。然而，它们缺乏最近的确定性对应物（如S4或Mamba）的计算效率。我们提出了KalMamba，这是一种有效的架构，用于学习强化学习的表示，结合了概率SSMs的优势和确定性SSMs的可扩展性。KalMamba利用Mamba在潜在空间中学习线性高斯SSM的动态参数。在这个潜在空间中的推断等同于标准的卡尔曼滤波和平滑。我们使用并行联想扫描来实现这些操作，类似于Mamba，以获得一种基于原则、高效且可扩展的概率SSM。我们的实验表明，KalMamba在RL中与最先进的SSM方法竞争，同时在特别是在更长的交互序列上显着提高了计算效率。

更新时间: 2024-06-21 13:27:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.15131v1

Assessing Good, Bad and Ugly Arguments Generated by ChatGPT: a New Dataset, its Methodology and Associated Tasks

The recent success of Large Language Models (LLMs) has sparked concerns about their potential to spread misinformation. As a result, there is a pressing need for tools to identify ``fake arguments'' generated by such models. To create these tools, examples of texts generated by LLMs are needed. This paper introduces a methodology to obtain good, bad and ugly arguments from argumentative essays produced by ChatGPT, OpenAI's LLM. We then describe a novel dataset containing a set of diverse arguments, ArGPT. We assess the effectiveness of our dataset and establish baselines for several argumentation-related tasks. Finally, we show that the artificially generated data relates well to human argumentation and thus is useful as a tool to train and test systems for the defined tasks.

Updated: 2024-06-21 13:27:10

标题: 评估由ChatGPT生成的好坏和丑陋论点：一个新数据集、其方法论和相关任务

摘要: 最近大型语言模型（LLMs）的成功引发了对它们传播错误信息潜力的担忧。因此，迫切需要工具来识别由这些模型生成的“假论点”。为了创建这些工具，需要示例文本由LLMs生成。本文介绍了一种方法论，以获取由ChatGPT生成的辩论性论文中的好、坏和丑陋的论点。然后我们描述了一个包含一组多样化论点的新颖数据集ArGPT。我们评估了我们的数据集的有效性，并为几个与辩论相关的任务建立了基线。最后，我们展示了人工生成的数据与人类辩论之间的关联，并因此可用作训练和测试为定义任务的系统的工具。

更新时间: 2024-06-21 13:27:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.15130v1

A Wavelet Guided Attention Module for Skin Cancer Classification with Gradient-based Feature Fusion

Skin cancer is a highly dangerous type of cancer that requires an accurate diagnosis from experienced physicians. To help physicians diagnose skin cancer more efficiently, a computer-aided diagnosis (CAD) system can be very helpful. In this paper, we propose a novel model, which uses a novel attention mechanism to pinpoint the differences in features across the spatial dimensions and symmetry of the lesion, thereby focusing on the dissimilarities of various classes based on symmetry, uniformity in texture and color, etc. Additionally, to take into account the variations in the boundaries of the lesions for different classes, we employ a gradient-based fusion of wavelet and soft attention-aided features to extract boundary information of skin lesions. We have tested our model on the multi-class and highly class-imbalanced dataset, called HAM10000, and achieved promising results, with a 91.17\% F1-score and 90.75\% accuracy. The code is made available at: https://github.com/AyushRoy2001/WAGF-Fusion.

Updated: 2024-06-21 13:21:44

标题: 基于梯度特征融合的小波引导注意力模块用于皮肤癌分类

摘要: 皮肤癌是一种非常危险的癌症类型，需要经验丰富的医生进行准确诊断。为了帮助医生更有效地诊断皮肤癌，计算机辅助诊断（CAD）系统可以非常有帮助。在本文中，我们提出了一种新颖的模型，该模型使用了一种新颖的注意机制来准确指出在空间维度和病变的对称性上的特征差异，从而侧重于根据对称性、纹理和颜色的均匀性等各种类别的差异。此外，为了考虑不同类别病变边界的变化，我们采用了基于梯度的小波和软注意辅助特征的融合来提取皮肤病变的边界信息。我们在名为HAM10000的多类别和高度类别不平衡的数据集上测试了我们的模型，并取得了令人满意的结果，F1分数达到91.17％，准确率为90.75％。该代码可在以下链接找到：https://github.com/AyushRoy2001/WAGF-Fusion。

更新时间: 2024-06-21 13:21:44

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.15128v1

Embracing Federated Learning: Enabling Weak Client Participation via Partial Model Training

In Federated Learning (FL), clients may have weak devices that cannot train the full model or even hold it in their memory space. To implement large-scale FL applications, thus, it is crucial to develop a distributed learning method that enables the participation of such weak clients. We propose EmbracingFL, a general FL framework that allows all available clients to join the distributed training regardless of their system resource capacity. The framework is built upon a novel form of partial model training method in which each client trains as many consecutive output-side layers as its system resources allow. Our study demonstrates that EmbracingFL encourages each layer to have similar data representations across clients, improving FL efficiency. The proposed partial model training method guarantees convergence to a neighbor of stationary points for non-convex and smooth problems. We evaluate the efficacy of EmbracingFL under a variety of settings with a mixed number of strong, moderate (~40% memory), and weak (~15% memory) clients, datasets (CIFAR-10, FEMNIST, and IMDB), and models (ResNet20, CNN, and LSTM). Our empirical study shows that EmbracingFL consistently achieves high accuracy as like all clients are strong, outperforming the state-of-the-art width reduction methods (i.e. HeteroFL and FjORD).

Updated: 2024-06-21 13:19:29

标题: 拥抱联邦学习：通过部分模型训练实现弱客户端参与

摘要: 在联邦学习（FL）中，客户端可能拥有性能较弱的设备，无法训练完整模型甚至无法将其保存在内存空间中。因此，为了实现大规模FL应用，开发一种分布式学习方法至关重要，使得这些性能较弱的客户端能够参与其中。我们提出了EmbracingFL，这是一个通用的FL框架，允许所有可用的客户端参与分布式训练，而不受其系统资源容量的限制。该框架建立在一种新颖的部分模型训练方法之上，其中每个客户端训练尽可能多的连续输出端层，以其系统资源允许。我们的研究表明，EmbracingFL鼓励每个层在不同客户端之间具有相似的数据表示，提高了FL的效率。提出的部分模型训练方法确保了对于非凸和平滑问题的收敛到稳定点的邻域。我们在各种设置下评估了EmbracingFL的有效性，包括混合数量的强（Strong）、中等（约40%内存）和弱（约15%内存）客户端、数据集（CIFAR-10、FEMNIST和IMDB）以及模型（ResNet20、CNN和LSTM）。我们的实证研究表明，EmbracingFL始终以高准确率实现，就像所有客户端都很强一样，表现优于最先进的宽度减少方法（即HeteroFL和FjORD）。

更新时间: 2024-06-21 13:19:29

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.15125v1

A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning

Hierarchical Reinforcement Learning (HRL) approaches have shown successful results in solving a large variety of complex, structured, long-horizon problems. Nevertheless, a full theoretical understanding of this empirical evidence is currently missing. In the context of the \emph{option} framework, prior research has devised efficient algorithms for scenarios where options are fixed, and the high-level policy selecting among options only has to be learned. However, the fully realistic scenario in which both the high-level and the low-level policies are learned is surprisingly disregarded from a theoretical perspective. This work makes a step towards the understanding of this latter scenario. Focusing on the finite-horizon problem, we present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy. The bounds derived are compared with the lower bound for non-hierarchical finite-horizon problems, allowing to characterize when a hierarchical approach is provably preferable, even without pre-trained options.

Updated: 2024-06-21 13:17:33

标题: 一个可证明高效的基于选项的算法，适用于高层和低层学习

摘要: 层次强化学习（HRL）方法已经在解决各种复杂、结构化和长期问题方面取得了成功的结果。然而，对这一经验证据的完整理论理解目前尚未形成。在“选项”框架的背景下，先前的研究已经设计出了有效的算法，用于固定选项的情况，高层策略仅需学习选择选项。然而，在高级和低级策略都需要学习的完全现实情境在理论上却被忽视。本文向着理解后一种情况迈出了一步。针对有限时间问题，我们提出了一种元算法，交替使用在不同（高和低）时间抽象水平上实例化的遗憾最小化算法。在更高层次上，我们将问题视为半马尔可夫决策过程（SMDP），其中低级策略固定，而在较低水平上，内部选项策略则与固定高级策略一起学习。推导出的界限与非层次有限时间问题的下界进行了比较，这样可以确定何时层次方法在没有经过预训练选项的情况下可以被证明为更优。

更新时间: 2024-06-21 13:17:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.15124v1

Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm well matched to a real-world RL deployment process. In this scenario, we aim to find the best-performing policy within a limited budget of online interactions. Previous work in the OtO setting has focused on correcting for bias introduced by the policy-constraint mechanisms of offline RL algorithms. Such constraints keep the learned policy close to the behavior policy that collected the dataset, but we show this can unnecessarily limit policy performance if the behavior policy is far from optimal. Instead, we forgo constraints and frame OtO RL as an exploration problem that aims to maximize the benefit of online data-collection. We first study the major online RL exploration methods based on intrinsic rewards and UCB in the OtO setting, showing that intrinsic rewards add training instability through reward-function modification, and UCB methods are myopic and it is unclear which learned-component's ensemble to use for action selection. We then introduce an algorithm for planning to go out-of-distribution (PTGOOD) that avoids these issues. PTGOOD uses a non-myopic planning procedure that targets exploration in relatively high-reward regions of the state-action space unlikely to be visited by the behavior policy. By leveraging concepts from the Conditional Entropy Bottleneck, PTGOOD encourages data collected online to provide new information relevant to improving the final deployment policy without altering rewards. We show empirically in several continuous control tasks that PTGOOD significantly improves agent returns during online fine-tuning and avoids the suboptimal policy convergence that many of our baselines exhibit in several environments.

Updated: 2024-06-21 13:13:15

标题: 计划在离线到在线强化学习中进行超出分布的行动

摘要: 离线预训练与静态数据集，随后进行在线微调（或称为OtO）是与真实世界RL部署流程相匹配的范式。在这种情况下，我们的目标是在有限的在线交互预算内找到表现最佳的策略。以往在OtO设置中的工作着重于纠正由离线RL算法的策略约束机制引入的偏差。这些约束使得学到的策略与收集数据集的行为策略保持接近，但我们表明如果行为策略远离最优，则这可能会不必要地限制策略性能。相反，我们放弃了约束，将OtO RL框架视为一个旨在最大化在线数据收集利益的探索问题。我们首先在OtO设置中研究基于内在奖励和UCB的主要在线RL探索方法，显示内在奖励通过修改奖励函数增加了训练不稳定性，而UCB方法是短视的，不清楚要用于动作选择的哪个学习组件的合奏。然后，我们介绍了一种用于计划超出分布范围的算法（PTGOOD），避免了这些问题。PTGOOD使用非短视的计划程序，旨在探索在状态-动作空间的相对高奖励区域中的数据采集，这些区域不太可能被行为策略访问。通过利用条件熵瓶颈的概念，PTGOOD鼓励在线收集的数据提供与改进最终部署策略相关的新信息，而不改变奖励。我们在几个连续控制任务的实证研究中表明，PTGOOD在在线微调过程中显著提高了代理回报，并避免了我们许多基线在几个环境中表现出的次优策略收敛问题。

更新时间: 2024-06-21 13:13:15

领域: cs.LG

下载: http://arxiv.org/abs/2310.05723v3

Speech Emotion Recognition under Resource Constraints with Data Distillation

Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.

Updated: 2024-06-21 13:10:46

标题: 在资源约束下进行语音情感识别并进行数据精炼

摘要: 言语情感识别（SER）在人机交互中起着至关重要的作用。物联网（IoT）中边缘设备的出现给构建复杂的深度学习模型带来了挑战，因为内存和计算资源受到限制。此外，情感言语数据通常包含私人信息，在部署SER模型时引发对隐私泄露的担忧。为了解决这些挑战，我们提出了一个数据精炼框架，以促进在IoT应用中使用合成、更小、更精炼的数据集高效开发SER模型。我们的实验表明，精炼数据集可以有效地用于训练具有固定初始化的SER模型，其性能与使用原始完整情感言语数据集开发的模型相当。

更新时间: 2024-06-21 13:10:46

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.15119v1

Attention as a Hypernetwork

Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not. What mechanisms underlie this ability for compositional generalization? By reformulating multi-head attention as a hypernetwork, we reveal that a low-dimensional latent code specifies key-query specific operations. We find empirically that this latent code is highly structured, capturing information about the subtasks performed by the network. Using the framework of attention as a hypernetwork we further propose a simple modification of multi-head linear attention that strengthens the ability for compositional generalization on a range of abstract reasoning tasks. In particular, we introduce a symbolic version of the Raven Progressive Matrices human intelligence test on which we demonstrate how scaling model size and data enables compositional generalization and gives rise to a functionally structured latent code in the transformer.

Updated: 2024-06-21 13:09:43

标题: 注意力作为一个超网络

摘要: 在某些情况下，Transformer可以推广到在训练过程中可能遇到其组成部分，但其组合尚未遇到的新问题实例。是什么机制支持这种组合泛化能力？通过将多头注意力重新构建为超网络，我们发现一个低维潜在代码指定了关键查询特定操作。我们实证发现这个潜在代码高度结构化，捕捉了网络执行的子任务信息。利用将注意力作为超网络的框架，我们进一步提出了一个简单的多头线性注意力的修改，增强了在一系列抽象推理任务中的组合泛化能力。特别是，我们引入了一个象征性版本的拉文渐进矩阵人类智力测试，通过该测试展示了如何通过扩展模型大小和数据实现组合泛化，并使Transformer中出现功能结构化的潜在代码。

更新时间: 2024-06-21 13:09:43

领域: cs.LG

下载: http://arxiv.org/abs/2406.05816v2

Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when starting to train on a new task. Motivated by this 'stability gap', we propose that continual learning strategies should focus not only on the optimization objective, but also on the way this objective is optimized. While there is some continual learning work that alters the optimization trajectory (e.g., using gradient projection techniques), this line of research is positioned as alternative to improving the optimization objective, while we argue it should be complementary. In search of empirical support for our proposition, we perform a series of pre-registered experiments combining replay-approximated joint objectives with gradient projection-based optimization routines. However, this first experimental attempt fails to show clear and consistent benefits. Nevertheless, our conceptual arguments, as well as some of our empirical results, demonstrate the distinctive importance of the optimization trajectory in continual learning, thereby opening up a new direction for continual learning research.

Updated: 2024-06-21 13:09:07

标题: 持续学习的两种互补视角：不仅要问要优化什么，还要问如何优化

摘要: 近年来，在深度神经网络的持续训练方面取得了相当大的进展，主要得益于采用将重播或正则化项添加到损失函数中以近似迄今为止所有任务的联合损失的方法。然而，我们发现，即使对联合损失进行了完美的近似，这些方法在开始对新任务进行训练时仍然会出现临时但相当大的遗忘。受到这种“稳定性差距”的启发，我们建议持续学习策略不仅应关注优化目标，还应关注优化这一目标的方式。虽然有一些持续学习工作改变了优化轨迹（例如，使用梯度投影技术），这一研究方向被定位为改善优化目标的替代方法，而我们认为它应该是互补的。为了寻找对我们的主张的经验支持，我们进行了一系列预先注册的实验，结合了重播近似的联合目标和基于梯度投影的优化程序。然而，这第一次实验尝试未能显示出明显和一致的好处。尽管如此，我们的概念论证以及我们的一些实证结果表明，在持续学习中，优化轨迹的独特重要性，从而开辟了持续学习研究的新方向。

更新时间: 2024-06-21 13:09:07

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2311.04898v2

FA-Net: A Fuzzy Attention-aided Deep Neural Network for Pneumonia Detection in Chest X-Rays

Pneumonia is a respiratory infection caused by bacteria, fungi, or viruses. It affects many people, particularly those in developing or underdeveloped nations with high pollution levels, unhygienic living conditions, overcrowding, and insufficient medical infrastructure. Pneumonia can cause pleural effusion, where fluids fill the lungs, leading to respiratory difficulty. Early diagnosis is crucial to ensure effective treatment and increase survival rates. Chest X-ray imaging is the most commonly used method for diagnosing pneumonia. However, visual examination of chest X-rays can be difficult and subjective. In this study, we have developed a computer-aided diagnosis system for automatic pneumonia detection using chest X-ray images. We have used DenseNet-121 and ResNet50 as the backbone for the binary class (pneumonia and normal) and multi-class (bacterial pneumonia, viral pneumonia, and normal) classification tasks, respectively. We have also implemented a channel-specific spatial attention mechanism, called Fuzzy Channel Selective Spatial Attention Module (FCSSAM), to highlight the specific spatial regions of relevant channels while removing the irrelevant channels of the extracted features by the backbone. We evaluated the proposed approach on a publicly available chest X-ray dataset, using binary and multi-class classification setups. Our proposed method achieves accuracy rates of 97.15\% and 79.79\% for the binary and multi-class classification setups, respectively. The results of our proposed method are superior to state-of-the-art (SOTA) methods. The code of the proposed model will be available at: https://github.com/AyushRoy2001/FA-Net.

Updated: 2024-06-21 13:08:40

标题: FA-Net：一种模糊注意力辅助深度神经网络，用于胸部X射线检测肺炎

摘要: 肺炎是一种由细菌、真菌或病毒引起的呼吸道感染。它影响许多人，特别是那些生活在污染严重、卫生条件不佳、人口拥挤和医疗基础设施不足的发展中国家或不发达国家的人群。肺炎可能导致胸膜积液，使肺部充满液体，导致呼吸困难。及早诊断对确保有效治疗和提高存活率至关重要。胸部X射线成像是诊断肺炎最常用的方法。然而，对胸部X射线的视觉检查可能会困难且主观。在这项研究中，我们开发了一种计算机辅助诊断系统，用于使用胸部X射线图像自动检测肺炎。我们分别将DenseNet-121和ResNet50作为二元类（肺炎和正常）和多类（细菌性肺炎、病毒性肺炎和正常）分类任务的骨干。我们还实施了一个通道特定的空间注意机制，称为模糊通道选择性空间注意模块（FCSSAM），以突出相关通道的特定空间区域，同时通过骨干去除提取特征的无关通道。我们在公开可用的胸部X射线数据集上评估了所提出的方法，使用二元和多类分类设置。我们所提出的方法在二元和多类分类设置中分别达到了97.15\%和79.79\%的准确率。我们所提出的方法的结果优于最先进的方法。所提出模型的代码将在以下网址提供：https://github.com/AyushRoy2001/FA-Net。

更新时间: 2024-06-21 13:08:40

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.15117v1

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective in these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific constraints during the learning process. This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs). The investigation commences with an examination of model-based strategies, delving into two foundational methods - optimism in the face of uncertainty and posterior sampling. Subsequently, the discussion transitions to parametrized model-free approaches, where the primal-dual policy gradient-based algorithm is explored as a solution for constrained MDPs. The monograph provides regret guarantees and analyzes constraint violation for each of the discussed setups. For the above exploration, we assume the underlying MDP to be ergodic. Further, this monograph extends its discussion to encompass results tailored for weakly communicating MDPs, thereby broadening the scope of its findings and their relevance to a wider range of practical scenarios.

Updated: 2024-06-21 13:04:50

标题: 受限制的强化学习与平均奖励目标：基于模型和无模型算法

摘要: 强化学习（RL）作为一个用于顺序决策的通用框架，在机器人技术、自动驾驶、推荐系统、供应链优化、生物学、力学和金融等各个领域都有应用。这些应用的主要目标是最大化平均奖励。现实世界中的场景通常需要在学习过程中遵守特定的约束条件。本专著重点探讨了在平均奖励马尔可夫决策过程（MDPs）的背景下，针对受限RL的各种基于模型和无模型方法。调查始于对基于模型的策略的研究，深入探讨了两种基础方法 - 在面对不确定性时的乐观主义和后验抽样。随后，讨论转向参数化的无模型方法，其中以原始-对偶策略梯度算法被探讨作为受限MDPs的解决方案。本专著提供了后悔保证，并分析了每个讨论设置的约束违规情况。在上述探索中，我们假设基础MDP是遍历的。此外，本专著将讨论扩展到适用于弱通信MDPs的结果，从而扩大了其发现的范围及其与更广泛实际场景的相关性。

更新时间: 2024-06-21 13:04:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.11481v2

A Dual Attention-aided DenseNet-121 for Classification of Glaucoma from Fundus Images

Deep learning and computer vision methods are nowadays predominantly used in the field of ophthalmology. In this paper, we present an attention-aided DenseNet-121 for classifying normal and glaucomatous eyes from fundus images. It involves the convolutional block attention module to highlight relevant spatial and channel features extracted by DenseNet-121. The channel recalibration module further enriches the features by utilizing edge information along with the statistical features of the spatial dimension. For the experiments, two standard datasets, namely RIM-ONE and ACRIMA, have been used. Our method has shown superior results than state-of-the-art models. An ablation study has also been conducted to show the effectiveness of each of the components. The code of the proposed work is available at: https://github.com/Soham2004GitHub/DADGC.

Updated: 2024-06-21 13:00:46

标题: 一种双重关注辅助的DenseNet-121用于视网膜底像中青光眼的分类

摘要: 深度学习和计算机视觉方法如今主要用于眼科领域。在本文中，我们提出了一种辅助关注的DenseNet-121模型，用于从眼底图像中对正常眼和青光眼眼进行分类。它涉及卷积块关注模块，用于突出由DenseNet-121提取的相关空间和通道特征。通道重新校准模块通过利用边缘信息以及空间维度的统计特征进一步丰富了特征。在实验中，使用了两个标准数据集，即RIM-ONE和ACRIMA。我们的方法表现出比现有模型更好的结果。还进行了消融研究，展示了每个组件的有效性。提出工作的代码可在以下网址找到：https://github.com/Soham2004GitHub/DADGC。

更新时间: 2024-06-21 13:00:46

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.15113v1

Investigating the impact of 2D gesture representation on co-speech gesture generation

Co-speech gestures play a crucial role in the interactions between humans and embodied conversational agents (ECA). Recent deep learning methods enable the generation of realistic, natural co-speech gestures synchronized with speech, but such approaches require large amounts of training data. "In-the-wild" datasets, which compile videos from sources such as YouTube through human pose detection models, offer a solution by providing 2D skeleton sequences that are paired with speech. Concurrently, innovative lifting models have emerged, capable of transforming these 2D pose sequences into their 3D counterparts, leading to large and diverse datasets of 3D gestures. However, the derived 3D pose estimation is essentially a pseudo-ground truth, with the actual ground truth being the 2D motion data. This distinction raises questions about the impact of gesture representation dimensionality on the quality of generated motions, a topic that, to our knowledge, remains largely unexplored. In this work, we evaluate the impact of the dimensionality of the training data, 2D or 3D joint coordinates, on the performance of a multimodal speech-to-gesture deep generative model. We use a lifting model to convert 2D-generated sequences of body pose to 3D. Then, we compare the sequence of gestures generated directly in 3D to the gestures generated in 2D and lifted to 3D as post-processing.

Updated: 2024-06-21 12:59:20

标题: 研究二维手势表示对语言协同手势生成的影响

摘要: 手语在人类和具身体对话代理（ECA）之间的互动中起着至关重要的作用。最近的深度学习方法使得能够生成与语音同步的逼真、自然的共语手势，但这些方法需要大量的训练数据。通过人体姿势检测模型从YouTube等来源编译视频的“野外”数据集提供了一种解决方案，其中包含与语音配对的2D骨架序列。同时，出现了创新的抬升模型，能够将这些2D姿势序列转换为它们的3D对应物，从而产生大量丰富的3D手势数据集。然而，所得的3D姿势估计实质上是一种伪地面实况，实际地面实况是2D运动数据。这种区别引发了有关手势表示维度对生成动作质量的影响的问题，这是一个据我们所知，尚未被广泛探讨的话题。在这项工作中，我们评估了训练数据维度（2D或3D关节坐标）对多模式语音到手势深度生成模型性能的影响。我们使用一个抬升模型将2D生成的身体姿势序列转换为3D。然后，我们将直接生成的3D手势序列与在2D中生成并抬升至3D进行后处理的手势序列进行比较。

更新时间: 2024-06-21 12:59:20

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.15111v1

Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprising alignment of untrained models. To estimate LLM-to-brain similarity, we first select language-selective units within an LLM, similar to how neuroscientists identify the language network in the human brain. We then benchmark the brain alignment of these LLM units across five different brain recording datasets. By isolating critical components of the Transformer architecture, we identify tokenization strategy and multihead attention as the two major components driving brain alignment. A simple form of recurrence further improves alignment. We further demonstrate this quantitative brain alignment of our model by reproducing landmark studies in the language neuroscience field, showing that localized model units -- just like language voxels measured empirically in the human brain -- discriminate more reliably between lexical than syntactic differences, and exhibit similar response profiles under the same experimental conditions. Finally, we demonstrate the utility of our model's representations for language modeling, achieving improved sample and parameter efficiency over comparable architectures. Our model's estimates of surprisal sets a new state-of-the-art in the behavioral alignment to human reading times. Taken together, we propose a highly brain- and behaviorally-aligned model that conceptualizes the human language system as an untrained shallow feature encoder, with structural priors, combined with a trained decoder to achieve efficient and performant language processing.

Updated: 2024-06-21 12:54:03

标题: 通过浅层未经训练的多头注意力网络实现类似大脑的语言处理

摘要: 大型语言模型（LLMs）已经被证明是人类语言系统的有效模型，一些模型能够预测当前数据集中大部分可解释的脑活动变化。即使在未经训练的模型中，由架构先验引发的表示可以与脑数据合理对齐。在这项工作中，我们调查了驱动未经训练模型惊人对齐的关键架构组件。为了估计LLM与脑的相似性，我们首先在LLM中选择语言选择性单元，类似于神经科学家在人类大脑中识别语言网络的方式。然后，我们在五个不同的脑记录数据集中基准测试这些LLM单元的脑对齐。通过隔离Transformer架构的关键组件，我们确定了分词策略和多头注意力作为驱动脑对齐的两个主要组件。简单的循环形式进一步改善了对齐。我们通过重现语言神经科学领域的重要研究，展示了我们模型的定量脑对齐，表明定位的模型单元 - 就像在人类大脑中经验性测量的语言体素一样 - 在词汇而不是句法差异之间更可靠地区分，并在相同实验条件下呈现类似的响应特征。最后，我们展示了我们模型表示在语言建模中的实用性，实现了比可比架构更好的样本和参数效率。我们模型对惊讶度的估计在与人类阅读时间的行为对齐方面设立了新的技术水平。综上所述，我们提出了一个高度与大脑和行为对齐的模型，将人类语言系统构想为一个未经训练的浅特征编码器，具有结构先验，结合训练后的解码器实现高效和高性能的语言处理。

更新时间: 2024-06-21 12:54:03

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.15109v1

NeuroCUT: A Neural Approach for Robust Graph Partitioning

Graph partitioning aims to divide a graph into disjoint subsets while optimizing a specific partitioning objective. The majority of formulations related to graph partitioning exhibit NP-hardness due to their combinatorial nature. Conventional methods, like approximation algorithms or heuristics, are designed for distinct partitioning objectives and fail to achieve generalization across other important partitioning objectives. Recently machine learning-based methods have been developed that learn directly from data. Further, these methods have a distinct advantage of utilizing node features that carry additional information. However, these methods assume differentiability of target partitioning objective functions and cannot generalize for an unknown number of partitions, i.e., they assume the number of partitions is provided in advance. In this study, we develop NeuroCUT with two key innovations over previous methodologies. First, by leveraging a reinforcement learning-based framework over node representations derived from a graph neural network and positional features, NeuroCUT can accommodate any optimization objective, even those with non-differentiable functions. Second, we decouple the parameter space and the partition count making NeuroCUT inductive to any unseen number of partition, which is provided at query time. Through empirical evaluation, we demonstrate that NeuroCUT excels in identifying high-quality partitions, showcases strong generalization across a wide spectrum of partitioning objectives, and exhibits strong generalization to unseen partition count.

Updated: 2024-06-21 12:53:43

标题: NeuroCUT：一种用于稳健图分区的神经方法

摘要: 图分区旨在将图分割为不相交的子集，同时优化特定的分区目标。与图分区相关的大多数公式由于其组合性质而表现出NP难度。传统方法，如近似算法或启发式方法，旨在针对不同的分区目标进行设计，无法实现对其他重要分区目标的泛化。最近发展了基于机器学习的方法，可以直接从数据中学习。此外，这些方法具有利用携带附加信息的节点特征的明显优势。然而，这些方法假设目标分区目标函数可微分，并且无法推广到未知数量的分区，即假设分区的数量提前提供。在这项研究中，我们开发了NeuroCUT，相较于先前的方法具有两个关键创新。首先，通过利用基于强化学习的框架，可以从图神经网络和位置特征导出的节点表示中，NeuroCUT可以适应任何优化目标，甚至那些具有不可微分函数的目标。其次，我们解耦了参数空间和分区计数，使NeuroCUT适用于任何未知数量的分区，这在查询时提供。通过实证评估，我们证明NeuroCUT在识别高质量分区方面表现出色，展示了对一系列分区目标的强大泛化能力，并且对未知分区数量也具有强大的泛化能力。

更新时间: 2024-06-21 12:53:43

领域: cs.LG

下载: http://arxiv.org/abs/2310.11787v3

Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors

Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in real-world scenarios. In recent years, many OOD detectors have been developed, and even the benchmarking has been standardized, i.e. OpenOOD. The number of post-hoc detectors is growing fast and showing an option to protect a pre-trained classifier against natural distribution shifts, claiming to be ready for real-world scenarios. However, its efficacy in handling adversarial examples has been neglected in the majority of studies. This paper investigates the adversarial robustness of the 16 post-hoc detectors on several evasion attacks and discuss a roadmap towards adversarial defense in OOD detectors.

Updated: 2024-06-21 12:45:07

标题: 解密后处理OOD检测器的对抗性鲁棒性定义

摘要: 在真实世界场景中安全部署深度学习模型的关键是检测到分布外（OOD）输入。近年来，许多OOD检测器已经被开发出来，甚至已经标准化了基准测试，即OpenOOD。后续检测器的数量正在迅速增长，并提供了一种保护预训练分类器免受自然分布变化影响的选择，声称已经准备好应对真实世界场景。然而，大多数研究中忽视了对抗性示例的处理效果。本文探讨了16种后续检测器在几种规避攻击中的对抗性鲁棒性，并讨论了在OOD检测器中朝向对抗性防御的路线图。

更新时间: 2024-06-21 12:45:07

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.15104v1

Federated Learning over Connected Modes

Statistical heterogeneity in federated learning poses two major challenges: slow global training due to conflicting gradient signals, and the need of personalization for local distributions. In this work, we tackle both challenges by leveraging recent advances in \emph{linear mode connectivity} -- identifying a linearly connected low-loss region in the weight space of neural networks, which we call solution simplex. We propose federated learning over connected modes (\textsc{Floco}), where clients are assigned local subregions in this simplex based on their gradient signals, and together learn the shared global solution simplex. This allows personalization of the client models to fit their local distributions within the degrees of freedom in the solution simplex and homogenizes the update signals for the global simplex training. Our experiments show that \textsc{Floco} accelerates the global training process, and significantly improves the local accuracy with minimal computational overhead.

Updated: 2024-06-21 12:43:12

标题: 连接模式下的联邦学习

摘要: 在联邦学习中的统计异质性存在两个主要挑战：由于梯度信号冲突导致全局训练缓慢，以及需要个性化处理本地分布。在这项工作中，我们通过利用最近在线性模式连接方面的进展来解决这两个挑战 - 在神经网络的权重空间中识别出一个线性连接的低损失区域，我们称之为解决方案简单形。我们提出了基于连接模式的联邦学习（Floco），其中根据它们的梯度信号将客户端分配到这个简单形中的本地子区域，并一起学习共享的全局解决方案简单形。这允许客户端模型个性化地适应其本地分布在解决方案简单形中的自由度范围内，并使全局简单形训练的更新信号同质化。我们的实验表明，Floco加速了全局训练过程，并显著提高了本地准确性，且计算开销最小。

更新时间: 2024-06-21 12:43:12

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2403.03333v2

Finding (and exploiting) vulnerabilities on IP Finding (and exploiting) vulnerabilities on IP Cameras: the Tenda CP3 case study

Consumer IP cameras are now the most widely adopted solution for remote monitoring in various contexts, such as private homes or small offices. While the security of these devices has been scrutinized, most approaches are limited to relatively shallow network-based analyses. In this paper, we discuss a methodology for the security analysis and identification of remotely exploitable vulnerabilities in IP cameras, which includes static and dynamic analyses of executables extracted from IP camera firmware. Compared to existing methodologies, our approach leverages the context of the target device to focus on the identification of malicious invocation sequences that could lead to exploitable vulnerabilities. We demonstrate the application of our methodology by using the Tenda CP3 IP camera as a case study. We identified five novel CVEs, with CVSS scores ranging from 7.5 to 9.8. To partially automate our analysis, we also developed a custom tool based on Ghidra and rhabdomancer.

Updated: 2024-06-21 12:41:48

标题: 发现（和利用）IP相机的漏洞：Tenda CP3案例研究

摘要: 消费者IP摄像头现在是远程监控在各种情境中最广泛采用的解决方案，如私人住宅或小型办公室。尽管这些设备的安全性受到了审查，但大多数方法都限于相对浅层的基于网络的分析。在本文中，我们讨论了一种用于安全分析和识别IP摄像头中存在远程利用漏洞的方法论，其中包括从IP摄像头固件中提取的可执行文件的静态和动态分析。与现有方法论相比，我们的方法利用目标设备的上下文，侧重于识别可能导致可利用漏洞的恶意调用序列。我们通过以Tenda CP3 IP摄像头为案例研究来展示我们方法的应用。我们识别出五个新的CVE，CVSS评分范围从7.5到9.8不等。为了部分自动化我们的分析，我们还开发了一个基于Ghidra和rhabdomancer的自定义工具。

更新时间: 2024-06-21 12:41:48

领域: cs.CR

下载: http://arxiv.org/abs/2406.15103v1

HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which is a highly challenging task. In this study, we introduce a novel optimization strategy called Hadamard Low-rank Quantization (HLQ), focusing on reducing the cost of backpropagation in convolutional and linear layers. We first analyze the sensitivity of gradient computation with respect to activation and weight, and judiciously design the HLQ pipeline to apply 4-bit Hadamard quantization to the activation gradient and Hadamard low-rank approximation to the weight gradient. This combination was found to be the best for maximizing benefits, and our extensive experiments demonstrate the outstanding performance of HLQ in both training from scratch and fine-tuning, achieving significant memory savings and acceleration on real GPUs with negligible quality degradation.

Updated: 2024-06-21 12:41:41

标题: HLQ：通过Hadamard低秩量化实现快速高效的反向传播

摘要: 随着模型规模的快速增加和各种微调应用的日益重要，轻量级训练变得至关重要。由于反向传播比正向传播昂贵一倍，优化反向传播尤为重要。然而，对这一过程的修改可能导致次优收敛，因此训练优化应该最小化扰动，这是一个极具挑战性的任务。在本研究中，我们介绍了一种新颖的优化策略，称为Hadamard低秩量化（HLQ），重点是减少卷积和线性层中反向传播的成本。我们首先分析了梯度计算对激活和权重的敏感性，并精心设计了HLQ流水线，将4位Hadamard量化应用于激活梯度，并将Hadamard低秩逼近应用于权重梯度。这种组合被发现对最大化收益最为有效，我们的大量实验表明了HLQ在从头开始训练和微调方面的出色表现，实现了在真实GPU上显著的内存节省和加速，质量降低可忽略不计。

更新时间: 2024-06-21 12:41:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.15102v1

Information Guided Regularization for Fine-tuning Language Models

The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is affected by these task-sensitive parameters through an information-theoretic lens. We then leverage the findings from our investigations to devise a novel approach to dropout for improved model regularization and better downstream generalization. This approach, named guided dropout, is both task & architecture agnostic and adds no computational overhead to the fine-tuning process. Through empirical evaluations, we showcase that our approach to regularization yields consistently better performance, even in scenarios of data paucity, compared to standardized baselines.

Updated: 2024-06-21 12:41:17

标题: 细调语言模型的信息引导正则化

摘要: 预训练-微调范式一直是现代语言建模中迁移学习的事实战略。鉴于语言模型中任务适应通常是跨任务共享参数的功能，我们认为需要存在一种更精细的正则化方法，以实现更顺畅的迁移学习。为此，我们通过信息论视角研究了预训练损失景观如何受到这些任务敏感参数的影响。然后，我们利用我们的研究结果，设计了一种用于改进模型正则化和改善下游泛化性能的新方法。这种方法被称为引导式dropout，既不受任务和架构的影响，也不会给微调过程增加计算开销。通过实证评估，我们展示了我们的正则化方法相对于标准基线在数据匮乏场景中都能持续提供更好的性能。

更新时间: 2024-06-21 12:41:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14005v2

How Intermodal Interaction Affects the Performance of Deep Multimodal Fusion for Mixed-Type Time Series

Mixed-type time series (MTTS) is a bimodal data type that is common in many domains, such as healthcare, finance, environmental monitoring, and social media. It consists of regularly sampled continuous time series and irregularly sampled categorical event sequences. The integration of both modalities through multimodal fusion is a promising approach for processing MTTS. However, the question of how to effectively fuse both modalities remains open. In this paper, we present a comprehensive evaluation of several deep multimodal fusion approaches for MTTS forecasting. Our comparison includes three fusion types (early, intermediate, and late) and five fusion methods (concatenation, weighted mean, weighted mean with correlation, gating, and feature sharing). We evaluate these fusion approaches on three distinct datasets, one of which was generated using a novel framework. This framework allows for the control of key data properties, such as the strength and direction of intermodal interactions, modality imbalance, and the degree of randomness in each modality, providing a more controlled environment for testing fusion approaches. Our findings show that the performance of different fusion approaches can be substantially influenced by the direction and strength of intermodal interactions. The study reveals that early and intermediate fusion approaches excel at capturing fine-grained and coarse-grained cross-modal features, respectively. These findings underscore the crucial role of intermodal interactions in determining the most effective fusion strategy for MTTS forecasting.

Updated: 2024-06-21 12:26:48

标题: 跨模态交互如何影响深度多模态融合在混合类型时间序列中的性能

摘要: 混合型时间序列（MTTS）是一种双峰数据类型，在许多领域中很常见，如医疗保健、金融、环境监测和社交媒体。它由定期采样的连续时间序列和不定期采样的分类事件序列组成。通过多模态融合整合两种模态是处理MTTS的一种有前途的方法。然而，如何有效地融合两种模态的问题仍然悬而未决。在本文中，我们对几种深度多模态融合方法进行了全面评估，用于MTTS预测。我们的比较包括三种融合类型（早期、中间和晚期）和五种融合方法（串联、加权平均、带相关性的加权平均、门控和特征共享）。我们在三个不同的数据集上评估了这些融合方法，其中一个是使用新框架生成的。该框架允许控制关键数据属性，如模态间相互作用的强度和方向、模态不平衡以及每个模态中随机性的程度，为测试融合方法提供了更受控制的环境。我们的研究结果表明，不同的融合方法的性能可以受到模态间相互作用的方向和强度的显著影响。研究揭示了早期和中间融合方法在捕捉细粒度和粗粒度跨模态特征方面表现卓越。这些发现强调了模态间相互作用在确定最有效的MTTS预测融合策略中的关键作用。

更新时间: 2024-06-21 12:26:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.15098v1

Towards General Negotiation Strategies with End-to-End Reinforcement Learning

The research field of automated negotiation has a long history of designing agents that can negotiate with other agents. Such negotiation strategies are traditionally based on manual design and heuristics. More recently, reinforcement learning approaches have also been used to train agents to negotiate. However, negotiation problems are diverse, causing observation and action dimensions to change, which cannot be handled by default linear policy networks. Previous work on this topic has circumvented this issue either by fixing the negotiation problem, causing policies to be non-transferable between negotiation problems or by abstracting the observations and actions into fixed-size representations, causing loss of information and expressiveness due to feature design. We developed an end-to-end reinforcement learning method for diverse negotiation problems by representing observations and actions as a graph and applying graph neural networks in the policy. With empirical evaluations, we show that our method is effective and that we can learn to negotiate with other agents on never-before-seen negotiation problems. Our result opens up new opportunities for reinforcement learning in negotiation agents.

Updated: 2024-06-21 12:24:36

标题: 朝向利用端到端强化学习实现一般谈判策略

摘要: 自动化谈判领域的研究历史悠久，设计能够与其他代理商进行谈判的代理商。这种谈判策略传统上基于手工设计和启发式。最近，强化学习方法也被用来训练代理商进行谈判。然而，谈判问题多样，导致观察和行动维度发生变化，这不能通过默认的线性策略网络处理。先前关于这个主题的工作要么通过固定谈判问题来绕过这个问题，导致策略在不同谈判问题之间不可转移，要么通过将观察和行动抽象成固定大小的表示来绕过这个问题，导致由于特征设计而导致信息丢失和表达能力不足。我们开发了一种端到端的强化学习方法，通过将观察和行动表示为图，并在策略中应用图神经网络，来解决多样的谈判问题。通过实证评估，我们展示了我们的方法是有效的，我们可以学会与其他代理商在以前未见过的谈判问题上进行谈判。我们的结果为强化学习在谈判代理商中开辟了新的机会。

更新时间: 2024-06-21 12:24:36

领域: cs.MA,cs.LG,I.2.11; I.2.6

下载: http://arxiv.org/abs/2406.15096v1

ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, built on unrealistic assumptions, or only effective against specific poison types, limiting their universal applicability. In this research, we propose a more universally effective, practical, and robust defense scheme called ECLIPSE. We first investigate the impact of Gaussian noise on the poisons and theoretically prove that any kind of poison will be largely assimilated when imposing sufficient random noise. In light of this, we assume the victim has access to an extremely limited number of clean images (a more practical scene) and subsequently enlarge this sparse set for training a denoising probabilistic model (a universal denoising tool). We then begin by introducing Gaussian noise to absorb the poisons and then apply the model for denoising, resulting in a roughly purified dataset. Finally, to address the trade-off of the inconsistency in the assimilation sensitivity of different poisons by Gaussian noise, we propose a lightweight corruption compensation module to effectively eliminate residual poisons, providing a more universal defense approach. Extensive experiments demonstrate that our defense approach outperforms 10 state-of-the-art defenses. We also propose an adaptive attack against ECLIPSE and verify the robustness of our defense scheme. Our code is available at https://github.com/CGCL-codes/ECLIPSE.

Updated: 2024-06-21 12:14:24

标题: ECLIPSE: 通过稀疏扩散净化清洁标签不加选择性毒物

摘要: 干净标签的不加选择性中毒攻击向正确标记的训练图像添加了看不见的扰动，从而大大降低了受害模型的泛化能力。最近，一些防御机制被提出，如对抗训练、图像转换技术和图像净化。然而，这些方案要么容易受到自适应攻击的影响，要么建立在不切实际的假设基础上，要么只对特定毒素类型有效，限制了它们的普适性。在这项研究中，我们提出了一种更具普遍有效性、实用性和强大防御方案，称为ECLIPSE。我们首先研究了高斯噪声对毒素的影响，并理论证明了当施加足够的随机噪声时，任何类型的毒素都将被大量同化。基于此，我们假设受害者只能访问极少量的清洁图像（更实际的情况），随后扩大这个稀疏集合以训练一个去噪概率模型（一种通用去噪工具）。我们首先引入高斯噪声来吸收毒素，然后应用该模型进行去噪，从而得到一个大致净化的数据集。最后，为了解决高斯噪声对不同毒素同化敏感性的不一致性的权衡，我们提出了一个轻量级的损坏补偿模块，有效消除残留毒素，提供了一种更普遍的防御方法。大量实验表明我们的防御方法优于10种最先进的防御方法。我们还提出了一种针对ECLIPSE的自适应攻击，并验证了我们防御方案的鲁棒性。我们的代码可在https://github.com/CGCL-codes/ECLIPSE 上找到。

更新时间: 2024-06-21 12:14:24

领域: cs.CR,cs.CV,eess.IV

下载: http://arxiv.org/abs/2406.15093v1

Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $\pi$ as a perturbation of a given reference measure $\mu$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincar\'e inequality offers improved bounds.

Updated: 2024-06-21 12:09:38

标题: 通过维数对数Sobolev不等式对概率测度中的低维结构进行尖锐检测

摘要: 在高维概率测度中识别低维结构是高效抽样的关键预处理步骤。我们介绍了一种方法，用于识别并逼近目标测度$\pi$作为给定参考测度$\mu$在$\mathbb{R}^{d}$的几个重要方向上的扰动。参考测度可以是高斯分布或高斯分布的非线性变换，通常出现在生成模型中。我们的方法扩展了先前关于最小化Kullback-Leibler散度的主导方法，以识别在这类测度中的最佳逼近。我们的主要贡献揭示了\emph{维度}对数Sobolev不等式（LSI）与这种假设的逼近之间的联系。具体来说，当目标和参考都是高斯分布时，我们表明最小化维度LSI等同于最小化限制在这种假设上的KL散度。对于一般的非高斯测度，维度LSI产生的主导函数在梯度为基础的降维中统一改进之前的主导函数。我们进一步证明了这种分析对于平方Hellinger距离的适用性，类似推理表明维度Poincar\'e不等式提供了改进的界限。

更新时间: 2024-06-21 12:09:38

领域: stat.ML,cs.LG,math.PR,math.ST,stat.CO,stat.TH

下载: http://arxiv.org/abs/2406.13036v2

Transferability of Graph Neural Networks using Graphon and Sampling Theories

Graph neural networks (GNNs) have become powerful tools for processing graph-based information in various domains. A desirable property of GNNs is transferability, where a trained network can swap in information from a different graph without retraining and retain its accuracy. A recent method of capturing transferability of GNNs is through the use of graphons, which are symmetric, measurable functions representing the limit of large dense graphs. In this work, we contribute to the application of graphons to GNNs by presenting an explicit two-layer graphon neural network (WNN) architecture. We prove its ability to approximate bandlimited graphon signals within a specified error tolerance using a minimal number of network weights. We then leverage this result, to establish the transferability of an explicit two-layer GNN over all sufficiently large graphs in a convergent sequence. Our work addresses transferability between both deterministic weighted graphs and simple random graphs and overcomes issues related to the curse of dimensionality that arise in other GNN results. The proposed WNN and GNN architectures offer practical solutions for handling graph data of varying sizes while maintaining performance guarantees without extensive retraining.

Updated: 2024-06-21 11:56:45

标题: 使用图论和抽样理论的图神经网络的可转移性

摘要: 图神经网络（GNNs）已经成为各个领域处理基于图的信息的强大工具。 GNNs的一个理想特性是可转移性，即经过训练的网络可以在不重新训练的情况下交换来自不同图的信息并保持其准确性。最近一种捕获GNNs可转移性的方法是通过使用图渐近，这些图渐近是对大型密集图的极限表示的对称可测函数。在这项工作中，我们通过提出一个显式的两层图渐近神经网络（WNN）架构，为图渐近在GNNs中的应用做出贡献。我们证明了它能够使用最少数量的网络权重来近似有限带宽的图渐近信号，并在指定的误差容限内实现。然后，我们利用这一结果，在一个收敛序列中建立一个显式的两层GNN在所有足够大的图之间的可转移性。我们的工作解决了在其他GNN结果中出现的与维度诅咒相关的问题，从而实现了确定性加权图和简单随机图之间的可转移性。所提出的WNN和GNN架构为处理各种大小的图数据提供了实用解决方案，同时保持性能保证，无需进行大量的重新训练。

更新时间: 2024-06-21 11:56:45

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2307.13206v2

GOAL: A Generalist Combinatorial Optimization Agent Learner

Machine Learning-based heuristics have recently shown impressive performance in solving a variety of hard combinatorial optimization problems (COPs). However they generally rely on a separate neural model, specialized and trained for each single problem. Any variation of a problem requires adjustment of its model and re-training from scratch. In this paper, we propose GOAL (for Generalist combinatorial Optimization Agent Learning), a generalist model capable of efficiently solving multiple COPs and which can be fine-tuned to solve new COPs. GOAL consists of a single backbone plus light-weight problem-specific adapters, mostly for input and output processing. The backbone is based on a new form of mixed-attention blocks which allows to handle problems defined on graphs with arbitrary combinations of node, edge and instance-level features. Additionally, problems which involve heterogeneous nodes or edges, such as in multi-partite graphs, are handled through a novel multi-type transformer architecture, where the attention blocks are duplicated to attend only the relevant combination of types while relying on the same shared parameters. We train GOAL on a set of routing, scheduling and classic graph problems and show that it is only slightly inferior to the specialized baselines while being the first multi-task model that solves a variety of COPs. Finally, we showcase the strong transfer learning capacity of GOAL by fine-tuning or learning the adapters for new problems, with only few shots and little data.

Updated: 2024-06-21 11:55:20

标题: 目标：通用组合优化智能学习代理

摘要: 基于机器学习的启发式方法最近在解决各种难以优化的组合问题（COPs）中表现出令人印象深刻的性能。然而，它们通常依赖于单独的神经模型，专门针对每个单独的问题进行专门训练。问题的任何变化都需要调整其模型并从头开始重新训练。在本文中，我们提出了GOAL（通用组合优化代理学习）模型，它能够高效地解决多个COPs，并且可以进行微调以解决新的COPs。GOAL由一个单一的主干加上轻量级的问题特定适配器组成，主要用于输入和输出处理。主干基于一种新形式的混合注意力块，这种块可以处理在图上定义的具有节点、边和实例级特征任意组合的问题。此外，涉及异构节点或边的问题，例如在多部分图中，通过一种新颖的多类型转换器架构来处理，其中注意力块被复制以仅关注相关类型的组合，同时依赖于相同的共享参数。我们在一组路由、调度和经典图问题上训练GOAL，并展示它与专门基线模型略有劣势，同时也是第一个解决各种COPs的多任务模型。最后，我们展示了GOAL强大的迁移学习能力，通过少量数据和少量数据进行微调或学习适配器来解决新问题。

更新时间: 2024-06-21 11:55:20

领域: cs.LG

下载: http://arxiv.org/abs/2406.15079v1

Supersonic OT: Fast Unconditionally Secure Oblivious Transfer

Oblivious Transfer (OT) is a fundamental cryptographic protocol with applications in secure Multi-Party Computation, Federated Learning, and Private Set Intersection. With the advent of quantum computing, it is crucial to develop unconditionally secure core primitives like OT to ensure their continued security in the post-quantum era. Despite over four decades since OT's introduction, the literature has predominantly relied on computational assumptions, except in cases using unconventional methods like noisy channels or a fully trusted party. Introducing "Supersonic OT", a highly efficient and unconditionally secure OT scheme that avoids public-key-based primitives, we offer an alternative to traditional approaches. Supersonic OT enables a receiver to obtain a response of size O(1). Its simple (yet non-trivial) design facilitates easy security analysis and implementation. The protocol employs a basic secret-sharing scheme, controlled swaps, the one-time pad, and a third-party helper who may be corrupted by a semi-honest adversary. Our implementation and runtime analysis indicate that a single instance of Supersonic OT completes in 0.35 milliseconds, making it up to 2000 times faster than the state-of-the-art base OT.

Updated: 2024-06-21 11:50:57

标题: 超音速OT：快速无条件安全的遗忘传输

摘要: 遗忘传输（OT）是一种具有基础密码协议的重要协议，在安全多方计算、联邦学习和私有集合相交等领域具有应用。随着量子计算的出现，开发无条件安全的核心原语如OT以确保它们在后量子时代的持续安全性至关重要。尽管自OT引入以来已有四十多年，文献主要依赖于计算假设，除非使用噪声信道或完全信任的第三方等非常规方法。引入了“超音速OT”，这是一种高效且无条件安全的OT方案，避免了基于公钥的原语，为传统方法提供了替代选择。超音速OT使接收方可以获得大小为O（1）的响应。其简单（但非平凡）的设计有助于易于进行安全分析和实现。该协议采用基本的秘密共享方案、受控交换、一次性密码本和可能被半诚实对手损害的第三方助手。我们的实现和运行时间分析表明，单个超音速OT实例在0.35毫秒内完成，比最先进的基本OT快2000倍。

更新时间: 2024-06-21 11:50:57

领域: cs.CR,cs.DB,cs.LG

下载: http://arxiv.org/abs/2406.15529v1

CORM: Cache Optimization with Recent Message for Large Language Model Inference

Large Language Models (LLMs), despite their remarkable performance across a wide range of tasks, necessitate substantial GPU memory and consume significant computational resources. Beyond the memory taken up by model weights, the memory used by the KV cache rises linearly with sequence length, becoming a primary bottleneck for inference. In this paper, we introduce an innovative method for optimizing the KV cache, which considerably minimizes its memory footprint. Upon thorough investigation, we discover that in most Transformer models, (i) there is a striking similarity between adjacent tokens' query vectors, and (ii) the attention calculation of the current query can rely exclusively on the attention information of a small fraction of preceding queries. Based on these observations, we present CORM, a KV cache eviction policy that dynamically retains essential key-value pairs for inference without the need for model fine-tuning. Our validation shows that CORM reduces the inference memory usage of KV cache by up to 70\% with negligible performance degradation across six tasks in LongBench. Furthermore, we demonstrate that CORM is compatible with GQA for further compression rate.

Updated: 2024-06-21 11:44:17

标题: CORM：利用最近消息进行大型语言模型推理的缓存优化

摘要: 大型语言模型（LLMs）尽管在各种任务中表现出色，但需要大量GPU内存并消耗大量计算资源。除了模型权重占用的内存外，KV缓存使用的内存随着序列长度呈线性增长，成为推断的主要瓶颈。在本文中，我们介绍了一种创新的优化KV缓存的方法，大大减少了其内存占用。经过深入调查，我们发现在大多数Transformer模型中，（i）相邻标记的查询向量之间存在显著的相似性，（ii）当前查询的注意力计算可以完全依赖于先前查询的一小部分注意力信息。基于这些观察结果，我们提出了CORM，一种KV缓存驱逐策略，动态保留推断所需的关键值对，无需进行模型微调。我们的验证显示，CORM将LongBench中六项任务中KV缓存的推断内存使用量减少了高达70％，并且几乎没有性能下降。此外，我们证明了CORM与GQA兼容，进一步压缩比率。

更新时间: 2024-06-21 11:44:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.15949v2

Neural Incremental Data Assimilation

Data assimilation is a central problem in many geophysical applications, such as weather forecasting. It aims to estimate the state of a potentially large system, such as the atmosphere, from sparse observations, supplemented by prior physical knowledge. The size of the systems involved and the complexity of the underlying physical equations make it a challenging task from a computational point of view. Neural networks represent a promising method of emulating the physics at low cost, and therefore have the potential to considerably improve and accelerate data assimilation. In this work, we introduce a deep learning approach where the physical system is modeled as a sequence of coarse-to-fine Gaussian prior distributions parametrized by a neural network. This allows us to define an assimilation operator, which is trained in an end-to-end fashion to minimize the reconstruction error on a dataset with different observation processes. We illustrate our approach on chaotic dynamical physical systems with sparse observations, and compare it to traditional variational data assimilation methods.

Updated: 2024-06-21 11:42:55

标题: 神经增量数据同化

摘要: 数据同化是许多地球物理应用中的一个核心问题，如天气预报。它旨在从稀疏观测和先前的物理知识补充中估计潜在大系统的状态，例如大气。所涉系统的规模和基础物理方程的复杂性使得从计算角度来看，这是一项具有挑战性的任务。神经网络代表了一种有望以较低成本模拟物理过程的方法，因此有潜力显著改进和加速数据同化。在这项工作中，我们引入了一种深度学习方法，其中物理系统被建模为由神经网络参数化的一系列粗到精的高斯先验分布。这使我们能够定义一个同化算子，通过端到端训练来最小化在具有不同观测过程的数据集上的重建误差。我们在具有稀疏观测的混沌动力学物理系统上演示了我们的方法，并将其与传统的变分数据同化方法进行了比较。

更新时间: 2024-06-21 11:42:55

领域: cs.LG

下载: http://arxiv.org/abs/2406.15076v1

KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning

Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box property of RL-based method, the generated database tuning strategies still face the urgent problem of lack explainability. Besides, the redundant parameters in large scale database always make the strategy learning become unstable. This paper proposes KnobTree, an interpertable framework designed for the optimization of database parameter configuration. In this framework, an interpertable database tuning algorithm based on RL-based differentatial tree is proposed, which building a transparent tree-based model to generate explainable database tuning strategies. To address the problem of large-scale parameters, We also introduce a explainable method for parameter importance assessment, by utilizing Shapley Values to identify parameters that have significant impacts on database performance. Experiments conducted on MySQL and Gbase8s databases have verified exceptional transparency and interpretability of the KnobTree model. The good property makes generated strategies can offer practical guidance to algorithm designers and database administrators. Moreover, our approach also slightly outperforms the existing RL-based tuning algorithms in aspects such as throughput, latency, and processing time.

Updated: 2024-06-21 11:40:55

标题: 旋钮树：通过可解释的强化学习智能配置数据库参数

摘要: 数据库是当代信息系统的基础，然而传统基于规则的配置方法难以处理具有数百个可调参数的现实应用程序的复杂性。深度强化学习（DRL）结合了感知和决策，为智能数据库配置调优提供了潜在解决方案。然而，由于基于RL的方法的黑盒特性，生成的数据库调优策略仍然面临缺乏可解释性的紧迫问题。此外，大规模数据库中的冗余参数总是使策略学习变得不稳定。本文提出了KnobTree，一个专为数据库参数配置优化设计的可解释框架。在这个框架中，提出了一种基于RL的差分树的可解释数据库调优算法，该算法构建了一个透明的基于树的模型来生成可解释的数据库调优策略。为了解决大规模参数的问题，我们还引入了一种可解释的参数重要性评估方法，通过利用Shapley Values来识别对数据库性能有显著影响的参数。在MySQL和Gbase8s数据库上进行的实验验证了KnobTree模型的异常透明性和可解释性。这种良好的性质使生成的策略能为算法设计人员和数据库管理员提供实际指导。此外，我们的方法在吞吐量、延迟和处理时间等方面也略优于现有的基于RL的调优算法。

更新时间: 2024-06-21 11:40:55

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2406.15073v1

SoK: Attacks on DAOs

Decentralized Autonomous Organizations (DAOs) are blockchain-based organizations that facilitate decentralized governance. Today, DAOs not only hold billions of dollars in their treasury but also govern many of the most popular Decentralized Finance (DeFi) protocols. This paper systematically analyses security threats to DAOs, focusing on the types of attacks they face. We study attacks on DAOs that took place in the past, attacks that have been theorized to be possible, and potential attacks that were uncovered and prevented in audits. For each of these (potential) attacks, we describe and categorize the attack vectors utilized into four categories. This reveals that while many attacks on DAOs take advantage of the less tangible and more complex human nature involved in governance, audits tend to focus on code and protocol vulnerabilities. Thus, additionally, the paper examines empirical data on DAO vulnerabilities, outlines risk factors contributing to these attacks, and suggests mitigation strategies to safeguard against such vulnerabilities.

Updated: 2024-06-21 11:40:11

标题: SoK: DAO的攻击

摘要: 去中心化自治组织（DAO）是基于区块链的组织，促进去中心化治理。今天，DAO不仅在其资金库中持有数十亿美元，还管理着许多最受欢迎的去中心化金融（DeFi）协议。本文系统地分析了针对DAO的安全威胁，重点关注它们面临的攻击类型。我们研究了过去发生的针对DAO的攻击，理论上可能发生的攻击，以及在审计中发现并阻止的潜在攻击。对于每一种（潜在的）攻击，我们描述并将利用的攻击向量分类为四类。这揭示了尽管许多对DAO的攻击利用了治理过程中涉及的不太明显且更复杂的人类因素，但审计往往集中在代码和协议漏洞上。因此，此外，本文还分析了DAO漏洞的实证数据，概述了导致这些攻击的风险因素，并提出了减轻此类漏洞威胁的策略。

更新时间: 2024-06-21 11:40:11

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2406.15071v1

Tempora-Fusion: Time-Lock Puzzle with Efficient Verifiable Homomorphic Linear Combination

To securely transmit sensitive information into the future, Time-Lock Puzzles (TLPs) have been developed. Their applications include scheduled payments, timed commitments, e-voting, and sealed-bid auctions. Homomorphic TLP is a key variant of TLP that enables computation on puzzles from different clients. This allows a solver/server to tackle only a single puzzle encoding the computation's result. However, existing homomorphic TLPs lack support for verifying the correctness of the computation results. We address this limitation by introducing Tempora-Fusion, a TLP that allows a server to perform homomorphic linear combinations of puzzles from different clients while ensuring verification of computation correctness. This scheme avoids asymmetric-key cryptography for verification, thus paving the way for efficient implementations. We discuss our scheme's application in various domains, such as federated learning, scheduled payments in online banking, and e-voting.

Updated: 2024-06-21 11:40:01

标题: Tempora-Fusion: 具有高效可验证同态线性组合的时间锁拼图

摘要: 为了安全地将敏感信息传输到未来，时间锁谜题（TLPs）已经被开发出来。它们的应用包括预定付款、定时承诺、电子投票和密封竞标拍卖。同态TLP是TLP的一个关键变种，它使不同客户端的谜题上的计算成为可能。这允许解谜者/服务器只处理一个编码计算结果的谜题。然而，现有的同态TLP缺乏支持验证计算结果正确性的功能。我们通过引入Tempora-Fusion来解决这一限制，这是一种TLP，允许服务器对来自不同客户端的谜题进行同态线性组合，同时确保验证计算正确性。这种方案避免了用于验证的非对称密钥密码，从而为高效实现铺平了道路。我们讨论了我们方案在各个领域的应用，例如联邦学习、在线银行的预定付款和电子投票。

更新时间: 2024-06-21 11:40:01

领域: cs.CR,cs.CE,cs.LG

下载: http://arxiv.org/abs/2406.15070v1

Delegated-Query Oblivious Transfer and its Practical Applications

Databases play a pivotal role in the contemporary World Wide Web and the world of cloud computing. Unfortunately, numerous privacy violations have recently garnered attention in the news. To enhance database privacy, we consider Oblivious Transfer (OT), an elegant cryptographic technology. Our observation reveals that existing research in this domain primarily concentrates on theoretical cryptographic applications, overlooking various practical aspects: - OTs assume parties have direct access to databases. Our "1-out-of-2 Delegated-Query OT" enables parties to privately query a database, without direct access. - With the rise of cloud computing, physically separated databases may no longer remain so. Our "1-out-of-2 Delegated-Query Multi-Receiver OT" protects privacy in such evolving scenarios. - Research often ignores the limitations of thin clients, e.g., Internet of Things devices. To address this, we propose a compiler that transforms any 1-out-of-n OT into a thin client version.

Updated: 2024-06-21 11:27:29

标题: 委托式查询遗忘传输及其实际应用

摘要: 数据库在当代互联网和云计算世界中发挥着关键作用。不幸的是，最近新闻中出现了许多隐私侵犯事件。为了增强数据库隐私，我们考虑了遗忘传输（OT），一种优雅的加密技术。我们的观察显示，该领域现有的研究主要集中在理论加密应用上，忽视了各种实际方面： - OTs假设各方直接访问数据库。我们的“1对2委托查询OT”使各方能够私下查询数据库，而无需直接访问。 - 随着云计算的兴起，物理上分离的数据库可能不再如此。我们的“1对2委托查询多接收者OT”在这种不断发展的情况下保护隐私。 - 研究经常忽视薄客户端的限制，例如物联网设备。为了解决这个问题，我们提出了一个编译器，将任何1对n OT转换为薄客户端版本。

更新时间: 2024-06-21 11:27:29

领域: cs.CR

下载: http://arxiv.org/abs/2406.15063v1

Straight-Through meets Sparse Recovery: the Support Exploration Algorithm

The {\it straight-through estimator} (STE) is commonly used to optimize quantized neural networks, yet its contexts of effective performance are still unclear despite empirical successes.To make a step forward in this comprehension, we apply STE to a well-understood problem: {\it sparse support recovery}. We introduce the {\it Support Exploration Algorithm} (SEA), a novel algorithm promoting sparsity, and we analyze its performance in support recovery (a.k.a. model selection) problems. SEA explores more supports than the state-of-the-art, leading to superior performance in experiments, especially when the columns of $A$ are strongly coherent.The theoretical analysis considers recovery guarantees when the linear measurements matrix $A$ satisfies the {\it Restricted Isometry Property} (RIP).The sufficient conditions of recovery are comparable but more stringent than those of the state-of-the-art in sparse support recovery. Their significance lies mainly in their applicability to an instance of the STE.

Updated: 2024-06-21 11:25:06

标题: 直通遇到稀疏恢复：支持探索算法

摘要: 直通估计器（STE）通常用于优化量化神经网络，尽管在实证成功的情况下，其有效性仍然不明确。为了在这方面迈出一步，我们将STE应用于一个广为人知的问题：稀疏支持恢复。我们介绍了支持探索算法（SEA），这是一种促进稀疏性的新算法，并分析了其在支持恢复（也称为模型选择）问题中的性能。SEA比现有技术探索更多支持，在实验中表现出优越的性能，特别是当矩阵A的列之间高度相关时。理论分析考虑了线性测量矩阵A满足受限等距性质（RIP）时的恢复保证。恢复的充分条件与稀疏支持恢复领域现有技术相比更为严格。它们的重要性主要在于它们适用于STE的一个实例。

更新时间: 2024-06-21 11:25:06

领域: cs.LG,cs.AI,math.OC,math.ST,stat.TH

下载: http://arxiv.org/abs/2301.13584v2

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training and evaluation on multiple devices. To address this, we introduce HW-GPT-Bench, a hardware-aware benchmark that utilizes surrogate predictions to approximate various hardware metrics across 13 devices of architectures in the GPT-2 family, with architectures containing up to 774M parameters. Our surrogates, via calibrated predictions and reliable uncertainty estimates, faithfully model the heteroscedastic noise inherent in the energy and latency measurements. To estimate perplexity, we employ weight-sharing techniques from Neural Architecture Search (NAS), inheriting pretrained weights from the largest GPT-2 model. Finally, we demonstrate the utility of HW-GPT-Bench by simulating optimization trajectories of various multi-objective optimization algorithms in just a few seconds.

Updated: 2024-06-21 11:21:01

标题: HW-GPT-Bench：面向硬件的语言模型架构基准测试

摘要: 随着语言模型规模的增大，需要在多个维度上进行彻底分析，以评估关键硬件指标之间的权衡，例如延迟、能耗、GPU内存使用和性能。在特定硬件约束条件下确定最佳模型配置变得至关重要，但由于在多个设备上进行详尽的训练和评估的计算负荷较大，因此仍然具有挑战性。为了解决这个问题，我们引入了HW-GPT-Bench，这是一个硬件感知基准测试，利用替代预测来近似GPT-2系列中包含多达774M参数的13种架构的各种硬件指标。通过经过校准的预测和可靠的不确定性估计，我们的替代模型忠实地模拟了能量和延迟测量中固有的异方差噪声。为了估算困惑度，我们采用了神经结构搜索(NAS)的权重共享技术，继承了最大的GPT-2模型的预训练权重。最后，我们通过在几秒钟内模拟各种多目标优化算法的优化轨迹来展示HW-GPT-Bench的实用性。

更新时间: 2024-06-21 11:21:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.10299v2

Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption

Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with predictive transition models through model-based rollouts. This combination raises a critical question: 'When to trust your model?'; i.e., which rollout length results in the model providing useful data? Janner et al. (2019) address this question by gradually increasing rollout lengths throughout the training. While theoretically tempting, uniform model accuracy is a fallacy that collapses at the latest when extrapolating. Instead, we propose asking the question 'Where to trust your model?'. Using inherent model uncertainty to consider local accuracy, we obtain the Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption (MACURA) algorithm. We propose an easy-to-tune rollout mechanism and demonstrate substantial improvements in data efficiency and performance compared to state-of-the-art deep MBRL methods on the MuJoCo benchmark.

Updated: 2024-06-21 11:12:23

标题: 相信模型相信自己--基于模型的带不确定性感知的Actor-Critic算法和滚动适应

摘要: Dyna风格的基于模型的强化学习（MBRL）将无模型代理与预测转换模型通过基于模型的rollout结合起来。这种组合引发了一个关键问题：“何时信任你的模型？”，即，哪种rollout长度会使模型提供有用的数据？Janner等人（2019）通过在训练过程中逐渐增加rollout长度来回答这个问题。虽然理论上很诱人，但统一的模型准确性是一个在外推时最终崩溃的谬论。相反，我们提出了一个问题：“在哪里信任你的模型？”利用固有的模型不确定性来考虑局部准确性，我们得到了考虑不确定性的基于模型的演员-评论家（MACURA）算法。我们提出了一个易于调整的rollout机制，并在MuJoCo基准测试中与最先进的深度MBRL方法相比，展示了数据效率和性能方面的显著改进。

更新时间: 2024-06-21 11:12:23

领域: cs.LG

下载: http://arxiv.org/abs/2405.19014v3

Latent Space Translation via Inverse Relative Projection

The emergence of similar representations between independently trained neural models has sparked significant interest in the representation learning community, leading to the development of various methods to obtain communication between latent spaces. "Latent space communication" can be achieved in two ways: i) by independently mapping the original spaces to a shared or relative one; ii) by directly estimating a transformation from a source latent space to a target one. In this work, we combine the two into a novel method to obtain latent space translation through the relative space. By formalizing the invertibility of angle-preserving relative representations and assuming the scale invariance of decoder modules in neural models, we can effectively use the relative space as an intermediary, independently projecting onto and from other semantically similar spaces. Extensive experiments over various architectures and datasets validate our scale invariance assumption and demonstrate the high accuracy of our method in latent space translation. We also apply our method to zero-shot stitching between arbitrary pre-trained text and image encoders and their classifiers, even across modalities. Our method has significant potential for facilitating the reuse of models in a practical manner via compositionality.

Updated: 2024-06-21 11:11:46

标题: 潜空间翻译通过逆相对投影

摘要: 独立训练的神经模型之间出现相似表示的出现引起了表示学习社区的极大兴趣，导致了各种方法的开发以获取潜在空间之间的通信。“潜在空间通信”可以通过两种方式实现：i)通过将原始空间独立映射到一个共享或相关空间；ii)通过直接估计从源潜在空间到目标空间的转换。在这项工作中，我们将这两种方法结合起来，通过相对空间实现潜在空间转换的新方法。通过形式化保持角度不变的相对表示的可逆性，并假设神经模型中解码器模块的尺度不变性，我们可以有效地使用相对空间作为中介，独立地投影到和从其他语义相似的空间。在各种架构和数据集上进行的大量实验验证了我们的尺度不变性假设，并展示了我们的方法在潜在空间转换中的高准确性。我们还将我们的方法应用于任意预训练文本和图像编码器及其分类器之间的零样本拼接，甚至跨模态。我们的方法通过组合具有重要潜力，有助于以实用方式促进模型的重复使用。

更新时间: 2024-06-21 11:11:46

领域: cs.LG

下载: http://arxiv.org/abs/2406.15057v1

SaTor: Satellite Routing in Tor to Reduce Latency

High latency is a critical limitation within the Tor network. A key factor exacerbating Tor latency is the creation of lengthy circuits that span across geographically distant regions, causing significant transmission delays. To address this issue, a common strategy involves modifying Tor's circuit building process to reduce the likelihood of selecting lengthy circuits. However, this strategy compromises the randomness of Tor's routing, thereby increasing the risk of deanonymization. Improving Tor's latency performance while minimizing security degradation presents a critical challenge. This paper proposes SaTor, a latency-improving scheme for Tor using satellite routing technology. SaTor proposes equipping a targeted subset of Tor relays with satellite network access, utilizing long-distance satellite transmission to accelerate slow circuits, without biasing the existing path selection process. Our SaTor performance evaluation, using a simulator we developed coupled with real-world measurements, demonstrates that over the long-term, SaTor offers an expected speed-up of roughly 40 ms for over 70% of circuits under common conditions. This improvement necessitates outfitting the top approx. 30-40% relays with satellite access. Our research uncovers a viable way to overcome Tor's latency bottleneck, serving as a practical reference for its future enhancement.

Updated: 2024-06-21 11:03:28

标题: SaTor: 用于减少延迟的Tor卫星路由

摘要: Tor网络中高延迟是一个关键限制。加剧Tor延迟的一个关键因素是创建横跨地理距离遥远地区的漫长电路，导致显著的传输延迟。为了解决这个问题，一个常见的策略涉及修改Tor的电路建立过程，以减少选择漫长电路的可能性。然而，这种策略损害了Tor路由的随机性，从而增加了去匿名化的风险。改善Tor的延迟性能同时最大程度地减少安全降级提出了一个关键挑战。本文提出了SaTor，一个利用卫星路由技术改善Tor延迟的方案。SaTor提出为Tor中的一部分中继节点配备卫星网络访问，利用长距离卫星传输加速慢速电路，而不偏向现有的路径选择过程。我们使用我们开发的模拟器结合真实世界的测量来评估SaTor的性能，结果表明在常见条件下，长期来看，SaTor为超过70%的电路提供了大约40毫秒的预期加速。这种改进需要为占前30-40%的中继节点配备卫星访问。我们的研究揭示了克服Tor延迟瓶颈的一种可行方式，为其未来的增强提供了一个实用参考。

更新时间: 2024-06-21 11:03:28

领域: cs.CR

下载: http://arxiv.org/abs/2406.15055v1

Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis

The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answers. In this paper, we investigate the construction of a more cohesive and stable Med-VQA structure. Motivated by causal effect, we propose a novel Triangular Reasoning VQA (Tri-VQA) framework, which constructs reverse causal questions from the perspective of "Why this answer?" to elucidate the source of the answer and stimulate more reasonable forward reasoning processes. We evaluate our method on the Endoscopic Ultrasound (EUS) multi-attribute annotated dataset from five centers, and test it on medical VQA datasets. Experimental results demonstrate the superiority of our approach over existing methods. Our codes and pre-trained models are available at https://anonymous.4open.science/r/Tri_VQA.

Updated: 2024-06-21 10:50:55

标题: Tri-VQA：三角推理医学视觉问答用于多属性分析

摘要: 医学视觉问答（Med-VQA）的交汇是一个具有挑战性的研究课题，其优势包括患者参与和临床专家参与提供第二意见。然而，现有基于联合嵌入的Med-VQA方法未能解释其提供的结果是基于正确推理还是巧合答案，这削弱了VQA答案的可信度。在本文中，我们研究了更具连贯性和稳定性的Med-VQA结构的构建。受因果效应的启发，我们提出了一种新颖的三角推理VQA（Tri-VQA）框架，从“为什么这个答案？”的角度构建逆向因果问题，以阐明答案的来源并激发更合理的正向推理过程。我们在来自五个中心的内窥镜超声（EUS）多属性注释数据集上评估我们的方法，并在医学VQA数据集上进行测试。实验结果表明我们的方法优于现有方法。我们的代码和预训练模型可在https://anonymous.4open.science/r/Tri_VQA 上找到。

更新时间: 2024-06-21 10:50:55

领域: cs.LG,cs.AI,cs.CL,cs.CV,I.2.7; I.2.10; J.3

下载: http://arxiv.org/abs/2406.15050v1

From Overfitting to Robustness: Quantity, Quality, and Variety Oriented Negative Sample Selection in Graph Contrastive Learning

Graph contrastive learning (GCL) aims to contrast positive-negative counterparts to learn the node embeddings, whereas graph data augmentation methods are employed to generate these positive-negative samples. The variation, quantity, and quality of negative samples compared to positive samples play crucial roles in learning meaningful embeddings for node classification downstream tasks. Less variation, excessive quantity, and low-quality negative samples cause the model to be overfitted for particular nodes, resulting in less robust models. To solve the overfitting problem in the GCL paradigm, this study proposes a novel Cumulative Sample Selection (CSS) algorithm by comprehensively considering negative samples' quality, variations, and quantity. Initially, three negative sample pools are constructed: easy, medium, and hard negative samples, which contain 25%, 50%, and 25% of the total available negative samples, respectively. Then, 10% negative samples are selected from each of these three negative sample pools for training the model. After that, a decision agent module evaluates model training results and decides whether to explore more negative samples from three negative sample pools by increasing the ratio or keep exploiting the current sampling ratio. The proposed algorithm is integrated into a proposed graph contrastive learning framework named NegAmplify. NegAmplify is compared with the SOTA methods on nine graph node classification datasets, with seven achieving better node classification accuracy with up to 2.86% improvement.

Updated: 2024-06-21 10:47:26

标题: 从过度拟合到稳健性：图对比学习中基于数量、质量和多样性导向的负样本选择

摘要: 图对比学习（GCL）旨在对比正负对应物以学习节点嵌入，而图数据增强方法用于生成这些正负样本。与正样本相比，负样本的变化、数量和质量对于学习节点分类下游任务的有意义嵌入起着至关重要的作用。较小的变化、过多的数量和低质量的负样本导致模型对特定节点过度拟合，从而导致模型不够健壮。为解决GCL范式中的过拟合问题，本研究提出了一种新颖的累积样本选择（CSS）算法，全面考虑了负样本的质量、变化和数量。首先，构建了三个负样本池：易、中、难负样本，分别包含总可用负样本的25％、50％和25％。然后，从这三个负样本池中各选择10％的负样本用于模型训练。之后，决策代理模块评估模型训练结果，并决定是否通过增加比例从三个负样本池中探索更多负样本，或者保持当前采样比例继续开发。该提出的算法集成到一个名为NegAmplify的图对比学习框架中。NegAmplify与SOTA方法在九个图节点分类数据集上进行比较，其中有七个取得了更好的节点分类准确性，最高提高了2.86％。

更新时间: 2024-06-21 10:47:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.15044v1

Discovering Common Information in Multi-view Data

We introduce an innovative and mathematically rigorous definition for computing common information from multi-view data, drawing inspiration from G\'acs-K\"orner common information in information theory. Leveraging this definition, we develop a novel supervised multi-view learning framework to capture both common and unique information. By explicitly minimizing a total correlation term, the extracted common information and the unique information from each view are forced to be independent of each other, which, in turn, theoretically guarantees the effectiveness of our framework. To estimate information-theoretic quantities, our framework employs matrix-based R{\'e}nyi's $\alpha$-order entropy functional, which forgoes the need for variational approximation and distributional estimation in high-dimensional space. Theoretical proof is provided that our framework can faithfully discover both common and unique information from multi-view data. Experiments on synthetic and seven benchmark real-world datasets demonstrate the superior performance of our proposed framework over state-of-the-art approaches.

Updated: 2024-06-21 10:47:06

标题: 发现多视图数据中的共同信息

摘要: 我们引入了一种创新且数学严谨的定义，用于从多视角数据中计算共同信息，受盖克斯-科纳共同信息在信息论中的启发。利用这个定义，我们开发了一个新颖的监督式多视角学习框架，以捕捉共同信息和独特信息。通过明确地最小化总相关项，从每个视角提取的共同信息和独特信息被迫彼此独立，这反过来在理论上保证了我们框架的有效性。为了估算信息论量，我们的框架采用基于矩阵的Rényi的α阶熵函数，这消除了在高维空间中变分逼近和分布估计的需要。提供了理论证明，证明我们的框架可以忠实地从多视角数据中发现共同信息和独特信息。对合成和七个基准真实世界数据集的实验表明，我们提出的框架优于现有技术方法的性能。

更新时间: 2024-06-21 10:47:06

领域: cs.LG

下载: http://arxiv.org/abs/2406.15043v1

Behaviour Distillation

Dataset distillation aims to condense large datasets into a small number of synthetic examples that can be used as drop-in replacements when training new models. It has applications to interpretability, neural architecture search, privacy, and continual learning. Despite strong successes in supervised domains, such methods have not yet been extended to reinforcement learning, where the lack of a fixed dataset renders most distillation methods unusable. Filling the gap, we formalize behaviour distillation, a setting that aims to discover and then condense the information required for training an expert policy into a synthetic dataset of state-action pairs, without access to expert data. We then introduce Hallucinating Datasets with Evolution Strategies (HaDES), a method for behaviour distillation that can discover datasets of just four state-action pairs which, under supervised learning, train agents to competitive performance levels in continuous control tasks. We show that these datasets generalize out of distribution to training policies with a wide range of architectures and hyperparameters. We also demonstrate application to a downstream task, namely training multi-task agents in a zero-shot fashion. Beyond behaviour distillation, HaDES provides significant improvements in neuroevolution for RL over previous approaches and achieves SoTA results on one standard supervised dataset distillation task. Finally, we show that visualizing the synthetic datasets can provide human-interpretable task insights.

Updated: 2024-06-21 10:45:43

标题: 行为提炼

摘要: 数据集蒸馏旨在将大型数据集压缩为少量合成示例，这些示例可以在训练新模型时用作插入替代品。它在可解释性、神经架构搜索、隐私和持续学习方面具有应用。尽管在监督领域取得了强大的成功，但这种方法尚未扩展到强化学习领域，其中缺乏固定数据集使大多数蒸馏方法无法使用。为了填补这一空白，我们正式形成了行为蒸馏，这种设置旨在发现并压缩训练专家策略所需的信息，形成一个合成数据集，其中包含状态-动作对，而无需访问专家数据。然后，我们介绍了具有进化策略的虚拟数据集生成（HaDES），这是一种行为蒸馏方法，可以发现仅包含四个状态-动作对的数据集，这些数据集在监督学习下能够训练出在连续控制任务中达到竞争水平的代理。我们展示了这些数据集可以在分布外泛化到具有各种架构和超参数的训练策略。我们还展示了对一个下游任务的应用，即以零击中方式训练多任务代理。除了行为蒸馏外，HaDES在强化学习的神经演化方面相比以前的方法提供了显著的改进，并在一个标准的监督数据集蒸馏任务上取得了最先进的结果。最后，我们展示了可视化合成数据集可以提供人类可解释的任务见解。

更新时间: 2024-06-21 10:45:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.15042v1

Benchmarking Pathology Feature Extractors for Whole Slide Image Classification

Weakly supervised whole slide image classification is a key task in computational pathology, which involves predicting a slide-level label from a set of image patches constituting the slide. Constructing models to solve this task involves multiple design choices, often made without robust empirical or conclusive theoretical justification. To address this, we conduct a comprehensive benchmarking of feature extractors to answer three critical questions: 1) Is stain normalisation still a necessary preprocessing step? 2) Which feature extractors are best for downstream slide-level classification? 3) How does magnification affect downstream performance? Our study constitutes the most comprehensive evaluation of publicly available pathology feature extractors to date, involving more than 10,000 training runs across 14 feature extractors, 9 tasks, 5 datasets, 3 downstream architectures, 2 levels of magnification, and various preprocessing setups. Our findings challenge existing assumptions: 1) We observe empirically, and by analysing the latent space, that skipping stain normalisation and image augmentations does not degrade performance, while significantly reducing memory and computational demands. 2) We develop a novel evaluation metric to compare relative downstream performance, and show that the choice of feature extractor is the most consequential factor for downstream performance. 3) We find that lower-magnification slides are sufficient for accurate slide-level classification. Contrary to previous patch-level benchmarking studies, our approach emphasises clinical relevance by focusing on slide-level biomarker prediction tasks in a weakly supervised setting with external validation cohorts. Our findings stand to streamline digital pathology workflows by minimising preprocessing needs and informing the selection of feature extractors.

Updated: 2024-06-21 10:43:34

标题: 对全切片图像分类进行病理特征提取器的基准测试

摘要: 弱监督全幻灯片图像分类是计算病理学中的一个关键任务，涉及从构成幻灯片的一组图像块中预测幻灯片级别标签。构建解决这一任务的模型涉及多个设计选择，通常在没有强有力的经验或明确的理论依据的情况下进行。为了解决这个问题，我们进行了一项全面的特征提取器基准测试，以回答三个关键问题：1）染色标准化仍然是必要的预处理步骤吗？2）哪些特征提取器最适合下游幻灯片级分类？3）放大镜如何影响下游性能？我们的研究构成迄今为止公开可用的病理学特征提取器的最全面评估，涉及超过10,000次训练运行，涵盖14个特征提取器，9个任务，5个数据集，3个下游体系结构，2个放大级别以及各种预处理设置。我们的研究结果挑战了现有的假设：1）我们通过经验观察和分析潜在空间发现，跳过染色标准化和图像增强不会降低性能，同时显著减少内存和计算需求。2）我们开发了一个新的评估指标来比较相对下游性能，并表明特征提取器的选择是影响下游性能的最重要因素。3）我们发现低放大倍数的幻灯片足以进行准确的幻灯片级分类。与以前的图像块级别基准测试研究相反，我们的方法强调临床相关性，重点关注在弱监督设置中使用外部验证队列进行幻灯片级生物标志预测任务。我们的研究结果有助于通过最小化预处理需求并指导特征提取器的选择来简化数字病理学工作流程。

更新时间: 2024-06-21 10:43:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.11772v5

Secure Composition of Robust and Optimising Compilers

To ensure that secure applications do not leak their secrets, they are required to uphold several security properties such as spatial and temporal memory safety as well as cryptographic constant time. Existing work shows how to enforce these properties individually, in an architecture-independent way, by using secure compiler passes that each focus on an individual property. Unfortunately, given two secure compiler passes that each preserve a possibly different security property, it is unclear what kind of security property is preserved by the composition of those secure compiler passes. This paper is the first to study what security properties are preserved across the composition of different secure compiler passes. Starting from a general theory of property composition for security-relevant properties (such as the aforementioned ones), this paper formalises a theory of composition of secure compilers. Then, it showcases this theory a secure multi-pass compiler that preserves the aforementioned security-relevant properties. Crucially, this paper derives the security of the multi-pass compiler from the composition of the security properties preserved by its individual passes, which include security-preserving as well as optimisation passes. From an engineering perspective, this is the desirable approach to building secure compilers.

Updated: 2024-06-21 10:41:25

标题: 稳健和优化编译器的安全组合

摘要: 为了确保安全应用程序不泄漏其机密信息，它们需要维护几个安全属性，如空间和时间内存安全以及加密常数时间。现有工作展示了如何以一种与架构无关的方式分别强制执行这些属性，通过使用专注于单个属性的安全编译器传递。不幸的是，考虑到每个保留可能不同安全属性的两个安全编译器传递，不清楚这些安全编译器传递的组合保留了什么样的安全属性。本文是首次研究不同安全编译器传递组合中保留了哪些安全属性。从安全相关属性（如前述属性）的一般性质的理论出发，本文形式化了安全编译器的组合理论。然后，展示了这一理论一个保留前述安全相关属性的安全多通道编译器。关键是，本文从由其各个传递保留的安全属性的组合推导出多通道编译器的安全性，其中包括保护安全和优化传递。从工程角度来看，这是构建安全编译器的理想方法。

更新时间: 2024-06-21 10:41:25

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2307.08681v2

Online detection and infographic explanation of spam reviews with data drift adaptation

Spam reviews are a pervasive problem on online platforms due to its significant impact on reputation. However, research into spam detection in data streams is scarce. Another concern lies in their need for transparency. Consequently, this paper addresses those problems by proposing an online solution for identifying and explaining spam reviews, incorporating data drift adaptation. It integrates (i) incremental profiling, (ii) data drift detection & adaptation, and (iii) identification of spam reviews employing Machine Learning. The explainable mechanism displays a visual and textual prediction explanation in a dashboard. The best results obtained reached up to 87 % spam F-measure.

Updated: 2024-06-21 10:35:46

标题: 在线检测和信息图解垃圾评论与数据漂移适应

摘要: 垃圾评论是在线平台上普遍存在的问题，因其对声誉的重大影响。然而，有关数据流中垃圾评论检测的研究很少。另一个关注点在于它们对透明度的需求。因此，本文提出了一个在线解决方案，用于识别和解释垃圾评论，结合数据漂移适应性。它整合了（i）递增式配置文件、（ii）数据漂移检测和适应性，以及（iii）利用机器学习识别垃圾评论。可解释机制在仪表板中显示了可视化和文本预测解释。最佳结果达到了87%的垃圾评论F-measure。

更新时间: 2024-06-21 10:35:46

领域: cs.LG,cs.AI,cs.CL,cs.SI

下载: http://arxiv.org/abs/2406.15038v1

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild.

Updated: 2024-06-21 10:27:11

标题: 生成AI的滥用：战术分类和来自现实数据的见解

摘要: 生成式、多模态人工智能（GenAI）在各行业具有革命性潜力，但其滥用可能带来重大风险。先前的研究已经揭示了先进人工智能系统被恶意利用的潜力。然而，我们仍然缺乏对GenAI模型在实践中如何被具体利用或滥用的明确理解，包括用于造成伤害的策略。在本文中，我们提出了一种GenAI滥用策略分类法，该分类法受到现有学术文献的启发，并通过对2023年1月至2024年3月之间报告的大约200起滥用事件进行定性分析而获得。通过这一分析，我们揭示了在这一时期滥用行为中的关键和新颖模式，包括潜在动机、策略以及攻击者如何在实际中利用和滥用系统跨模态（如图像、文本、音频、视频）的能力。

更新时间: 2024-06-21 10:27:11

领域: cs.AI

下载: http://arxiv.org/abs/2406.13843v2

GiusBERTo: A Legal Language Model for Personal Data De-identification in Italian Court of Auditors Decisions

Recent advances in Natural Language Processing have demonstrated the effectiveness of pretrained language models like BERT for a variety of downstream tasks. We present GiusBERTo, the first BERT-based model specialized for anonymizing personal data in Italian legal documents. GiusBERTo is trained on a large dataset of Court of Auditors decisions to recognize entities to anonymize, including names, dates, locations, while retaining contextual relevance. We evaluate GiusBERTo on a held-out test set and achieve 97% token-level accuracy. GiusBERTo provides the Italian legal community with an accurate and tailored BERT model for de-identification, balancing privacy and data protection.

Updated: 2024-06-21 10:25:26

标题: GiusBERTo：意大利审计法院决定中个人数据去识别的法律语言模型

摘要: 最近自然语言处理领域取得的进展展示了预训练语言模型如BERT在各种下游任务中的有效性。我们提出了GiusBERTo，这是第一个基于BERT的模型，专门用于在意大利法律文件中对个人数据进行匿名化处理。GiusBERTo在大量审计法院决定的数据集上进行训练，以识别需要匿名化的实体，包括姓名、日期、地点，同时保持上下文相关性。我们在一个留存的测试集上评估了GiusBERTo，并实现了97%的标记级准确率。GiusBERTo为意大利法律社区提供了一款准确且定制的BERT模型，用于去识别化，平衡隐私和数据保护。

更新时间: 2024-06-21 10:25:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.15032v1

Using Neural Networks for Data Cleaning in Weather Datasets

In climate science, we often want to compare across different datasets. Difficulties can arise in doing this due to inevitable mismatches that arise between observational and reanalysis data, or even between different reanalyses. This misalignment can raise problems for any work that seeks to make inferences about one dataset from another. We considered tropical cyclone location as an example task with one dataset providing atmospheric conditions (ERA5) and another providing storm tracks (IBTrACS). We found that while the examples often aligned well, there were a considerable proportion (around 25%) which were not well aligned. We trained a neural network to map from the wind field to the storm location; in this setting misalignment in the datasets appears as "label noise" (i.e. the labelled storm location does not correspond to the underlying wind field). We found that this neural network trained only on the often noisy labels from IBTrACS had a denoising effect, and performed better than the IBTrACS labels themselves, as measured by human preferences. Remarkably, this even held true for training points, on which we might have expected the network to overfit to the IBTrACS predictions.

Updated: 2024-06-21 10:09:42

标题: 使用神经网络清洗气象数据集

摘要: 在气候科学中，我们经常希望比较不同数据集之间的差异。由于观测数据和再分析数据之间或甚至不同再分析数据之间不可避免的不匹配，这可能会导致困难。这种不对齐可能会给试图从一个数据集推断另一个数据集的任何工作带来问题。我们以热带气旋位置作为一个示例任务，一个数据集提供大气条件（ERA5），另一个提供风暴路径（IBTrACS）。我们发现，虽然示例通常很好地对齐，但有相当比例（约25%）并没有很好地对齐。我们训练了一个神经网络，将风场映射到风暴位置；在这种设置中，数据集的不对齐表现为“标签噪音”（即标记的风暴位置与潜在的风场不一致）。我们发现，仅训练于IBTrACS通常嘈杂标签的这个神经网络具有去噪效果，并且表现比IBTrACS标签本身更好，这是由人类偏好来衡量的。值得注意的是，即使对于我们可能期望网络过拟合于IBTrACS预测的训练点，这种情况仍然成立。

更新时间: 2024-06-21 10:09:42

领域: cs.LG

下载: http://arxiv.org/abs/2406.15027v1

RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Deep reinforcement learning (RL) has recently shown significant benefits in solving combinatorial optimization (CO) problems, reducing reliance on domain expertise, and improving computational efficiency. However, the field lacks a unified benchmark for easy development and standardized comparison of algorithms across diverse CO problems. To fill this gap, we introduce RL4CO, a unified and extensive benchmark with in-depth library coverage of 23 state-of-the-art methods and more than 20 CO problems. Built on efficient software libraries and best practices in implementation, RL4CO features modularized implementation and flexible configuration of diverse RL algorithms, neural network architectures, inference techniques, and environments. RL4CO allows researchers to seamlessly navigate existing successes and develop their unique designs, facilitating the entire research process by decoupling science from heavy engineering. We also provide extensive benchmark studies to inspire new insights and future work. RL4CO has attracted numerous researchers in the community and is open-sourced at https://github.com/ai4co/rl4co.

Updated: 2024-06-21 10:05:39

标题: RL4CO：一种广泛的组合优化强化学习基准

摘要: 深度强化学习（RL）最近在解决组合优化（CO）问题方面显示出显著的优势，减少了对领域专业知识的依赖，提高了计算效率。然而，该领域缺乏统一的基准，以便轻松开发和标准化比较各种CO问题上的算法。为了填补这一空白，我们引入了RL4CO，一个统一且广泛的基准，深入涵盖了23种最新方法和20多个CO问题的库。基于高效的软件库和最佳实践，RL4CO具有模块化实现和灵活配置不同RL算法、神经网络架构、推断技术和环境。RL4CO允许研究人员无缝地浏览现有的成功案例，并开发他们独特的设计，通过将科学与繁重的工程分离，促进整个研究过程。我们还提供了大量的基准研究，以启发新的见解和未来的工作。RL4CO已经吸引了许多社区内的研究人员，并在https://github.com/ai4co/rl4co上开源。

更新时间: 2024-06-21 10:05:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.17100v4

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

The infinite horizon setting is widely adopted for problems of reinforcement learning (RL). These invariably result in stationary policies that are optimal. In many situations, finite horizon control problems are of interest and for such problems, the optimal policies are time-varying in general. Another setting that has become popular in recent times is of Constrained Reinforcement Learning, where the agent maximizes its rewards while it also aims to satisfy some given constraint criteria. However, this setting has only been studied in the context of infinite horizon MDPs where stationary policies are optimal. We present an algorithm for constrained RL in the Finite Horizon Setting where the horizon terminates after a fixed (finite) time. We use function approximation in our algorithm which is essential when the state and action spaces are large or continuous and use the policy gradient method to find the optimal policy. The optimal policy that we obtain depends on the stage and so is non-stationary in general. To the best of our knowledge, our paper presents the first policy gradient algorithm for the finite horizon setting with constraints. We show the convergence of our algorithm to a constrained optimal policy. We also compare and analyze the performance of our algorithm through experiments and show that our algorithm performs better than some other well known algorithms.

Updated: 2024-06-21 10:05:36

标题: 一种有限时域约束马尔可夫决策过程的策略梯度方法

摘要: 无限时间跨度设置被广泛应用于强化学习（RL）问题。这些问题总是导致最优的静态策略。在许多情况下，有限时间跨度控制问题是感兴趣的，对于这些问题，一般来说最优策略是变化的。最近流行的另一个设置是约束强化学习，其中代理最大化其奖励的同时也旨在满足一些给定的约束条件。然而，这种设置仅在无限时间跨度的MDP上进行了研究，其中静态策略是最优的。我们提出了一个算法，用于有限时间跨度设置下的约束RL，其中时间跨度在固定（有限）时间后终止。在我们的算法中使用了函数逼近，当状态和动作空间很大或连续时，这是必不可少的，并使用策略梯度方法找到最优策略。我们获得的最优策略取决于阶段，因此一般来说是非静态的。据我们所知，我们的论文提出了第一个有约束的有限时间跨度设置的策略梯度算法。我们展示了我们的算法收敛到一个受约束的最优策略。我们还通过实验比较和分析了我们算法的性能，并展示我们的算法表现比一些其他知名算法更好。

更新时间: 2024-06-21 10:05:36

领域: cs.LG

下载: http://arxiv.org/abs/2210.04527v3

Deobfuscation of Semi-Linear Mixed Boolean-Arithmetic Expressions

Mixed Boolean-Arithmetic (MBA) obfuscation is a common technique used to transform simple expressions into semantically equivalent but more complex combinations of boolean and arithmetic operators. Its widespread usage in DRM systems, malware, and software protectors is well documented. In 2021, Liu et al. proposed a groundbreaking method of simplifying linear MBAs, utilizing a hidden two-way transformation between 1-bit and n-bit variables. In 2022, Reichenwallner et al. proposed a similar but more effective method of simplifying linear MBAs, SiMBA, relying on a similar but more involved theorem. However, because current linear MBA simplifiers operate in 1-bit space, they cannot handle expressions which utilize constants inside of their bitwise operands, e.g. (x&1), (x&1111) + (y&1111). We propose an extension to SiMBA that enables simplification of this broader class of expressions. It surpasses peer tools, achieving efficient simplification of a class of MBAs that current simplifiers struggle with.

Updated: 2024-06-21 10:04:01

标题: 半线性混合布尔-算术表达式的去混淆

摘要: 混合布尔-算术（MBA）混淆是一种常用的技术，用于将简单表达式转换为语义等效但更复杂的布尔和算术运算符的组合。其在DRM系统、恶意软件和软件保护程序中的广泛应用已被充分记录。 2021年，刘等人提出了一种突破性方法，用于简化线性MBA，利用1位和n位变量之间的隐藏双向转换。 2022年，Reichenwallner等人提出了一种类似但更有效的简化线性MBA的方法SiMBA，依赖于一个类似但更复杂的定理。然而，由于当前线性MBA简化器在1位空间中运行，它们无法处理在其位操作数内部使用常量的表达式，例如（x＆1），（x＆1111）+（y＆1111）。我们提出了SiMBA的扩展，使其能够简化这个更广泛类别的表达式。它超越了同行工具，实现了对当前简化器难以处理的一类MBA的高效简化。

更新时间: 2024-06-21 10:04:01

领域: cs.CR

下载: http://arxiv.org/abs/2406.10016v2

SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning

An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach is Graph Symmetric Attention, which refines the traditional self-attention mechanism to preserve graph symmetries, resulting in invariant and equivariant latent representations. We showcase SiT's superior generalization over ViTs on MiniGrid and Procgen RL benchmarks, and its sample efficiency on Atari 100k and CIFAR10.

Updated: 2024-06-21 10:03:14

标题: SiT：对称不变变换器用于强化学习中的泛化

摘要: 在强化学习（RL）中一个悬而未决的挑战是有效地将经过训练的策略部署到新的或略有不同的情况，以及语义相似的环境中。我们介绍了一种称为Symmetry-Invariant Transformer（SiT）的可扩展视觉变换器（ViT），它以自监督的方式利用本地和全局数据模式来提高泛化能力。我们方法的核心是图对称注意力，它改进了传统的自注意机制以保持图的对称性，从而产生不变和等变的潜在表示。我们展示了SiT在MiniGrid和Procgen RL基准测试上优于ViTs的泛化能力，以及在Atari 100k和CIFAR10上的样本效率。

更新时间: 2024-06-21 10:03:14

领域: cs.LG

下载: http://arxiv.org/abs/2406.15025v1

Efficient Perception, Planning, and Control Algorithm for Vision-Based Automated Vehicles

Autonomous vehicles have limited computational resources and thus require efficient control systems. The cost and size of sensors have limited the development of self-driving cars. To overcome these restrictions, this study proposes an efficient framework for the operation of vision-based automatic vehicles; the framework requires only a monocular camera and a few inexpensive radars. The proposed algorithm comprises a multi-task UNet (MTUNet) network for extracting image features and constrained iterative linear quadratic regulator (CILQR) and vision predictive control (VPC) modules for rapid motion planning and control. MTUNet is designed to simultaneously solve lane line segmentation, the ego vehicle's heading angle regression, road type classification, and traffic object detection tasks at approximately 40 FPS for 228 x 228 pixel RGB input images. The CILQR controllers then use the MTUNet outputs and radar data as inputs to produce driving commands for lateral and longitudinal vehicle guidance within only 1 ms. In particular, the VPC algorithm is included to reduce steering command latency to below actuator latency, preventing performance degradation during tight turns. The VPC algorithm uses road curvature data from MTUNet to estimate the appropriate correction for the current steering angle at a look-ahead point to adjust the turning amount. The inclusion of the VPC algorithm in a VPC-CILQR controller leads to higher performance on curvy roads than the use of CILQR alone. Our experiments demonstrate that the proposed autonomous driving system, which does not require high-definition maps, can be applied in current autonomous vehicles.

Updated: 2024-06-21 10:00:44

标题: 基于视觉的自动驾驶车辆的高效感知、规划和控制算法

摘要: 自动驾驶车辆具有有限的计算资源，因此需要高效的控制系统。传感器的成本和尺寸限制了自动驾驶汽车的发展。为了克服这些限制，本研究提出了一个用于视觉自动车辆操作的高效框架；该框架仅需要一个单目摄像头和几个廉价的雷达。所提出的算法包括一个用于提取图像特征的多任务UNet（MTUNet）网络，以及用于快速运动规划和控制的受限迭代线性二次调节器（CILQR）和视觉预测控制（VPC）模块。MTUNet旨在同时解决车道线分割、自我车辆的航向角回归、道路类型分类和交通物体检测任务，对于228 x 228像素RGB输入图像，大约以40 FPS的速度运行。然后，CILQR控制器使用MTUNet的输出和雷达数据作为输入，仅在1毫秒内产生用于横向和纵向车辆引导的驾驶指令。特别是，VPC算法用于将转向指令延迟降低到低于执行器延迟，以防止在紧急转弯时性能下降。VPC算法使用来自MTUNet的道路曲率数据来估计当前转向角在前瞻点的适当校正，以调整转弯量。将VPC算法包含在VPC-CILQR控制器中比仅使用CILQR在弯曲道路上具有更高的性能。我们的实验表明，所提出的无需高清地图的自动驾驶系统可以应用于当前的自动驾驶车辆。

更新时间: 2024-06-21 10:00:44

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2209.07042v6

Random Pareto front surfaces

The goal of multi-objective optimisation is to identify the Pareto front surface which is the set obtained by connecting the best trade-off points. Typically this surface is computed by evaluating the objectives at different points and then interpolating between the subset of the best evaluated trade-off points. In this work, we propose to parameterise the Pareto front surface using polar coordinates. More precisely, we show that any Pareto front surface can be equivalently represented using a scalar-valued length function which returns the projected length along any positive radial direction. We then use this representation in order to rigorously develop the theory and applications of stochastic Pareto front surfaces. In particular, we derive many Pareto front surface statistics of interest such as the expectation, covariance and quantiles. We then discuss how these can be used in practice within a design of experiments setting, where the goal is to both infer and use the Pareto front surface distribution in order to make effective decisions. Our framework allows for clear uncertainty quantification and we also develop advanced visualisation techniques for this purpose. Finally we discuss the applicability of our ideas within multivariate extreme value theory and illustrate our methodology in a variety of numerical examples, including a case study with a real-world air pollution data set.

Updated: 2024-06-21 09:58:51

标题: 随机帕累托前沿表面

摘要: 多目标优化的目标是确定帕累托前沿表面，这是通过连接最佳权衡点得到的集合。通常，该表面是通过在不同点评估目标，然后在最佳评估权衡点的子集之间插值来计算的。在这项工作中，我们建议使用极坐标来参数化帕累托前沿表面。更确切地说，我们展示了任何帕累托前沿表面都可以等效地使用标量值长度函数来表示，该函数返回沿任何正径向的投影长度。然后，我们使用这种表示来严格发展随机帕累托前沿表面的理论和应用。特别地，我们推导了许多有趣的帕累托前沿表面统计量，如期望值、协方差和分位数。然后我们讨论了如何在实践中在设计实验环境中使用这些统计量，目标是推断和使用帕累托前沿表面分布以做出有效决策。我们的框架允许清晰地量化不确定性，并为此目的开发了先进的可视化技术。最后，我们讨论了我们的想法在多元极值理论中的适用性，并在各种数值示例中说明了我们的方法，包括一个涉及真实空气污染数据集的案例研究。

更新时间: 2024-06-21 09:58:51

领域: stat.ML,cs.LG,math.OC,stat.ME

下载: http://arxiv.org/abs/2405.01404v2

Latent Functional Maps

Neural models learn data representations that lie on low-dimensional manifolds, yet modeling the relation between these representational spaces is an ongoing challenge. By integrating spectral geometry principles into neural modeling, we show that this problem can be better addressed in the functional domain, mitigating complexity, while enhancing interpretability and performances on downstream tasks. To this end, we introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces. We validate our framework on various applications, ranging from stitching to retrieval tasks, demonstrating that latent functional maps can serve as a swiss-army knife for representation alignment.

Updated: 2024-06-21 09:57:50

标题: 潜在功能映射

摘要: 神经模型学习处于低维流形上的数据表示，然而建模这些表示空间之间的关系仍然是一个持续挑战。通过将谱几何原理整合到神经建模中，我们展示了这个问题可以在功能领域更好地解决，减轻复杂性，同时增强可解释性和在下游任务中的性能。为此，我们向表示学习社区引入了一个多功能框架，允许：(i)以可解释的方式比较不同空间并测量它们的内在相似性；(ii)在无监督和弱监督设置中找到它们之间的对应关系；(iii)有效地在不同空间之间传输表示。我们在各种应用程序上验证了我们的框架，从拼接到检索任务，展示了潜在的功能映射可以作为表示对齐的瑞士军刀。

更新时间: 2024-06-21 09:57:50

领域: cs.LG

下载: http://arxiv.org/abs/2406.14183v2

AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding

AI personal assistants deployed via robots or wearables require embodied understanding to collaborate with humans effectively. However, current Vision-Language Models (VLMs) primarily focus on third-person view videos, neglecting the richness of egocentric perceptual experience. To address this gap, we propose three key contributions. First, we introduce the Egocentric Video Understanding Dataset (EVUD) for training VLMs on video captioning and question answering tasks specific to egocentric videos. Second, we present AlanaVLM, a 7B parameter VLM trained using parameter-efficient methods on EVUD. Finally, we evaluate AlanaVLM's capabilities on OpenEQA, a challenging benchmark for embodied video question answering. Our model achieves state-of-the-art performance, outperforming open-source models including strong Socratic models using GPT-4 as a planner by 3.6%. Additionally, we outperform Claude 3 and Gemini Pro Vision 1.0 and showcase competitive results compared to Gemini Pro 1.5 and GPT-4V, even surpassing the latter in spatial reasoning. This research paves the way for building efficient VLMs that can be deployed in robots or wearables, leveraging embodied video understanding to collaborate seamlessly with humans in everyday tasks, contributing to the next generation of Embodied AI.

Updated: 2024-06-21 09:53:41

标题: AlanaVLM：一种用于自我中心视频理解的多模态具身人工智能基础模型

摘要: 通过机器人或可穿戴设备部署的AI个人助手需要具有体验理解以有效地与人类合作。然而，当前的视觉语言模型（VLMs）主要集中在第三人称视角视频上，忽视了自我中心感知经验的丰富性。为了解决这一差距，我们提出了三个关键贡献。首先，我们引入了自我中心视频理解数据集（EVUD），用于在自我中心视频上进行视频字幕和问题回答任务的VLMs训练。其次，我们提出了AlanaVLM，一个使用EVUD上的参数高效方法训练的7B参数VLM。最后，我们在OpenEQA上评估了AlanaVLM的能力，这是一个具有挑战性的用于具体视频问题回答的基准。我们的模型实现了最先进的性能，在超过使用GPT-4作为规划者的强大苏格拉底模型的开源模型中表现优越3.6％。此外，我们在空间推理方面优于Claude 3和Gemini Pro Vision 1.0，并展示了与Gemini Pro 1.5和GPT-4V的竞争结果相比的竞争结果，甚至在空间推理方面超过了后者。这项研究为构建能够部署在机器人或可穿戴设备中的高效VLMs铺平了道路，利用体验视频理解与人类无缝合作在日常任务中，为下一代具体AI做出贡献。

更新时间: 2024-06-21 09:53:41

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.13807v2

Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast Cancer

The objective of this research is to introduce a network specialized in predicting drugs that can be repurposed by investigating real-world evidence sources, such as clinical trials and biomedical literature. Specifically, it aims to generate drug combination therapies for complex diseases (e.g., cancer, Alzheimer's). We present a multilayered network medicine approach, empowered by a highly configured ChatGPT prompt engineering system, which is constructed on the fly to extract drug mentions in clinical trials. Additionally, we introduce a novel algorithm that connects real-world evidence with disease-specific signaling pathways (e.g., KEGG database). This sheds light on the repurposability of drugs if they are found to bind with one or more protein constituents of a signaling pathway. To demonstrate, we instantiated the framework for breast cancer and found that, out of 46 breast cancer signaling pathways, the framework identified 38 pathways that were covered by at least two drugs. This evidence signals the potential for combining those drugs. Specifically, the most covered signaling pathway, ID hsa:2064, was covered by 108 drugs, some of which can be combined. Conversely, the signaling pathway ID hsa:1499 was covered by only two drugs, indicating a significant gap for further research. Our network medicine framework, empowered by GenAI, shows promise in identifying drug combinations with a high degree of specificity, knowing the exact signaling pathways and proteins that serve as targets. It is noteworthy that ChatGPT successfully accelerated the process of identifying drug mentions in clinical trials, though further investigations are required to determine the relationships among the drug mentions.

Updated: 2024-06-21 09:52:55

标题: 通过网络医学和基因人工智能加速复杂疾病治疗：乳腺癌药物再利用的案例研究

摘要: 本研究的目标是通过调查临床试验和生物医学文献等真实世界证据来源，引入一个专门用于预测可重新用途的药物的网络。具体来说，旨在为复杂疾病（如癌症、阿尔茨海默病）生成药物组合疗法。我们提出了一个多层次的网络医学方法，依靠一个高度配置的ChatGPT提示工程系统，该系统可以根据需要构建，以提取临床试验中的药物提及。此外，我们介绍了一种将真实世界证据与疾病特定信号通路（如KEGG数据库）连接起来的新算法。如果发现某种药物与信号通路中的一个或多个蛋白质结合，这将揭示药物的可重新用途性。为了证明，我们为乳腺癌实例化了该框架，并发现，在46个乳腺癌信号通路中，该框架识别出至少有两种药物涉及的38个信号通路。这一证据表明了结合这些药物的潜力。具体而言，覆盖最广泛的信号通路ID hsa:2064被108种药物覆盖，其中一些可以结合使用。相反，信号通路ID hsa:1499只被两种药物覆盖，表明进一步研究存在重大差距。我们的网络医学框架，依赖GenAI，显示出在识别具有高度特异性的药物组合方面的潜力，了解确切的信号通路和蛋白质作为靶标。值得注意的是，ChatGPT成功加速了在临床试验中识别药物提及的过程，尽管需要进一步调查以确定药物提及之间的关系。

更新时间: 2024-06-21 09:52:55

领域: cs.AI,cs.CL,cs.IR,I.2; I.2.6

下载: http://arxiv.org/abs/2406.13106v2

Provably Secure Non-interactive Key Exchange Protocol for Group-Oriented Applications in Scenarios with Low-Quality Networks

Non-interactive key exchange (NIKE) enables two or multiple parties (just knowing the public system parameters and each other's public key) to derive a (group) session key without the need for interaction. Recently, NIKE in multi-party settings has been attached importance. However, we note that most existing multi-party NIKE protocols, underlying costly cryptographic techniques (i.e., multilinear maps and indistinguishability obfuscation), lead to high computational costs once employed in practice. Therefore, it is a challenging task to achieve multi-party NIKE protocols by using more practical cryptographic primitives. In this paper, we propose a secure and efficient NIKE protocol for secure communications in dynamic groups, whose construction only bases on bilinear maps. This protocol allows multiple parties to negotiate asymmetric group keys (a public group encryption key and each party's decryption key) without any interaction among one another. Additionally, the protocol supports updating of group keys in an efficient and non-interactive way once any party outside a group or any group member joins or leaves the group. Further, any party called a sender (even outside a group) intending to connect with some or all of group members called receivers in a group, just needs to generate a ciphertext with constant size under the public group encryption key, and only the group member who is the real receiver can decrypt the ciphertext to obtain the session key. We prove our protocol captures the correctness and indistinguishability of session key under k-Bilinear Diffie-Hellman exponent (k-BDHE) assumption. Efficiency evaluation shows the efficiency of our protocol.

Updated: 2024-06-21 09:49:29

标题: 可证明安全的非交互式密钥交换协议，用于低质量网络场景中群组导向应用程序

摘要: 非交互式密钥交换（NIKE）使两个或多个各方（仅知道公共系统参数和彼此的公钥）能够在不需要交互的情况下推导出（群）会话密钥。最近，多方设置中的NIKE变得越来越重要。然而，我们注意到大多数现有的多方NIKE协议基于昂贵的加密技术（即多线性映射和不可区分性混淆），一旦在实践中使用就会导致高昂的计算成本。因此，通过使用更实用的加密原语实现多方NIKE协议是一项具有挑战性的任务。在本文中，我们提出了一种安全高效的NIKE协议，用于动态群组中的安全通信，其构建仅基于双线性映射。该协议允许多个各方在彼此之间没有任何交互的情况下协商对称群密钥（一个公共群加密密钥和每个各方的解密密钥）。此外，该协议支持在任何一个群组之外的各方或任何群成员加入或离开群组时以高效且非交互的方式更新群密钥。此外，任何称为发送方（甚至在群组之外）打算与群组中的一些或所有成员（称为接收方）建立连接的各方，只需根据公共群加密密钥生成一个具有恒定大小的密文，只有真正的接收方群成员才能解密密文以获得会话密钥。我们证明我们的协议捕获了在k-双线性Diffie-Hellman指数（k-BDHE）假设下的会话密钥的正确性和不可区分性。效率评估显示了我们协议的效率。

更新时间: 2024-06-21 09:49:29

领域: cs.CR

下载: http://arxiv.org/abs/2407.00073v1

Evolution of Rewards for Food and Motor Action by Simulating Birth and Death

The reward system is one of the fundamental drivers of animal behaviors and is critical for survival and reproduction. Despite its importance, the problem of how the reward system has evolved is underexplored. In this paper, we try to replicate the evolution of biologically plausible reward functions and investigate how environmental conditions affect evolved rewards' shape. For this purpose, we developed a population-based decentralized evolutionary simulation framework, where agents maintain their energy level to live longer and produce more children. Each agent inherits its reward function from its parent subject to mutation and learns to get rewards via reinforcement learning throughout its lifetime. Our results show that biologically reasonable positive rewards for food acquisition and negative rewards for motor action can evolve from randomly initialized ones. However, we also find that the rewards for motor action diverge into two modes: largely positive and slightly negative. The emergence of positive motor action rewards is surprising because it can make agents too active and inefficient in foraging. In environments with poor and poisonous foods, the evolution of rewards for less important foods tends to be unstable, while rewards for normal foods are still stable. These results demonstrate the usefulness of our simulation environment and energy-dependent birth and death model for further studies of the origin of reward systems.

Updated: 2024-06-21 09:44:56

标题: 通过模拟出生和死亡来研究食物和运动行为的奖励演化

摘要: 奖励系统是动物行为的基本驱动因素之一，对生存和繁殖至关重要。尽管其重要性，奖励系统如何演化的问题仍未得到充分探讨。本文试图复制生物学上合理的奖励函数的演化过程，并探究环境条件如何影响演化奖励的形态。为此，我们开发了一个基于群体的分散式进化仿真框架，代理通过维持能量水平以延长寿命和生育更多子代。每个代理从其父代继承奖励函数，经突变后学习通过强化学习获取奖励。我们的结果显示，对食物获取的生物合理的正面奖励和对运动动作的负面奖励可以从随机初始化的奖励中演化出来。然而，我们发现运动动作的奖励分为两种模式：主要是正面的和略微负面的。正面的运动动作奖励的出现令人惊讶，因为它会使代理过于活跃且在觅食上效率低下。在食物贫乏和有毒的环境中，对次要食物的奖励演化往往不稳定，而对正常食物的奖励仍然稳定。这些结果展示了我们的仿真环境和能量依赖的出生和死亡模型对进一步研究奖励系统起源的有用性。

更新时间: 2024-06-21 09:44:56

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2406.15016v1

GraLMatch: Matching Groups of Entities with Graphs and Language Models

In this paper, we present an end-to-end multi-source Entity Matching problem, which we call entity group matching, where the goal is to assign to the same group, records originating from multiple data sources but representing the same real-world entity. We focus on the effects of transitively matched records, i.e. the records connected by paths in the graph G = (V,E) whose nodes and edges represent the records and whether they are a match or not. We present a real-world instance of this problem, where the challenge is to match records of companies and financial securities originating from different data providers. We also introduce two new multi-source benchmark datasets that present similar matching challenges as real-world records. A distinctive characteristic of these records is that they are regularly updated following real-world events, but updates are not applied uniformly across data sources. This phenomenon makes the matching of certain groups of records only possible through the use of transitive information. In our experiments, we illustrate how considering transitively matched records is challenging since a limited amount of false positive pairwise match predictions can throw off the group assignment of large quantities of records. Thus, we propose GraLMatch, a method that can partially detect and remove false positive pairwise predictions through graph-based properties. Finally, we showcase how fine-tuning a Transformer-based model (DistilBERT) on a reduced number of labeled samples yields a better final entity group matching than training on more samples and/or incorporating fine-tuning optimizations, illustrating how precision becomes the deciding factor in the entity group matching of large volumes of records.

Updated: 2024-06-21 09:44:16

标题: GraLMatch：使用图和语言模型匹配实体组

摘要: 在这篇论文中，我们提出了一个端到端的多源实体匹配问题，我们称之为实体组匹配，其目标是将来自多个数据源但代表同一实体的记录分配到同一组中。我们关注传递匹配记录的影响，即在图G =（V，E）中连接的记录，其中节点和边代表记录以及它们是否匹配。我们提出了这个问题的一个真实实例，挑战是匹配来自不同数据提供商的公司和金融证券的记录。我们还引入了两个新的多源基准数据集，这些数据集提供了类似的匹配挑战，如真实世界记录。这些记录的一个显著特点是它们会根据真实事件定期更新，但更新不会在所有数据源上均匀应用。这种现象使得仅通过使用传递信息才可能匹配某些记录组。在我们的实验中，我们展示了考虑传递匹配记录是具有挑战性的，因为有限数量的错误正配对预测可能会影响大量记录的分组分配。因此，我们提出了GraLMatch方法，该方法可以通过基于图的属性部分检测和移除错误的正配对预测。最后，我们展示了如何在少量标记样本上微调基于Transformer的模型（DistilBERT）能够比在更多样本上训练和/或结合微调优化产生更好的最终实体组匹配结果，说明在大量记录的实体组匹配中，精度成为决定因素。

更新时间: 2024-06-21 09:44:16

领域: cs.DB,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.15015v1

Jellyfish: A Large Language Model for Data Preprocessing

This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the data mining pipeline that transforms raw data into a clean format conducive to easy processing. Whereas the use of LLMs has sparked interest in devising universal solutions to DP, recent initiatives in this domain typically rely on GPT APIs, raising inevitable data breach concerns. Unlike these approaches, we consider instruction-tuning local LLMs (7 -- 13B models) as universal DP task solvers that operate on a local, single, and low-priced GPU, ensuring data security and enabling further customization. We select a collection of datasets across four representative DP tasks and construct instruction tuning data using data configuration, knowledge injection, and reasoning data distillation techniques tailored to DP. By tuning Mistral-7B, Llama 3-8B, and OpenOrca-Platypus2-13B, our models, namely, Jellyfish-7B/8B/13B, deliver competitiveness compared to GPT-3.5/4 models and strong generalizability to unseen tasks while barely compromising the base models' abilities in NLP tasks. Meanwhile, Jellyfish offers enhanced reasoning capabilities compared to GPT-3.5. Our models are available at: https://huggingface.co/NECOUDBFM/Jellyfish . Our instruction dataset is available at: https://huggingface.co/datasets/NECOUDBFM/Jellyfish-Instruct .

Updated: 2024-06-21 09:39:31

标题: 水母：用于数据预处理的大型语言模型

摘要: 本文探讨了LLMs在数据预处理（DP）中的利用，这是数据挖掘流程中的关键步骤，将原始数据转换为易于处理的清洁格式。尽管使用LLMs已经引起了人们对DP普遍解决方案的兴趣，但最近在这一领域的倡议通常依赖于GPT API，引发了不可避免的数据泄震想。与这些方法不同，我们考虑使用指令调整的本地LLMs（7-13B模型）作为通用DP任务解决方案，这些模型在本地、单一和低价GPU上运行，确保数据安全并实现进一步的定制。我们选择了代表性的四个DP任务的数据集合，并使用数据配置、知识注入和推理数据提炼技术构建指令调整数据，以适应DP。通过调整Mistral-7B、Llama 3-8B和OpenOrca-Platypus2-13B，我们的模型，即Jellyfish-7B/8B/13B，与GPT-3.5/4模型相比具有竞争力，并对未知任务具有强大的泛化能力，同时几乎不损害基础模型在自然语言处理任务中的能力。与此同时，Jellyfish相比GPT-3.5提供了增强的推理能力。我们的模型可在以下链接找到：https://huggingface.co/NECOUDBFM/Jellyfish。我们的指令数据集可在以下链接找到：https://huggingface.co/datasets/NECOUDBFM/Jellyfish-Instruct。

更新时间: 2024-06-21 09:39:31

领域: cs.AI,cs.CL,cs.DB,cs.LG

下载: http://arxiv.org/abs/2312.01678v5

Fair, Manipulation-Robust, and Transparent Sortition

Sortition, the random selection of political representatives, is increasingly being used around the world to choose participants of deliberative processes like Citizens' Assemblies. Motivated by sortition's practical importance, there has been a recent flurry of research on sortition algorithms, whose task it is to select a panel from among a pool of volunteers. This panel must satisfy quotas enforcing representation of key population subgroups. Past work has contributed an algorithmic approach for fulfilling this task while ensuring that volunteers' chances of selection are maximally equal, as measured by any convex equality objective. The question, then, is: which equality objective is the right one? Past work has mainly studied the objectives Minimax and Leximin, which respectively minimize the maximum and maximize the minimum chance of selection given to any volunteer. Recent work showed that both of these objectives have key weaknesses: Minimax is highly robust to manipulation but is arbitrarily unfair; oppositely, Leximin is highly fair but arbitrarily manipulable. In light of this gap, we propose a new equality objective, Goldilocks, that aims to achieve these ideals simultaneously by ensuring that no volunteer receives too little or too much chance of selection. We theoretically bound the extent to which Goldilocks achieves these ideals, finding that in an important sense, Goldilocks recovers among the best available solutions in a given instance. We then extend our bounds to the case where the output of Goldilocks is transformed to achieve a third goal, Transparency. Our empirical analysis of Goldilocks in real data is even more promising: we find that this objective achieves nearly instance-optimal minimum and maximum selection probabilities simultaneously in most real instances -- an outcome not even guaranteed to be possible for any algorithm.

Updated: 2024-06-21 09:38:03

标题: 公平、抗操纵和透明的随机选取

摘要: 抽签制度，即随机选择政治代表，越来越多地被全球各地用于选择像市民大会这样的协商过程的参与者。受到抽签制度的实际重要性的推动，最近对抽签算法进行了大量研究，其任务是从志愿者池中选择一个小组。该小组必须满足强制代表关键人口子群体的配额。过去的工作为实现这一任务贡献了一种算法方法，同时确保志愿者被选择的机会最大程度地相等，这可以通过任何凸等式目标来衡量。那么问题是：哪个等式目标是正确的？过去的工作主要研究了Minimax和Leximin这两个目标，分别是最小化任何志愿者被选中的最大机会和最大化任何志愿者被选中的最小机会。最近的研究表明，这两个目标都存在关键的弱点：Minimax对操纵高度鲁棒，但是任意不公平；相反，Leximin非常公平，但是任意可操纵。鉴于这一差距，我们提出了一个新的等式目标，Goldilocks，旨在通过确保没有志愿者获得太少或太多的选中机会来同时实现这些理想。我们在理论上限定了Goldilocks实现这些理想的程度，发现从一个重要的意义上讲，Goldilocks在给定实例中恢复了最佳可用解决方案之一。然后，我们将我们的界限扩展到Goldilocks的输出被转换以实现第三个目标——透明度的情况。我们对Goldilocks在真实数据中的实证分析更加令人振奋：我们发现这一目标在大多数真实实例中同时实现了几乎最优的最小和最大选中概率——这甚至不被任何算法保证是可能的。

更新时间: 2024-06-21 09:38:03

领域: cs.AI

下载: http://arxiv.org/abs/2406.15009v1

RouteFinder: Towards Foundation Models for Vehicle Routing Problems

Vehicle Routing Problems (VRPs) are optimization problems with significant real-world implications in logistics, transportation, and supply chain management. Despite the recent progress made in learning to solve individual VRP variants, there is a lack of a unified approach that can effectively tackle a wide range of tasks, which is crucial for real-world impact. This paper introduces RouteFinder, a framework for developing foundation models for VRPs. Our key idea is that a foundation model for VRPs should be able to model variants by treating each variant as a subset of a larger VRP problem, equipped with different attributes. We introduce a parallelized environment that can handle any combination of attributes at the same time in a batched manner, and an efficient sampling procedure to train on a mix of problems at each optimization step that can greatly improve convergence robustness. We also introduce novel Global Feature Embeddings that project instance-wise attributes efficiently onto the latent space and help the model understand different VRP variants. Finally, we introduce Efficient Adapter Layers, a simple yet effective technique to finetune pre-trained RouteFinder models to solve novel variants with previously unseen attributes outside of the original feature space. We validate our approach through extensive experiments on 24 VRP variants, demonstrating competitive results over recent multi-task learning models. We make our code openly available at https://github.com/ai4co/routefinder.

Updated: 2024-06-21 09:34:26

标题: RouteFinder：面向车辆路径问题的基础模型

摘要: 车辆路径问题（VRPs）是具有重要现实意义的优化问题，在物流、运输和供应链管理中有重大影响。尽管最近在学习解决各种VRP变体方面取得了进展，但缺乏一种统一方法能够有效地处理各种任务，这对于实际影响至关重要。本文介绍了RouteFinder，这是一个用于开发VRPs基础模型的框架。我们的关键思想是，VRPs的基础模型应该能够通过将每个变体视为更大VRP问题的子集，并配备不同属性来建模变体。我们引入了一个并行化环境，可以以批处理方式同时处理任意组合的属性，并引入了一种有效的采样过程，以在每个优化步骤上训练一系列问题，从而极大地提高收敛鲁棒性。我们还引入了新颖的全局特征嵌入，将实例属性高效地投影到潜在空间，并帮助模型理解不同的VRP变体。最后，我们介绍了高效适配器层，这是一种简单而有效的技术，用于微调预训练的RouteFinder模型，以解决具有先前未见属性的新变体，超出原始特征空间。我们通过对24个VRP变体进行广泛实验证实了我们的方法，展示了相对于最近的多任务学习模型的竞争结果。我们将我们的代码公开提供在https://github.com/ai4co/routefinder。

更新时间: 2024-06-21 09:34:26

领域: cs.AI

下载: http://arxiv.org/abs/2406.15007v1

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

Large Multimodal Models (LMMs) have shown remarkable progress in medical Visual Question Answering (Med-VQA), achieving high accuracy on existing benchmarks. However, their reliability under robust evaluation is questionable. This study reveals that when subjected to simple probing evaluation, state-of-the-art models perform worse than random guessing on medical diagnosis questions. To address this critical evaluation problem, we introduce the Probing Evaluation for Medical Diagnosis (ProbMed) dataset to rigorously assess LMM performance in medical imaging through probing evaluation and procedural diagnosis. Particularly, probing evaluation features pairing original questions with negation questions with hallucinated attributes, while procedural diagnosis requires reasoning across various diagnostic dimensions for each image, including modality recognition, organ identification, clinical findings, abnormalities, and positional grounding. Our evaluation reveals that top-performing models like GPT-4o, GPT-4V, and Gemini Pro perform worse than random guessing on specialized diagnostic questions, indicating significant limitations in handling fine-grained medical inquiries. Besides, models like LLaVA-Med struggle even with more general questions, and results from CheXagent demonstrate the transferability of expertise across different modalities of the same organ, showing that specialized domain knowledge is still crucial for improving performance. This study underscores the urgent need for more robust evaluation to ensure the reliability of LMMs in critical fields like medical diagnosis, and current LMMs are still far from applicable to those fields.

Updated: 2024-06-21 09:32:19

标题: 比随机更糟糕？医学视觉问答中大型多模态模型的尴尬简单探测评估

摘要: 大型多模态模型（LMMs）在医学视觉问答（Med-VQA）领域取得了显著进展，在现有基准测试中取得了高准确性。然而，它们在稳健评估下的可靠性存在疑问。本研究揭示，当经过简单的探测评估时，最先进的模型在医学诊断问题上的表现比随机猜测更差。为了解决这一关键评估问题，我们引入了用于医学影像的探测评估和程序诊断的Probing Evaluation for Medical Diagnosis（ProbMed）数据集，以严格评估LMM在医学影像中的表现。特别是，探测评估特征将原始问题与具有虚构属性的否定问题配对，而程序诊断需要在每个图像的各种诊断维度上进行推理，包括模态识别、器官识别、临床发现、异常和位置定位。我们的评估显示，像GPT-4o、GPT-4V和Gemini Pro等表现最佳的模型在专业诊断问题上的表现比随机猜测更差，表明它们在处理细粒度医学询问方面存在显著限制。此外，像LLaVA-Med这样的模型甚至在更一般的问题上也遇到困难，CheXagent的结果显示了专业领域知识在不同器官的不同模态之间的专业知识可转移性，表明专业领域知识对提高性能仍然至关重要。这项研究强调了在关键领域如医学诊断中确保LMM可靠性的更加稳健评估的紧迫性，目前的LMM仍远未适用于这些领域。

更新时间: 2024-06-21 09:32:19

领域: cs.AI

下载: http://arxiv.org/abs/2405.20421v2

Dislocation cartography: Representations and unsupervised classification of dislocation networks with unique fingerprints

Detecting structure in data is the first step to arrive at meaningful representations for systems. This is particularly challenging for dislocation networks evolving as a consequence of plastic deformation of crystalline systems. Our study employs Isomap, a manifold learning technique, to unveil the intrinsic structure of high-dimensional density field data of dislocation structures from different compression axis. The resulting maps provide a systematic framework for quantitatively comparing dislocation structures, offering unique fingerprints based on density fields. Our novel, unbiased approach contributes to the quantitative classification of dislocation structures which can be systematically extended.

Updated: 2024-06-21 09:32:09

标题: 位错制图学：利用独特指纹对位错网络进行表示和无监督分类

摘要: 在数据中检测结构是到达系统有意义表示的第一步。对于由于晶体系统的塑性变形而演变的位错网络来说，这是一项特别具有挑战性的任务。我们的研究采用Isomap，一种流形学习技术，来揭示不同压缩轴位错结构的高维密度场数据的内在结构。结果地图为定量比较位错结构提供了系统化框架，基于密度场提供独特的指纹。我们的新颖、无偏的方法有助于对位错结构进行定量分类，并可以系统地扩展。

更新时间: 2024-06-21 09:32:09

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2406.15004v1

Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.

Updated: 2024-06-21 09:26:55

标题: 揭示多模态交互对用户参与的影响：在AI驱动对话中的全面评估

摘要: 大型语言模型（LLMs）显著推动了用户与机器人的互动，实现了更复杂和连贯的对话。然而，目前普遍采用的纯文本模式可能无法充分发挥有效用户参与的潜力。本文探讨了多模态互动对聊天机器人对话中用户参与度的影响，该互动将图像和音频与文本一起结合。我们使用各种聊天机器人和真实用户互动数据进行了全面分析，采用留存率和对话长度等指标来评估用户参与度。我们的研究结果显示，与纯文本对话相比，多模态互动显著提升了用户参与度。值得注意的是，第三种模态的引入显著增加了参与度，超越了仅仅使用两种模态所观察到的益处。这些结果表明，多模态互动优化了认知处理，并促进了更丰富的信息理解。这项研究强调了多模态在聊天机器人设计中的重要性，为创造更具吸引力和沉浸式的人工智能通信体验提供了宝贵的见解，并向更广泛的人工智能社区传达了多模态互动在增强用户参与度方面的益处。

更新时间: 2024-06-21 09:26:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.15000v1

Multi-Objective Quality-Diversity for Crystal Structure Prediction

Crystal structures are indispensable across various domains, from batteries to solar cells, and extensive research has been dedicated to predicting their properties based on their atomic configurations. However, prevailing Crystal Structure Prediction methods focus on identifying the most stable solutions that lie at the global minimum of the energy function. This approach overlooks other potentially interesting materials that lie in neighbouring local minima and have different material properties such as conductivity or resistance to deformation. By contrast, Quality-Diversity algorithms provide a promising avenue for Crystal Structure Prediction as they aim to find a collection of high-performing solutions that have diverse characteristics. However, it may also be valuable to optimise for the stability of crystal structures alongside other objectives such as magnetism or thermoelectric efficiency. Therefore, in this work, we harness the power of Multi-Objective Quality-Diversity algorithms in order to find crystal structures which have diverse features and achieve different trade-offs of objectives. We analyse our approach on 5 crystal systems and demonstrate that it is not only able to re-discover known real-life structures, but also find promising new ones. Moreover, we propose a method for illuminating the objective space to gain an understanding of what trade-offs can be achieved.

Updated: 2024-06-21 09:26:34

标题: 多目标质量-多样性在晶体结构预测中的应用

摘要: 晶体结构在各个领域都是不可或缺的，从电池到太阳能电池，广泛的研究已经致力于根据其原子配置来预测它们的性质。然而，现有的晶体结构预测方法侧重于识别能量函数全局最小值处的最稳定解决方案。这种方法忽略了其他潜在有趣的材料，这些材料位于相邻的局部最小值处，并具有不同的材料特性，如导电性或抗变形性。相比之下，多样性质量-多样性算法为晶体结构预测提供了一个有前途的途径，因为它们旨在找到具有不同特征的高性能解决方案集合。然而，对于晶体结构的稳定性以及其他目标，如磁性或热电效率，进行优化也可能很有价值。因此，在这项工作中，我们利用多目标质量-多样性算法的力量，以找到具有多样特征并实现不同目标权衡的晶体结构。我们在5个晶体系统上分析了我们的方法，并展示它不仅能重新发现已知的现实结构，还能找到有前景的新结构。此外，我们提出了一种方法，通过照明客观空间来获得对可以实现的权衡的理解。

更新时间: 2024-06-21 09:26:34

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.17164v2

The ULS23 Challenge: a Baseline Model and Benchmark Dataset for 3D Universal Lesion Segmentation in Computed Tomography

Size measurements of tumor manifestations on follow-up CT examinations are crucial for evaluating treatment outcomes in cancer patients. Efficient lesion segmentation can speed up these radiological workflows. While numerous benchmarks and challenges address lesion segmentation in specific organs like the liver, kidneys, and lungs, the larger variety of lesion types encountered in clinical practice demands a more universal approach. To address this gap, we introduced the ULS23 benchmark for 3D universal lesion segmentation in chest-abdomen-pelvis CT examinations. The ULS23 training dataset contains 38,693 lesions across this region, including challenging pancreatic, colon and bone lesions. For evaluation purposes, we curated a dataset comprising 775 lesions from 284 patients. Each of these lesions was identified as a target lesion in a clinical context, ensuring diversity and clinical relevance within this dataset. The ULS23 benchmark is publicly accessible via uls23.grand-challenge.org, enabling researchers worldwide to assess the performance of their segmentation methods. Furthermore, we have developed and publicly released our baseline semi-supervised 3D lesion segmentation model. This model achieved an average Dice coefficient of 0.703 $\pm$ 0.240 on the challenge test set. We invite ongoing submissions to advance the development of future ULS models.

Updated: 2024-06-21 09:23:17

标题: 《ULS23挑战：计算机断层扫描中3D通用病变分割的基准模型和基准数据集》

摘要: 肿瘤表现的大小测量在随访CT检查中对评估癌症患者的治疗结果至关重要。高效的病变分割可以加快这些放射学工作流程。虽然许多基准和挑战涉及特定器官如肝脏、肾脏和肺部的病变分割，但在临床实践中遇到的更多种类的病变要求更普遍的方法。为了填补这一差距，我们推出了用于胸腹盆CT检查的3D通用病变分割ULS23基准。ULS23训练数据集包含该区域的38,693个病变，包括具有挑战性的胰腺、结肠和骨骼病变。为了评估目的，我们整理了一个数据集，包括284名患者的775个病变。这些病变中的每一个都被确定为临床上的目标病变，确保该数据集的多样性和临床相关性。ULS23基准可通过uls23.grand-challenge.org公开访问，使全球研究人员能够评估其分割方法的性能。此外，我们已开发并公开发布了我们的基线半监督3D病变分割模型。该模型在挑战测试集上取得了平均Dice系数为0.703±0.240。我们邀请持续提交以推进未来ULS模型的发展。

更新时间: 2024-06-21 09:23:17

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.05231v2

A Biomechatronic Approach to Evaluating the Security of Wearable Devices in the Internet of Medical Things

The Internet of Medical Things (IoMT) has the potential to revolutionize healthcare by reducing human error and improving patient health. For instance, wearable smart infusion pumps can accurately administer medication and integrate with electronic health records. These pumps can alert healthcare professionals or remote servers when an operation fails, preventing distressing incidents. However, as the number of connected medical devices increases, so does the risk of cyber threats. Wearable medication devices based on IoT attached to patients' bodies are particularly vulnerable to significant cyber threats. Since they are connected to the internet, these devices can be exposed to potential harm, which can disrupt or degrade device performance and harm patients. Therefore, it is crucial to establish secure data authentication for internet-connected medical devices to ensure patient safety and well-being. It is also important to note that the wearability option of such devices might downgrade the computational resources, making them more susceptible to security risks. We propose implementing a security approach for a wearable infusion pump to mitigate cyber threats. We evaluated the proposed architecture with 20, 50, and 100 users for 10 minutes and repeated the evaluation 10 times with two infusion settings, each repeated five times. The desired volumes and rates for the two settings were 2 ml and 4 ml/hr and 5 ml and 5 ml/hr, respectively. The maximum error in infusion rate was measured to be 2.5%. We discuss the practical challenges of implementing such a security-enabled device and suggest initial solutions.

Updated: 2024-06-21 09:17:51

标题: 一个生物机电一体化的方法来评估可穿戴设备在医疗物联网中的安全性

摘要: 医疗物联网（IoMT）有可能通过减少人为错误和改善患者健康来彻底改变医疗保健。例如，可穿戴智能输液泵可以准确给药并与电子健康记录集成。这些泵可以在操作失败时向医护人员或远程服务器发出警报，防止令人痛苦的事故发生。然而，随着连接的医疗设备数量增加，网络威胁的风险也在增加。基于物联网的可穿戴药物设备连接到患者身体，特别容易受到重大网络威胁。由于它们连接到互联网，这些设备可能暴露于潜在危害之中，这可能会破坏或降低设备性能并危害患者。因此，建立安全的数据认证对于保证互联网连接的医疗设备的患者安全和健康至关重要。值得注意的是，这些设备的可穿戴选项可能会降低计算资源，使它们更容易受到安全风险的影响。我们提出实施一种安全方法来减轻可穿戴输液泵的网络威胁。我们对拟议的架构进行了评估，设置了20、50和100个用户，每个用户使用10分钟，并重复了10次，每次重复两次输液设置，每次重复五次。两种设置的期望容量和速率分别为2毫升和4毫升/小时以及5毫升和5毫升/小时。输液速率的最大误差被测量为2.5%。我们讨论了实施这种安全设备的实际挑战，并提出了初步解决方案。

更新时间: 2024-06-21 09:17:51

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.14996v1

Probabilistic and Differentiable Wireless Simulation with Geometric Transformers

Modelling the propagation of electromagnetic signals is critical for designing modern communication systems. While there are precise simulators based on ray tracing, they do not lend themselves to solving inverse problems or the integration in an automated design loop. We propose to address these challenges through differentiable neural surrogates that exploit the geometric aspects of the problem. We first introduce the Wireless Geometric Algebra Transformer (Wi-GATr), a generic backbone architecture for simulating wireless propagation in a 3D environment. It uses versatile representations based on geometric algebra and is equivariant with respect to E(3), the symmetry group of the underlying physics. Second, we study two algorithmic approaches to signal prediction and inverse problems based on differentiable predictive modelling and diffusion models. We show how these let us predict received power, localize receivers, and reconstruct the 3D environment from the received signal. Finally, we introduce two large, geometry-focused datasets of wireless signal propagation in indoor scenes. In experiments, we show that our geometry-forward approach achieves higher-fidelity predictions with less data than various baselines.

Updated: 2024-06-21 09:14:11

标题: 用概率和可微分几何变换器进行无线仿真

摘要: 建模电磁信号传播对于设计现代通信系统至关重要。虽然基于射线追踪的精确模拟器存在，但并不适用于解决逆问题或在自动化设计循环中集成。我们提出通过利用问题的几何特征，通过可微分神经替代方案来解决这些挑战。首先，我们介绍了无线几何代数变换器（Wi-GATr），这是一个用于在3D环境中模拟无线传播的通用骨干架构。它使用基于几何代数的多功能表示，并对底层物理的对称群E（3）具有等变性。其次，我们研究了基于可微预测建模和扩散模型的信号预测和逆问题的两种算法方法。我们展示了这些方法如何让我们预测接收功率、定位接收器，并从接收信号中重建3D环境。最后，我们介绍了两个大型、以几何为重点的室内场景中的无线信号传播数据集。在实验中，我们展示了我们的几何正向方法比各种基准模型使用更少数据实现了更高保真度的预测。

更新时间: 2024-06-21 09:14:11

领域: cs.LG,cs.NI,eess.SP,stat.ML

下载: http://arxiv.org/abs/2406.14995v1

Discovering Dynamic Symbolic Policies with Genetic Programming

Artificial intelligence (AI) techniques are increasingly being applied to solve control problems. However, control systems developed in AI are often black-box methods, in that it is not clear how and why they generate their outputs. A lack of transparency can be problematic for control tasks in particular, because it complicates the identification of biases or errors, which in turn negatively influences the user's confidence in the system. To improve the interpretability and transparency in control systems, the black-box structure can be replaced with white-box symbolic policies described by mathematical expressions. Genetic programming offers a gradient-free method to optimise the structure of non-differentiable mathematical expressions. In this paper, we show that genetic programming can be used to discover symbolic control systems. This is achieved by learning a symbolic representation of a function that transforms observations into control signals. We consider both systems that implement static control policies without memory and systems that implement dynamic memory-based control policies. In case of the latter, the discovered function becomes the state equation of a differential equation, which allows for evidence integration. Our results show that symbolic policies are discovered that perform comparably with black-box policies on a variety of control tasks. Furthermore, the additional value of the memory capacity in the dynamic policies is demonstrated on experiments where static policies fall short. Overall, we demonstrate that white-box symbolic policies can be optimised with genetic programming, while offering interpretability and transparency that lacks in black-box models.

Updated: 2024-06-21 09:14:03

标题: 用遗传规划发现动态符号策略

摘要: 人工智能（AI）技术越来越多地被应用于解决控制问题。然而，在AI中开发的控制系统通常是黑盒方法，即它们生成输出的方式和原因并不清楚。缺乏透明度对控制任务尤为棘手，因为它使得识别偏见或错误变得复杂，进而负面影响用户对系统的信心。为了提高控制系统的解释性和透明度，可以用数学表达式描述的白盒符号策略取代黑盒结构。遗传编程提供了一种无梯度方法来优化不可微分的数学表达式的结构。本文展示了遗传编程可以用于发现符号控制系统。这是通过学习将观察转化为控制信号的函数的符号表示实现的。我们考虑实现静态控制策略而没有记忆的系统和实现基于动态记忆的控制策略的系统。在后者的情况下，发现的函数成为微分方程的状态方程，这允许证据整合。我们的结果表明，发现了执行各种控制任务的符号策略，其表现与黑盒策略相当。此外，在静态策略表现不佳的实验中展示了动态策略中记忆容量的额外价值。总的来说，我们证明了白盒符号策略可以通过遗传编程进行优化，同时提供黑盒模型所缺乏的解释性和透明度。

更新时间: 2024-06-21 09:14:03

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2406.02765v2

MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards

Sparse rewards pose a significant challenge to achieving high sample efficiency in goal-conditioned reinforcement learning (RL). Specifically, in sequential manipulation tasks, the agent receives failure rewards until it successfully completes the entire manipulation task, which leads to low sample efficiency. To tackle this issue and improve sample efficiency, we propose a novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER). MRHER breaks down a continuous task into subtasks with increasing complexity and utilizes the previous subtask to guide the learning of the subsequent one. Instead of using Hindsight Experience Replay (HER) in every subtask, we design a new robust model-based relabeling method called Foresight relabeling (FR). FR predicts the future trajectory of the hindsight state and relabels the expected goal as a goal achieved on the virtual future trajectory. By incorporating FR, MRHER effectively captures more information from historical experiences, leading to improved sample efficiency, particularly in object-manipulation environments. Experimental results demonstrate that MRHER exhibits state-of-the-art sample efficiency in benchmark tasks, outperforming RHER by 13.79% and 14.29% in the FetchPush-v1 environment and FetchPickandPlace-v1 environment, respectively.

Updated: 2024-06-21 09:11:05

标题: MRHER：基于模型的继电器事后经验重播，用于稀疏奖励的顺序物体操作任务

摘要: 稀疏奖励对于在目标条件强化学习（RL）中实现高样本效率构成了重大挑战。具体来说，在连续操作任务中，代理接收失败奖励直到成功完成整个操作任务，这导致了低样本效率。为了解决这个问题并提高样本效率，我们提出了一种新颖的基于模型的RL框架，称为基于模型的中继顺见经验重演（MRHER）。MRHER将连续任务分解为逐渐增加复杂度的子任务，并利用先前的子任务来指导后续子任务的学习。我们设计了一种名为前瞻重标记（FR）的新型强大基于模型的重标记方法，而不是在每个子任务中使用顺见经验重演（HER）。FR预测顺见状态的未来轨迹，并将预期目标重新标记为在虚拟未来轨迹上实现的目标。通过整合FR，MRHER有效地从历史经验中捕获更多信息，从而提高了样本效率，特别是在物体操作环境中。实验结果表明，MRHER在基准任务中表现出最先进的样本效率，在FetchPush-v1环境和FetchPickandPlace-v1环境中分别比RHER高出13.79%和14.29%。

更新时间: 2024-06-21 09:11:05

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2306.16061v2

Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts

Automating data generation with Large Language Models (LLMs) has become increasingly popular. In this work, we investigate the feasibility and effectiveness of LLM-based data generation in the challenging setting of source-grounded information-seeking dialogs, with response attribution, over long documents. Our source texts consist of long and noisy meeting transcripts, adding to the task complexity. Since automating attribution remains difficult, we propose a semi-automatic approach: dialog queries and responses are generated with LLMs, followed by human verification and identification of attribution spans. Using this approach, we created MISeD -- Meeting Information Seeking Dialogs dataset -- a dataset of information-seeking dialogs focused on meeting transcripts. Models finetuned with MISeD demonstrate superior performance compared to off-the-shelf models, even those of larger size. Finetuning on MISeD gives comparable response generation quality to finetuning on fully manual data, while improving attribution quality and reducing time and effort.

Updated: 2024-06-21 09:10:28

标题: 为基于源的信息寻求对话生成高效数据：会议记录的一个使用案例

摘要: 使用大型语言模型（LLMs）自动生成数据已经变得越来越受欢迎。在这项工作中，我们研究了在具有回应归因的挑战性环境中，使用LLM进行数据生成的可行性和有效性，这个环境是源文本导向的信息寻求对话，在长文档中进行。我们的源文本包括长且嘈杂的会议记录，增加了任务的复杂性。由于自动化归因仍然困难，我们提出了一种半自动化方法：对话查询和回复由LLMs生成，然后进行人工验证和识别归因范围。使用这种方法，我们创建了MISeD——会议信息寻求对话数据集——一个专注于会议记录的信息寻求对话数据集。使用MISeD进行微调的模型表现出比现成模型更优越的性能，甚至比更大尺寸的模型也如此。在MISeD上进行微调可以得到与在完全手动数据上微调相媲美的回复生成质量，同时提高归因质量并减少时间和努力。

更新时间: 2024-06-21 09:10:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.01121v2

Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System

Automating dexterous, contact-rich manipulation tasks using rigid robots is a significant challenge in robotics. Rigid robots, defined by their actuation through position commands, face issues of excessive contact forces due to their inability to adapt to contact with the environment, potentially causing damage. While compliance control schemes have been introduced to mitigate these issues by controlling forces via external sensors, they are hampered by the need for fine-tuning task-specific controller parameters. Learning from Demonstrations (LfD) offers an intuitive alternative, allowing robots to learn manipulations through observed actions. In this work, we introduce a novel system to enhance the teaching of dexterous, contact-rich manipulations to rigid robots. Our system is twofold: firstly, it incorporates a teleoperation interface utilizing Virtual Reality (VR) controllers, designed to provide an intuitive and cost-effective method for task demonstration with haptic feedback. Secondly, we present Comp-ACT (Compliance Control via Action Chunking with Transformers), a method that leverages the demonstrations to learn variable compliance control from a few demonstrations. Our methods have been validated across various complex contact-rich manipulation tasks using single-arm and bimanual robot setups in simulated and real-world environments, demonstrating the effectiveness of our system in teaching robots dexterous manipulations with enhanced adaptability and safety.

Updated: 2024-06-21 09:03:37

标题: 学习来自少数示范的双手机器人变量顺从控制，具有触觉反馈远程操作系统

摘要: 使用刚性机器人自动化灵巧、接触丰富的操作任务在机器人领域是一个重大挑战。刚性机器人通过位置命令来执行动作，面临因无法适应与环境接触而产生过大接触力的问题，可能导致损坏。虽然引入了柔顺控制方案以通过外部传感器控制力来缓解这些问题，但受到需要微调特定任务控制参数的限制。学习演示（LfD）提供了一种直观的替代方案，允许机器人通过观察动作来学习操作技巧。在这项工作中，我们引入了一个新颖的系统，用于增强对刚性机器人进行灵巧、接触丰富的操作的教学。我们的系统是双重的：首先，它整合了一个利用虚拟现实（VR）控制器的远程操作界面，旨在提供一种直观、成本效益的方法，用于带有触觉反馈的任务演示。其次，我们提出了Comp-ACT（通过变压器进行动作分块的柔顺控制），这种方法利用演示学习从少量演示中学习可变柔顺控制。我们的方法已经在模拟和真实环境中通过单臂和双臂机器人设置验证，展示了我们系统在教导机器人灵巧操作方面的有效性，具有增强的适应性和安全性。

更新时间: 2024-06-21 09:03:37

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14990v1

Introducing the Biomechanics-Function Relationship in Glaucoma: Improved Visual Field Loss Predictions from intraocular pressure-induced Neural Tissue Strains

Objective. (1) To assess whether neural tissue structure and biomechanics could predict functional loss in glaucoma; (2) To evaluate the importance of biomechanics in making such predictions. Design, Setting and Participants. We recruited 238 glaucoma subjects. For one eye of each subject, we imaged the optic nerve head (ONH) using spectral-domain OCT under the following conditions: (1) primary gaze and (2) primary gaze with acute IOP elevation. Main Outcomes: We utilized automatic segmentation of optic nerve head (ONH) tissues and digital volume correlation (DVC) analysis to compute intraocular pressure (IOP)-induced neural tissue strains. A robust geometric deep learning approach, known as Point-Net, was employed to predict the full Humphrey 24-2 pattern standard deviation (PSD) maps from ONH structural and biomechanical information. For each point in each PSD map, we predicted whether it exhibited no defect or a PSD value of less than 5%. Predictive performance was evaluated using 5-fold cross-validation and the F1-score. We compared the model's performance with and without the inclusion of IOP-induced strains to assess the impact of biomechanics on prediction accuracy. Results: Integrating biomechanical (IOP-induced neural tissue strains) and structural (tissue morphology and neural tissues thickness) information yielded a significantly better predictive model (F1-score: 0.76+-0.02) across validation subjects, as opposed to relying only on structural information, which resulted in a significantly lower F1-score of 0.71+-0.02 (p < 0.05). Conclusion: Our study has shown that the integration of biomechanical data can significantly improve the accuracy of visual field loss predictions. This highlights the importance of the biomechanics-function relationship in glaucoma, and suggests that biomechanics may serve as a crucial indicator for the development and progression of glaucoma.

Updated: 2024-06-21 09:00:46

标题: 引入青光眼的生物力学-功能关系：通过眼压引起的神经组织应变改善视野损失预测

摘要: 目的。（1）评估神经组织结构和生物力学是否能够预测青光眼中的功能丧失；（2）评估生物力学在进行此类预测中的重要性。设计、设置和参与者。我们招募了238名青光眼患者。对每位受试者的一只眼睛，我们使用光谱域OCT成像视神经头（ONH），在以下条件下进行：（1）原始注视和（2）原始注视下急性眼压升高。主要结果。我们利用ONH组织的自动分割和数字体积相关（DVC）分析来计算眼内压（IOP）诱导的神经组织应变。我们采用一种称为Point-Net的强大几何深度学习方法，来从ONH结构和生物力学信息中预测完整的Humphrey 24-2图案标准偏差（PSD）图。对于每个PSD图中的每个点，我们预测它是否没有缺陷或PSD值低于5％。使用5倍交叉验证和F1分数评估预测性能。我们比较了模型在包含IOP诱导应变和不包含IOP诱导应变的情况下的性能，以评估生物力学对预测准确性的影响。结果。整合生物力学（IOP诱导的神经组织应变）和结构（组织形态和神经组织厚度）信息，产生了一个在验证受试者中明显更好的预测模型（F1分数：0.76+-0.02），相对于仅依赖结构信息，其结果为显著较低的F1分数为0.71+-0.02（p <0.05）。结论。我们的研究表明，整合生物力学数据可以显著提高视野丧失预测的准确性。这突显了在青光眼中生物力学-功能关系的重要性，并暗示生物力学可能作为青光眼的发展和进展的关键指标。

更新时间: 2024-06-21 09:00:46

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2406.14988v1

Graph Neural Networks in Histopathology: Emerging Trends and Future Directions

Histopathological analysis of Whole Slide Images (WSIs) has seen a surge in the utilization of deep learning methods, particularly Convolutional Neural Networks (CNNs). However, CNNs often fall short in capturing the intricate spatial dependencies inherent in WSIs. Graph Neural Networks (GNNs) present a promising alternative, adept at directly modeling pairwise interactions and effectively discerning the topological tissue and cellular structures within WSIs. Recognizing the pressing need for deep learning techniques that harness the topological structure of WSIs, the application of GNNs in histopathology has experienced rapid growth. In this comprehensive review, we survey GNNs in histopathology, discuss their applications, and explore emerging trends that pave the way for future advancements in the field. We begin by elucidating the fundamentals of GNNs and their potential applications in histopathology. Leveraging quantitative literature analysis, we identify four emerging trends: Hierarchical GNNs, Adaptive Graph Structure Learning, Multimodal GNNs, and Higher-order GNNs. Through an in-depth exploration of these trends, we offer insights into the evolving landscape of GNNs in histopathological analysis. Based on our findings, we propose future directions to propel the field forward. Our analysis serves to guide researchers and practitioners towards innovative approaches and methodologies, fostering advancements in histopathological analysis through the lens of graph neural networks.

Updated: 2024-06-21 08:57:40

标题: 组织病理学中的图神经网络：新兴趋势和未来方向

摘要: 对全切片图像（WSIs）的组织病理学分析已经看到了深度学习方法的激增，特别是卷积神经网络（CNNs）。然而，CNNs经常在捕捉WSIs固有的错综复杂的空间依赖关系方面表现不佳。图神经网络（GNNs）提供了一个有前途的替代方案，能够直接建模成对交互作用并有效地识别WSIs中的组织和细胞结构的拓扑关系。鉴于深度学习技术需要利用WSIs的拓扑结构，GNNs在组织病理学中的应用经历了快速增长。在这个全面的回顾中，我们调查了组织病理学中的GNNs，讨论了它们的应用，并探讨了为未来领域进步铺平道路的新兴趋势。我们首先阐明了GNNs的基本原理及其在组织病理学中的潜在应用。通过量化文献分析，我们确定了四个新兴趋势：分层GNNs，自适应图结构学习，多模态GNNs和高阶GNNs。通过深入探讨这些趋势，我们提供了有关GNNs在组织病理学分析领域不断发展的见解。根据我们的发现，我们提出了推动该领域前进的未来方向。我们的分析旨在引导研究人员和从业者朝着创新的方法和方法论前进，通过图神经网络促进组织病理学分析的进步。

更新时间: 2024-06-21 08:57:40

领域: cs.CV,cs.AI,cs.LG,q-bio.TO,I.2.10; I.4.10; J.3

下载: http://arxiv.org/abs/2406.12808v3

Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers

Prompting and Multiple Choices Questions (MCQ) have become the preferred approach to assess the capabilities of Large Language Models (LLMs), due to their ease of manipulation and evaluation. Such experimental appraisals have pointed toward the LLMs' apparent ability to perform causal reasoning or to grasp uncertainty. In this paper, we investigate whether these abilities are measurable outside of tailored prompting and MCQ by reformulating these issues as direct text completion - the foundation of LLMs. To achieve this goal, we define scenarios with multiple possible outcomes and we compare the prediction made by the LLM through prompting (their Stated Answer) to the probability distributions they compute over these outcomes during next token prediction (their Revealed Belief). Our findings suggest that the Revealed Belief of LLMs significantly differs from their Stated Answer and hint at multiple biases and misrepresentations that their beliefs may yield in many scenarios and outcomes. As text completion is at the core of LLMs, these results suggest that common evaluation methods may only provide a partial picture and that more research is needed to assess the extent and nature of their capabilities.

Updated: 2024-06-21 08:56:35

标题: 大型语言模型是否表现出认知失调？研究揭示信念和陈述答案之间的差异

摘要: 促使和多选题（MCQ）已成为评估大型语言模型（LLMs）能力的首选方法，因为它们易于操作和评估。这种实验评估指向LLMs明显具有执行因果推理或理解不确定性的能力。在本文中，我们调查这些能力是否可以在定制提示和MCQ之外通过重新构建这些问题为直接文本完成（LLMs的基础）来衡量。为了实现这一目标，我们定义了具有多个可能结果的场景，并比较LLM通过提示（其陈述答案）进行的预测与它们在下一个标记预测期间计算的这些结果的概率分布（其显露的信念）。我们的研究结果表明，LLMs的显露的信念与它们的陈述答案显著不同，并暗示它们的信念在许多场景和结果中可能产生多重偏见和误代表。由于文本完成是LLMs的核心，这些结果表明常见的评估方法可能只提供了部分图片，并且需要更多研究来评估它们的能力的范围和性质。

更新时间: 2024-06-21 08:56:35

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.14986v1

On the Worst Prompt Performance of Large Language Models

The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fails to fully address the diversity of real-world user queries and assumes the existence of task-specific datasets. To address these limitations, we introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries and emphasizes the importance of using the worst prompt performance to gauge the lower bound of model performance. Extensive experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance; for instance, a difference of 45.48% between the worst and best performance for the Llama-2-70B-chat model, with its worst performance dipping as low as 9.38%. We further illustrate the difficulty in identifying the worst prompt from both model-agnostic and model-dependent perspectives, emphasizing the absence of a shortcut to characterize the worst prompt. We also attempt to enhance the worst prompt performance using existing prompt engineering and prompt consistency methods, but find that their impact is limited. These findings underscore the need to create more resilient LLMs that can maintain high performance across diverse prompts. Data and code are available at https://github.com/cbwbuaa/On-the-Worst-Prompt- Performance-of-LLMs.

Updated: 2024-06-21 08:55:37

标题: 大型语言模型最差提示表现的研究

摘要: 大型语言模型（LLMs）的性能对提示的措辞非常敏感，这引发了对它们在现实世界场景中可靠性的重大关注。现有研究通常将提示分为任务级别指令和案例级别输入，并主要关注评估和提高对任务级别指令变化的鲁棒性。然而，这种设置未能充分解决现实世界用户查询的多样性，并假定存在特定任务的数据集。为了解决这些局限性，我们引入了RobustAlpacaEval，这是一个由语义上等效的案例级别查询组成的新基准，并强调使用最差提示性能来衡量模型性能下限的重要性。在RobustAlpacaEval上进行的ChatGPT和来自Llama、Mistral和Gemma系列的六个开源LLMs的大量实验揭示了模型性能的显着变异性；例如，对于Llama-2-70B-chat模型，最差性能与最佳性能之间的差异达到45.48％，其最差性能甚至降至9.38％。我们进一步说明了从模型无关和模型相关的角度识别最差提示的困难，强调了表征最差提示的捷径的缺失。我们还尝试使用现有的提示工程和提示一致性方法提高最差提示的性能，但发现它们的影响有限。这些发现强调了需要创建更具韧性的LLMs，以便在不同提示下保持高性能。数据和代码可在https://github.com/cbwbuaa/On-the-Worst-Prompt-Performance-of-LLMs找到。

更新时间: 2024-06-21 08:55:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10248v2

ExcelFormer: Can a DNN be a Sure Bet for Tabular Prediction?

Data organized in tabular format is ubiquitous in real-world applications, and users often craft tables with biased feature definitions and flexibly set prediction targets of their interests. Thus, a rapid development of a robust, effective, dataset-versatile, user-friendly tabular prediction approach is highly desired. While Gradient Boosting Decision Trees (GBDTs) and existing deep neural networks (DNNs) have been extensively utilized by professional users, they present several challenges for casual users, particularly: (i) the dilemma of model selection due to their different dataset preferences, and (ii) the need for heavy hyperparameter searching, failing which their performances are deemed inadequate. In this paper, we delve into this question: Can we develop a deep learning model that serves as a "sure bet" solution for a wide range of tabular prediction tasks, while also being user-friendly for casual users? We delve into three key drawbacks of deep tabular models, encompassing: (P1) lack of rotational variance property, (P2) large data demand, and (P3) over-smooth solution. We propose ExcelFormer, addressing these challenges through a semi-permeable attention module that effectively constrains the influence of less informative features to break the DNNs' rotational invariance property (for P1), data augmentation approaches tailored for tabular data (for P2), and attentive feedforward network to boost the model fitting capability (for P3). These designs collectively make ExcelFormer a "sure bet" solution for diverse tabular datasets. Extensive and stratified experiments conducted on real-world datasets demonstrate that our model outperforms previous approaches across diverse tabular data prediction tasks, and this framework can be friendly to casual users, offering ease of use without the heavy hyperparameter tuning.

Updated: 2024-06-21 08:52:06

标题: ExcelFormer：DNN是否可以成为表格预测的稳赢之选？

摘要: 以表格形式组织的数据在现实世界的应用中随处可见，用户经常根据偏见的特征定义制作表格，并灵活设置他们感兴趣的预测目标。因此，迫切需要快速开发一个稳健、有效、适用于各种数据集、用户友好的表格预测方法。虽然梯度提升决策树（GBDTs）和现有的深度神经网络（DNNs）被专业用户广泛使用，但它们对于普通用户存在一些挑战，特别是：（i）由于它们对数据集的不同偏好而导致的模型选择困境，和（ii）需要进行繁重的超参数搜索，否则它们的表现被认为不足够。在本文中，我们深入探讨这个问题：我们能否开发一个深度学习模型，它可以作为一种“稳赢”的解决方案，适用于各种表格预测任务，同时对普通用户友好？我们深入探讨了深度表格模型的三个关键缺点，包括：（P1）缺乏旋转不变性属性，（P2）对大量数据的需求，和（P3）过度平滑的解决方案。我们提出了ExcelFormer，通过一个半透性注意模块有效地限制不太信息丰富的特征对DNNs的旋转不变性属性的影响（针对P1），为表格数据量身定制的数据增强方法（针对P2），以及用于增强模型拟合能力的注意力前馈网络（针对P3）。这些设计共同使ExcelFormer成为各种表格数据集的“稳赢”解决方案。在真实世界数据集上进行的广泛和分层实验表明，我们的模型在各种表格数据预测任务上优于先前的方法，而且这个框架对于普通用户友好，提供了无需繁重超参数调整的易用性。

更新时间: 2024-06-21 08:52:06

领域: cs.LG

下载: http://arxiv.org/abs/2301.02819v6

Provable Privacy with Non-Private Pre-Processing

When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.

Updated: 2024-06-21 08:51:29

标题: 具有非私有预处理的可证明隐私

摘要: 在分析差分隐私（DP）机器学习流程时，经常忽视数据相关的预处理可能带来的潜在隐私成本。在这项工作中，我们提出了一个通用框架，用于评估非私有数据相关预处理算法所产生的额外隐私成本。我们的框架利用两个新的技术概念：一种名为Smooth DP的DP变体和预处理算法的有界敏感性，建立了整体隐私保证的上限。除了通用框架外，我们还为多种数据相关预处理算法（如数据插补、量化、重复消除和PCA）提供了明确的整体隐私保证，当它们与几种DP算法结合使用时。值得注意的是，这个框架也很容易实现，可以直接集成到现有的DP流程中。

更新时间: 2024-06-21 08:51:29

领域: cs.CR,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.13041v4

Hierarchical thematic classification of major conference proceedings

In this paper, we develop a decision support system for the hierarchical text classification. We consider text collections with a fixed hierarchical structure of topics given by experts in the form of a tree. The system sorts the topics by relevance to a given document. The experts choose one of the most relevant topics to finish the classification. We propose a weighted hierarchical similarity function to calculate topic relevance. The function calculates the similarity of a document and a tree branch. The weights in this function determine word importance. We use the entropy of words to estimate the weights. The proposed hierarchical similarity function formulates a joint hierarchical thematic classification probability model of the document topics, parameters, and hyperparameters. The variational Bayesian inference gives a closed-form EM algorithm. The EM algorithm estimates the parameters and calculates the probability of a topic for a given document. Compared to hierarchical multiclass SVM, hierarchical PLSA with adaptive regularization, and hierarchical naive Bayes, the weighted hierarchical similarity function has better improvement in ranking accuracy in an abstract collection of a major conference EURO and a website collection of industrial companies.

Updated: 2024-06-21 08:48:57

标题: 主要会议论文的分层主题分类

摘要: 在这篇论文中，我们为层次文本分类开发了一个决策支持系统。我们考虑了由专家以树形结构给出的固定层次主题的文本集合。该系统根据给定文档与主题的相关性对主题进行排序。专家选择最相关的主题来完成分类。我们提出了一种加权层次相似性函数来计算主题相关性。该函数计算文档与树枝的相似性。函数中的权重确定了单词的重要性。我们使用单词的熵来估计权重。提出的层次相似性函数制定了文档主题、参数和超参数的联合层次主题分类概率模型。变分贝叶斯推断提供了一个封闭形式的EM算法。EM算法估计参数并计算给定文档的主题概率。与层次多类别SVM、带自适应正则化的层次PLSA和层次朴素贝叶斯相比，加权层次相似性函数在一个主要会议EURO的摘要集合和工业公司网站集合中的排名准确性方面有更好的改进。

更新时间: 2024-06-21 08:48:57

领域: cs.LG,cs.IR,stat.ML

下载: http://arxiv.org/abs/2406.14983v1

Active Few-Shot Fine-Tuning

We study the question: How can we select the right data for fine-tuning to a specific task? We call this data selection problem active fine-tuning and show that it is an instance of transductive active learning, a novel generalization of classical active learning. We propose ITL, short for information-based transductive learning, an approach which samples adaptively to maximize information gained about the specified task. We are the first to show, under general regularity assumptions, that such decision rules converge uniformly to the smallest possible uncertainty obtainable from the accessible data. We apply ITL to the few-shot fine-tuning of large neural networks and show that fine-tuning with ITL learns the task with significantly fewer examples than the state-of-the-art.

Updated: 2024-06-21 08:48:18

标题: 主动式少样本微调

摘要: 我们研究了一个问题：我们如何选择合适的数据进行特定任务的微调？我们称这个数据选择问题为主动微调，并展示它是经验主动学习的一个实例，这是对经典主动学习的一种新的概括。我们提出了ITL，即基于信息的经验学习，这是一种自适应取样的方法，旨在最大化关于指定任务的信息增益。我们是第一个在一般的正则性假设下展示，这样的决策规则会收敛到从可访问数据中获得的可能的最小不确定性。我们将ITL应用于大型神经网络的少样本微调，并展示使用ITL进行微调比当前最先进技术所需的例子明显更少地学习任务。

更新时间: 2024-06-21 08:48:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.15441v4

Human-AI collectives produce the most accurate differential diagnoses

Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 medical cases. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience, and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.

Updated: 2024-06-21 08:46:30

标题: 人工智能集体产生最准确的不同诊断

摘要: 人工智能系统，尤其是大型语言模型（LLMs），越来越多地被用于影响个人和整个社会的高风险决策，往往缺乏足够的保障来确保安全、质量和公平。然而，LLMs会产生幻觉，缺乏常识，并且存在偏见 - 这些缺点可能反映了LLMs固有的局限性，因此可能无法通过更复杂的架构、更多的数据或更多的人类反馈来解决。仅仅依赖LLMs进行复杂的高风险决策是有问题的。在这里，我们提出了一种混合集体智能系统，通过利用人类经验和LLMs处理的大量信息的互补优势来减轻这些风险。我们将这种方法应用于开放式医学诊断，将医生制定的40,762个不同诊断与两千一百三十三个病例中五种最先进的LLMs的诊断相结合。我们展示了医生和LLMs的混合集体在各种医学专业和专业经验范围内胜过单个医生和医生集体，以及单个LLMs和LLM合奏。这一结果适用于各种医学专业和专业经验，并且可以归因于人类和LLMs的互补贡献，导致不同类型的错误。我们的方法突显了集体人类和机器智能在医学诊断等复杂、开放领域中提高准确性的潜力。

更新时间: 2024-06-21 08:46:30

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.14981v1

Predictions Based on Pixel Data: Insights from PDEs and Finite Differences

As supported by abundant experimental evidence, neural networks are state-of-the-art for many approximation tasks in high-dimensional spaces. Still, there is a lack of a rigorous theoretical understanding of what they can approximate, at which cost, and at which accuracy. One network architecture of practical use, especially for approximation tasks involving images, is (residual) convolutional networks. However, due to the locality of the linear operators involved in these networks, their analysis is more complicated than that of fully connected neural networks. This paper deals with approximation of time sequences where each observation is a matrix. We show that with relatively small networks, we can represent exactly a class of numerical discretizations of PDEs based on the method of lines. We constructively derive these results by exploiting the connections between discrete convolution and finite difference operators. Our network architecture is inspired by those typically adopted in the approximation of time sequences. We support our theoretical results with numerical experiments simulating the linear advection, heat, and Fisher equations.

Updated: 2024-06-21 08:45:24

标题: 基于像素数据的预测：来自PDE和有限差分方法的见解

摘要: 根据大量实验证据支持，神经网络在高维空间中的许多逼近任务中处于最前沿。然而，对于它们能够以何种代价和准确度逼近的严格理论理解仍然缺乏。一种实用的网络架构，特别适用于涉及图像的逼近任务，是（残差）卷积网络。然而，由于这些网络中涉及的线性算子的局部性，它们的分析比全连接神经网络更复杂。本文涉及每个观测值为矩阵的时间序列的逼近。我们表明，通过使用相对较小的网络，我们可以精确表示基于线方法的PDE的一类数值离散化。我们通过利用离散卷积和有限差分算子之间的联系来建设性地推导这些结果。我们的网络架构受到通常用于逼近时间序列的那些启发。我们通过模拟线性平流、热传导和Fisher方程的数值实验支持我们的理论结果。

更新时间: 2024-06-21 08:45:24

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2305.00723v2

Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data

Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities, most studies overlook the informativeness disparities between modalities. Here, we propose TMM, a trusted multiview multimodal graph attention framework for AD diagnosis, using extensive brain-wide transcriptomics and imaging data. First, we construct view-specific brain regional co-function networks (RRIs) from transcriptomics and multimodal radiomics data to incorporate interaction information from both biomolecular and imaging perspectives. Next, we apply graph attention (GAT) processing to each RRI network to produce graph embeddings and employ cross-modal attention to fuse transcriptomics-derived embedding with each imagingderived embedding. Finally, a novel true-false-harmonized class probability (TFCP) strategy is designed to assess and adaptively adjust the prediction confidence of each modality for AD diagnosis. We evaluate TMM using the AHBA database with brain-wide transcriptomics data and the ADNI database with three imaging modalities (AV45-PET, FDG-PET, and VBM-MRI). The results demonstrate the superiority of our method in identifying AD, EMCI, and LMCI compared to state-of-the-arts. Code and data are available at https://github.com/Yaolab-fantastic/TMM.

Updated: 2024-06-21 08:39:24

标题: 可靠的增强多视角多模式阿尔茨海默病预测与全脑成像转录组数据

摘要: 脑转录组学为我们提供了洞察脑部协调其功能和过程的分子机制。然而，现有的预测阿尔茨海默病（AD）的多模态方法主要依赖于影像学和有时遗传数据，往往忽视了大脑的转录组基础。此外，虽然努力整合各种模态之间的互补信息，但大多数研究忽视了各模态之间信息的差异性。在这里，我们提出了TMM，一种可信的多视图多模态图注意框架，用于AD诊断，利用广泛的全脑转录组和影像数据。首先，我们从转录组和多模态放射组学数据构建特定视图的脑区域共功能网络（RRIs），以融合来自生物分子和影像两个角度的交互信息。接下来，我们对每个RRIs网络应用图注意力（GAT）处理，以生成图嵌入，并采用跨模态注意力将转录组衍生的嵌入与每个影像衍生的嵌入融合。最后，设计了一种新颖的真假和谐类概率（TFCP）策略，用于评估和自适应调整每种模态对于AD诊断的预测置信度。我们使用AHBA数据库的全脑转录组数据和ADNI数据库的三种影像模态（AV45-PET，FDG-PET和VBM-MRI）对TMM进行评估。结果显示，与现有技术相比，我们的方法在识别AD、EMCI和LMCI方面具有优越性。代码和数据可在https://github.com/Yaolab-fantastic/TMM 获取。

更新时间: 2024-06-21 08:39:24

领域: cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.14977v1

On the composable security of weak coin flipping

Weak coin flipping is a cryptographic primitive in which two mutually distrustful parties generate a shared random bit to agree on a winner via remote communication. While a stand-alone secure weak coin flipping protocol can be constructed from noiseless communication channels, its composability has not been explored. In this work, we demonstrate that no weak coin flipping protocol can be abstracted into a black box resource with composable security. Despite this, we also establish the overall stand-alone security of weak coin flipping protocols under sequential composition.

Updated: 2024-06-21 08:37:59

标题: 弱硬币翻转的可组合安全性

摘要: 弱币翻转是一种密码学原语，两个相互不信任的 parties 通过远程通信生成共享的随机比特，以便就赢家达成一致。虽然可以从无噪声通信渠道构建独立安全的弱币翻转协议，但其可组合性尚未被探索。在这项工作中，我们证明没有弱币翻转协议可以被抽象为具有可组合安全性的黑匣资源。尽管如此，我们也建立了弱币翻转协议在顺序组合下的整体独立安全性。

更新时间: 2024-06-21 08:37:59

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2402.15233v2

Contextual Knowledge Graph

Knowledge Graphs (KGs) are foundational structures in many AI applications, representing entities and their interrelations through triples. However, triple-based KGs lack the contextual information of relational knowledge, like temporal dynamics and provenance details, which are crucial for comprehensive knowledge representation and effective reasoning. Instead, \textbf{Contextual Knowledge Graphs} (CKGs) expand upon the conventional structure by incorporating additional information such as time validity, geographic location, and source provenance. This integration provides a more nuanced and accurate understanding of knowledge, enabling KGs to offer richer insights and support more sophisticated reasoning processes. In this work, we first discuss the inherent limitations of triple-based KGs and introduce the concept of contextual KGs, highlighting their advantages in knowledge representation and reasoning. We then present \textbf{KGR$^3$, a context-enriched KG reasoning paradigm} that leverages large language models (LLMs) to retrieve candidate entities and related contexts, rank them based on the retrieved information, and reason whether sufficient information has been obtained to answer a query. Our experimental results demonstrate that KGR$^3$ significantly improves performance on KG completion (KGC) and KG question answering (KGQA) tasks, validating the effectiveness of incorporating contextual information on KG representation and reasoning.

Updated: 2024-06-21 08:33:10

标题: 上下文知识图

摘要: 知识图谱（KGs）是许多人工智能应用中的基本结构，通过三元组表示实体及其相互关系。然而，基于三元组的知识图谱缺乏关系知识的背景信息，如时间动态和来源细节，这些对于全面知识表示和有效推理至关重要。相反，\textbf{上下文知识图谱}（CKGs）通过整合额外信息，如时间有效性、地理位置和来源可信度，扩展了传统结构。这种整合提供了对知识更细致和准确的理解，使知识图谱能够提供更丰富的见解并支持更复杂的推理过程。在这项工作中，我们首先讨论基于三元组的知识图谱的固有局限性，并介绍上下文知识图谱的概念，强调其在知识表示和推理中的优势。然后，我们提出\textbf{KGR$^3$，一个富有上下文的知识图谱推理范式}，利用大型语言模型（LLMs）检索候选实体和相关上下文，根据检索到的信息对它们进行排名，并推理是否已获得足够信息来回答查询。我们的实验结果表明，KGR$^3$在知识图谱完善（KGC）和知识图谱问题回答（KGQA）任务上显著提高了性能，验证了在知识图谱表示和推理中整合上下文信息的有效性。

更新时间: 2024-06-21 08:33:10

领域: cs.AI

下载: http://arxiv.org/abs/2406.11160v2

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of integrating financial regulatory data into a robust language model and examined the effectiveness of our model merging techniques in preserving and improving the model's instructive abilities. The model is accessible at hugging face: https://huggingface.co/arcee-ai/Llama-3-SEC-Base, arcee-ai/Llama-3-SEC-Base. This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training. This is a preprint technical report with thorough evaluations to understand the entire process.

Updated: 2024-06-21 08:29:31

标题: Llama3-70B-Instruct领域自适应：通过持续预训练和模型合并进行综合评估

摘要: 我们对Meta-Llama-3-70B-Instruct模型在SEC数据上的领域适应性进行了大量实验，探讨其在通用和特定领域基准测试中的表现。我们的重点包括持续预训练（CPT）和模型合并，旨在增强模型的特定领域能力，同时减轻灾难性遗忘。通过这项研究，我们评估了将金融监管数据整合到一个强大的语言模型中的影响，并检验了我们的模型合并技术在保留和提高模型教导能力方面的有效性。该模型可在hugging face上访问：https://huggingface.co/arcee-ai/Llama-3-SEC-Base，arcee-ai/Llama-3-SEC-Base。这是我们最终模型的中间检查点，迄今已见过20B个令牌。完整模型仍在训练过程中。这是一份预印技术报告，具有全面的评估，以了解整个过程。

更新时间: 2024-06-21 08:29:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14971v1

Uni-Mol2: Exploring Molecular Pretraining Model at Scale

In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining models remains unexplored. In this work, we present Uni-Mol2 , an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. Along with this, we systematically investigate the scaling law within molecular pretraining models, characterizing the power-law correlations between validation loss and model size, dataset size, and computational resources. Consequently, we successfully scale Uni-Mol2 to 1.1 billion parameters through pretraining on 800 million conformations, making it the largest molecular pretraining model to date. Extensive experiments show consistent improvement in the downstream tasks as the model size grows. The Uni-Mol2 with 1.1B parameters also outperforms existing methods, achieving an average 27% improvement on the QM9 and 14% on COMPAS-1D dataset.

Updated: 2024-06-21 08:28:54

标题: Uni-Mol2：探索大规模分子预训练模型

摘要: 近年来，预训练模型在自然语言处理（NLP）、计算机视觉（CV）和生命科学领域取得了显著进展。在NLP和CV领域的显著进展主要是由模型参数和数据规模的扩展驱动的，这一现象现在被认为是扩展定律。然而，对分子预训练模型中扩展定律的研究尚未被探索。在这项工作中，我们提出了Uni-Mol2，这是一种创新的分子预训练模型，利用双轨变压器有效地整合了原子级、图级和几何结构级别的特征。除此之外，我们系统地研究了分子预训练模型内的扩展定律，描述了验证损失与模型大小、数据集大小和计算资源之间的幂律相关性。因此，我们成功地将Uni-Mol2扩展到了11亿个参数，通过在8亿个构象上进行预训练，使其成为迄今为止最大的分子预训练模型。大量实验证明，随着模型规模的增长，下游任务的性能不断提高。具有11亿个参数的Uni-Mol2还优于现有方法，在QM9数据集上平均提高27%，在COMPAS-1D数据集上提高14%。

更新时间: 2024-06-21 08:28:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.14969v1

Eight challenges in developing theory of intelligence

A good theory of mathematical beauty is more practical than any current observation, as new predictions of physical reality can be verified self-consistently. This belief applies to the current status of understanding deep neural networks including large language models and even the biological intelligence. Toy models provide a metaphor of physical reality, allowing mathematically formulating that reality (i.e., the so-called theory), which can be updated as more conjectures are justified or refuted. One does not need to pack all details into a model, but rather, more abstract models are constructed, as complex systems like brains or deep networks have many sloppy dimensions but much less stiff dimensions that strongly impact macroscopic observables. This kind of bottom-up mechanistic modeling is still promising in the modern era of understanding the natural or artificial intelligence. Here, we shed light on eight challenges in developing theory of intelligence following this theoretical paradigm. Theses challenges are representation learning, generalization, adversarial robustness, continual learning, causal learning, internal model of the brain, next-token prediction, and finally the mechanics of subjective experience.

Updated: 2024-06-21 08:26:30

标题: 发展智力理论中的八大挑战

摘要: 数学美学的一个好理论比任何当前的观察更实用，因为可以通过验证自洽的新的物理现实预测。这种信念适用于理解深度神经网络的当前状态，包括大型语言模型甚至生物智能。玩具模型提供了物理现实的隐喻，允许数学地表达这种现实（即所谓的理论），可以在更多的猜想被证实或被推翻时进行更新。人们不需要将所有细节都包含在模型中，而是构建更抽象的模型，因为像大脑或深度网络这样的复杂系统具有许多松散的维度，但对宏观可观测量产生强烈影响的刚性维度较少。这种自下而上的机械建模在理解自然或人工智能的现代时代仍然很有前景。在这里，我们揭示了按照这种理论范式发展智能理论时面临的八个挑战。这些挑战包括表示学习、泛化、对抗性稳健性、持续学习、因果学习、大脑的内部模型、下一个标记的预测，最后是主观体验的机制。

更新时间: 2024-06-21 08:26:30

领域: q-bio.NC,cond-mat.stat-mech,cs.AI,cs.CL

下载: http://arxiv.org/abs/2306.11232v2

AIGC-Chain: A Blockchain-Enabled Full Lifecycle Recording System for AIGC Product Copyright Management

As artificial intelligence technology becomes increasingly prevalent, Artificial Intelligence Generated Content (AIGC) is being adopted across various sectors. Although AIGC is playing an increasingly significant role in business and culture, questions surrounding its copyright have sparked widespread debate. The current legal framework for copyright and intellectual property is grounded in the concept of human authorship, but in the creation of AIGC, human creators primarily provide conceptual ideas, with AI independently responsible for the expressive elements. This disconnect creates complexity and difficulty in determining copyright ownership under existing laws. Consequently, it is imperative to reassess the intellectual contributions of all parties involved in the creation of AIGC to ensure a fair allocation of copyright ownership. To address this challenge, we introduce AIGC-Chain, a blockchain-enabled full lifecycle recording system designed to manage the copyright of AIGC products. It is engineered to meticulously document the entire lifecycle of AIGC products, providing a transparent and dependable platform for copyright management. Furthermore, we propose a copyright tracing method based on an Indistinguishable Bloom Filter, named IBFT, which enhances the efficiency of blockchain transaction queries and significantly reduces the risk of fraudulent copyright claims for AIGC products. In this way, auditors can analyze the copyright of AIGC products by reviewing all relevant information retrieved from the blockchain.

Updated: 2024-06-21 08:22:39

标题: AIGC-Chain：一种用于AIGC产品版权管理的区块链启用的全生命周期记录系统

摘要: 随着人工智能技术日益普及，人工智能生成内容（AIGC）正在被各个领域采用。尽管AIGC在商业和文化领域发挥着日益重要的作用，围绕其版权的问题引发了广泛的争论。当前的版权和知识产权法律框架基于人类创作的概念，但在AIGC的创作过程中，人类创作者主要提供概念性想法，而人工智能则独立负责表现元素。这种脱节在现有法律下导致了版权归属的复杂性和困难。因此，有必要重新评估所有参与AIGC创作的各方的知识贡献，以确保公平分配版权所有权。为了解决这一挑战，我们引入了AIGC-Chain，这是一个基于区块链的全生命周期记录系统，旨在管理AIGC产品的版权。它被设计为精心记录AIGC产品的整个生命周期，为版权管理提供一个透明可靠的平台。此外，我们提出了一种基于不可区分的布隆过滤器的版权追踪方法，称为IBFT，它提高了区块链交易查询的效率，并显著降低了AIGC产品的虚假版权索赔风险。通过这种方式，审计员可以通过查看从区块链检索的所有相关信息来分析AIGC产品的版权。

更新时间: 2024-06-21 08:22:39

领域: cs.CY,cs.CR

下载: http://arxiv.org/abs/2406.14966v1

Enhancing Actuarial Non-Life Pricing Models via Transformers

Currently, there is a lot of research in the field of neural networks for non-life insurance pricing. The usual goal is to improve the predictive power via neural networks while building upon the generalized linear model, which is the current industry standard. Our paper contributes to this current journey via novel methods to enhance actuarial non-life models with transformer models for tabular data. We build here upon the foundation laid out by the combined actuarial neural network as well as the localGLMnet and enhance those models via the feature tokenizer transformer. The manuscript demonstrates the performance of the proposed methods on a real-world claim frequency dataset and compares them with several benchmark models such as generalized linear models, feed-forward neural networks, combined actuarial neural networks, LocalGLMnet, and pure feature tokenizer transformer. The paper shows that the new methods can achieve better results than the benchmark models while preserving certain generalized linear model advantages. The paper also discusses the practical implications and challenges of applying transformer models in actuarial settings.

Updated: 2024-06-21 08:20:20

标题: 通过Transformer提升精算非寿险定价模型

摘要: 目前，在非寿险定价领域有许多关于神经网络的研究。通常的目标是通过神经网络提高预测能力，同时建立在广义线性模型的基础上，这是当前行业标准。我们的论文通过新颖的方法为表格数据增强精算非寿险模型与变压器模型相结合，为这一研究领域做出了贡献。我们在已有的精算神经网络以及LocalGLMnet的基础上构建了模型，并通过特征标记器变压器来增强这些模型。该论文展示了所提出的方法在真实索赔频率数据集上的性能，并与几种基准模型进行了比较，如广义线性模型、前馈神经网络、精算神经网络、LocalGLMnet和纯特征标记器变压器。论文表明，新方法可以比基准模型取得更好的结果，同时保留了一定的广义线性模型优势。论文还讨论了在精算领域应用变压器模型的实际影响和挑战。

更新时间: 2024-06-21 08:20:20

领域: cs.LG,cs.AI,q-fin.ST,stat.AP,62 68

下载: http://arxiv.org/abs/2311.07597v2

Optimised Grouped-Query Attention Mechanism for Transformers

Grouped-query attention (GQA) has been widely adopted in LLMs to mitigate the complexity of multi-head attention (MHA). To transform an MHA to a GQA, neighbour queries in MHA are evenly split into groups where each group shares the value and key layers. In this work, we propose AsymGQA, an activation-informed approach to asymmetrically grouping an MHA to a GQA for better model performance. Our AsymGQA outperforms the GQA within the same model size budget. For example, AsymGQA LLaMA-2-7B has an accuracy increase of 7.5% on MMLU compared to neighbour grouping. Our approach addresses the GQA's trade-off problem between model performance and hardware efficiency.

Updated: 2024-06-21 08:20:06

标题: 优化的分组查询注意力机制用于Transformers

摘要: 组合查询关注（GQA）已被广泛采用在LLMs中，以减轻多头注意力（MHA）的复杂性。为了将MHA转化为GQA，MHA中的邻近查询被均匀分成组，每个组共享值和键层。在这项工作中，我们提出了AsymGQA，一种激活信息驱动的方法，将MHA不对称地分组为GQA，以获得更好的模型性能。我们的AsymGQA在相同的模型尺寸预算下优于GQA。例如，与邻居分组相比，AsymGQA LLaMA-2-7B在MMLU上的准确度提高了7.5%。我们的方法解决了GQA在模型性能和硬件效率之间的权衡问题。

更新时间: 2024-06-21 08:20:06

领域: cs.LG

下载: http://arxiv.org/abs/2406.14963v1

Unlocking the Global Synergies in Low-Rank Adapters

Low-rank Adaption (LoRA) has been the de-facto parameter-efficient fine-tuning technique for large language models. We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance. In addition to the allocation for the standard LoRA-adapted models, we also demonstrate the efficacy of HeteroLoRA by performing the allocation in a more challenging search space that includes LoRA modules and LoRA-adapted shortcut connections. Experiments show that HeteroLoRA enables improvements in model performance given the same parameter budge. For example, on MRPC, we see an improvement of 1.6% in accuracy with similar training parameter budget. We will open-source our algorithm once the paper is accepted.

Updated: 2024-06-21 08:10:03

标题: 解锁低秩适配器在全球的协同效应

摘要: 低秩适应（LoRA）已成为大型语言模型的实际参数高效微调技术。我们提出了HeteroLoRA，这是一种轻量级搜索算法，利用零成本代理来分配有限的LoRA可训练参数到模型中，以提高微调性能。除了为标准LoRA适应模型分配外，我们还通过在更具挑战性的搜索空间中执行分配，包括LoRA模块和LoRA适应的快捷连接，展示了HeteroLoRA的有效性。实验证明，HeteroLoRA在相同参数预算的情况下能够改善模型性能。例如，在MRPC上，我们看到准确度提高了1.6%，而训练参数预算相似。一旦论文被接受，我们将开源我们的算法。

更新时间: 2024-06-21 08:10:03

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.14956v1

From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent

Digital storytelling, essential in entertainment, education, and marketing, faces challenges in production scalability and flexibility. The StoryAgent framework, introduced in this paper, utilizes Large Language Models and generative tools to automate and refine digital storytelling. Employing a top-down story drafting and bottom-up asset generation approach, StoryAgent tackles key issues such as manual intervention, interactive scene orchestration, and narrative consistency. This framework enables efficient production of interactive and consistent narratives across multiple modalities, democratizing content creation and enhancing engagement. Our results demonstrate the framework's capability to produce coherent digital stories without reference videos, marking a significant advancement in automated digital storytelling.

Updated: 2024-06-21 08:09:17

标题: 从词语到世界：将一行提示转化为具有沉浸式多模数字故事的交流型LLM代理

摘要: 数字叙事在娱乐、教育和营销中至关重要，但在生产规模和灵活性方面面临挑战。本文介绍的StoryAgent框架利用大型语言模型和生成工具来自动化和优化数字叙事。采用自上而下的故事起草和自下而上的资产生成方法，StoryAgent解决了手动干预、互动场景编排和叙事一致性等关键问题。该框架实现了跨多种形式的交互和一致叙事的高效生产，使内容创作民主化并增强参与度。我们的结果表明，该框架能够在没有参考视频的情况下产生连贯的数字故事，标志着自动化数字叙事的重大进步。

更新时间: 2024-06-21 08:09:17

领域: cs.CL,cs.AI,cs.GR

下载: http://arxiv.org/abs/2406.10478v2

Reduction of finite sampling noise in quantum neural networks

Quantum neural networks (QNNs) use parameterized quantum circuits with data-dependent inputs and generate outputs through the evaluation of expectation values. Calculating these expectation values necessitates repeated circuit evaluations, thus introducing fundamental finite-sampling noise even on error-free quantum computers. We reduce this noise by introducing the variance regularization, a technique for reducing the variance of the expectation value during the quantum model training. This technique requires no additional circuit evaluations if the QNN is properly constructed. Our empirical findings demonstrate the reduced variance speeds up the training and lowers the output noise as well as decreases the number of necessary evaluations of gradient circuits. This regularization method is benchmarked on the regression of multiple functions and the potential energy surface of water. We show that in our examples, it lowers the variance by an order of magnitude on average and leads to a significantly reduced noise level of the QNN. We finally demonstrate QNN training on a real quantum device and evaluate the impact of error mitigation. Here, the optimization is feasible only due to the reduced number of necessary shots in the gradient evaluation resulting from the reduced variance.

Updated: 2024-06-21 08:08:06

标题: 量子神经网络中有限采样噪声的减少

摘要: 量子神经网络（QNNs）使用具有数据相关输入的参数化量子电路，并通过评估期望值生成输出。计算这些期望值需要重复电路评估，因此即使在无误差的量子计算机上也会引入基本的有限采样噪声。我们通过引入方差正则化来减少这种噪声，这是一种在量子模型训练期间减少期望值方差的技术。如果QNN被正确构建，这种技术不需要额外的电路评估。我们的经验研究表明，降低方差可以加快训练速度，降低输出噪声并减少梯度电路的必要评估次数。这种正则化方法在多个函数的回归和水的势能表面上进行了基准测试。我们展示，在我们的示例中，平均降低了一个数量级的方差，并显著降低了QNN的噪声水平。最后，我们展示了在真实量子设备上进行的QNN训练，并评估了误差缓解的影响。在这里，由于方差减少导致的渐变评估所需的射击次数减少，优化才变得可行。

更新时间: 2024-06-21 08:08:06

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2306.01639v3

Deep Imbalanced Regression to Estimate Vascular Age from PPG Data: a Novel Digital Biomarker for Cardiovascular Health

Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss to address deep imbalanced regression tasks. We trained a one-dimensional convolutional neural network (Net1D) incorporating the Dist Loss on the extensive UK Biobank dataset (n=502,389) to estimate vascular age from PPG signals and validate its efficacy in characterizing cardiovascular health. The model's performance was validated on a 40% held-out test set, achieving state-of-the-art results, especially in regions with small sample sizes. Furthermore, we divided the population into three subgroups based on the difference between predicted vascular age and chronological age: less than -10 years, between -10 and 10 years, and greater than 10 years. We analyzed the relationship between predicted vascular age and several cardiovascular events over a follow-up period of up to 10 years, including death, coronary heart disease, and heart failure. Our results indicate that the predicted vascular age has significant potential to reflect an individual's cardiovascular health status. Our code will be available at https://github.com/Ngk03/AI-vascular-age.

Updated: 2024-06-21 08:04:12

标题: 深度不平衡回归用于从PPG数据估计血管年龄：心血管健康的新型数字生物标志物

摘要: 光电容抗 (PPG) 正在成为监测人体血液动力学的关键工具，最近的研究突出了它在通过深度学习评估血管老化方面的潜力。然而，现实世界中的年龄分布往往不平衡，给深度学习模型带来了重大挑战。在本文中，我们介绍了一种名为 Dist Loss 的新颖、简单且有效的损失函数，用于处理深度不平衡的回归任务。我们在广泛的英国生物库数据集 (n=502,389) 上训练了一个包含 Dist Loss 的一维卷积神经网络 (Net1D)，以估计来自 PPG 信号的血管年龄，并验证其在表征心血管健康方面的有效性。该模型的性能在一个 40% 的保留测试集上进行了验证，取得了最新颖的结果，尤其是在样本量较小的地区。此外，我们根据预测的血管年龄与实际年龄之间的差异将人群分为三个亚组：小于 -10 岁、-10 至 10 岁之间和大于 10 岁。我们分析了在长达 10 年的随访期内，包括死亡、冠心病和心力衰竭在内的多种心血管事件与预测的血管年龄之间的关系。我们的结果表明，预测的血管年龄具有显著的潜力反映个体的心血管健康状况。我们的代码将在 https://github.com/Ngk03/AI-vascular-age 上提供。

更新时间: 2024-06-21 08:04:12

领域: cs.CV,cs.AI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.14953v1

An Idiosyncrasy of Time-discretization in Reinforcement Learning

Many reinforcement learning algorithms are built on an assumption that an agent interacts with an environment over fixed-duration, discrete time steps. However, physical systems are continuous in time, requiring a choice of time-discretization granularity when digitally controlling them. Furthermore, such systems do not wait for decisions to be made before advancing the environment state, necessitating the study of how the choice of discretization may affect a reinforcement learning algorithm. In this work, we consider the relationship between the definitions of the continuous-time and discrete-time returns. Specifically, we acknowledge an idiosyncrasy with naively applying a discrete-time algorithm to a discretized continuous-time environment, and note how a simple modification can better align the return definitions. This observation is of practical consideration when dealing with environments where time-discretization granularity is a choice, or situations where such granularity is inherently stochastic.

Updated: 2024-06-21 08:03:25

标题: 一种强化学习中时间离散化的特异性

摘要: 许多强化学习算法建立在一个假设之上，即代理与环境在固定持续时间的离散时间步内进行交互。然而，物理系统在时间上是连续的，需要在数字控制它们时选择时间离散化的粒度。此外，这样的系统不会等待决策被做出再推进环境状态，这需要研究离散化选择可能如何影响强化学习算法。在这项工作中，我们考虑了连续时间和离散时间回报的定义之间的关系。具体来说，我们承认一个问题，即将离散时间算法朴素地应用于离散化的连续时间环境，并指出如何通过简单修改可以更好地使回报定义保持一致。当处理时间离散化粒度是一个选择的环境，或者这种粒度本质上是随机的情况下，这一观察是一个实际考虑的问题。

更新时间: 2024-06-21 08:03:25

领域: cs.LG,cs.AI,I.2.6; I.2.9

下载: http://arxiv.org/abs/2406.14951v1

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed system to efficiently generate and process a massive amount of data. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. In this paper, we present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework. Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLlyScalableRL, which allows efficient and massively parallelized training and easy development of customized algorithms. Our evaluation shows that SRL outperforms existing academic libraries, reaching at most 21x higher training throughput in a distributed setting. On learning performance, beyond performing and scaling well on common RL benchmarks with different RL algorithms, SRL can reproduce the same solution in the challenging hide-and-seek environment as reported by OpenAI with up to 5x speedup in wall-clock time. Notably, SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores. SRL source code is available at: https://github.com/openpsi-project/srl .

Updated: 2024-06-21 08:02:57

标题: SRL：将分布式强化学习扩展到超过一万个核心

摘要: 随着强化学习（RL）任务日益复杂，需要一个分布式系统以高效地生成和处理大量数据。然而，现有的开源库存在各种限制，阻碍了它们在需要大规模训练的挑战性场景中的实际应用。本文提出了一种对RL训练数据流的新抽象，将不同的RL训练应用统一到一个通用框架中。根据这种抽象，我们开发了一个可扩展、高效和可扩展的分布式RL系统，称为ReaLlyScalableRL，它允许高效并行化训练和轻松开发定制算法。我们的评估结果显示，在分布式环境中，SRL的训练吞吐量最高可达现有学术库的21倍。在学习性能方面，除了在不同RL算法的常见RL基准测试中表现良好和扩展之外，SRL还可以在挑战性的藏匿和寻找环境中复现OpenAI报告的相同解决方案，并且在挂钟时间上实现了高达5倍的加速。值得注意的是，SRL是学术界第一个在超过15k个CPU核心上进行大规模RL实验的系统。SRL的源代码可在以下链接找到：https://github.com/openpsi-project/srl。

更新时间: 2024-06-21 08:02:57

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.16688v3

CEASEFIRE: An AI-powered system for combatting illicit firearms trafficking

Modern technologies have led illicit firearms trafficking to partially merge with cybercrime, while simultaneously permitting its off-line aspects to become more sophisticated. Law enforcement officers face difficult challenges that require hi-tech solutions. This article presents a real-world system, powered by advanced Artificial Intelligence, for facilitating them in their everyday work.

Updated: 2024-06-21 08:02:25

标题: 停火：一种用于打击非法枪支走私的人工智能系统

摘要: 现代技术已经导致非法枪支走私部分与网络犯罪融合，同时也使其线下方面变得更加复杂。执法人员面临着困难挑战，需要高科技解决方案。本文介绍了一个由先进人工智能驱动的现实世界系统，以帮助他们在日常工作中更加高效。

更新时间: 2024-06-21 08:02:25

领域: cs.AI

下载: http://arxiv.org/abs/2406.14949v1

Towards Retrieval Augmented Generation over Large Video Libraries

Video content creators need efficient tools to repurpose content, a task that often requires complex manual or automated searches. Crafting a new video from large video libraries remains a challenge. In this paper we introduce the task of Video Library Question Answering (VLQA) through an interoperable architecture that applies Retrieval Augmented Generation (RAG) to video libraries. We propose a system that uses large language models (LLMs) to generate search queries, retrieving relevant video moments indexed by speech and visual metadata. An answer generation module then integrates user queries with this metadata to produce responses with specific video timestamps. This approach shows promise in multimedia content retrieval, and AI-assisted video content creation.

Updated: 2024-06-21 07:52:01

标题: 朝向在大型视频库中实现检索增强生成

摘要: 视频内容创作者需要有效的工具来重新利用内容，这项任务通常需要复杂的手动或自动搜索。从大型视频库中制作新视频仍然是一个挑战。在本文中，我们通过一个可互操作的架构介绍了视频库问答（VLQA）的任务，该架构应用了检索增强生成（RAG）技术来处理视频库。我们提出了一个系统，利用大型语言模型（LLMs）生成搜索查询，检索由语音和视觉元数据索引的相关视频片段。然后，一个答案生成模块将用户查询与这些元数据整合，产生具有特定视频时间戳的响应。这种方法在多媒体内容检索和AI辅助视频内容创建方面显示出潜力。

更新时间: 2024-06-21 07:52:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.14938v1

Long-time asymptotics of noisy SVGD outside the population limit

Stein Variational Gradient Descent (SVGD) is a widely used sampling algorithm that has been successfully applied in several areas of Machine Learning. SVGD operates by iteratively moving a set of interacting particles (which represent the samples) to approximate the target distribution. Despite recent studies on the complexity of SVGD and its variants, their long-time asymptotic behavior (i.e., after numerous iterations ) is still not understood in the finite number of particles regime. We study the long-time asymptotic behavior of a noisy variant of SVGD. First, we establish that the limit set of noisy SVGD for large is well-defined. We then characterize this limit set, showing that it approaches the target distribution as increases. In particular, noisy SVGD provably avoids the variance collapse observed for SVGD. Our approach involves demonstrating that the trajectories of noisy SVGD closely resemble those described by a McKean-Vlasov process.

Updated: 2024-06-21 07:45:55

标题: 噪声SVGD在种群极限之外的长时间渐近行为

摘要: Stein变分梯度下降（SVGD）是一种广泛应用的采样算法，在多个机器学习领域取得成功。SVGD通过迭代地移动一组相互作用的粒子（代表样本）来逼近目标分布。尽管最近对SVGD及其变体的复杂性进行了研究，但在有限数量的粒子制度下，它们的长时间渐近行为（即大量迭代后）仍然不为人所理解。本文研究了一种噪声变体的SVGD的长时间渐近行为。首先，我们确立了噪声SVGD的极限集是良定义的。然后我们表征了这个极限集，显示它随着增加逼近目标分布。特别是，噪声SVGD可以证明避免了SVGD中观察到的方差崩溃。我们的方法涉及证明噪声SVGD的轨迹与McKean-Vlasov过程描述的轨迹非常相似。

更新时间: 2024-06-21 07:45:55

领域: cs.LG,math.PR

下载: http://arxiv.org/abs/2406.11929v2

On the growth of the parameters of approximating ReLU neural networks

This work focuses on the analysis of fully connected feed forward ReLU neural networks as they approximate a given, smooth function. In contrast to conventionally studied universal approximation properties under increasing architectures, e.g., in terms of width or depth of the networks, we are concerned with the asymptotic growth of the parameters of approximating networks. Such results are of interest, e.g., for error analysis or consistency results for neural network training. The main result of our work is that, for a ReLU architecture with state of the art approximation error, the realizing parameters grow at most polynomially. The obtained rate with respect to a normalized network size is compared to existing results and is shown to be superior in most cases, in particular for high dimensional input.

Updated: 2024-06-21 07:45:28

标题: 关于逼近ReLU神经网络参数增长的研究

摘要: 本文重点研究全连接前馈ReLU神经网络在逼近给定平滑函数时的分析。与传统研究网络结构增加（如宽度或深度）下的通用逼近性质不同，我们关注逼近网络参数的渐进增长。这样的结果对于神经网络训练的误差分析或一致性结果具有重要意义。我们的主要结果是，对于具有最先进逼近误差的ReLU架构，实现参数的增长最多是多项式的。与现有结果相比，得到的与标准化网络规模相关的速率在大多数情况下都优于现有结果，特别是对于高维输入。

更新时间: 2024-06-21 07:45:28

领域: cs.LG,cs.NA,math.NA,41A25, 41A65

下载: http://arxiv.org/abs/2406.14936v1

Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling

Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as clustering and quality-based methods, to create representative subsets of benchmarks. Our approach ensures statistically aligned model rankings compared to full datasets, evidenced by high Pearson correlation coefficients. Empirical analysis across six NLP benchmarks reveals that: (1) quality-based sampling consistently achieves strong correlations (0.85 to 0.95) with full datasets at a 10\% sampling rate such as Quality SE and Quality CPD (2) clustering methods excel in specific benchmarks such as MMLU (3) no single method universally outperforms others across all metrics. Extending this framework, we leverage the HEIM leaderboard to cover 25 text-to-image models on 17 different benchmarks. SubLIME dynamically selects the optimal technique for each benchmark, significantly reducing evaluation costs while preserving ranking integrity and score distribution. Notably, a minimal sampling rate of 1% proves effective for benchmarks like MMLU. Additionally, we demonstrate that employing difficulty-based sampling to target more challenging benchmark segments enhances model differentiation with broader score distributions. We also combine semantic search, tool use, and GPT-4 review to identify redundancy across benchmarks within specific LLM categories, such as coding benchmarks. This allows us to further reduce the number of samples needed to maintain targeted rank preservation. Overall, SubLIME offers a versatile and cost-effective solution for the robust evaluation of LLMs and text-to-image models.

Updated: 2024-06-21 07:38:55

标题: 通过自适应抽样实现大型语言模型和文本到图像模型的高效数据评估

摘要: 评估LLM和文本到图像模型是一个计算密集型的任务，往往被忽视。高效的评估对于理解这些模型的多样化能力并允许在日益增多的新模型和基准测试之间进行比较至关重要。为了解决这个问题，我们引入了SubLIME，这是一个数据高效的评估框架，采用自适应采样技术，如聚类和基于质量的方法，来创建基准测试的代表性子集。我们的方法确保了与完整数据集相比具有统计对齐的模型排名，证明了高皮尔逊相关系数。通过对六个NLP基准测试的实证分析，我们发现：(1)基于质量的采样在10%的采样率下如Quality SE和Quality CPD一直能够与完整数据集保持强相关性(0.85到0.95)；(2)聚类方法在特定基准测试中表现出色，如MMLU；(3)没有一种单一方法在所有指标上普遍优于其他方法。扩展这个框架，我们利用HEIM排行榜来覆盖17个不同基准测试上的25个文本到图像模型。SubLIME动态选择每个基准测试的最佳技术，显著降低评估成本同时保持排名完整性和得分分布。值得注意的是，对于像MMLU这样的基准测试，1%的最小采样率已经证明是有效的。此外，我们展示了采用基于困难程度的采样来针对更具挑战性的基准测试部分，增强了模型的区分度并拓宽了得分分布。我们还结合语义搜索、工具使用和GPT-4审查，来识别特定LLM类别内基准测试之间的冗余，从而进一步减少需要维持目标排名保存所需的样本数量。总的来说，SubLIME为LLM和文本到图像模型的稳健评估提供了一种多功能且具有成本效益的解决方案。

更新时间: 2024-06-21 07:38:55

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.15527v1

Efficient Graph Similarity Computation with Alignment Regularization

We consider the graph similarity computation (GSC) task based on graph edit distance (GED) estimation. State-of-the-art methods treat GSC as a learning-based prediction task using Graph Neural Networks (GNNs). To capture fine-grained interactions between pair-wise graphs, these methods mostly contain a node-level matching module in the end-to-end learning pipeline, which causes high computational costs in both the training and inference stages. We show that the expensive node-to-node matching module is not necessary for GSC, and high-quality learning can be attained with a simple yet powerful regularization technique, which we call the Alignment Regularization (AReg). In the training stage, the AReg term imposes a node-graph correspondence constraint on the GNN encoder. In the inference stage, the graph-level representations learned by the GNN encoder are directly used to compute the similarity score without using AReg again to speed up inference. We further propose a multi-scale GED discriminator to enhance the expressive ability of the learned representations. Extensive experiments on real-world datasets demonstrate the effectiveness, efficiency and transferability of our approach.

Updated: 2024-06-21 07:37:28

标题: 使用对齐正则化的高效图相似性计算

摘要: 我们考虑基于图编辑距离（GED）估计的图相似性计算（GSC）任务。现有技术方法将GSC视为使用图神经网络（GNNs）的基于学习的预测任务。为了捕捉成对图之间的细粒度交互，这些方法主要包含一个节点级匹配模块在端到端学习管道中，这导致了在训练和推断阶段的高计算成本。我们表明，昂贵的节点对节点匹配模块对于GSC并非必要，可以通过一种简单而强大的正则化技术来获得高质量的学习，我们称之为对齐正则化（AReg）。在训练阶段，AReg项对GNN编码器施加节点-图对应约束。在推断阶段，GNN编码器学习的图级表示直接用于计算相似性得分，而不再使用AReg以加速推断。我们进一步提出了一个多尺度GED鉴别器来增强学习表示的表达能力。在真实数据集上的大量实验表明了我们方法的有效性、效率和可迁移性。

更新时间: 2024-06-21 07:37:28

领域: cs.LG

下载: http://arxiv.org/abs/2406.14929v1

Autonomous Agents for Collaborative Task under Information Asymmetry

Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great progress in solving complex tasks. It performs communication among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' communication is leveraged to enhance human cooperation, a new challenge arises due to information asymmetry, since each agent can only access the information of its human user. Previous MAS struggle to complete tasks under this condition. To address this, we propose a new MAS paradigm termed iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human social network is mirrored in the agent network, where agents proactively exchange human information necessary for task resolution, thereby overcoming information asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to navigate agents' communication towards effective information exchange. Together with InfoNav, iAgents organizes human information in a mixed memory to provide agents with accurate and comprehensive information for exchange. Additionally, we introduce InformativeBench, the first benchmark tailored for evaluating LLM agents' task-solving ability under information asymmetry. Experimental results show that iAgents can collaborate within a social network of 140 individuals and 588 relationships, autonomously communicate over 30 turns, and retrieve information from nearly 70,000 messages to complete tasks within 3 minutes.

Updated: 2024-06-21 07:37:19

标题: 信息不对称下协作任务的自主代理

摘要: 大型语言模型多智能体系统(LLM-MAS)在解决复杂任务方面取得了巨大进展。在共享信息的前提下，它在系统内的智能体之间进行通信，共同解决任务。然而，当智能体的通信被利用来增强人类合作时，由于信息不对称，出现了新的挑战，因为每个智能体只能访问其人类用户的信息。以往的多智能体系统在这种情况下很难完成任务。为了解决这个问题，我们提出了一种新的多智能体系统范式，称为iAgents，即信息多智能体系统。在iAgents中，人类社交网络在智能体网络中得到了反映，智能体主动交换为任务解决所必需的人类信息，从而克服了信息不对称的问题。iAgents采用一种新颖的智能体推理机制InfoNav，将智能体的通信导向有效的信息交换。与InfoNav一起，iAgents利用混合内存组织人类信息，为智能体提供准确和全面的信息以进行交换。此外，我们引入了InformativeBench，这是专门用于评估LLM智能体在信息不对称条件下解决任务能力的第一个基准。实验结果表明，iAgents可以在一个包括140个个体和588个关系的社交网络中合作，自主进行30轮的通信，并从将近70,000条消息中检索信息，在3分钟内完成任务。

更新时间: 2024-06-21 07:37:19

领域: cs.AI,cs.CL,cs.HC,cs.MA,cs.SI

下载: http://arxiv.org/abs/2406.14928v1

TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement

Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT). However, careful evaluations by human reveal that the translations produced by LLMs still contain multiple errors. Importantly, feeding back such error information into the LLMs can lead to self-refinement and result in improved translation performance. Motivated by these insights, we introduce a systematic LLM-based self-refinement translation framework, named \textbf{TEaR}, which stands for \textbf{T}ranslate, \textbf{E}stimate, \textbf{a}nd \textbf{R}efine, marking a significant step forward in this direction. Our findings demonstrate that 1) our self-refinement framework successfully assists LLMs in improving their translation quality across a wide range of languages, whether it's from high-resource languages to low-resource ones or whether it's English-centric or centered around other languages; 2) TEaR exhibits superior systematicity and interpretability; 3) different estimation strategies yield varied impacts, directly affecting the effectiveness of the final corrections. Additionally, traditional neural translation models and evaluation models operate separately, often focusing on singular tasks due to their limited capabilities, while general-purpose LLMs possess the capability to undertake both tasks simultaneously. We further conduct cross-model correction experiments to investigate the potential relationship between the translation and evaluation capabilities of general-purpose LLMs. Our code and data are available at https://github.com/fzp0424/self_correct_mt

Updated: 2024-06-21 07:35:53

标题: TEaR：利用系统自我改进提高基于LLM的机器翻译

摘要: 大型语言模型（LLMs）在机器翻译（MT）中取得了令人印象深刻的成果。然而，人类进行的细致评估揭示出LLMs产生的翻译仍然包含多个错误。重要的是，将这种错误信息反馈到LLMs中可以导致自我改进，并实现翻译性能的提高。受到这些见解的启发，我们引入了一种系统化的基于LLM的自我改进翻译框架，命名为\textbf{TEaR}，代表着在这个方向上的重大进展。我们的研究结果表明：1）我们的自我改进框架成功地帮助LLMs提高其在各种语言中的翻译质量，无论是从高资源语言到低资源语言，还是以英语为中心或以其他语言为中心；2）TEaR表现出卓越的系统性和可解释性；3）不同的估计策略产生不同的影响，直接影响最终修正的有效性。此外，传统的神经翻译模型和评估模型操作分开，通常由于其有限的能力而专注于单一任务，而通用LLMs具有同时执行两个任务的能力。我们进一步进行跨模型校正实验，以研究通用LLMs的翻译和评估能力之间的潜在关系。我们的代码和数据可在https://github.com/fzp0424/self_correct_mt 上找到。

更新时间: 2024-06-21 07:35:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.16379v3

Extraction of 3D trajectories of mandibular condyles from 2D real-time MRI

Computing the trajectories of mandibular condyles directly from MRI could provide a comprehensive examination, allowing for the extraction of both anatomical and kinematic details. This study aimed to investigate the feasibility of extracting 3D condylar trajectories from 2D real-time MRI and to assess their precision.Twenty healthy subjects underwent real-time MRI while opening and closing their jaws. One axial and two sagittal slices were segmented using a U-Net-based algorithm. The centers of mass of the resulting masks were projected onto the coordinate system based on anatomical markers and temporally adjusted using a common projection. The quality of the computed trajectories was evaluated using metrics designed to estimate movement reproducibility, head motion, and slice placement symmetry.The segmentation of the axial slices demonstrated good-to-excellent quality; however, the segmentation of the sagittal slices required some fine-tuning. The movement reproducibility was acceptable for most cases; nevertheless, head motion displaced the trajectories by 1 mm on average. The difference in the superior-inferior coordinate of the condyles in the closed jaw position was 1.7 mm on average.Despite limitations in precision, real-time MRI enables the extraction of condylar trajectories with sufficient accuracy for evaluating clinically relevant parameters such as condyle displacement, trajectories aspect, and symmetry.

Updated: 2024-06-21 07:35:40

标题: 提取下颌骨髁突在2D实时MRI中的3D轨迹

摘要: 直接从MRI计算下颌骨髁突的轨迹可能提供全面的检查，允许提取解剖和运动学细节。本研究旨在探讨从2D实时MRI中提取3D髁突轨迹的可行性，并评估其精度。二十名健康受试者在张合下颌时进行了实时MRI检查。使用基于U-Net的算法对一个轴向和两个矢状切片进行分割。所得掩模的质心投影到基于解剖标记的坐标系上，并通过常规投影进行时间调整。使用设计用于估计运动重现性、头部运动和切片放置对称性的度量标准评估计算的轨迹质量。轴向切片的分割表现出良好至优秀的质量；然而，矢状切片的分割需要一些微调。大多数情况下的运动重现性是可接受的；然而，头部运动平均使轨迹偏移1mm。闭合下颌位置的髁突的上下坐标的差异平均为1.7mm。尽管存在精度限制，实时MRI能够提取足够准确的髁突轨迹，用于评估临床相关参数，如髁突位移、轨迹方面和对称性。

更新时间: 2024-06-21 07:35:40

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2406.14925v1

The Impact of AI on Perceived Job Decency and Meaningfulness: A Case Study

The proliferation of Artificial Intelligence (AI) in workplaces stands to change the way humans work, with job satisfaction intrinsically linked to work life. Existing research on human-AI collaboration tends to prioritize performance over the experiential aspects of work. In contrast, this paper explores the impact of AI on job decency and meaningfulness in workplaces. Through interviews in the Information Technology (IT) domain, we not only examined the current work environment, but also explored the perceived evolution of the workplace ecosystem with the introduction of an AI. Findings from the preliminary exploratory study reveal that respondents tend to visualize a workplace where humans continue to play a dominant role, even with the introduction of advanced AIs. In this prospective scenario, AI is seen as serving as a complement rather than replacing the human workforce. Furthermore, respondents believe that the introduction of AI will maintain or potentially increase overall job satisfaction.

Updated: 2024-06-21 07:31:56

标题: 人工智能对工作体面性和意义感知的影响：案例研究

摘要: 人工智能在工作场所的激增将改变人类工作方式，工作满意度与工作生活密切相关。现有研究人工智能与人类合作往往更注重绩效而非工作体验方面。相反，本文探讨了人工智能对工作场所中工作的体面性和意义性的影响。通过在信息技术领域的访谈，我们不仅考察了当前的工作环境，还探讨了引入人工智能后工作生态系统的变化。初步探索性研究的结果显示，受访者倾向于想象一个工作场所，在这里人类继续扮演主导角色，即使引入先进的人工智能。在这种未来情景中，人工智能被视为一种补充，而非替代人类劳动力。此外，受访者相信引入人工智能将维持或潜在增加整体工作满意度。

更新时间: 2024-06-21 07:31:56

领域: cs.AI

下载: http://arxiv.org/abs/2406.14273v2

LLM2FEA: Discover Novel Designs with Generative Evolutionary Multitasking

The rapid research and development of generative artificial intelligence has enabled the generation of high-quality images, text, and 3D models from text prompts. This advancement impels an inquiry into whether these models can be leveraged to create digital artifacts for both creative and engineering applications. Drawing on innovative designs from other domains may be one answer to this question, much like the historical practice of ``bionics", where humans have sought inspiration from nature's exemplary designs. This raises the intriguing possibility of using generative models to simultaneously tackle design tasks across multiple domains, facilitating cross-domain learning and resulting in a series of innovative design solutions. In this paper, we propose LLM2FEA as the first attempt to discover novel designs in generative models by transferring knowledge across multiple domains. By utilizing a multi-factorial evolutionary algorithm (MFEA) to drive a large language model, LLM2FEA integrates knowledge from various fields to generate prompts that guide the generative model in discovering novel and practical objects. Experimental results in the context of 3D aerodynamic design verify the discovery capabilities of the proposed LLM2FEA. The designs generated by LLM2FEA not only satisfy practicality requirements to a certain degree but also feature novel and aesthetically pleasing shapes, demonstrating the potential applications of LLM2FEA in discovery tasks.

Updated: 2024-06-21 07:20:51

标题: LLM2FEA：通过生成式进化多任务学习发现新颖设计

摘要: 快速发展的生成式人工智能使得可以从文本提示中生成高质量的图像、文本和3D模型。这一进展引发了一个问题，即这些模型是否可以被利用来为创意和工程应用创造数字化作品。借鉴其他领域的创新设计可能是这个问题的答案之一，就像人类以往从自然的典范设计中寻求灵感的历史做法“仿生学”一样。这引发了使用生成式模型同时处理多个领域设计任务的有趣可能性，促进跨领域学习，并产生一系列创新设计解决方案。本文提出LLM2FEA作为首次尝试，通过跨领域知识转移来发现生成式模型中的新颖设计。通过利用多因素进化算法(MFEA)驱动大型语言模型，LLM2FEA整合了来自各个领域的知识，生成引导生成模型发现新颖和实用对象的提示。在3D空气动力设计的背景下进行的实验结果验证了提出的LLM2FEA的发现能力。LLM2FEA生成的设计不仅在一定程度上满足了实用性要求，还具有新颖和美观的形状，展示了LLM2FEA在发现任务中的潜在应用。

更新时间: 2024-06-21 07:20:51

领域: cs.AI,cs.CL,cs.CV,cs.LG,cs.NE

下载: http://arxiv.org/abs/2406.14917v1

Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks

In the realm of deep learning, the Kolmogorov-Arnold Network (KAN) has emerged as a potential alternative to multilayer projections (MLPs). However, its applicability to vision tasks has not been extensively validated. In our study, we demonstrated the effectiveness of KAN for vision tasks through multiple trials on the MNIST, CIFAR10, and CIFAR100 datasets, using a training batch size of 32. Our results showed that while KAN outperformed the original MLP-Mixer on CIFAR10 and CIFAR100, it performed slightly worse than the state-of-the-art ResNet-18. These findings suggest that KAN holds significant promise for vision tasks, and further modifications could enhance its performance in future evaluations.Our contributions are threefold: first, we showcase the efficiency of KAN-based algorithms for visual tasks; second, we provide extensive empirical assessments across various vision benchmarks, comparing KAN's performance with MLP-Mixer, CNNs, and Vision Transformers (ViT); and third, we pioneer the use of natural KAN layers in visual tasks, addressing a gap in previous research. This paper lays the foundation for future studies on KANs, highlighting their potential as a reliable alternative for image classification tasks.

Updated: 2024-06-21 07:20:34

标题: 展示Kolmogorov-Arnold网络在视觉任务中的有效性

摘要: 在深度学习领域，科尔莫哥洛夫-阿诺德网络（KAN）已经成为多层投影（MLPs）的潜在替代方案。然而，其在视觉任务中的适用性尚未得到广泛验证。在我们的研究中，我们通过在MNIST、CIFAR10和CIFAR100数据集上进行多次试验，使用32的训练批次大小，展示了KAN在视觉任务中的有效性。我们的结果显示，尽管KAN在CIFAR10和CIFAR100上优于原始的MLP-Mixer，但与最先进的ResNet-18相比略逊一筹。这些发现表明，KAN在视觉任务中具有显著的潜力，进一步的修改可能会提高其在未来评估中的性能。我们的贡献有三个方面：首先，我们展示了基于KAN的算法在视觉任务中的效率；其次，我们在各种视觉基准测试中进行了广泛的实证评估，将KAN的性能与MLP-Mixer、CNNs和Vision Transformers（ViT）进行了比较；第三，我们开创了在视觉任务中使用自然KAN层的先河，弥补了以前研究中的空白。本文为未来关于KAN的研究奠定了基础，凸显了其作为图像分类任务可靠替代方案的潜力。

更新时间: 2024-06-21 07:20:34

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14916v1

Large Language Model-Driven Curriculum Design for Mobile Networks

This study introduces an innovative framework that employs large language models (LLMs) to automate the design and generation of curricula for reinforcement learning (RL). As mobile networks evolve towards the 6G era, managing their increasing complexity and dynamic nature poses significant challenges. Conventional RL approaches often suffer from slow convergence and poor generalization due to conflicting objectives and the large state and action spaces associated with mobile networks. To address these shortcomings, we introduce curriculum learning, a method that systematically exposes the RL agent to progressively challenging tasks, improving convergence and generalization. However, curriculum design typically requires extensive domain knowledge and manual human effort. Our framework mitigates this by utilizing the generative capabilities of LLMs to automate the curriculum design process, significantly reducing human effort while improving the RL agent's convergence and performance. We deploy our approach within a simulated mobile network environment and demonstrate improved RL convergence rates, generalization to unseen scenarios, and overall performance enhancements. As a case study, we consider autonomous coordination and user association in mobile networks. Our obtained results highlight the potential of combining LLM-based curriculum generation with RL for managing next-generation wireless networks, marking a significant step towards fully autonomous network operations.

Updated: 2024-06-21 07:06:30

标题: 大型语言模型驱动的移动网络课程设计

摘要: 这项研究介绍了一种创新框架，利用大型语言模型（LLM）自动化设计和生成强化学习（RL）课程。随着移动网络向6G时代发展，管理其不断增加的复杂性和动态性带来重大挑战。传统的RL方法通常由于冲突的目标和与移动网络相关的大状态和动作空间而导致收敛缓慢和泛化能力差。为了解决这些缺点，我们引入了课程学习，一种系统地使RL代理逐渐面对具有挑战性的任务，改善收敛和泛化能力。然而，课程设计通常需要广泛的领域知识和人工努力。我们的框架通过利用LLM的生成能力自动化课程设计过程，显著减少人力努力同时改善RL代理的收敛和性能。我们将我们的方法部署在模拟移动网络环境中，并展示了改进的RL收敛速度、对未见场景的泛化能力以及整体性能提升。作为案例研究，我们考虑了移动网络中的自主协调和用户关联。我们获得的结果突显了将基于LLM的课程生成与RL相结合，以管理下一代无线网络的潜力，标志着朝着完全自主网络运营迈出了重要一步。

更新时间: 2024-06-21 07:06:30

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2405.18039v2

Twin Transformer using Gated Dynamic Learnable Attention mechanism for Fault Detection and Diagnosis in the Tennessee Eastman Process

Fault detection and diagnosis (FDD) is a crucial task for ensuring the safety and efficiency of industrial processes. We propose a novel FDD methodology for the Tennessee Eastman Process (TEP), a widely used benchmark for chemical process control. The model employs two separate Transformer branches, enabling independent processing of input data and potential extraction of diverse information. A novel attention mechanism, Gated Dynamic Learnable Attention (GDLAttention), is introduced which integrates a gating mechanism and dynamic learning capabilities. The gating mechanism modulates the attention weights, allowing the model to focus on the most relevant parts of the input. The dynamic learning approach adapts the attention strategy during training, potentially leading to improved performance. The attention mechanism uses a bilinear similarity function, providing greater flexibility in capturing complex relationships between query and key vectors. In order to assess the effectiveness of our approach, we tested it against 21 and 18 distinct fault scenarios in TEP, and compared its performance with several established FDD techniques. The outcomes indicate that the method outperforms others in terms of accuracy, false alarm rate, and misclassification rate. This underscores the robustness and efficacy of the approach for FDD in intricate industrial processes.

Updated: 2024-06-21 07:04:49

标题: 使用门控动态可学习注意机制的双变压器在田纳西伊斯曼过程中的故障检测和诊断

摘要: 故障检测与诊断（FDD）是确保工业过程安全和效率的关键任务。我们提出了一种新颖的FDD方法，适用于Tennessee Eastman Process（TEP），这是化工过程控制中广泛使用的基准。该模型采用两个独立的Transformer分支，能够独立处理输入数据并提取多样化信息。引入了一种新颖的注意机制，即门控动态可学习注意力（GDLAttention），它整合了门控机制和动态学习能力。门控机制调节注意力权重，使模型能够专注于输入的最相关部分。动态学习方法在训练过程中调整注意力策略，可能导致性能提升。注意机制使用双线性相似性函数，提供了更大的灵活性，捕捉查询和关键向量之间的复杂关系。为了评估我们的方法的有效性，我们在TEP中测试了21个和18个不同的故障场景，并将其性能与几种已建立的FDD技术进行了比较。结果表明，该方法在准确性、误警率和误分类率方面优于其他方法。这突显了该方法在复杂工业过程中的FDD中的鲁棒性和有效性。

更新时间: 2024-06-21 07:04:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.10842v3

Towards Dynamic Resource Allocation and Client Scheduling in Hierarchical Federated Learning: A Two-Phase Deep Reinforcement Learning Approach

Federated learning (FL) is a viable technique to train a shared machine learning model without sharing data. Hierarchical FL (HFL) system has yet to be studied regrading its multiple levels of energy, computation, communication, and client scheduling, especially when it comes to clients relying on energy harvesting to power their operations. This paper presents a new two-phase deep deterministic policy gradient (DDPG) framework, referred to as ``TP-DDPG'', to balance online the learning delay and model accuracy of an FL process in an energy harvesting-powered HFL system. The key idea is that we divide optimization decisions into two groups, and employ DDPG to learn one group in the first phase, while interpreting the other group as part of the environment to provide rewards for training the DDPG in the second phase. Specifically, the DDPG learns the selection of participating clients, and their CPU configurations and the transmission powers. A new straggler-aware client association and bandwidth allocation (SCABA) algorithm efficiently optimizes the other decisions and evaluates the reward for the DDPG. Experiments demonstrate that with substantially reduced number of learnable parameters, the TP-DDPG can quickly converge to effective polices that can shorten the training time of HFL by 39.4% compared to its benchmarks, when the required test accuracy of HFL is 0.9.

Updated: 2024-06-21 07:01:23

标题: 走向分层联邦学习中的动态资源分配和客户端调度：一种两阶段深度强化学习方法

摘要: 联邦学习（FL）是一种可行的技术，可以在不共享数据的情况下训练共享的机器学习模型。至今尚未研究层次化FL（HFL）系统，特别是在涉及依靠能量收集来为其运行提供动力的客户端时，其多个能量、计算、通信和客户端调度级别。本文提出了一种新的两阶段深度确定性策略梯度（DDPG）框架，称为“TP-DDPG”，以在线平衡能量收集驱动的HFL系统中的FL进程的学习延迟和模型准确性。关键思想是将优化决策分为两组，并在第一阶段使用DDPG学习一组，同时将另一组解释为环境的一部分，为训练第二阶段的DDPG提供奖励。具体而言，DDPG学习参与客户端的选择，以及他们的CPU配置和传输功率。一种新的故障感知客户端关联和带宽分配（SCABA）算法有效优化其他决策并评估DDPG的奖励。实验证明，与其基准相比，TP-DDPG可以迅速收敛到有效策略，将HFL的训练时间缩短39.4％，当HFL所需的测试准确性为0.9时，可降低学习参数的数量。

更新时间: 2024-06-21 07:01:23

领域: cs.LG,cs.DC,math.OC

下载: http://arxiv.org/abs/2406.14910v1

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring their distinct accuracy-latency trade-offs. To address this challenge, we propose the Mixture of Attention (MoA), which automatically tailors distinct sparse attention configurations to different heads and layers. MoA constructs and navigates a search space of various attention patterns and their scaling rules relative to input sequence lengths. It profiles the model, evaluates potential configurations, and pinpoints the optimal sparse attention compression plan. MoA adapts to varying input sizes, revealing that some attention heads expand their focus to accommodate longer sequences, while other heads consistently concentrate on fixed-length local contexts. Experiments show that MoA increases the effective context length by $3.9\times$ with the same average attention span, boosting retrieval accuracy by $1.5-7.1\times$ over the uniform-attention baseline across Vicuna-7B, Vicuna-13B, and Llama3-8B models. Moreover, MoA narrows the capability gaps between sparse and dense models, reducing the maximum relative performance drop from $9\%-36\%$ to within $5\%$ across two long-context understanding benchmarks. MoA achieves a $1.2-1.4\times$ GPU memory reduction and boosts decode throughput by $5.5-6.7 \times$ for 7B and 13B dense models on a single GPU, with minimal impact on performance.

Updated: 2024-06-21 06:58:37

标题: MoA: 自动大型语言模型压缩的稀疏注意力混合

摘要: 稀疏注意力可以有效地缓解大型语言模型（LLMs）在长上下文中对内存和吞吐量的显著需求。现有方法通常采用统一的稀疏注意力掩码，将相同的稀疏模式应用于不同的注意力头和输入长度。然而，这种统一方法无法捕捉LLMs固有的多样化注意力模式，忽略了它们不同的准确性与延迟之间的权衡。为了解决这一挑战，我们提出了混合注意力（MoA），它可以自动为不同的头部和层自动调整不同的稀疏注意力配置。MoA构建并导航各种注意力模式和它们相对于输入序列长度的缩放规则的搜索空间。它对模型进行分析，评估潜在的配置，并找出最佳的稀疏注意力压缩计划。MoA适应不同的输入大小，揭示了一些注意力头将其焦点扩展以适应更长序列，而其他头部则始终集中于固定长度的局部上下文。实验表明，MoA在保持相同平均注意力范围的情况下，使有效上下文长度提高了3.9倍，将检索准确性提高了1.5-7.1倍，优于Vicuna-7B、Vicuna-13B和Llama3-8B模型的统一注意力基线。此外，MoA缩小了稀疏和密集模型之间的能力差距，将最大相对性能下降从9%-36%减少到5%以内，在两个长上下文理解基准测试中。MoA实现了GPU内存减少1.2-1.4倍，将7B和13B密集模型在单个GPU上的解码吞吐量提高了5.5-6.7倍，对性能影响很小。

更新时间: 2024-06-21 06:58:37

领域: cs.LG,cs.AI,cs.CL,I.2.7

下载: http://arxiv.org/abs/2406.14909v1

Enhancing reliability in prediction intervals using point forecasters: Heteroscedastic Quantile Regression and Width-Adaptive Conformal Inference

Building prediction intervals for time series forecasting problems presents a complex challenge, particularly when relying solely on point predictors, a common scenario for practitioners in the industry. While research has primarily focused on achieving increasingly efficient valid intervals, we argue that, when evaluating a set of intervals, traditional measures alone are insufficient. There are additional crucial characteristics: the intervals must vary in length, with this variation directly linked to the difficulty of the prediction, and the coverage of the interval must remain independent of the difficulty of the prediction for practical utility. We propose the Heteroscedastic Quantile Regression (HQR) model and the Width-Adaptive Conformal Inference (WACI) method, providing theoretical coverage guarantees, to overcome those issues, respectively. The methodologies are evaluated in the context of Electricity Price Forecasting and Wind Power Forecasting, representing complex scenarios in time series forecasting. The results demonstrate that HQR and WACI not only improve or achieve typical measures of validity and efficiency but also successfully fulfil the commonly ignored mentioned characteristics.

Updated: 2024-06-21 06:51:13

标题: 利用点预测器增强预测区间的可靠性：异方差分位回归和宽度自适应符合推理

摘要: 为时间序列预测问题构建预测区间提出了复杂的挑战，特别是当仅依赖点预测器时，这是行业从业者常见的情况。虽然研究主要集中在实现越来越有效的有效区间，但我们认为，在评估一组区间时，仅传统指标是不足的。还有其他关键特征：区间的长度必须变化，这种变化直接与预测的困难程度相关，而区间的覆盖范围必须保持独立于预测的困难程度以实现实用性。我们提出了Heteroscedastic Quantile Regression（HQR）模型和Width-Adaptive Conformal Inference（WACI）方法，分别提供理论覆盖保证，以解决这些问题。这些方法在电力价格预测和风力预测的背景下进行评估，代表了时间序列预测中的复杂场景。结果表明，HQR和WACI不仅改进或实现了有效性和效率的典型指标，而且成功地满足了常常被忽视的特征。

更新时间: 2024-06-21 06:51:13

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.14904v1

GIEBench: Towards Holistic Evaluation of Group Indentity-based Empathy for Large Language Models

As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individuals' group identities. To address this gap, we introduce GIEBench, a comprehensive benchmark that includes 11 identity dimensions, covering 97 group identities with a total of 999 single-choice questions related to specific group identities. GIEBench is designed to evaluate the empathy of LLMs when presented with specific group identities such as gender, age, occupation, and race, emphasizing their ability to respond from the standpoint of the identified group. This supports the ongoing development of empathetic LLM applications tailored to users with different identities. Our evaluation of 23 LLMs revealed that while these LLMs understand different identity standpoints, they fail to consistently exhibit equal empathy across these identities without explicit instructions to adopt those perspectives. This highlights the need for improved alignment of LLMs with diverse values to better accommodate the multifaceted nature of human identities. Our datasets are available at https://github.com/GIEBench/GIEBench.

Updated: 2024-06-21 06:50:42

标题: GIEBench: 面向大型语言模型的群体身份为基础的共情综合评估

摘要: 随着大型语言模型（LLMs）的不断发展和广泛应用，LLMs展现出对不同群体身份的共情能力并理解他们的视角的能力日益受到重视。大多数现有的用于评估LLMs共情能力的基准主要关注普遍的人类情感，如悲伤和痛苦，往往忽视了个体群体身份的背景。为了弥补这一空白，我们介绍了GIEBench，这是一个包括11个身份维度的综合基准，涵盖了97个群体身份，总共有999个与特定群体身份相关的单选题。GIEBench旨在评估LLMs在面对特定群体身份（如性别、年龄、职业和种族）时的共情能力，强调它们从被确认的群体的立场出发做出回应的能力。这支持着针对具有不同身份的用户定制的富有同情心的LLMs应用的持续发展。我们对23个LLMs进行的评估显示，尽管这些LLMs理解不同的身份立场，但它们在没有明确指示采纳这些视角的情况下无法一致展现出对这些身份的平等共情。这突显了需要改进LLMs与多样化价值观的对齐以更好地适应人类身份多面性的需求。我们的数据集可在https://github.com/GIEBench/GIEBench 上找到。

更新时间: 2024-06-21 06:50:42

领域: cs.AI

下载: http://arxiv.org/abs/2406.14903v1

Safely Learning with Private Data: A Federated Learning Framework for Large Language Model

Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuitable for LLM due to their high computational demands on clients. An alternative, split learning, offloads most training parameters to the server while training embedding and output layers locally, making it more suitable for LLM. Nonetheless, it faces significant challenges in security and efficiency. Firstly, the gradients of embeddings are prone to attacks, leading to potential reverse engineering of private data. Furthermore, the server's limitation of handle only one client's training request at a time hinders parallel training, severely impacting training efficiency. In this paper, we propose a Federated Learning framework for LLM, named FL-GLM, which prevents data leakage caused by both server-side and peer-client attacks while improving training efficiency. Specifically, we first place the input block and output block on local client to prevent embedding gradient attacks from server. Secondly, we employ key-encryption during client-server communication to prevent reverse engineering attacks from peer-clients. Lastly, we employ optimization methods like client-batching or server-hierarchical, adopting different acceleration methods based on the actual computational capabilities of the server. Experimental results on NLU and generation tasks demonstrate that FL-GLM achieves comparable metrics to centralized chatGLM model, validating the effectiveness of our federated learning framework.

Updated: 2024-06-21 06:43:15

标题: 在私有数据中安全学习：用于大型语言模型的联邦学习框架

摘要: 私人数据比公共数据更大且质量更高，可以极大地改善大型语言模型（LLM）。然而，由于隐私问题，这些数据通常分散在多个数据孤岛中，使得其安全利用于LLM训练成为一项挑战。联邦学习（FL）是训练具有分布式私人数据的模型的理想解决方案，但传统框架如FedAvg并不适合LLM，因为它们对客户端的计算需求很高。另一种替代方案，分割学习，将大部分训练参数卸载到服务器，同时在本地进行嵌入和输出层的训练，使其更适合LLM。然而，它面临着安全性和效率方面的重大挑战。首先，嵌入的梯度容易受到攻击，导致私人数据的潜在反向工程。此外，服务器一次只能处理一个客户端的训练请求的限制阻碍了并行训练，严重影响了训练效率。在本文中，我们提出了一个针对LLM的联邦学习框架，名为FL-GLM，可以防止由服务器端和对等客户端攻击引起的数据泄漏，同时提高训练效率。具体来说，我们首先将输入块和输出块放在本地客户端上，以防止来自服务器的嵌入梯度攻击。其次，在客户端与服务器之间的通信过程中使用密钥加密，以防止来自对等客户端的反向工程攻击。最后，我们采用像客户端批处理或服务器分层这样的优化方法，根据服务器的实际计算能力采用不同的加速方法。NLU和生成任务上的实验结果表明，FL-GLM实现了与集中式chatGLM模型相当的指标，验证了我们的联邦学习框架的有效性。

更新时间: 2024-06-21 06:43:15

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2406.14898v1

Talking the Talk Does Not Entail Walking the Walk: On the Limits of Large Language Models in Lexical Entailment Recognition

Verbs form the backbone of language, providing the structure and meaning to sentences. Yet, their intricate semantic nuances pose a longstanding challenge. Understanding verb relations through the concept of lexical entailment is crucial for comprehending sentence meanings and grasping verb dynamics. This work investigates the capabilities of eight Large Language Models in recognizing lexical entailment relations among verbs through differently devised prompting strategies and zero-/few-shot settings over verb pairs from two lexical databases, namely WordNet and HyperLex. Our findings unveil that the models can tackle the lexical entailment recognition task with moderately good performance, although at varying degree of effectiveness and under different conditions. Also, utilizing few-shot prompting can enhance the models' performance. However, perfectly solving the task arises as an unmet challenge for all examined LLMs, which raises an emergence for further research developments on this topic.

Updated: 2024-06-21 06:30:16

标题: "说话并不意味着付诸行动：关于大型语言模型在词汇蕴涵识别中的局限性"

摘要: 动词构成了语言的支柱，为句子提供结构和意义。然而，它们复杂的语义细微差别构成了长期的挑战。通过词汇蕴涵的概念理解动词之间的关系对于理解句子的含义和把握动词的动态至关重要。本研究通过不同设计的提示策略和零/少样本设置，调查了八个大型语言模型在识别动词之间的词汇蕴涵关系方面的能力，涉及来自两个词汇数据库（WordNet和HyperLex）的动词对。我们的研究结果揭示了模型可以以适度良好的性能处理词汇蕴涵识别任务，尽管在不同条件下和不同效率程度下。此外，利用少样本提示可以提高模型的性能。然而，完美解决这一任务对于所有检验的LLM来说仍然是一个未解决的挑战，这引发了有关这一主题进一步研究发展的需求。

更新时间: 2024-06-21 06:30:16

领域: cs.CL,cs.AI,cs.CY,cs.IR,physics.soc-ph

下载: http://arxiv.org/abs/2406.14894v1

Metric Space Magnitude for Evaluating the Diversity of Latent Representations

The magnitude of a metric space is a novel invariant that provides a measure of the 'effective size' of a space across multiple scales, while also capturing numerous geometrical properties, such as curvature, density, or entropy. We develop a family of magnitude-based measures of the intrinsic diversity of latent representations, formalising a novel notion of dissimilarity between magnitude functions of finite metric spaces. Our measures are provably stable under perturbations of the data, can be efficiently calculated, and enable a rigorous multi-scale characterisation and comparison of latent representations. We show their utility and superior performance across different domains and tasks, including (i) the automated estimation of diversity, (ii) the detection of mode collapse, and (iii) the evaluation of generative models for text, image, and graph data.

Updated: 2024-06-21 06:25:49

标题: 度量空间幅度用于评估潜在表示的多样性

摘要: 度量空间的量级是一种新颖的不变量，提供了一个跨多个尺度衡量空间“有效大小”的方法，同时还捕捉了许多几何属性，如曲率、密度或熵。我们开发了一系列基于量级的隐式表示的内在多样性度量，形式化了有限度量空间的量函数之间的一种新颖的不相似性概念。我们的度量在数据扰动下被证明是稳定的，可以高效计算，并能够对隐式表示进行严格的多尺度表征和比较。我们展示了它们在不同领域和任务中的效用和卓越性能，包括（i）自动估计多样性，（ii）检测模式坍塌，以及（iii）评估文本、图像和图数据的生成模型。

更新时间: 2024-06-21 06:25:49

领域: cs.LG,math.GT,stat.ML

下载: http://arxiv.org/abs/2311.16054v3

What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models

This paper presents a way of enhancing the reliability of Large Multi-modal Models (LMMs) in addressing hallucination, where the models generate cross-modal inconsistent responses. Without additional training, we propose Counterfactual Inception, a novel method that implants counterfactual thinking into LMMs using self-generated counterfactual keywords. Our method is grounded in the concept of counterfactual thinking, a cognitive process where human considers alternative realities, enabling more extensive context exploration. Bridging the human cognition mechanism into LMMs, we aim for the models to engage with and generate responses that span a wider contextual scene understanding, mitigating hallucinatory outputs. We further introduce Plausibility Verification Process (PVP), a simple yet robust keyword constraint that effectively filters out sub-optimal keywords to enable the consistent triggering of counterfactual thinking in the model responses. Comprehensive analyses across various LMMs, including both open-source and proprietary models, corroborate that counterfactual thinking significantly reduces hallucination and helps to broaden contextual understanding based on true visual clues.

Updated: 2024-06-21 06:11:25

标题: 如果……？：思考反事实关键词有助于减轻大型多模态模型中的幻觉

摘要: 本文提出了一种增强大型多模型（LMMs）在处理幻觉时可靠性的方法，其中模型会生成跨模态不一致的响应。我们提出了反事实启示法，这是一种新颖的方法，通过使用自动生成的反事实关键词将反事实思维植入到LMMs中。我们的方法基于反事实思维的概念，这是一种认知过程，人类在其中考虑替代的现实，从而实现更广泛的上下文探索。将人类认知机制与LMMs联系起来，我们的目标是使模型参与并生成涵盖更广泛上下文场景理解的响应，从而减轻幻觉输出。我们进一步引入了可信性验证过程（PVP），这是一个简单而强大的关键词约束，能够有效地过滤出次优的关键词，以实现在模型响应中一致地触发反事实思维。对包括开源和专有模型在内的各种LMMs进行的全面分析证实，反事实思维显著减少了幻觉，并帮助基于真实视觉线索扩展上下文理解。

更新时间: 2024-06-21 06:11:25

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.13513v2

Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks

Computer vision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption on the number of modes. Through experiments on a specifically designed synthetic dataset, we demonstrate that traditional loss functions lead to poor probability distribution estimates and severe overconfidence, in the absence of full ground truth distributions. In order to alleviate these issues, we propose hinge-Wasserstein -- a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training. This enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available. In extensive experiments, we show that the proposed loss leads to substantially better uncertainty estimation on two challenging computer vision tasks: horizon line detection and stereo disparity estimation.

Updated: 2024-06-21 06:09:50

标题: 铰链-瓦瑟斯坦：在回归任务中估计多模态随机不确定性

摘要: 在安全关键应用中部署的计算机视觉系统需要量化其输出的不确定性。我们研究从图像到参数值的回归，在这种情况下，通常通过预测概率分布来检测不确定性。在这种情况下，我们研究了回归分类范式，它可以表示多模态分布，而不需要对模态数量进行先验假设。通过对一个特别设计的合成数据集进行实验，我们证明传统的损失函数在没有完整的真实分布的情况下会导致概率分布估计不准确和过度自信。为了缓解这些问题，我们提出了hinge-Wasserstein - 一种简单的改进Wasserstein损失函数，它在训练过程中减少了对弱次要模态的惩罚。这使得可以预测具有多个模态的复杂分布，并允许在没有完整真实分布的数据集上进行训练。在广泛的实验中，我们展示了所提出的损失函数在两个具有挑战性的计算机视觉任务中（地平线检测和立体视差估计）能够显著改善不确定性估计。

更新时间: 2024-06-21 06:09:50

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.00560v4

Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts

Reasoning methods, best exemplified by the well-known Chain-of-Thought (CoT), empower the reasoning abilities of Large Language Models (LLMs) by eliciting them to solve complex tasks in a step-by-step manner. Although they are achieving significant success, the ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data, which makes other languages a barrier. In this paper, we propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages. The proposed method, through a self-consistent cross-lingual prompting mechanism inspired by the Tree-of-Thoughts approach, provides multi-step reasoning paths in different languages that, during the steps, lead to the final solution. Experimental evaluations show that our method significantly outperforms existing prompting methods by reducing the number of interactions and achieving state-of-the-art performance.

Updated: 2024-06-21 06:06:51

标题: 通过思维树实现跨语言的多步推理赋能

摘要: Reasoning methods, best exemplified by the well-known Chain-of-Thought (CoT), empower the reasoning abilities of Large Language Models (LLMs) by eliciting them to solve complex tasks in a step-by-step manner. Although they are achieving significant success, the ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data, which makes other languages a barrier. In this paper, we propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages. The proposed method, through a self-consistent cross-lingual prompting mechanism inspired by the Tree-of-Thoughts approach, provides multi-step reasoning paths in different languages that, during the steps, lead to the final solution. Experimental evaluations show that our method significantly outperforms existing prompting methods by reducing the number of interactions and achieving state-of-the-art performance. 本文提出了跨语言思维树（Cross-ToT），一种用于跨语言CoT推理的方法。通过受“思维树”方法启发的自洽跨语言提示机制，该方法在不同语言中提供多步推理路径，这些路径在各个步骤中都能引导到最终解决方案。实验评估显示，我们的方法通过减少交互次数并实现最先进的性能，明显优于现有提示方法。

更新时间: 2024-06-21 06:06:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.08097v4

Pathformer: Recursive Path Query Encoding for Complex Logical Query Answering

Complex Logical Query Answering (CLQA) over incomplete knowledge graphs is a challenging task. Recently, Query Embedding (QE) methods are proposed to solve CLQA by performing multi-hop logical reasoning. However, most of them only consider historical query context information while ignoring future information, which leads to their failure to capture the complex dependencies behind the elements of a query. In recent years, the transformer architecture has shown a strong ability to model long-range dependencies between words. The bidirectional attention mechanism proposed by the transformer can solve the limitation of these QE methods regarding query context. Still, as a sequence model, it is difficult for the transformer to model complex logical queries with branch structure computation graphs directly. To this end, we propose a neural one-point embedding method called Pathformer based on the tree-like computation graph, i.e., query computation tree. Specifically, Pathformer decomposes the query computation tree into path query sequences by branches and then uses the transformer encoder to recursively encode these path query sequences to obtain the final query embedding. This allows Pathformer to fully utilize future context information to explicitly model the complex interactions between various parts of the path query. Experimental results show that Pathformer outperforms existing competitive neural QE methods, and we found that Pathformer has the potential to be applied to non-one-point embedding space.

Updated: 2024-06-21 06:02:58

标题: 路径形成器：用于复杂逻辑查询回答的递归路径查询编码

摘要: 复杂逻辑查询回答（CLQA）在不完整知识图上是一项具有挑战性的任务。最近，提出了查询嵌入（QE）方法来解决CLQA，通过进行多跳逻辑推理。然而，大多数方法仅考虑历史查询上下文信息，而忽略未来信息，导致它们无法捕捉查询元素背后的复杂依赖关系。近年来，变压器架构已经展示了模拟单词之间长距离依赖关系的强大能力。变压器提出的双向注意机制可以解决这些QE方法在查询上下文方面的局限性。然而，作为一个序列模型，变压器难以直接模拟具有分支结构计算图的复杂逻辑查询。为此，我们提出了一种基于树状计算图的神经单点嵌入方法，称为Pathformer，即查询计算树。具体而言，Pathformer通过分支将查询计算树分解为路径查询序列，然后使用变压器编码器递归地编码这些路径查询序列，以获得最终的查询嵌入。这使得Pathformer能够充分利用未来上下文信息来显式模拟路径查询各部分之间的复杂交互。实验结果表明，Pathformer优于现有竞争性的神经QE方法，我们发现Pathformer有潜力应用于非单点嵌入空间。

更新时间: 2024-06-21 06:02:58

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2406.14880v1

MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

LiDAR-based 3D object detection is pivotal across many applications, yet the performance of such detection systems often degrades after deployment, especially when faced with unseen test point clouds originating from diverse locations or subjected to corruption. In this work, we introduce a new online adaptation framework for detectors named Model Synergy (MOS). Specifically, MOS dynamically assembles best-fit supermodels for each test batch from a bank of historical checkpoints, leveraging long-term knowledge to guide model updates without forgetting. The model assembly is directed by the proposed synergy weights (SW), employed for weighted averaging of the selected checkpoints to minimize redundancy in the composite supermodel. These weights are calculated by evaluating the similarity of predicted bounding boxes on test data and the feature independence among model pairs in the bank. To maintain an informative yet compact model bank, we pop out checkpoints with the lowest average SW scores and insert newly updated model weights. Our method was rigorously tested against prior test-time domain adaptation strategies on three datasets and under eight types of corruptions, demonstrating its superior adaptability to changing scenes and conditions. Remarkably, our approach achieved a 67.3% increase in performance in a complex "cross-corruption" scenario, which involves cross-dataset inconsistencies and real-world scene corruptions, providing a more realistic testbed of adaptation capabilities. The code is available at https://github.com/zhuoxiao-chen/MOS.

Updated: 2024-06-21 05:58:19

标题: MOS: 利用模型协同性进行基于LiDAR的3D目标检测测试时间适应

摘要: 基于LiDAR的3D物体检测在许多应用中至关重要，然而，这种检测系统的性能经常在部署后下降，特别是当面临来自不同位置或遭受破坏的未知测试点云时。在本研究中，我们引入了一种名为模型协同（MOS）的新的在线适应框架，用于检测器。具体而言，MOS动态地从历史检查点库中为每个测试批次组装最佳拟合的超级模型，利用长期知识来引导模型更新而不会遗忘。模型组装由提出的协同权重（SW）指导，用于加权平均选择的检查点，以最小化复合超级模型中的冗余。这些权重是通过评估在测试数据上预测的边界框的相似性和银行中模型对之间的特征独立性来计算的。为了保持一个信息丰富而紧凑的模型库，我们弹出具有最低平均SW分数的检查点，并插入新更新的模型权重。我们的方法在三个数据集和八种类型的破坏下进行了严格测试，展示了其对不断变化的场景和条件的优越适应能力。值得注意的是，我们的方法在复杂的“跨破坏”情景中性能提高了67.3％，涉及跨数据集的不一致性和真实世界场景的破坏，为适应能力提供了更加现实的测试基准。该代码可在https://github.com/zhuoxiao-chen/MOS上找到。

更新时间: 2024-06-21 05:58:19

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.14878v1

Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

Active learning is increasingly adopted for expensive multi-objective combinatorial optimization problems, but it involves a challenging subset selection problem, optimizing the batch acquisition score that quantifies the goodness of a batch for evaluation. Due to the excessively large search space of the subset selection problem, prior methods optimize the batch acquisition on the latent space, which has discrepancies with the actual space, or optimize individual acquisition scores without considering the dependencies among candidates in a batch instead of directly optimizing the batch acquisition. To manage the vast search space, a simple and effective approach is the greedy method, which decomposes the problem into smaller subproblems, yet it has difficulty in parallelization since each subproblem depends on the outcome from the previous ones. To this end, we introduce a novel greedy-style subset selection algorithm that optimizes batch acquisition directly on the combinatorial space by sequential greedy sampling from the greedy policy, specifically trained to address all greedy subproblems concurrently. Notably, our experiments on the red fluorescent proteins design task show that our proposed method achieves the baseline performance in 1.69x fewer queries, demonstrating its efficiency.

Updated: 2024-06-21 05:57:08

标题: 在昂贵的多目标组合优化中训练贪婪策略进行提案批量选择

摘要: 主动学习越来越多地被用于昂贵的多目标组合优化问题，但它涉及一个具有挑战性的子集选择问题，即优化批量获取分数，该分数量化了用于评估的批量的好坏。由于子集选择问题的搜索空间过大，先前的方法在潜在空间上优化批量获取，这与实际空间存在差异，或者优化个别获取分数而不考虑批量中候选项之间的依赖关系，而不是直接优化批量获取。为了管理庞大的搜索空间，一种简单有效的方法是贪心算法，它将问题分解为较小的子问题，但由于每个子问题依赖于先前子问题的结果，因此很难并行化。为此，我们引入了一种新颖的贪心风格子集选择算法，通过从贪心策略中顺序贪心抽样，直接在组合空间上优化批量获取。具体训练以同时解决所有贪心子问题。值得注意的是，我们在红色荧光蛋白设计任务上的实验表明，我们提出的方法在查询数量少了1.69倍的情况下实现了基准性能，证明了其效率。

更新时间: 2024-06-21 05:57:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.14876v1

Automatic Counting and Classification of Mosquito Eggs in Field Traps

The analysis of the field traps where the mosquitoes insert their eggs is vital to check that the sterile insect technique (SIT) is working properly. This is because the number of hatched eggs may indicate that the sterile males are not competing with the wild ones. Nowadays, the study of the traps is done manually by microscope and is very time-consuming and prone to human error. This paper presents an automatic trap survey. For this purpose, a device has been designed that automatically scans the slat obtaining different overlapping photos. Subsequently, the images are analyzed by a Mask-RCNN neural network that segments the eggs and classifies them into 2 classes: full or hatch

Updated: 2024-06-21 05:56:22

标题: 野外陷阱中蚊子卵的自动计数和分类

摘要: 蚊子产卵的野外陷阱分析对于检查无菌昆虫技术（SIT）是否正常运作至关重要。这是因为孵化的卵数量可能表明无菌雄性昆虫与野生昆虫之间不存在竞争。如今，陷阱的研究是通过显微镜手动进行的，非常耗时且容易出现人为错误。本文提出了一种自动陷阱调查方法。为此，设计了一种自动扫描栅条的设备，获取不同重叠的照片。随后，通过Mask-RCNN神经网络分析图像，将卵分割并分类为两类：完整或孵化。

更新时间: 2024-06-21 05:56:22

领域: cs.AI

下载: http://arxiv.org/abs/2405.20656v3

Nearly Minimax Optimal Regret for Multinomial Logistic Bandit

In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the maximum assortment size $K$. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of $\Omega(d\sqrt{\smash[b]{T/K}})$ and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of $\tilde{O}(d\sqrt{\smash[b]{T/K}})$. Under non-uniform rewards, we prove a lower bound of $\Omega(d\sqrt{T})$ and an upper bound of $\tilde{O}(d\sqrt{T})$, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.

Updated: 2024-06-21 05:55:23

标题: 多项式逻辑回归赌博机的几乎最小化风险optimal遗憾

摘要: 在这篇论文中，我们研究了上下文多项式Logit（MNL）赌博问题，其中学习代理根据上下文信息顺序选择一组，并且用户反馈遵循MNL选择模型。尤其是关于最大组合大小$K$，较低和较高遗憾界之间存在显著差异。此外，在这些界之间的奖励结构变化使得寻求最优性变得复杂。在均匀奖励下，其中所有项目具有相同的预期奖励，我们建立了一个$\Omega(d\sqrt{\smash[b]{T/K}})$的遗憾下界，并提出了一个常数时间算法，OFU-MNL+，它实现了一个匹配的上界$\tilde{O}(d\sqrt{\smash[b]{T/K}})$。在非均匀奖励下，我们证明了一个$\Omega(d\sqrt{T})$的下界和一个$\tilde{O}(d\sqrt{T})$的上界，OFU-MNL+也可以实现这个上界。我们的实证研究支持这些理论发现。据我们所知，这是上下文MNL赌博文献中第一部证明极小极值优化的工作，无论是在均匀还是非均匀奖励设置下，并且提出了一个计算效率高的算法，可以达到这种最优性，直到对数因子。

更新时间: 2024-06-21 05:55:23

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.09831v5

Data Augmentation on Graphs: A Technical Survey

In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. To advance research in this emerging direction, this survey provides a comprehensive review and summary of existing graph data augmentation (GDAug) techniques. Specifically, this survey first provides an overview of various feasible taxonomies and categorizes existing GDAug studies based on multi-scale graph elements. Subsequently, for each type of GDAug technique, this survey formalizes standardized technical definition, discuss the technical details, and provide schematic illustration. The survey also reviews domain-specific graph data augmentation techniques, including those for heterogeneous graphs, temporal graphs, spatio-temporal graphs, and hypergraphs. In addition, this survey provides a summary of available evaluation metrics and design guidelines for graph data augmentation. Lastly, it outlines the applications of GDAug at both the data and model levels, discusses open issues in the field, and looks forward to future directions. The latest advances in GDAug are summarized in GitHub.

Updated: 2024-06-21 05:50:54

标题: 图数据增强：技术调查

摘要: 最近几年，图表示学习在取得显著成功的同时也遭受了低质量数据问题。作为一种改善计算机视觉中数据质量的成熟技术，数据增强也在图领域引起了越来越多的关注。为了推动这一新兴方向的研究，本调查提供了对现有图数据增强（GDAug）技术的全面回顾和总结。具体而言，本调查首先概述了各种可行的分类法，并根据多尺度图元素对现有的GDAug研究进行分类。随后，针对每种类型的GDAug技术，本调查形式化了标准化的技术定义，讨论了技术细节，并提供了示意图。调查还审查了特定领域的图数据增强技术，包括异构图、时态图、时空图和超图。此外，本调查总结了可用的评估指标和图数据增强的设计指南。最后，它概述了GDAug在数据和模型层面的应用，讨论了该领域的未解问题，并展望了未来的发展方向。最新的GDAug进展在GitHub上总结。

更新时间: 2024-06-21 05:50:54

领域: cs.LG

下载: http://arxiv.org/abs/2212.09970v3

I don't trust you (anymore)! -- The effect of students' LLM use on Lecturer-Student-Trust in Higher Education

Trust plays a pivotal role in Lecturer-Student-Collaboration, encompassing teaching and research aspects. The advent of Large Language Models (LLMs) in platforms like Open AI's ChatGPT, coupled with their cost-effectiveness and high-quality results, has led to their rapid adoption among university students. However, discerning genuine student input from LLM-generated output poses a challenge for lecturers. This dilemma jeopardizes the trust relationship between lecturers and students, potentially impacting university downstream activities, particularly collaborative research initiatives. Despite attempts to establish guidelines for student LLM use, a clear framework mutually beneficial for lecturers and students in higher education remains elusive. This study addresses the research question: How does the use of LLMs by students impact Informational and Procedural Justice, influencing Team Trust and Expected Team Performance? Methodically, we applied a quantitative construct-based survey, evaluated using techniques of Structural Equation Modelling (PLS- SEM) to examine potential relationships among these constructs. Our findings based on 23 valid respondents from Ndejje University indicate that lecturers are less concerned about the fairness of LLM use per se but are more focused on the transparency of student utilization, which significantly influences Team Trust positively. This research contributes to the global discourse on integrating and regulating LLMs and subsequent models in education. We propose that guidelines should support LLM use while enforcing transparency in Lecturer-Student- Collaboration to foster Team Trust and Performance. The study contributes valuable insights for shaping policies enabling ethical and transparent LLMs usage in education to ensure effectiveness of collaborative learning environments.

Updated: 2024-06-21 05:35:57

标题: 我不再信任你！-- 学生在高等教育中使用LMS对讲师-学生信任的影响

摘要: 信任在讲师-学生-合作中起着关键作用，涵盖教学和研究方面。大型语言模型（LLMs）如Open AI的ChatGPT等平台的出现，加上它们的成本效益和高质量结果，导致它们在大学生中迅速被采用。然而，区分真实的学生输入和LLM生成的输出对讲师来说是一种挑战。这种困境危及了讲师和学生之间的信任关系，可能影响大学的下游活动，特别是协作研究计划。尽管已经尝试制定学生LLM使用的指导方针，但在高等教育中对讲师和学生都有益的清晰框架仍然难以捉摸。本研究探讨了一个研究问题：学生如何使用LLMs影响信息和程序公正，从而影响团队信任和预期团队绩效？在方法上，我们应用了基于量化构建的调查，采用结构方程建模技术（PLS-SEM）评估这些构建之间潜在的关系。我们的研究结果基于来自Ndejje大学的23名有效受访者，表明讲师对LLM使用的公平性本身并不太关注，而更关注学生利用的透明度，这对团队信任产生了积极影响。这项研究为整合和规范LLMs以及随后的教育模型的全球讨论做出了贡献。我们建议指导方针应支持LLM的使用，并在讲师-学生-合作中强调透明度，以促进团队信任和绩效。该研究为塑造政策提供了有价值的见解，使在教育中使用道德和透明的LLMs成为可能，以确保协作学习环境的有效性。

更新时间: 2024-06-21 05:35:57

领域: cs.CY,cs.AI,cs.ET,cs.HC,cs.LG,K.3.1; K.4.2; K.4.3; J.4; H.0; I.2.0

下载: http://arxiv.org/abs/2406.14871v1

Deep learning empowered sensor fusion to improve infant movement classification

There is a recent boom in the development of AI solutions to facilitate and enhance diagnostic procedures for established clinical tools. To assess the integrity of the developing nervous system, the Prechtl general movement assessment (GMA) is recognized for its clinical value in diagnosing neurological impairments in early infancy. GMA has been increasingly augmented through machine learning approaches intending to scale-up its application, circumvent costs in the training of human assessors and further standardize classification of spontaneous motor patterns. Available deep learning tools, all of which are based on single sensor modalities, are however still considerably inferior to that of well-trained human assessors. These approaches are hardly comparable as all models are designed, trained and evaluated on proprietary/silo-data sets. With this study we propose a sensor fusion approach for assessing fidgety movements (FMs) comparing three different sensor modalities (pressure, inertial, and visual sensors). Various combinations and two sensor fusion approaches (late and early fusion) for infant movement classification were tested to evaluate whether a multi-sensor system outperforms single modality assessments. The performance of the three-sensor fusion (classification accuracy of 94.5\%) was significantly higher than that of any single modality evaluated, suggesting the sensor fusion approach is a promising avenue for automated classification of infant motor patterns. The development of a robust sensor fusion system may significantly enhance AI-based early recognition of neurofunctions, ultimately facilitating automated early detection of neurodevelopmental conditions.

Updated: 2024-06-21 05:24:28

标题: 深度学习赋能传感器融合以提高婴儿运动分类

摘要: 最近，人工智能解决方案的发展在促进和增强已建立的临床工具的诊断程序方面出现了繁荣。为了评估发育中的神经系统的完整性，Prechtl一般运动评估（GMA）被认为在早期婴儿神经损伤诊断中具有临床价值。通过机器学习方法不断增强GMA，旨在扩大其应用范围，避免培训人类评估员的成本，并进一步标准化自发运动模式的分类。然而，目前可用的基于单传感器模态的深度学习工具仍然明显逊于经过良好训练的人类评估员。这些方法几乎无法比较，因为所有模型都是在专有/孤立数据集上设计、训练和评估的。本研究提出了一种传感器融合方法，用于评估不安静运动（FMs），比较了三种不同的传感器模态（压力、惯性和视觉传感器）。测试了各种组合和两种传感器融合方法（后期融合和前期融合）以评估多传感器系统是否优于单模态评估。三传感器融合的性能（分类准确率为94.5％）显著高于任何单一模态评估，表明传感器融合方法是自动分类婴儿运动模式的一个有前途的途径。建立稳健的传感器融合系统可能显著增强基于人工智能的早期神经功能识别，最终促进神经发育状况的自动早期检测。

更新时间: 2024-06-21 05:24:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09014v3

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Recently, considerable efforts have been directed towards compressing Large Language Models (LLMs), which showcase groundbreaking capabilities across diverse applications but entail significant deployment costs due to their large sizes. Meanwhile, much less attention has been given to mitigating the costs associated with deploying multiple LLMs of varying sizes despite its practical significance. Thus, this paper introduces \emph{any-precision LLM}, extending the concept of any-precision DNN to LLMs. Addressing challenges in any-precision LLM, we propose a lightweight method for any-precision quantization of LLMs, leveraging a post-training quantization framework, and develop a specialized software engine for its efficient serving. As a result, our solution significantly reduces the high costs of deploying multiple, different-sized LLMs by overlaying LLMs quantized to varying bit-widths, such as 3, 4, ..., $n$ bits, into a memory footprint comparable to a single $n$-bit LLM. All the supported LLMs with varying bit-widths demonstrate state-of-the-art model quality and inference throughput, proving itself to be a compelling option for deployment of multiple, different-sized LLMs. Our code is open-sourced and available online.

Updated: 2024-06-21 05:20:56

标题: Any-Precision LLM: 多个不同尺寸的低成本部署

摘要: 最近，人们已经付出了相当多的努力来压缩大型语言模型（LLMs），这些模型展示了突破性的能力，适用于各种应用，但由于其庞大的尺寸而导致部署成本显著增加。与此同时，尽管部署多个不同尺寸的LLMs的成本在实际中具有重要意义，但却没有得到足够的关注。因此，本文介绍了“任意精度LLM”，将任意精度深度神经网络（DNN）的概念扩展到LLMs。针对任意精度LLM面临的挑战，我们提出了一种轻量级的任意精度LLM的量化方法，利用后训练量化框架，并开发了一个专门的软件引擎用于高效地提供服务。因此，我们的解决方案通过将LLMs叠加到不同比特宽度（如3、4、...、n比特）的量化LLMs中，将部署多个不同尺寸的LLMs的高成本显著降低到与单个n比特LLM相当的内存占用。所有支持的具有不同比特宽度的LLMs展示了最先进的模型质量和推理吞吐量，证明了它是部署多个不同尺寸的LLMs的一个引人注目的选择。我们的代码是开源的，并且在线提供。

更新时间: 2024-06-21 05:20:56

领域: cs.LG

下载: http://arxiv.org/abs/2402.10517v4

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90\%$. Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization, suggesting new directions in the presence of both benefits and pitfalls of reconstruction for pruning LLMs.

Updated: 2024-06-21 05:13:34

标题: 重新思考修剪大型语言模型：重建误差最小化的好处和风险

摘要: 这项工作建议从根本上重新思考当前修剪大型语言模型（LLMs）的做法。这样做的方法是通过分而治之：将模型分成子模型，依次修剪它们，并逐个在小的校准数据上重建密集对应模型的预测；最终模型是通过简单地将得到的稀疏子模型组合在一起获得的。虽然这种方法能够在内存限制下进行修剪，但会产生较高的重建误差。在这项工作中，我们首先提出了一系列重建技术，可以将这种错误显著减少超过90%。然而，不知不觉中，我们发现最小化重建误差并不总是理想的，可能会过度拟合给定的校准数据，导致语言困惑度增加，下游任务表现不佳。我们发现一种自动生成校准数据的策略可以缓解重建和泛化之间的权衡，为修剪LLMs提供了新的方向，同时也提出了有关重建的利与弊。

更新时间: 2024-06-21 05:13:34

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.15524v1

Direct Multi-Turn Preference Optimization for Language Agents

Adapting Large Language Models (LLMs) for agent tasks is critical in developing language agents. Direct Preference Optimization (DPO) is a promising technique for this adaptation with the alleviation of compounding errors, offering a means to directly optimize Reinforcement Learning (RL) objectives. However, applying DPO to multi-turn tasks presents challenges due to the inability to cancel the partition function. Overcoming this obstacle involves making the partition function independent of the current state and addressing length disparities between preferred and dis-preferred trajectories. In this light, we replace the policy constraint with the state-action occupancy measure constraint in the RL objective and add length normalization to the Bradley-Terry model, yielding a novel loss function named DMPO for multi-turn agent tasks with theoretical explanations. Extensive experiments on three multi-turn agent task datasets confirm the effectiveness and superiority of the DMPO loss.

Updated: 2024-06-21 05:13:20

标题: 语言代理的直接多轮偏好优化

摘要: 将大型语言模型（LLMs）调整为智能体任务对于开发语言智能体至关重要。直接偏好优化（DPO）是一种有前途的技术，可以通过减轻复合错误来适应此调整，提供直接优化强化学习（RL）目标的手段。然而，将DPO应用于多轮任务存在挑战，因为无法取消分区函数。克服这个障碍涉及使分区函数独立于当前状态，并解决首选和不首选轨迹之间的长度差异。在这种情况下，我们将策略约束替换为RL目标中的状态-动作占用度量约束，并在Bradley-Terry模型中添加长度归一化，得到一种名为DMPO的新型损失函数，用于多轮智能体任务，并提供了理论解释。对三个多轮智能体任务数据集进行的广泛实验验证了DMPO损失的有效性和优越性。

更新时间: 2024-06-21 05:13:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.14868v1

LayerMatch: Do Pseudo-labels Benefit All Layers?

Deep neural networks have achieved remarkable performance across various tasks when supplied with large-scale labeled data. However, the collection of labeled data can be time-consuming and labor-intensive. Semi-supervised learning (SSL), particularly through pseudo-labeling algorithms that iteratively assign pseudo-labels for self-training, offers a promising solution to mitigate the dependency of labeled data. Previous research generally applies a uniform pseudo-labeling strategy across all model layers, assuming that pseudo-labels exert uniform influence throughout. Contrasting this, our theoretical analysis and empirical experiment demonstrate feature extraction layer and linear classification layer have distinct learning behaviors in response to pseudo-labels. Based on these insights, we develop two layer-specific pseudo-label strategies, termed Grad-ReLU and Avg-Clustering. Grad-ReLU mitigates the impact of noisy pseudo-labels by removing the gradient detrimental effects of pseudo-labels in the linear classification layer. Avg-Clustering accelerates the convergence of feature extraction layer towards stable clustering centers by integrating consistent outputs. Our approach, LayerMatch, which integrates these two strategies, can avoid the severe interference of noisy pseudo-labels in the linear classification layer while accelerating the clustering capability of the feature extraction layer. Through extensive experimentation, our approach consistently demonstrates exceptional performance on standard semi-supervised learning benchmarks, achieving a significant improvement of 10.38% over baseline method and a 2.44% increase compared to state-of-the-art methods.

Updated: 2024-06-21 05:09:28

标题: LayerMatch：伪标签是否有益于所有层？

摘要: 深度神经网络在提供大规模标记数据时，在各种任务上取得了显著的性能。然而，收集标记数据可能耗时且劳动密集。半监督学习（SSL），尤其是通过伪标记算法迭代分配伪标记进行自我训练，提供了一个有希望的解决方案来减少对标记数据的依赖。先前的研究通常在所有模型层上应用统一的伪标记策略，假设伪标记在整个过程中产生统一的影响。与此形成对比，我们的理论分析和实证实验表明特征提取层和线性分类层对伪标记具有不同的学习行为。基于这些见解，我们开发了两种层特定的伪标记策略，称为Grad-ReLU和Avg-Clustering。Grad-ReLU通过消除线性分类层中伪标记的梯度有害效应来减轻嘈杂伪标记的影响。Avg-Clustering通过整合一致的输出加速特征提取层向稳定聚类中心的收敛。我们的方法LayerMatch集成了这两种策略，可以避免线性分类层中嘈杂伪标记的严重干扰，同时加速特征提取层的聚类能力。通过大量实验，我们的方法在标准半监督学习基准上始终表现出色，相较基线方法实现了显著改进10.38%，较最先进方法提高了2.44%。

更新时间: 2024-06-21 05:09:28

领域: cs.LG

下载: http://arxiv.org/abs/2406.14207v2

DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent application of LLMs for code generation is iterative code repair, where a model fixes an incorrect program by rationalizing about errors and generating a new program. However, code repair is primarily studied on high-resource languages like Python, and the framework's efficacy is under-explored on low-resource languages. To apply code repair for low-resource languages, we propose Distilling Low-Resource Repairs (DistiLRR), an approach that transfers the reasoning and code generation ability from a teacher model to a student model. Our results show that DistiLRR consistently outperforms baselines on low-resource languages, but has similar performance on high-resource languages. To investigate this behavior, we perform a further analysis and find that the correlation between rationale quality and code correctness is weaker than previously perceived. We hypothesize this weakness is magnified in low-resource settings where base models lack deep knowledge of a programming language, leading to wavering benefits of code repair between high-resource and low-resource languages.

Updated: 2024-06-21 05:05:39

标题: DistiLRR：将代码修复技术应用于低资源编程语言

摘要: 大型语言模型（LLMs）在代码生成任务中表现出色。LLMs最近在代码生成中的一个应用是迭代式代码修复，其中模型通过对错误进行合理化分析并生成新程序来修复不正确的程序。然而，代码修复主要是在高资源语言如Python上进行研究的，而在低资源语言上该框架的效果尚未得到充分探索。为了将代码修复应用于低资源语言，我们提出了Distilling Low-Resource Repairs（DistiLRR）方法，该方法将教师模型的推理和代码生成能力转移到学生模型上。我们的结果显示，DistiLRR在低资源语言上始终优于基线模型，但在高资源语言上表现相似。为了探究这种行为，我们进行了进一步分析，并发现合理性质量与代码正确性之间的相关性比以前认为的要弱。我们假设这种弱点在低资源环境中被放大，因为基础模型缺乏对编程语言的深入了解，导致代码修复在高资源语言和低资源语言之间的效益出现波动。

更新时间: 2024-06-21 05:05:39

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.14867v1

AI-based Anomaly Detection for Clinical-Grade Histopathological Diagnostics

While previous studies have demonstrated the potential of AI to diagnose diseases in imaging data, clinical implementation is still lagging behind. This is partly because AI models require training with large numbers of examples only available for common diseases. In clinical reality, however, only few diseases are common, whereas the majority of diseases are less frequent (long-tail distribution). Current AI models overlook or misclassify these diseases. We propose a deep anomaly detection approach that only requires training data from common diseases to detect also all less frequent diseases. We collected two large real-world datasets of gastrointestinal biopsies, which are prototypical of the problem. Herein, the ten most common findings account for approximately 90% of cases, whereas the remaining 10% contained 56 disease entities, including many cancers. 17 million histological images from 5,423 cases were used for training and evaluation. Without any specific training for the diseases, our best-performing model reliably detected a broad spectrum of infrequent ("anomalous") pathologies with 95.0% (stomach) and 91.0% (colon) AUROC and generalized across scanners and hospitals. By design, the proposed anomaly detection can be expected to detect any pathological alteration in the diagnostic tail of gastrointestinal biopsies, including rare primary or metastatic cancers. This study establishes the first effective clinical application of AI-based anomaly detection in histopathology that can flag anomalous cases, facilitate case prioritization, reduce missed diagnoses and enhance the general safety of AI models, thereby driving AI adoption and automation in routine diagnostics and beyond.

Updated: 2024-06-21 04:59:19

标题: 基于人工智能的临床级组织病理学诊断异常检测

摘要: 先前的研究已经证明了人工智能在诊断影像数据中疾病的潜力，但临床应用仍然滞后。部分原因是因为人工智能模型需要大量的训练样本，这些样本仅适用于常见疾病。然而，在临床实践中，只有少数疾病是常见的，而大多数疾病是不太常见的（长尾分布）。当前的人工智能模型忽视或错误分类这些疾病。我们提出了一种深度异常检测方法，只需要来自常见疾病的训练数据，就可以检测所有不太常见的疾病。我们收集了两个大型真实世界的胃肠生物检验数据集，这些数据集是该问题的典型代表。在这里，十种最常见的发现约占了90%的病例，而剩下的10%包含了56种疾病实体，包括许多癌症。来自5,423例病例的1,700万个组织学图像用于训练和评估。在没有针对这些疾病的特定训练的情况下，我们表现最佳的模型可可靠地检测到95.0%（胃）和91.0%（结肠）的AUROC，并且在不同的扫描仪和医院之间具有泛化能力。根据设计，预期所提出的异常检测可以检测出胃肠生物检验中诊断尾部的任何病理变化，包括罕见的原发性或转移性癌症。这项研究建立了基于人工智能异常检测的组织病理学在临床应用中的首次有效性，可以标记异常病例，促进病例优先处理，减少漏诊，并增强人工智能模型的整体安全性，从而推动人工智能在常规诊断以及更广泛领域的自动化应用。

更新时间: 2024-06-21 04:59:19

领域: cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.14866v1

REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability

Understanding the agent's learning process, particularly the factors that contribute to its success or failure post-training, is crucial for comprehending the rationale behind the agent's decision-making process. Prior methods clarify the learning process by creating a structural causal model (SCM) or visually representing the distribution of value functions. Nevertheless, these approaches have constraints as they exclusively function in 2D-environments or with uncomplicated transition dynamics. Understanding the agent's learning process in complicated environments or tasks is more challenging. In this paper, we propose REVEAL-IT, a novel framework for explaining the learning process of an agent in complex environments. Initially, we visualize the policy structure and the agent's learning process for various training tasks. By visualizing these findings, we can understand how much a particular training task or stage affects the agent's performance in test. Then, a GNN-based explainer learns to highlight the most important section of the policy, providing a more clear and robust explanation of the agent's learning process. The experiments demonstrate that explanations derived from this framework can effectively help in the optimization of the

Updated: 2024-06-21 04:58:39

标题: REVEAL-IT：具有演化代理策略可见性的强化学习，用于可解释性

摘要: 了解代理学习过程，特别是在培训后导致其成功或失败的因素，对于理解代理决策过程背后的原因至关重要。先前的方法通过创建结构因果模型（SCM）或直观表示价值函数的分布来阐明学习过程。然而，这些方法存在约束，因为它们仅在二维环境或具有简单转换动态的情况下运作。在复杂环境或任务中理解代理的学习过程更具挑战性。在本文中，我们提出了一个名为REVEAL-IT的新框架，用于解释代理在复杂环境中的学习过程。首先，我们可视化不同训练任务的策略结构和代理的学习过程。通过可视化这些发现，我们可以了解特定训练任务或阶段对代理在测试中表现的影响程度。然后，基于GNN的解释器学习突出策略中最重要的部分，提供更清晰和稳健的解释代理学习过程的说明。实验证明，从这个框架中得出的解释可以有效地帮助优化代理的操作。

更新时间: 2024-06-21 04:58:39

领域: cs.AI

下载: http://arxiv.org/abs/2406.14214v2

A review of feature selection strategies utilizing graph data structures and knowledge graphs

Feature selection in Knowledge Graphs (KGs) are increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through this comprehensive review, we aim to catalyze further innovation in feature selection for KGs, paving the way for more insightful, efficient, and interpretable analytical models across various domains. Our exploration reveals the critical importance of scalability, accuracy, and interpretability in feature selection techniques, advocating for the integration of domain knowledge to refine the selection process. We highlight the burgeoning potential of multi-objective optimization and interdisciplinary collaboration in advancing KG feature selection, underscoring the transformative impact of such methodologies on precision medicine, among other fields. The paper concludes by charting future directions, including the development of scalable, dynamic feature selection algorithms and the integration of explainable AI principles to foster transparency and trust in KG-driven models.

Updated: 2024-06-21 04:50:02

标题: 利用图数据结构和知识图谱的特征选择策略综述

摘要: 知识图谱（KGs）中的特征选择在不同领域中越来越多地被利用，包括生物医学研究、自然语言处理（NLP）和个性化推荐系统。本文深入探讨了KGs内的特征选择方法论，强调它们在增强机器学习（ML）模型效力、假设生成和可解释性方面的作用。通过这一全面回顾，我们旨在催生更多关于KGs特征选择的创新，为各个领域打造更具洞察力、高效和可解释性的分析模型铺平道路。我们的探索揭示了在特征选择技术中可伸缩性、准确性和可解释性的关键重要性，主张将领域知识整合到选择过程中以进一步完善。我们强调多目标优化和跨学科合作在推动KG特征选择方面具备潜在的增长潜力，突显这些方法论对精准医学等领域的转变性影响。本文最后指出了未来的方向，包括开发可伸缩、动态的特征选择算法以及整合可解释人工智能原则以促进KG驱动模型的透明度和信任。

更新时间: 2024-06-21 04:50:02

领域: cs.LG,stat.AP,stat.ML

下载: http://arxiv.org/abs/2406.14864v1

Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks

Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we propose a novel hardware-software co-design approach for DNN intellectual property (IP) protection that capitalizes on the inherent aging characteristics of circuits and a novel differential orientation fine-tuning (DOFT) to ensure effective protection. Hardware-wise, we employ random aging to produce authorized chips. This process circumvents the need for chip redesign, thereby eliminating any additional hardware overhead during the inference procedure of DNNs. Moreover, the authorized chips demonstrate a considerable disparity in DNN inference performance when compared to unauthorized chips. Software-wise, we propose a novel DOFT, which allows pre-trained DNNs to maintain their original accuracy on authorized chips with minimal fine-tuning, while the model's performance on unauthorized chips is reduced to random guessing. Extensive experiments on various models, including MLP, VGG, ResNet, Mixer, and SwinTransformer, with lightweight binary and practical multi-bit weights demonstrate that the proposed method achieves effective IP protection, with only 10\% accuracy on unauthorized chips, while preserving nearly the original accuracy on authorized ones.

Updated: 2024-06-21 04:49:17

标题: 年长且更有智慧：设备老化与深度神经网络知识产权保护的结合

摘要: 深度神经网络（DNNs），如广泛使用的具有数十亿参数的GPT-3，通常由于高昂的训练成本和围绕用于训练它们的数据的隐私问题而保密。以往保护DNN的方法通常需要昂贵的电路重新设计，导致额外的开销，如增加的面积、能耗和延迟。为了解决这些问题，我们提出了一种新颖的硬件-软件协同设计方法，用于DNN知识产权（IP）保护，利用电路固有的老化特性和一种新颖的差异方向微调（DOFT）来确保有效的保护。在硬件方面，我们采用随机老化来生产授权芯片。这个过程避免了芯片重新设计的需要，从而在DNN的推理过程中消除了任何额外的硬件开销。此外，与未经授权的芯片相比，授权的芯片在DNN推理性能上表现出明显的差异。在软件方面，我们提出了一种新颖的DOFT，允许预训练的DNN在经过最少微调的情况下保持在授权芯片上的原始准确性，而在未经授权的芯片上，模型的性能降低到随机猜测。对包括MLP、VGG、ResNet、Mixer和SwinTransformer在内的各种模型进行了大量实验，使用轻量级二进制和实用的多比特权重，结果表明所提出的方法实现了有效的IP保护，在未经授权的芯片上仅有10%的准确率，同时在授权芯片上保留了几乎原始的准确性。

更新时间: 2024-06-21 04:49:17

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2406.14863v1

DiffTOP: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

This paper introduces DiffTOP, which utilizes Differentiable Trajectory OPtimization as the policy representation to generate actions for deep reinforcement and imitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function. The key to our approach is to leverage the recent progress in differentiable trajectory optimization, which enables computing the gradients of the loss with respect to the parameters of trajectory optimization. As a result, the cost and dynamics functions of trajectory optimization can be learned end-to-end. DiffTOP addresses the ``objective mismatch'' issue of prior model-based RL algorithms, as the dynamics model in DiffTOP is learned to directly maximize task performance by differentiating the policy gradient loss through the trajectory optimization process. We further benchmark DiffTOP for imitation learning on standard robotic manipulation task suites with high-dimensional sensory observations and compare our method to feed-forward policy classes as well as Energy-Based Models (EBM) and Diffusion. Across 15 model-based RL tasks and 35imitation learning tasks with high-dimensional image and point cloud inputs, DiffTOP outperforms prior state-of-the-art methods in both domains.

Updated: 2024-06-21 04:46:15

标题: DiffTOP：深度强化学习和模仿学习的可微轨迹优化

摘要: 本文介绍了DiffTOP，它利用可微轨迹优化作为策略表示来生成深度强化学习和模仿学习的动作。轨迹优化是控制中强大且广泛使用的算法，由成本和动力学函数参数化。我们方法的关键是利用最近在可微轨迹优化方面的进展，这使得可以计算损失相对于轨迹优化参数的梯度。因此，轨迹优化的成本和动力学函数可以端到端地学习。DiffTOP解决了先前基于模型的RL算法的“目标不匹配”问题，因为DiffTOP中的动力学模型被学习以通过轨迹优化过程直接最大化任务性能，通过策略梯度损失的微分。我们进一步在具有高维感知观测的标准机器人操作任务套件上对DiffTOP进行了模仿学习的基准测试，并将我们的方法与前馈策略类、能量基模型（EBM）和扩散模型进行了比较。在15个基于模型的RL任务和35个具有高维图像和点云输入的模仿学习任务中，DiffTOP在两个领域中均优于先前的最先进方法。

更新时间: 2024-06-21 04:46:15

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2402.05421v2

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. LatentExplainer tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. By perturbing latent variables and interpreting changes in generated data, the framework provides a systematic approach to understanding and controlling the data generation process, enhancing the transparency and interpretability of deep generative models. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations of latent variables.

Updated: 2024-06-21 04:39:03

标题: 潜在解释器：使用多模态基础模型解释深度生成模型中的潜在表示

摘要: 深度生成模型如VAEs和扩散模型通过利用潜变量学习数据分布并生成高质量样本，推动了各种生成任务的发展。尽管可解释AI领域在解释机器学习模型方面取得了进展，但理解生成模型中的潜变量仍具有挑战性。本文介绍了LatentExplainer，一个用于自动生成深度生成模型中潜变量的语义意义解释的框架。LatentExplainer应对了三个主要挑战：推断潜变量的含义，将解释与归纳偏差对齐，以及处理不同程度的可解释性。通过扰动潜变量并解释生成数据的变化，该框架提供了一个系统化方法来理解和控制数据生成过程，增强了深度生成模型的透明度和可解释性。我们在几个真实和合成数据集上评估了我们提出的方法，结果表明在生成潜变量的高质量解释方面表现出优越性能。

更新时间: 2024-06-21 04:39:03

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.14862v1

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs.

Updated: 2024-06-21 04:33:48

标题: 从LLMs到MLLMs：探索多模式越狱的领域

摘要: 大型语言模型（LLMs）和多模态大型语言模型（MLLMs）的快速发展暴露了各种对抗性攻击的漏洞。本文提供了针对LLMs和MLLMs的越狱研究的综合概述，重点介绍了评估基准、攻击技术和防御策略方面的最新进展。与单模态越狱更加先进的状态相比，多模态领域仍未被充分探索。我们总结了多模态越狱的限制和潜在研究方向，旨在激发未来研究并进一步增强MLLMs的鲁棒性和安全性。

更新时间: 2024-06-21 04:33:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.14859v1

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles

A popular approach for solving zero-sum games is to maintain populations of policies to approximate the Nash Equilibrium (NE). Previous studies have shown that Policy Space Response Oracle (PSRO) algorithm is an effective multi-agent reinforcement learning framework for solving such games. However, repeatedly training new policies from scratch to approximate Best Response (BR) to opponents' mixed policies at each iteration is both inefficient and costly. While some PSRO variants initialize a new policy by inheriting from past BR policies, this approach limits the exploration of new policies, especially against challenging opponents. To address this issue, we propose Fusion-PSRO, which employs policy fusion to initialize policies for better approximation to BR. By selecting high-quality base policies from meta-NE, policy fusion fuses the base policies into a new policy through model averaging. This approach allows the initialized policies to incorporate multiple expert policies, making it easier to handle difficult opponents compared to inheriting from past BR policies or initializing from scratch. Moreover, our method only modifies the policy initialization phase, allowing its application to nearly all PSRO variants without additional training overhead. Our experiments on non-transitive matrix games, Leduc Poker, and the more complex Liars Dice demonstrate that Fusion-PSRO enhances the performance of nearly all PSRO variants, achieving lower exploitability.

Updated: 2024-06-21 04:28:53

标题: Fusion-PSRO：用于策略空间响应神谕的纳什策略融合

摘要: 解决零和博弈的一种流行方法是维护一组策略来近似纳什均衡（NE）。先前的研究表明，策略空间响应预言者（PSRO）算法是解决这类游戏的有效多智能体强化学习框架。然而，反复训练新策略以近似每次迭代中对手混合策略的最佳响应（BR）既低效又昂贵。虽然一些PSRO变体通过继承过去的BR策略来初始化新策略，但这种方法限制了对新策略的探索，尤其是对抗具有挑战性对手时。为解决这个问题，我们提出了融合-PSRO，它利用策略融合来初始化策略以更好地近似BR。通过从元-NE中选择高质量的基础策略，策略融合通过模型平均将基础策略融合成一个新策略。这种方法允许初始化的策略结合多个专家策略，使其相对于继承过去的BR策略或从头开始初始化更容易处理困难对手。此外，我们的方法仅修改了策略初始化阶段，使其适用于几乎所有PSRO变体，而无需额外的训练开销。我们在非传递矩阵游戏、Leduc扑克和更复杂的说谎骰子上的实验表明，融合-PSRO提高了几乎所有PSRO变体的性能，实现了更低的剥削性。

更新时间: 2024-06-21 04:28:53

领域: cs.GT,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2405.21027v4

Concept Prerequisite Relation Prediction by Using Permutation-Equivariant Directed Graph Neural Networks

This paper studies the problem of CPRP, concept prerequisite relation prediction, which is a fundamental task in using AI for education. CPRP is usually formulated into a link-prediction task on a relationship graph of concepts and solved by training the graph neural network (GNN) model. However, current directed GNNs fail to manage graph isomorphism which refers to the invariance of non-isomorphic graphs, reducing the expressivity of resulting representations. We present a permutation-equivariant directed GNN model by introducing the Weisfeiler-Lehman test into directed GNN learning. Our method is then used for CPRP and evaluated on three public datasets. The experimental results show that our model delivers better prediction performance than the state-of-the-art methods.

Updated: 2024-06-21 04:12:56

标题: 使用置换等变有向图神经网络预测概念先决关系

摘要: 这篇论文研究了CPRP，即概念先决关系预测的问题，这是在教育中使用人工智能的基本任务。CPRP通常被制定为在概念关系图上的链接预测任务，并通过训练图神经网络（GNN）模型来解决。然而，当前的有向GNN无法处理图同构性问题，这指的是非同构图的不变性，降低了结果表示的表达能力。我们提出了一个置换等变的有向GNN模型，通过将Weisfeiler-Lehman测试引入有向GNN学习中。然后我们将该方法用于CPRP，并在三个公共数据集上进行评估。实验结果显示，我们的模型比最先进的方法提供了更好的预测性能。

更新时间: 2024-06-21 04:12:56

领域: cs.LG,cs.AI,68T07,I.2.6

下载: http://arxiv.org/abs/2312.09802v2

Composite Concept Extraction through Backdooring

Learning composite concepts, such as \textquotedbl red car\textquotedbl , from individual examples -- like a white car representing the concept of \textquotedbl car\textquotedbl{} and a red strawberry representing the concept of \textquotedbl red\textquotedbl -- is inherently challenging. This paper introduces a novel method called Composite Concept Extractor (CoCE), which leverages techniques from traditional backdoor attacks to learn these composite concepts in a zero-shot setting, requiring only examples of individual concepts. By repurposing the trigger-based model backdooring mechanism, we create a strategic distortion in the manifold of the target object (e.g., \textquotedbl car\textquotedbl ) induced by example objects with the target property (e.g., \textquotedbl red\textquotedbl ) from objects \textquotedbl red strawberry\textquotedbl , ensuring the distortion selectively affects the target objects with the target property. Contrastive learning is then employed to further refine this distortion, and a method is formulated for detecting objects that are influenced by the distortion. Extensive experiments with in-depth analysis across different datasets demonstrate the utility and applicability of our proposed approach.

Updated: 2024-06-21 04:11:33

标题: 通过后门手段进行复合概念提取

摘要: 学习复合概念，比如"红色汽车"，从个体例子中学习是困难的。比如，一个白色汽车代表了"汽车"的概念，一个红色草莓代表了"红色"的概念。本文介绍了一种名为复合概念抽取器（CoCE）的新方法，利用传统后门攻击技术来学习这些复合概念，只需要个体概念的例子。通过重新利用基于触发器的模型后门机制，在目标对象（例如"汽车"）的流形中创建战略性扭曲，由具有目标属性（例如"红色"）的示例对象（例如"红色草莓"）引起，确保扭曲有选择性地影响具有目标属性的目标对象。然后采用对比学习进一步完善这种扭曲，并制定一种方法来检测受扭曲影响的对象。通过对不同数据集进行深入分析的大量实验证明了我们提出的方法的效用和适用性。

更新时间: 2024-06-21 04:11:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13411v2

Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating a gap that hinders the application and evaluation of methods from one to the other. To bridge the gap, in this work, we present a Unified Benchmark for unsupervised Graph-level OOD and anomaly Detection (our method), a comprehensive evaluation framework that unifies GLAD and GLOD under the concept of generalized graph-level OOD detection. Our benchmark encompasses 35 datasets spanning four practical anomaly and OOD detection scenarios, facilitating the comparison of 16 representative GLAD/GLOD methods. We conduct multi-dimensional analyses to explore the effectiveness, generalizability, robustness, and efficiency of existing methods, shedding light on their strengths and limitations. Furthermore, we provide an open-source codebase (https://github.com/UB-GOLD/UB-GOLD) of our method to foster reproducible research and outline potential directions for future investigations based on our insights.

Updated: 2024-06-21 04:07:43

标题: 将无监督图级异常检测与异常分布检测统一起来：一个基准测试

摘要: 为了构建安全可靠的图机器学习系统，近年来无监督图级别异常检测（GLAD）和无监督图级别超出分布（OOD）检测（GLOD）受到了重视。尽管这两个研究方向确实有相同的目标，但由于评估设置不同，它们在社区中被独立研究，从而产生了一个阻碍方法在两者之间应用和评估的差距。为了弥合这一差距，在这项工作中，我们提出了一个统一的无监督图级OOD和异常检测基准（我们的方法），这是一个将GLAD和GLOD统一在广义图级OOD检测概念下的全面评估框架。我们的基准包括35个数据集，涵盖了四种实际的异常和OOD检测场景，便于比较16种代表性的GLAD/GLOD方法。我们进行多维分析，探索现有方法的有效性、泛化能力、稳健性和效率，揭示它们的优势和局限性。此外，我们提供了我们方法的开源代码库（https://github.com/UB-GOLD/UB-GOLD），以促进可重现研究，并根据我们的见解概述了未来研究的潜在方向。

更新时间: 2024-06-21 04:07:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.15523v1

Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis

Limited access to neurological care leads to missed diagnoses of Parkinson's disease (PD), leaving many individuals unidentified and untreated. We trained a novel neural network-based fusion architecture to detect Parkinson's disease (PD) by analyzing features extracted from webcam recordings of three tasks: finger tapping, facial expression (smiling), and speech (uttering a sentence containing all letters of the alphabet). Additionally, the model incorporated Monte Carlo Dropout to improve prediction accuracy by considering uncertainties. The study participants (n = 845, 272 with PD) were randomly split into three sets: 60% for training, 20% for model selection (hyper-parameter tuning), and 20% for final performance evaluation. The dataset consists of 1102 sessions, each session containing videos of all three tasks. Our proposed model achieved significantly better accuracy, area under the ROC curve (AUROC), and sensitivity at non-inferior specificity compared to any single-task model. Withholding uncertain predictions further boosted the performance, achieving 88.0% (95% CI: 87.7% - 88.4%) accuracy, 93.0% (92.8% - 93.2%) AUROC, 79.3% (78.4% - 80.2%) sensitivity, and 92.6% (92.3% - 92.8%) specificity, at the expense of not being able to predict for 2.3% (2.0% - 2.6%) data. Further analysis suggests that the trained model does not exhibit any detectable bias across sex and ethnic subgroups and is most effective for individuals aged between 50 and 80. This accessible, low-cost approach requiring only an internet-enabled device with a webcam and microphone paves the way for convenient PD screening at home, particularly in regions with limited access to clinical specialists.

Updated: 2024-06-21 04:02:19

标题: 可通过多任务视频分析在家检测帕金森病

摘要: 有限的神经病学护理导致帕金森病（PD）的漏诊，许多人未被识别和治疗。我们训练了一种新颖的基于神经网络的融合架构，通过分析从网络摄像头记录的三个任务提取的特征来检测帕金森病（PD）：手指敲击、面部表情（微笑）和言语（发出包含所有字母的句子）。此外，该模型还融合了蒙特卡洛辍学（Monte Carlo Dropout）以通过考虑不确定性来提高预测准确性。研究参与者（n = 845，272例PD）被随机分为三组：60%用于训练，20%用于模型选择（超参数调整），20%用于最终性能评估。数据集包含1102个会话，每个会话包含所有三个任务的视频。我们提出的模型在准确性、ROC曲线下面积（AUROC）和非劣性特异性下的灵敏度方面均显着优于任何单一任务模型。保留不确定性预测进一步提升了性能，实现了88.0%（95% CI：87.7% - 88.4%）的准确性，93.0%（92.8% - 93.2%）的AUROC，79.3%（78.4% - 80.2%）的灵敏度和92.6%（92.3% - 92.8%）的特异性，代价是无法预测2.3%（2.0% - 2.6%）的数据。进一步分析表明，经过训练的模型在性别和种族亚组之间没有任何可检测的偏见，并且对于50到80岁之间的个体最为有效。这种方便、低成本的方法只需要一个带有网络摄像头和麦克风的互联网设备，为家庭中方便的PD筛查铺平了道路，特别是在对临床专家的接入有限的地区。

更新时间: 2024-06-21 04:02:19

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2406.14856v1

Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations

Models trained on a labeled source domain (e.g., labeled images from wildlife camera traps) often generalize poorly when deployed on an out-of-distribution (OOD) target domain (e.g., images from new camera trap locations). In the domain adaptation setting where unlabeled target data is available, self-supervised pretraining (e.g., masked autoencoding or contrastive learning) is a promising method to mitigate this performance drop. Pretraining improves OOD error when the generic data augmentations used (e.g., masking or cropping) connect the source and target domains, which may be far apart in the input space. In this paper, we show on real-world tasks that standard fine-tuning after pretraining does not consistently improve OOD error over simply training from scratch on labeled source data. To better leverage pretraining for distribution shifts, we propose Connect Later: after pretraining with generic augmentations, fine-tune with targeted augmentations designed with knowledge of the distribution shift. Pretraining learns good representations within the source and target domains, while targeted augmentations connect the domains better during fine-tuning. Connect Later improves average OOD error over standard fine-tuning and supervised learning with targeted augmentations on 4 real-world datasets: Connect Later achieves the state-of-the-art on astronomical time-series classification (AstroClassification) by 2.5%, wildlife species identification (iWildCam-WILDS) with ResNet-50 by 0.9%, and tumor identification (Camelyon17-WILDS) with DenseNet121 by 1.1%; as well as best performance on a new dataset for astronomical time-series redshift prediction (Redshifts) by 0.03 RMSE (11% relative). Code and datasets are available at https://github.com/helenqu/connect-later.

Updated: 2024-06-21 04:01:26

标题: 连接稍后：使用有针对性的增强技术改进微调以提高鲁棒性

摘要: 在标记的源域（例如，来自野生动物摄像机陷阱的标记图像）上训练的模型在部署到分布外（OOD）目标域（例如，来自新摄像机陷阱位置的图像）时通常泛化能力较差。在领域适应设置中，当有未标记的目标数据可用时，自监督预训练（例如，掩模自编码或对比学习）是一种有希望缓解性能下降的方法。预训练通过改善OOD错误，当使用的通用数据增强（例如，掩蔽或裁剪）将源域和目标域连接起来时，这两者可能在输入空间中相距甚远。在本文中，我们展示了在真实任务中，标准微调在预训练后并不一致地改善OOD错误，而不仅仅是从头开始训练标记的源数据。为了更好地利用预训练来处理分布转移，我们提出了Connect Later：在使用通用增强预训练后，使用具有分布转移知识设计的有针对性增强进行微调。预训练在源域和目标域内学习良好的表示，而有针对性的增强在微调过程中更好地连接了这两个域。Connect Later在4个真实世界数据集上比标准微调和带有有针对性增强的监督学习平均OOD错误有所改进：Connect Later在天文时间序列分类（AstroClassification）上的表现比最先进提高了2.5％，在野生动物物种识别（iWildCam-WILDS）上，使用ResNet-50提高了0.9％，在肿瘤识别（Camelyon17-WILDS）上，使用DenseNet121提高了1.1％；以及在新的天文时间序列红移预测（Redshifts）数据集上表现最佳，RMSE提高了0.03（相对增加11％）。代码和数据集可在https://github.com/helenqu/connect-later找到。

更新时间: 2024-06-21 04:01:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.03325v2

Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts. To mitigate these risks, concept removal methods have been proposed. These methods aim to modify diffusion models to prevent the generation of malicious and unwanted concepts. Despite these efforts, existing research faces several challenges: (1) a lack of consistent comparisons on a comprehensive dataset, (2) ineffective prompts in harmful and nudity concepts, (3) overlooked evaluation of the ability to generate the benign part within prompts containing malicious concepts. To address these gaps, we propose to benchmark the concept removal methods by introducing a new dataset, Six-CD, along with a novel evaluation metric. In this benchmark, we conduct a thorough evaluation of concept removals, with the experimental observations and discussions offering valuable insights in the field.

Updated: 2024-06-21 03:58:44

标题: Six-CD：用于良性文本到图像扩散模型的概念去除基准测试

摘要: 文本到图像（T2I）扩散模型已经展示出在生成与文本提示紧密对应的图像方面的异常能力。然而，T2I扩散模型的进步带来了重大风险，因为这些模型可能被用于恶意目的，比如生成带有暴力或裸露内容的图像，或在不恰当的背景下创建未经授权的公众人物肖像。为了减轻这些风险，一些概念去除方法已经被提出。这些方法旨在修改扩散模型，以防止生成恶意和不想要的概念。尽管有这些努力，现有研究面临几个挑战：（1）缺乏对全面数据集的一致比较，（2）在有害和裸露概念中使用无效的提示，（3）忽视了在包含恶意概念的提示中生成良性部分的能力的评估。为了填补这些空白，我们提出通过引入一个新的数据集Six-CD以及一种新颖的评估指标来对概念去除方法进行基准测试。在这个基准测试中，我们对概念去除进行了彻底评估，实验观察和讨论提供了有价值的见解。

更新时间: 2024-06-21 03:58:44

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.14855v1

PEANO-ViT: Power-Efficient Approximations of Non-Linearities in Vision Transformers

The deployment of Vision Transformers (ViTs) on hardware platforms, specially Field-Programmable Gate Arrays (FPGAs), presents many challenges, which are mainly due to the substantial computational and power requirements of their non-linear functions, notably layer normalization, softmax, and Gaussian Error Linear Unit (GELU). These critical functions pose significant obstacles to efficient hardware implementation due to their complex mathematical operations and the inherent resource count and architectural limitations of FPGAs. PEANO-ViT offers a novel approach to streamlining the implementation of the layer normalization layer by introducing a division-free technique that simultaneously approximates the division and square root function. Additionally, PEANO-ViT provides a multi-scale division strategy to eliminate division operations in the softmax layer, aided by a Pade-based approximation for the exponential function. Finally, PEANO-ViT introduces a piece-wise linear approximation for the GELU function, carefully designed to bypass the computationally intensive operations associated with GELU. In our comprehensive evaluations, PEANO-ViT exhibits minimal accuracy degradation (<= 0.5% for DeiT-B) while significantly enhancing power efficiency, achieving improvements of 1.91x, 1.39x, 8.01x for layer normalization, softmax, and GELU, respectively. This improvement is achieved through substantial reductions in DSP, LUT, and register counts for these non-linear operations. Consequently, PEANO-ViT enables efficient deployment of Vision Transformers on resource- and power-constrained FPGA platforms.

Updated: 2024-06-21 03:54:10

标题: PEANO-ViT: 视觉变换器中非线性的高效能近似

摘要: Vision Transformers（ViTs）在硬件平台上的部署，尤其是在可编程门阵列（FPGAs）上，面临许多挑战，主要是由于其非线性函数（特别是层归一化、softmax和高斯误差线性单元（GELU））的大量计算和功耗要求。这些关键函数由于其复杂的数学运算和FPGAs固有的资源计数和架构限制，对高效硬件实现构成重大障碍。PEANO-ViT提供了一种新颖的方法，通过引入一种无除法技术来简化层归一化层的实现，同时近似除法和平方根函数。此外，PEANO-ViT提供了一个多尺度除法策略，通过基于Pade的指数函数近似来消除softmax层中的除法操作。最后，PEANO-ViT引入了一种分段线性逼近GELU函数，精心设计以绕过与GELU相关的计算密集型操作。在我们的全面评估中，PEANO-ViT表现出最小的准确性下降（对于DeiT-B <= 0.5%），同时显著提高功耗效率，分别实现了层归一化、softmax和GELU的1.91倍、1.39倍和8.01倍的改进。通过对这些非线性操作的DSP、LUT和寄存器计数的大幅减少，实现了这一改进。因此，PEANO-ViT使得Vision Transformers能够在资源和功耗受限的FPGA平台上进行高效部署。

更新时间: 2024-06-21 03:54:10

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.14854v1

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains. Despite this promise, spatial understanding and reasoning -- a fundamental component of human cognition -- remains under-explored. We develop novel benchmarks that cover diverse aspects of spatial reasoning such as relationship understanding, navigation, and counting. We conduct a comprehensive evaluation of competitive language and vision-language models. Our findings reveal several counter-intuitive insights that have been overlooked in the literature: (1) Spatial reasoning poses significant challenges where competitive models can fall behind random guessing; (2) Despite additional visual input, VLMs often under-perform compared to their LLM counterparts; (3) When both textual and visual information is available, multi-modal language models become less reliant on visual information if sufficient textual clues are provided. Additionally, we demonstrate that leveraging redundancy between vision and text can significantly enhance model performance. We hope our study will inform the development of multimodal models to improve spatial intelligence and further close the gap with human intelligence.

Updated: 2024-06-21 03:53:37

标题: 一个图片是否值得千言万语？深入探讨视觉语言模型的空间推理

摘要: 大型语言模型（LLMs）和视觉-语言模型（VLMs）在各种任务和领域中展现出了显著的性能。尽管有这种前景，空间理解和推理——人类认知的基本组成部分——仍未得到充分探索。我们开发了涵盖关系理解、导航和计数等多方面空间推理的新基准。我们对竞争性语言和视觉-语言模型进行了全面评估。我们的研究发现了一些在文献中被忽视的反直觉观点：（1）空间推理提出了重大挑战，竞争性模型可能落后于随机猜测；（2）尽管有额外的视觉输入，VLMs通常表现不如LLM对应模型；（3）当文本和视觉信息均可用时，多模态语言模型在提供足够的文本线索时会更少依赖于视觉信息。此外，我们证明利用视觉和文本之间的冗余可以显著提高模型性能。我们希望我们的研究能够为发展多模态模型改进空间智能并进一步弥补与人类智能之间的差距提供信息。

更新时间: 2024-06-21 03:53:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.14852v1

Graph Edge Representation via Tensor Product Graph Convolutional Representation

Graph Convolutional Networks (GCNs) have been widely studied. The core of GCNs is the definition of convolution operators on graphs. However, existing Graph Convolution (GC) operators are mainly defined on adjacency matrix and node features and generally focus on obtaining effective node embeddings which cannot be utilized to address the graphs with (high-dimensional) edge features. To address this problem, by leveraging tensor contraction representation and tensor product graph diffusion theories, this paper analogously defines an effective convolution operator on graphs with edge features which is named as Tensor Product Graph Convolution (TPGC). The proposed TPGC aims to obtain effective edge embeddings. It provides a complementary model to traditional graph convolutions (GCs) to address the more general graph data analysis with both node and edge features. Experimental results on several graph learning tasks demonstrate the effectiveness of the proposed TPGC.

Updated: 2024-06-21 03:21:26

标题: 通过张量积图卷积表示图边

摘要: 图卷积网络（GCNs）已经广泛研究。GCNs的核心是在图上定义卷积运算符。然而，现有的图卷积（GC）运算符主要是在邻接矩阵和节点特征上定义的，并且通常专注于获取有效的节点嵌入，这些嵌入不能用来处理具有（高维）边特征的图。为了解决这个问题，本文利用张量收缩表示和张量乘积图扩散理论，类比地在具有边特征的图上定义了一种有效的卷积运算符，称为张量乘积图卷积（TPGC）。提出的TPGC旨在获取有效的边嵌入。它为传统图卷积（GCs）提供了一个补充模型，以处理具有节点和边特征的更一般的图数据分析。在几个图学习任务上的实验结果证明了所提出的TPGC的有效性。

更新时间: 2024-06-21 03:21:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.14846v1

SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering

The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature extraction and fusion processes more challenging. Euclidean space is difficult to effectively represent multi-dimensional relationships of data. Especially when extracting and processing data with a tree structure or hierarchical structure, Euclidean space is not suitable as an embedding space. Additionally, the self-attention mechanism in Transformers is effective in capturing the dynamic relationships between elements in a sequence. However, the self-attention mechanism's limitations in window modeling and quadratic computational complexity reduce its effectiveness in modeling long sequences. To address these limitations, we propose SHMamba: Structured Hyperbolic State Space Model to integrate the advantages of hyperbolic geometry and state space models. Specifically, SHMamba leverages the intrinsic properties of hyperbolic space to represent hierarchical structures and complex relationships in audio-visual data. Meanwhile, the state space model captures dynamic changes over time by globally modeling the entire sequence. Furthermore, we introduce an adaptive curvature hyperbolic alignment module and a cross fusion block to enhance the understanding of hierarchical structures and the dynamic exchange of cross-modal information, respectively. Extensive experiments demonstrate that SHMamba outperforms previous methods with fewer parameters and computational costs. Our learnable parameters are reduced by 78.12\%, while the average performance improves by 2.53\%. Experiments show that our method demonstrates superiority among all current major methods and is more suitable for practical application scenarios.

Updated: 2024-06-21 03:13:45

标题: SHMamba：用于音频视觉问答的结构化双曲状态空间模型

摘要: 音视频问答（AVQA）任务具有重要的应用潜力。与传统的单模态方法相比，AVQA的多模态输入使特征提取和融合过程更具挑战性。欧几里得空间难以有效地表示数据的多维关系。特别是在提取和处理具有树状结构或层次结构的数据时，欧几里得空间不适合作为嵌入空间。此外，Transformer中的自注意机制在捕捉序列中元素之间的动态关系方面是有效的。然而，自注意机制在窗口建模和二次计算复杂度方面的限制降低了其在建模长序列方面的有效性。为解决这些限制，我们提出了SHMamba：结构化双曲状态空间模型，以整合双曲几何和状态空间模型的优势。具体而言，SHMamba利用双曲空间的固有属性来表示音视频数据中的层次结构和复杂关系。同时，状态空间模型通过全局建模整个序列来捕捉随时间动态变化。此外，我们引入了一个自适应曲率双曲对齐模块和一个交叉融合块，分别用于增强对层次结构的理解和跨模态信息的动态交换。大量实验证明，SHMamba在参数和计算成本较少的情况下优于先前的方法。我们的可学习参数减少了78.12％，而平均性能提高了2.53％。实验证明，我们的方法在所有当前主要方法中表现出优势，并更适合于实际应用场景。

更新时间: 2024-06-21 03:13:45

领域: cs.AI,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.09833v2

DN-CL: Deep Symbolic Regression against Noise via Contrastive Learning

Noise ubiquitously exists in signals due to numerous factors including physical, electronic, and environmental effects. Traditional methods of symbolic regression, such as genetic programming or deep learning models, aim to find the most fitting expressions for these signals. However, these methods often overlook the noise present in real-world data, leading to reduced fitting accuracy. To tackle this issue, we propose \textit{\textbf{D}eep Symbolic Regression against \textbf{N}oise via \textbf{C}ontrastive \textbf{L}earning (DN-CL)}. DN-CL employs two parameter-sharing encoders to embed data points from various data transformations into feature shields against noise. This model treats noisy data and clean data as different views of the ground-truth mathematical expressions. Distances between these features are minimized, utilizing contrastive learning to distinguish between 'positive' noise-corrected pairs and 'negative' contrasting pairs. Our experiments indicate that DN-CL demonstrates superior performance in handling both noisy and clean data, presenting a promising method of symbolic regression.

Updated: 2024-06-21 03:13:40

标题: DN-CL：通过对比学习进行抗噪深度符号回归

摘要: 噪声普遍存在于信号中，由于许多因素包括物理、电子和环境效应。传统的符号回归方法，如遗传编程或深度学习模型，旨在找到这些信号的最合适表达式。然而，这些方法通常忽视现实数据中存在的噪声，导致拟合精度降低。为了解决这个问题，我们提出了通过对比学习的深度符号回归抵抗噪声（DN-CL）。DN-CL采用两个参数共享的编码器将来自各种数据转换的数据点嵌入特征屏蔽以对抗噪声。该模型将有噪声的数据和干净数据视为地面真实数学表达式的不同视图。利用对比学习最小化这些特征之间的距离，以区分“正面”噪声校正对和“负面”对比对。我们的实验表明，DN-CL在处理有噪声和干净数据方面表现出卓越性能，展示了一种有前途的符号回归方法。

更新时间: 2024-06-21 03:13:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.14844v1

Image anomaly detection and prediction scheme based on SSA optimized ResNet50-BiGRU model

Image anomaly detection is a popular research direction, with many methods emerging in recent years due to rapid advancements in computing. The use of artificial intelligence for image anomaly detection has been widely studied. By analyzing images of athlete posture and movement, it is possible to predict injury status and suggest necessary adjustments. Most existing methods rely on convolutional networks to extract information from irrelevant pixel data, limiting model accuracy. This paper introduces a network combining Residual Network (ResNet) and Bidirectional Gated Recurrent Unit (BiGRU), which can predict potential injury types and provide early warnings by analyzing changes in muscle and bone poses from video images. To address the high complexity of this network, the Sparrow search algorithm was used for optimization. Experiments conducted on four datasets demonstrated that our model has the smallest error in image anomaly detection compared to other models, showing strong adaptability. This provides a new approach for anomaly detection and predictive analysis in images, contributing to the sustainable development of human health and performance.

Updated: 2024-06-21 03:11:38

标题: 基于SSA优化的ResNet50-BiGRU模型的图像异常检测和预测方案

摘要: 图像异常检测是一个热门的研究方向，由于计算技术的快速发展，近年来涌现了许多方法。利用人工智能进行图像异常检测已经得到广泛研究。通过分析运动员姿势和动作的图像，可以预测受伤状态并建议必要的调整。大多数现有方法依赖于卷积网络从无关像素数据中提取信息，限制了模型的准确性。本文介绍了一种结合残差网络（ResNet）和双向门控循环单元（BiGRU）的网络，通过分析视频图像中肌肉和骨骼姿势的变化，可以预测潜在的受伤类型并提供早期警报。为了解决这个网络的高复杂性，采用了Sparrow搜索算法进行优化。在四个数据集上进行的实验表明，与其他模型相比，我们的模型在图像异常检测中具有最小的误差，表现出强大的适应性。这为图像异常检测和预测分析提供了一种新方法，有助于人类健康和表现的可持续发展。

更新时间: 2024-06-21 03:11:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13987v2

Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework

Structured tabular data is a fundamental data type in numerous fields, and the capacity to reason over tables is crucial for answering questions and validating hypotheses. However, constructing labeled data for complex reasoning tasks is labor intensive, and the quantity of annotated data remains insufficient to support the intricate demands of real-world applications. To address the insufficient annotation challenge, we present a self-training framework for unsupervised complex tabular reasoning (UCTR-ST) by generating diverse synthetic data with complex logic. Specifically, UCTR-ST incorporates several essential techniques: we aggregate diverse programs and execute them on tables based on a "Program-Management" component, and we bridge the gap between programs and text with a powerful "Program-Transformation" module that generates natural language sentences with complex logic. Furthermore, we optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios. The entire framework utilizes self-training techniques to leverage the unlabeled training data, which results in significant performance improvements when tested on real-world data. Experimental results demonstrate that UCTRST achieves above 90% of the supervised model performance on different tasks and domains, reducing the dependence on manual annotation. Additionally, our approach can serve as a data augmentation technique, significantly boosting the performance of supervised models in low-resourced domains.

Updated: 2024-06-21 03:06:36

标题: 无监督复杂表格推理的优化技术：基于自训练框架的方法

摘要: 结构化表格数据是许多领域中的一种基本数据类型，能够对表格进行推理是回答问题和验证假设的关键。然而，为复杂推理任务构建带标签的数据是劳动密集的，而且注释数据的数量仍然不足以支持现实应用的复杂需求。为了解决不足的注释挑战，我们提出了一种自学习框架用于无监督复杂表格推理（UCTR-ST），通过生成具有复杂逻辑的多样合成数据。具体来说，UCTR-ST结合了几种关键技术：我们基于“程序管理”组件聚合多样的程序并在表格上执行它们，通过强大的“程序转换”模块弥合程序和文本之间的差距，生成具有复杂逻辑的自然语言句子。此外，我们使用“表格-文本操作器”优化过程，处理联合表格-文本推理场景。整个框架利用自学习技术利用未标记的训练数据，在真实数据上测试时表现出显著的性能改进。实验结果表明，UCTRST在不同任务和领域上实现了超过90%的监督模型性能，减少了对手动注释的依赖。此外，我们的方法可以作为数据增强技术，显著提高低资源领域监督模型的性能。

更新时间: 2024-06-21 03:06:36

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2212.10097v2

TabularMark: Watermarking Tabular Datasets for Machine Learning

Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we watermark tabular datasets without significantly compromising their utility for training ML models while preventing attackers from training usable ML models on attacked datasets? In this paper, we propose a hypothesis testing-based watermarking scheme, TabularMark. Data noise partitioning is utilized for data perturbation during embedding, which is adaptable for numerical and categorical attributes while preserving the data utility. For detection, a custom-threshold one proportion z-test is employed, which can reliably determine the presence of the watermark. Experiments on real-world and synthetic datasets demonstrate the superiority of TabularMark in detectability, non-intrusiveness, and robustness.

Updated: 2024-06-21 02:58:45

标题: TabularMark：用于机器学习的水印化表格数据集

摘要: 数字水印技术广泛应用于保护共享数据的所有权，同时保持数据的实用性。然而，现有的用于表格数据集的水印方法在所需属性（可检测性、非侵入性和鲁棒性）方面存在不足，并且仅从数据统计的角度保留数据的实用性，忽略了在数据集上训练的下游ML模型的性能。我们是否可以在不显著损害表格数据集的实用性的情况下为其添加水印，同时防止攻击者在受攻击的数据集上训练可用的ML模型？在本文中，我们提出了一种基于假设检验的水印方案TabularMark。在嵌入过程中利用数据噪声分区对数据进行扰动，该方法适用于数值和分类属性，并保持数据的实用性。在检测方面，采用了自定义阈值的一比例z检验，可以可靠地确定水印的存在。对真实和合成数据集的实验表明，TabularMark在可检测性、非侵入性和鲁棒性方面具有优越性。

更新时间: 2024-06-21 02:58:45

领域: cs.CR,cs.DB,cs.LG

下载: http://arxiv.org/abs/2406.14841v1

FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as another line of rigorous verification, are maintained with comprehensive rules and theorems. In this paper, we propose FVEL, an interactive Formal Verification Environment with LLMs. Specifically, FVEL transforms a given code to be verified into Isabelle, and then conducts verification via neural automated theorem proving with an LLM. The joined paradigm leverages the rigorous yet abundant formulated and organized rules in Isabelle and is also convenient for introducing and adjusting cutting-edge LLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER dataset includes code dependencies and verification processes that are formulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646 proof steps in total with in-depth dependencies. We benchmark FVELER in the FVEL environment by first fine-tuning LLMs with FVELER and then evaluating them on Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned Llama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 -> 84) more problems in SV-COMP. And the proportion of proof errors is reduced. Project page: https://fveler.github.io/.

Updated: 2024-06-21 02:51:41

标题: FVEL：利用定理证明的大语言模型进行交互式形式验证环境

摘要: 形式验证（FV）在当前新兴的大型语言模型（LLMs）通过程序合成中变得越来越重要。然而，当前的形式验证主要依赖符号验证器或手工规则，导致了验证的广泛和灵活性的限制。另一方面，用于自动定理证明的形式语言，如Isabelle，作为另一种严格验证的方式，通过维护全面的规则和定理。在本文中，我们提出了一个与LLMs互动的形式验证环境FVEL。具体来说，FVEL将待验证的代码转换为Isabelle，然后通过带有LLMs的神经自动定理证明进行验证。这种结合的范式利用了Isabelle中严谨而丰富的规则，并且方便介绍和调整尖端的LLMs。为了实现这一目标，我们提取了一个大规模的FVELER3。FVELER数据集包括在Isabelle中制定的代码依赖关系和验证过程，总共包含758个理论，29,125个引理和200,646个证明步骤，具有深入的依赖关系。我们首先通过使用FVELER对LLMs进行微调，然后在Code2Inv和SV-COMP上对它们进行评估来在FVEL环境中对FVELER进行基准测试。结果显示，使用FVELER微调后的Llama3- 8B在SV-COMP中解决了17.39%（69-> 81）更多问题，Mistral-7B解决了12%（75-> 84）更多问题。证明错误的比例也有所降低。项目页面：https://fveler.github.io/。

更新时间: 2024-06-21 02:51:41

领域: cs.AI,cs.CL,cs.LG,cs.MS

下载: http://arxiv.org/abs/2406.14408v2

Automated architectural space layout planning using a physics-inspired generative design framework

The determination of space layout is one of the primary activities in the schematic design stage of an architectural project. The initial layout planning defines the shape, dimension, and circulation pattern of internal spaces; which can also affect performance and cost of the construction. When carried out manually, space layout planning can be complicated, repetitive and time consuming. In this work, a generative design framework for the automatic generation of spatial architectural layout has been developed. The proposed approach integrates a novel physics-inspired parametric model for space layout planning and an evolutionary optimisation metaheuristic. Results revealed that such a generative design framework can generate a wide variety of design suggestions at the schematic design stage, applicable to complex design problems.

Updated: 2024-06-21 02:50:52

标题: 使用物理启发的生成设计框架进行自动化建筑空间布局规划

摘要: 空间布局确定是建筑项目原始设计阶段的主要活动之一。初始布局规划定义了内部空间的形状、尺寸和流通模式；这也会影响建筑的性能和成本。手动进行空间布局规划可能会复杂、重复且耗时。本研究开发了一个用于自动生成空间建筑布局的生成设计框架。所提出的方法集成了一种新颖的受物理启发的参数模型用于空间布局规划，以及一种进化优化元启发式方法。结果表明，这样的生成设计框架可以在原始设计阶段生成各种设计建议，适用于复杂的设计问题。

更新时间: 2024-06-21 02:50:52

领域: cs.AI

下载: http://arxiv.org/abs/2406.14840v1

QxEAI: Quantum-like evolutionary algorithm for automated probabilistic forecasting

Forecasting, to estimate future events, is crucial for business and decision-making. This paper proposes QxEAI, a methodology that produces a probabilistic forecast that utilizes a quantum-like evolutionary algorithm based on training a quantum-like logic decision tree and a classical value tree on a small number of related time series. We demonstrate how the application of our quantum-like evolutionary algorithm to forecasting can overcome the challenges faced by classical and other machine learning approaches. By using three real-world datasets (Dow Jones Index, retail sales, gas consumption), we show how our methodology produces accurate forecasts while requiring little to none manual work.

Updated: 2024-06-21 02:45:04

标题: QxEAI：用于自动概率预测的量子式进化算法

摘要: 预测未来事件对于企业和决策制定至关重要。本文提出了QxEAI方法，该方法利用基于训练量子般逻辑决策树和经典价值树的量子般进化算法生成概率预测。我们展示了如何将我们的量子般进化算法应用于预测可以克服经典和其他机器学习方法所面临的挑战。通过使用三个真实世界数据集（道琼斯指数、零售销售、燃气消费），我们展示了我们的方法如何产生准确的预测，同时几乎不需要手动工作。

更新时间: 2024-06-21 02:45:04

领域: physics.soc-ph,cs.AI,cs.LG,cs.NE,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2405.03701v2

Bayesian neural networks for predicting uncertainty in full-field material response

Stress and material deformation field predictions are among the most important tasks in computational mechanics. These predictions are typically made by solving the governing equations of continuum mechanics using finite element analysis, which can become computationally prohibitive considering complex microstructures and material behaviors. Machine learning (ML) methods offer potentially cost effective surrogates for these applications. However, existing ML surrogates are either limited to low-dimensional problems and/or do not provide uncertainty estimates in the predictions. This work proposes an ML surrogate framework for stress field prediction and uncertainty quantification for diverse materials microstructures. A modified Bayesian U-net architecture is employed to provide a data-driven image-to-image mapping from initial microstructure to stress field with prediction (epistemic) uncertainty estimates. The Bayesian posterior distributions for the U-net parameters are estimated using three state-of-the-art inference algorithms: the posterior sampling-based Hamiltonian Monte Carlo method and two variational approaches, the Monte-Carlo Dropout method and the Bayes by Backprop algorithm. A systematic comparison of the predictive accuracy and uncertainty estimates for these methods is performed for a fiber reinforced composite material and polycrystalline microstructure application. It is shown that the proposed methods yield predictions of high accuracy compared to the FEA solution, while uncertainty estimates depend on the inference approach. Generally, the Hamiltonian Monte Carlo and Bayes by Backprop methods provide consistent uncertainty estimates. Uncertainty estimates from Monte Carlo Dropout, on the other hand, are more difficult to interpret and depend strongly on the method's design.

Updated: 2024-06-21 02:43:25

标题: 贝叶斯神经网络用于预测完整材料响应中的不确定性

摘要: 应力和材料变形场的预测是计算力学中最重要的任务之一。这些预测通常通过使用有限元分析求解连续介质力学的控制方程来实现，考虑到复杂的微结构和材料行为，这可能变得计算上禁止。机器学习（ML）方法为这些应用提供了潜在的具有成本效益的替代方案。然而，现有的ML替代方案要么仅限于低维问题，要么在预测中不提供不确定性估计。本文提出了一种用于不同材料微结构的应力场预测和不确定性量化的ML替代框架。采用修改后的Bayesian U-net架构，提供了从初始微结构到应力场的数据驱动图像到图像映射，同时提供了预测（认识）不确定性估计。使用三种最先进的推断算法（基于后验采样的汉密尔顿蒙特卡洛方法和两种变分方法，蒙特卡洛辍学方法和Bayes by Backprop算法）估计U-net参数的贝叶斯后验分布。对于增强纤维复合材料和多晶微结构应用进行了这些方法的预测准确性和不确定性估计的系统比较。结果表明，与有限元解相比，所提出的方法产生了高准确度的预测，而不确定性估计取决于推断方法。通常，汉密尔顿蒙特卡洛和Bayes by Backprop方法提供一致的不确定性估计。另一方面，蒙特卡洛辍学方法的不确定性估计更难解释，并且在很大程度上取决于该方法的设计。

更新时间: 2024-06-21 02:43:25

领域: stat.ML,cond-mat.mtrl-sci,cs.LG,stat.AP

下载: http://arxiv.org/abs/2406.14838v1

Hierarchical Path-planning from Speech Instructions with Spatial Concept-based Topometric Semantic Mapping

Assisting individuals in their daily activities through autonomous mobile robots, especially for users without specialized knowledge, is crucial. Specifically, the capability of robots to navigate to destinations based on human speech instructions is essential. While robots can take different paths to the same goal, the shortest path is not always the best. A preferred approach is to accommodate waypoint specifications flexibly, planning an improved alternative path, even with detours. Additionally, robots require real-time inference capabilities. This study aimed to realize a hierarchical spatial representation using a topometric semantic map and path planning with speech instructions, including waypoints. This paper presents Spatial Concept-based Topometric Semantic Mapping for Hierarchical Path Planning (SpCoTMHP), integrating place connectivity. This approach offers a novel integrated probabilistic generative model and fast approximate inference across hierarchy levels. A formulation based on control as probabilistic inference theoretically supports the proposed path planning algorithm. We conducted experiments in home environments using the Toyota Human Support Robot on the SIGVerse simulator and in a lab-office environment with the real robot, Albert. Users issued speech commands specifying the waypoint and goal, such as "Go to the bedroom via the corridor." Navigation experiments using speech instructions with a waypoint demonstrated a performance improvement of SpCoTMHP over the baseline hierarchical path planning method with heuristic path costs (HPP-I), in terms of the weighted success rate at which the robot reaches the closest target and passes the correct waypoints, by 0.590. The computation time was significantly accelerated by 7.14 seconds with SpCoTMHP compared to baseline HPP-I in advanced tasks.

Updated: 2024-06-21 02:41:16

标题: 基于空间概念的拓扑语义映射的语音指令下的分层路径规划

摘要: 通过自主移动机器人帮助个人进行日常活动，特别是对于没有专业知识的用户来说至关重要。具体来说，机器人根据人类语音指令导航到目的地的能力是必不可少的。虽然机器人可以采取不同的路径达到同样的目标，但最短路径并不总是最好的。一种更好的方法是灵活地适应航点规范，规划一条改进的替代路径，即使有绕道。此外，机器人需要实时推断能力。本研究旨在利用拓扑语义地图实现分层空间表示，并通过语音指令进行路径规划，包括航点。本文介绍了基于空间概念的拓扑语义地图路径规划（SpCoTMHP），整合了地点连接性。这种方法提供了一种新颖的集成概率生成模型，并在层次结构水平上进行快速近似推断。基于控制作为概率推断的公式理论上支持了提出的路径规划算法。我们在家庭环境中使用Toyota人类支持机器人在SIGVerse模拟器上进行实验，并在实际机器人Albert的实验室办公室环境中进行了实验。用户发出了指定航点和目标的语音命令，例如“通过走廊去卧室”。使用带有航点的语音指令进行导航实验显示，相对于具有启发式路径成本（HPP-I）的基线分层路径规划方法，SpCoTMHP的性能提升了0.590，以达到机器人到达最近目标并通过正确航点的加权成功率。在复杂任务中，与基线HPP-I相比，SpCoTMHP的计算时间显著加快了7.14秒。

更新时间: 2024-06-21 02:41:16

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2203.10820v3

SPL: A Socratic Playground for Learning Powered by Large Language Model

Dialogue-based Intelligent Tutoring Systems (ITSs) have significantly advanced adaptive and personalized learning by automating sophisticated human tutoring strategies within interactive dialogues. However, replicating the nuanced patterns of expert human communication remains a challenge in Natural Language Processing (NLP). Recent advancements in NLP, particularly Large Language Models (LLMs) such as OpenAI's GPT-4, offer promising solutions by providing human-like and context-aware responses based on extensive pre-trained knowledge. Motivated by the effectiveness of LLMs in various educational tasks (e.g., content creation and summarization, problem-solving, and automated feedback provision), our study introduces the Socratic Playground for Learning (SPL), a dialogue-based ITS powered by the GPT-4 model, which employs the Socratic teaching method to foster critical thinking among learners. Through extensive prompt engineering, SPL can generate specific learning scenarios and facilitates efficient multi-turn tutoring dialogues. The SPL system aims to enhance personalized and adaptive learning experiences tailored to individual needs, specifically focusing on improving critical thinking skills. Our pilot experimental results from essay writing tasks demonstrate SPL has the potential to improve tutoring interactions and further enhance dialogue-based ITS functionalities. Our study, exemplified by SPL, demonstrates how LLMs enhance dialogue-based ITSs and expand the accessibility and efficacy of educational technologies.

Updated: 2024-06-21 02:36:10

标题: SPL：由大型语言模型驱动的苏格拉底学习游乐场

摘要: 基于对话的智能辅导系统（ITSs）通过在互动对话中自动化复杂的人类辅导策略，显著推进了自适应和个性化学习。然而，在自然语言处理（NLP）中复制专家人类沟通的微妙模式仍然是一个挑战。近年来，特别是OpenAI的GPT-4等大型语言模型（LLMs）在NLP领域的最新进展，通过提供基于广泛预训练知识的类人和上下文感知的响应，提供了有希望的解决方案。受LLMs在各种教育任务（如内容创作和总结、问题解决和自动反馈提供）中的有效性启发，我们的研究引入了基于对话的ITS——Socratic Learning Playground（SPL），由GPT-4模型驱动，采用苏格拉底教学方法培养学习者的批判性思维。通过广泛的提示工程，SPL可以生成特定的学习场景，并促进高效的多轮辅导对话。SPL系统旨在提升个性化和适应性学习体验，特别关注提高批判性思维能力。我们在论文写作任务中的初步实验结果显示，SPL有潜力改善辅导互动并进一步增强基于对话的ITS功能。我们的研究以SPL为例，展示了LLMs如何增强基于对话的ITS，并扩大教育技术的可访问性和有效性。

更新时间: 2024-06-21 02:36:10

领域: cs.AI

下载: http://arxiv.org/abs/2406.13919v2

ToVo: Toxicity Taxonomy via Voting

Existing toxic detection models face significant limitations, such as lack of transparency, customization, and reproducibility. These challenges stem from the closed-source nature of their training data and the paucity of explanations for their evaluation mechanism. To address these issues, we propose a dataset creation mechanism that integrates voting and chain-of-thought processes, producing a high-quality open-source dataset for toxic content detection. Our methodology ensures diverse classification metrics for each sample and includes both classification scores and explanatory reasoning for the classifications. We utilize the dataset created through our proposed mechanism to train our model, which is then compared against existing widely-used detectors. Our approach not only enhances transparency and customizability but also facilitates better fine-tuning for specific use cases. This work contributes a robust framework for developing toxic content detection models, emphasizing openness and adaptability, thus paving the way for more effective and user-specific content moderation solutions.

Updated: 2024-06-21 02:35:30

标题: ToVo：通过投票进行毒性分类

摘要: 现有的毒性检测模型存在显著的局限性，例如缺乏透明度、定制性和可重复性。这些挑战源于其训练数据的闭源性和评估机制缺乏解释。为了解决这些问题，我们提出了一个集成投票和思维链过程的数据集创建机制，制作了一个高质量的开源数据集，用于检测有毒内容。我们的方法确保每个样本具有多样化的分类指标，并包括分类分数和分类的解释推理。我们利用通过我们提出的机制创建的数据集来训练我们的模型，然后将其与现有广泛使用的检测器进行比较。我们的方法不仅增强了透明度和定制性，还有助于更好地针对特定用例进行微调。这项工作为开发毒性内容检测模型提供了一个强大的框架，强调开放性和适应性，从而为更有效和用户特定的内容管理解决方案铺平道路。

更新时间: 2024-06-21 02:35:30

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.14835v1

Measuring Sample Importance in Data Pruning for Training LLMs from a Data Compression Perspective

Compute-efficient training of large language models (LLMs) has become an important research problem. In this work, we consider data pruning as a method of data-efficient training of LLMs, where we take a data compression view on data pruning. We argue that the amount of information of a sample, or the achievable compression on its description length, represents its sample importance. The key idea is that, less informative samples are likely to contain redundant information, and thus should be pruned first. We leverage log-likelihood function of trained models as a surrogate to measure information content of samples. Experiments reveal a surprising insight that information-based pruning can enhance the generalization capability of the model, improves upon language modeling and downstream tasks as compared to the model trained on the entire dataset.

Updated: 2024-06-21 02:30:32

标题: 从数据压缩角度衡量数据修剪对LLM训练的样本重要性

摘要: 大型语言模型（LLMs）的高效训练已成为一个重要的研究问题。在这项工作中，我们将数据修剪视为LLMs的高效训练方法，其中我们从数据压缩的角度看待数据修剪。我们认为样本的信息量，或者其描述长度的可压缩性，代表了其样本重要性。关键思想是，信息较少的样本很可能包含冗余信息，因此应该首先被修剪。我们利用训练模型的对数似然函数作为衡量样本信息内容的替代指标。实验证明，基于信息的修剪可以增强模型的泛化能力，相较于在整个数据集上训练的模型，提高了语言建模和下游任务的性能。

更新时间: 2024-06-21 02:30:32

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14124v2

Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?

In this work, we present empirical results regarding the feasibility of using offline large language models (LLMs) in the context of electronic design automation (EDA). The goal is to investigate and evaluate a contemporary language model's (Llama-2-7B) ability to function as a microelectronic Q & A expert as well as its reasoning, and generation capabilities in solving microelectronic-related problems. Llama-2-7B was tested across a variety of adaptation methods, including introducing a novel low-rank knowledge distillation (LoRA-KD) scheme. Our experiments produce both qualitative and quantitative results.

Updated: 2024-06-21 02:25:57

标题: 低秩知识蒸馏在LLMs中对微电子推理是否有用？

摘要: 在这项工作中，我们提出了关于在电子设计自动化（EDA）领域使用离线大型语言模型（LLMs）的可行性的实证结果。我们的目标是调查和评估一个当代语言模型（Llama-2-7B）在作为微电子问答专家以及在解决与微电子相关问题时的推理和生成能力。Llama-2-7B在多种适应方法下进行了测试，包括引入一种新颖的低秩知识蒸馏（LoRA-KD）方案。我们的实验产生了定性和定量结果。

更新时间: 2024-06-21 02:25:57

领域: cs.LG

下载: http://arxiv.org/abs/2406.13808v2

Byzantine-Robust Decentralized Federated Learning

Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bottleneck at the server, and trust dependency issues. To address challenges, decentralized federated learning (DFL) architecture has been proposed to allow clients to train models collaboratively in a serverless and peer-to-peer manner. However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients. To date, only a limited number of Byzantine-robust DFL methods have been proposed, most of which are either communication-inefficient or remain vulnerable to advanced poisoning attacks. In this paper, we propose a new algorithm called BALANCE (Byzantine-robust averaging through local similarity in decentralization) to defend against poisoning attacks in DFL. In BALANCE, each client leverages its own local model as a similarity reference to determine if the received model is malicious or benign. We establish the theoretical convergence guarantee for BALANCE under poisoning attacks in both strongly convex and non-convex settings. Furthermore, the convergence rate of BALANCE under poisoning attacks matches those of the state-of-the-art counterparts in Byzantine-free settings. Extensive experiments also demonstrate that BALANCE outperforms existing DFL methods and effectively defends against poisoning attacks.

Updated: 2024-06-21 02:17:50

标题: 拜占庭鲁棒的分散式联邦学习

摘要: 联邦学习（FL）使多个客户端在不泄露其私有训练数据的情况下协作训练机器学习模型成为可能。在传统的FL中，系统遵循服务器辅助架构（服务器辅助FL），其中训练过程由中央服务器协调。然而，服务器辅助FL框架由于服务器存在通信瓶颈和信任依赖问题而面临着扩展性差的困境。为了解决这些挑战，提出了去中心化的联邦学习（DFL）架构，允许客户端以无服务器且点对点的方式协作训练模型。然而，由于其完全去中心化的性质，DFL极易受到毒化攻击的影响，恶意客户端可以通过向相邻客户端发送精心制作的本地模型来操纵系统。到目前为止，仅提出了有限数量的拜占庭-鲁棒DFL方法，其中大多数要么通信效率低下，要么仍然容易受到高级毒化攻击的影响。在本文中，我们提出了一种名为BALANCE（通过本地相似性实现对去中心化拜占庭-鲁棒平均的防御）的新算法，用于抵御DFL中的毒化攻击。在BALANCE中，每个客户端利用自己的本地模型作为相似性参考来确定接收到的模型是否恶意还是良性。我们在强凸和非凸设置下建立了BALANCE在毒化攻击下的理论收敛保证。此外，BALANCE在毒化攻击下的收敛速率与拜占庭-自由设置下的最先进对应方法相匹配。广泛的实验还表明，BALANCE优于现有的DFL方法，并有效防御毒化攻击。

更新时间: 2024-06-21 02:17:50

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.10416v3

L-DIT: A dApp for Live Detectability, Identifiability and Trackability for ASOs on the Behavioral Dynamics Blockchain

As the number of Anthropogenic Space Objects (ASOs) grows, there is an urgent need to ensure space safety, security, and sustainability (S3) for long-term space use. Currently, no globally effective method can quantify the safety, security, and Sustainability of all ASOs in orbit. Existing methods such as the Space Sustainability Rating (SSR) rely on volunteering private information to provide sustainability ratings. However, the need for such sensitive data might prove to be a barrier to adoption for space entities. For effective comparison of ASOs, the rating mechanism should apply to all ASOs, even retroactively, so that the sustainability of a single ASO can be assessed holistically. Lastly, geopolitical boundaries and alignments play a crucial and limiting role in a volunteered rating system, limiting the space safety, security, and sustainability. This work presents a Live Detectability, Identifiability, and Trackability (L-DIT) score through a distributed app (dApp) built on top of the Behavioral Dynamics blockchain (BDB). The BDB chain is a space situational awareness (SSA) chain that provides verified and cross-checked ASO data from multiple sources. This unique combination of consensus-based information from BDB and permissionless access to data allows the DIT scoring method presented here to be applied to all ASOs. While the underlying BDB chain collects, filters, and validates SSA data from various open (and closed if available) sources, the L-DIT dApp consumes the data from the chain to provide L-DIT score that can contribute towards an operator's, manufacturer's, or owner's sustainability practices. Our dApp provides data for all ASOs, allowing their sustainability score to be compared against other ASOs, regardless of geopolitical alignments, providing business value to entities such as space insurance providers and enabling compliance validation and enforcement.

Updated: 2024-06-21 02:02:33

标题: L-DIT：一种用于ASO在行为动态区块链上的实时检测、识别和追踪的dApp

摘要: 随着人为太空物体（ASOs）数量的增长，迫切需要确保长期太空利用的安全、安全和可持续性（S3）。目前，没有全球有效的方法可以量化轨道上所有ASOs的安全性、安全性和可持续性。现有方法如太空可持续性评级（SSR）依赖于志愿提供私人信息以提供可持续性评级。然而，对此类敏感数据的需求可能成为太空实体采用的障碍。为了有效比较ASOs，评级机制应适用于所有ASOs，甚至可以追溯，以便全面评估单个ASO的可持续性。最后，地缘政治边界和对齐在志愿评级系统中发挥关键和限制作用，限制太空安全、安全性和可持续性。本文通过建立在行为动态区块链（BDB）之上的分布式应用程序（dApp）提供了一种实时可检测性、可识别性和可跟踪性（L-DIT）评分。BDB链是一个提供来自多个来源的经过验证和交叉检查的ASO数据的空间态势感知（SSA）链。来自BDB的共识信息和对数据的无权限访问的独特组合使得本文介绍的DIT评分方法可以应用于所有ASOs。虽然基础BDB链从各种开放（如果有的话还有封闭）来源收集、过滤和验证SSA数据，但L-DIT dApp从链中提取数据，提供可以促进运营商、制造商或所有者可持续性实践的L-DIT评分。我们的dApp为所有ASOs提供数据，使其可持续性评分可以与其他ASOs进行比较，无论地缘政治对齐如何，为太空保险提供商等实体提供商业价值，并实现合规验证和执行。

更新时间: 2024-06-21 02:02:33

领域: cs.CR,astro-ph.IM

下载: http://arxiv.org/abs/2404.18350v2

A Comparative Study of Deep Learning and Iterative Algorithms for Joint Channel Estimation and Signal Detection in OFDM Systems

Joint channel estimation and signal detection (JCESD) is crucial in orthogonal frequency division multiplexing (OFDM) systems, but traditional algorithms perform poorly in low signal-to-noise ratio (SNR) scenarios. Deep learning (DL) methods have been investigated, but concerns regarding computational expense and lack of validation in low-SNR settings remain. Hence, the development of a robust and low-complexity model that can deliver excellent performance across a wide range of SNRs is highly desirable. In this paper, we aim to establish a benchmark where traditional algorithms and DL methods are validated on different channel models, Doppler, and SNR settings, particularly focusing on the semi-blind setting. In particular, we propose a new DL model where the backbone network is formed by unrolling the iterative algorithm, and the hyperparameters are estimated by hypernetworks. Additionally, we adapt a lightweight DenseNet to the task of JCESD for comparison. We evaluate different methods in three aspects: generalization in terms of bit error rate (BER), robustness, and complexity. Our results indicate that DL approaches outperform traditional algorithms in the challenging low-SNR setting, while the iterative algorithm performs better in high-SNR settings. Furthermore, the iterative algorithm is more robust in the presence of carrier frequency offset, whereas DL methods excel when signals are corrupted by asymmetric Gaussian noise.

Updated: 2024-06-21 02:02:09

标题: 一种用于OFDM系统中联合信道估计和信号检测的深度学习与迭代算法的比较研究

摘要: 联合信道估计和信号检测(JCESD)在正交频分复用(OFDM)系统中至关重要，但传统算法在低信噪比(SNR)场景下表现不佳。深度学习(DL)方法已经得到研究，但对计算开销和在低SNR环境中缺乏验证的担忧仍然存在。因此，开发一个既稳健又低复杂度的模型，能够在各种SNR范围内提供出色的性能是非常可取的。本文旨在建立一个基准，对传统算法和DL方法在不同信道模型、多普勒和SNR设置下进行验证，特别是关注半盲设置。具体而言，我们提出了一个新的DL模型，其中骨干网络由展开迭代算法形成，超参数由超网络估计。此外，我们将轻量级DenseNet调整到JCESD任务中进行比较。我们从三个方面评估不同方法：误比特率(BER)的泛化能力、稳健性和复杂度。我们的结果表明，DL方法在具有挑战性的低SNR环境中优于传统算法，而迭代算法在高SNR环境中表现更好。此外，当信号受到不对称的高斯噪声干扰时，DL方法表现出色，而迭代算法在载波频率偏移存在时更稳健。

更新时间: 2024-06-21 02:02:09

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2303.03678v3

Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte Carlo

Approximate Thompson sampling with Langevin Monte Carlo broadens its reach from Gaussian posterior sampling to encompass more general smooth posteriors. However, it still encounters scalability issues in high-dimensional problems when demanding high accuracy. To address this, we propose an approximate Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where the latter is the go-to workhorse for simulations of high-dimensional posteriors. Based on the standard smoothness and log-concavity conditions, we study the accelerated posterior concentration and sampling using a specific potential function. This design improves the sample complexity for realizing logarithmic regrets from $\mathcal{\tilde O}(d)$ to $\mathcal{\tilde O}(\sqrt{d})$. The scalability and robustness of our algorithm are also empirically validated through synthetic experiments in high-dimensional bandit problems.

Updated: 2024-06-21 01:54:15

标题: 使用欠阻尼 Langevin 蒙特卡洛加速近似 Thompson 抽样

摘要: 近似的Thompson采样与Langevin Monte Carlo相结合，拓展了其在高斯后验采样的基础上，涵盖了更一般的光滑后验分布。然而，在要求高精度时，仍然遇到高维问题中的可扩展性问题。为了解决这个问题，我们提出了一种近似的Thompson采样策略，利用了欠阻尼的Langevin Monte Carlo，后者是高维后验分布模拟的常用工具。基于标准的光滑性和对数凹函数条件，我们研究了使用特定势函数加速后验集中和采样。这种设计改善了实现对数遗憾从$\mathcal{\tilde O}(d)$到$\mathcal{\tilde O}(\sqrt{d})$的样本复杂性。我们的算法的可扩展性和鲁棒性也通过在高维赌博问题中的合成实验得到了经验验证。

更新时间: 2024-06-21 01:54:15

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.11665v3

Self-supervised Brain Lesion Generation for Effective Data Augmentation of Medical Images

Accurate brain lesion delineation is important for planning neurosurgical treatment. Automatic brain lesion segmentation methods based on convolutional neural networks have demonstrated remarkable performance. However, neural network performance is constrained by the lack of large-scale well-annotated training datasets. In this manuscript, we propose a comprehensive framework to efficiently generate new, realistic samples for training a brain lesion segmentation model. We first train a lesion generator, based on an adversarial autoencoder, in a self-supervised manner. Next, we utilize a novel image composition algorithm, Soft Poisson Blending, to seamlessly combine synthetic lesions and brain images to obtain training samples. Finally, to effectively train the brain lesion segmentation model with augmented images we introduce a new prototype consistence regularization to align real and synthetic features. Our framework is validated by extensive experiments on two public brain lesion segmentation datasets: ATLAS v2.0 and Shift MS. Our method outperforms existing brain image data augmentation schemes. For instance, our method improves the Dice from 50.36% to 60.23% compared to the U-Net with conventional data augmentation techniques for the ATLAS v2.0 dataset.

Updated: 2024-06-21 01:53:12

标题: 自监督脑损伤生成用于医学图像的有效数据增强

摘要: 准确的脑部病变划分对于规划神经外科治疗至关重要。基于卷积神经网络的自动脑部病变分割方法表现出卓越的性能。然而，神经网络的性能受到缺乏大规模、良好注释的训练数据集的限制。在本文中，我们提出了一个全面的框架，以高效地生成新的、逼真的样本，用于训练脑部病变分割模型。我们首先以自监督方式训练了一个基于对抗自编码器的病变生成器。接下来，我们利用一种新颖的图像合成算法，Soft Poisson Blending，无缝地将合成的病变和脑部图像结合起来，获得训练样本。最后，为了有效地训练脑部病变分割模型，我们引入了一种新的原型一致性正则化方法，以对齐真实和合成特征。我们的框架通过对两个公共脑部病变分割数据集（ATLAS v2.0和Shift MS）进行广泛实验证实。我们的方法优于现有的脑部图像数据增强方案。例如，我们的方法将ATLAS v2.0数据集的Dice指数从50.36%提高到60.23%，相较于使用传统数据增强技术的U-Net。

更新时间: 2024-06-21 01:53:12

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2406.14826v1

Latent diffusion models for parameterization and data assimilation of facies-based geomodels

Geological parameterization entails the representation of a geomodel using a small set of latent variables and a mapping from these variables to grid-block properties such as porosity and permeability. Parameterization is useful for data assimilation (history matching), as it maintains geological realism while reducing the number of variables to be determined. Diffusion models are a new class of generative deep-learning procedures that have been shown to outperform previous methods, such as generative adversarial networks, for image generation tasks. Diffusion models are trained to "denoise", which enables them to generate new geological realizations from input fields characterized by random noise. Latent diffusion models, which are the specific variant considered in this study, provide dimension reduction through use of a low-dimensional latent variable. The model developed in this work includes a variational autoencoder for dimension reduction and a U-net for the denoising process. Our application involves conditional 2D three-facies (channel-levee-mud) systems. The latent diffusion model is shown to provide realizations that are visually consistent with samples from geomodeling software. Quantitative metrics involving spatial and flow-response statistics are evaluated, and general agreement between the diffusion-generated models and reference realizations is observed. Stability tests are performed to assess the smoothness of the parameterization method. The latent diffusion model is then used for ensemble-based data assimilation. Two synthetic "true" models are considered. Significant uncertainty reduction, posterior P$_{10}$-P$_{90}$ forecasts that generally bracket observed data, and consistent posterior geomodels, are achieved in both cases.

Updated: 2024-06-21 01:32:03

标题: 潜在扩散模型用于相位基准地质模型参数化和数据同化

摘要: 地质参数化是指使用一小组潜变量来表示地质模型，并将这些变量映射到网格块属性，如孔隙度和渗透率。参数化对于数据同化（历史匹配）非常有用，因为它在保持地质真实性的同时减少了需要确定的变量数量。扩散模型是一种新型的生成深度学习程序，已被证明在图像生成任务中优于以往的方法，如生成对抗网络。扩散模型被训练用于“去噪”，这使它们能够从由随机噪声表征的输入场中生成新的地质实现。潜扩散模型是本研究考虑的具体变体，通过使用低维潜变量实现降维。本研究开发的模型包括用于降维的变分自动编码器和用于去噪过程的U-net。我们的应用涉及条件2D三相（河道-堤岸-泥）系统。潜扩散模型显示出生成的实现与地质建模软件中的样本在视觉上一致。评估了涉及空间和流动响应统计的定量指标，并观察到扩散生成的模型与参考实现之间的一般一致性。进行稳定性测试以评估参数化方法的平滑性。然后使用潜扩散模型进行基于集合的数据同化。考虑了两个合成“真实”模型。在两种情况下都实现了显著的不确定性减少，后验P$_{10}$-P$_{90}$预测通常包围观测数据，并且获得了一致的后验地质模型。

更新时间: 2024-06-21 01:32:03

领域: cs.CV,cs.AI,cs.CE,cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2406.14815v1

On the estimation rate of Bayesian PINN for inverse problems

Solving partial differential equations (PDEs) and their inverse problems using Physics-informed neural networks (PINNs) is a rapidly growing approach in the physics and machine learning community. Although several architectures exist for PINNs that work remarkably in practice, our theoretical understanding of their performances is somewhat limited. In this work, we study the behavior of a Bayesian PINN estimator of the solution of a PDE from $n$ independent noisy measurement of the solution. We focus on a class of equations that are linear in their parameters (with unknown coefficients $\theta_\star$). We show that when the partial differential equation admits a classical solution (say $u_\star$), differentiable to order $\beta$, the mean square error of the Bayesian posterior mean is at least of order $n^{-2\beta/(2\beta + d)}$. Furthermore, we establish a convergence rate of the linear coefficients of $\theta_\star$ depending on the order of the underlying differential operator. Last but not least, our theoretical results are validated through extensive simulations.

Updated: 2024-06-21 01:13:18

标题: 关于贝叶斯PINN用于逆问题估计速率的研究

摘要: 使用物理启发的神经网络（PINNs）解决偏微分方程（PDEs）及其反问题是物理和机器学习社区中日益增长的方法。尽管存在几种在实践中表现出色的PINNs架构，但我们对其性能的理论理解仍然有限。在这项工作中，我们研究了基于贝叶斯的PINN估计器对PDE解的行为，通过$n$个独立噪声测量的解。我们关注一个在参数上线性的方程类（具有未知系数$\theta_\star$）。我们表明，当偏微分方程存在一个经典解（比如$u_\star$），可微到$\beta$阶时，贝叶斯后验均值的均方误差至少为$n^{-2\beta/(2\beta + d)}$阶。此外，我们建立了线性系数$\theta_\star$收敛速率，取决于底层微分算子的阶数。最后，我们通过大量模拟验证了我们的理论结果。

更新时间: 2024-06-21 01:13:18

领域: math.ST,cs.LG,stat.ME,stat.ML,stat.TH

下载: http://arxiv.org/abs/2406.14808v1

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Speculative decoding reduces the inference latency of a target large language model via utilizing a smaller and faster draft model. Its performance depends on a hyperparameter K -- the candidate length, i.e., the number of candidate tokens for the target model to verify in each round. However, previous methods often use simple heuristics to choose K, which may result in sub-optimal performance. We study the choice of the candidate length K and formulate it as a Markov Decision Process. We theoretically show that the optimal policy of this Markov decision process takes the form of a threshold policy, i.e., the current speculation should stop and be verified when the probability of getting a rejection exceeds a threshold value. Motivated by this theory, we propose SpecDec++, an enhanced version of speculative decoding that adaptively determines the candidate length on the fly. We augment the draft model with a trained acceptance prediction head to predict the conditional acceptance probability of the candidate tokens. SpecDec++ will stop the current speculation when the predicted probability that at least one token gets rejected exceeds a threshold. We implement SpecDec++ and apply it to the llama-2-chat 7B & 70B model pair. Our adaptive method achieves a 2.04x speedup on the Alpaca dataset (an additional 7.2% improvement over the baseline speculative decoding). On the GSM8K and HumanEval datasets, our method achieves a 2.26x speedup (9.4% improvement) and 2.23x speedup (11.1% improvement), respectively.

Updated: 2024-06-21 01:01:42

标题: SpecDec++：通过自适应候选长度提升推测解码

摘要: 投机解码通过利用一个较小且更快的草稿模型，减少目标大型语言模型的推理延迟。其性能取决于一个超参数K -- 候选长度，即每轮目标模型需要验证的候选标记数量。然而，先前的方法通常使用简单的启发式方法选择K，这可能导致次优性能。我们研究了候选长度K的选择，并将其形式化为马尔可夫决策过程。我们从理论上表明，这个马尔可夫决策过程的最优策略采用一个阈值策略，即当获取拒绝的概率超过阈值时，当前的推测应该停止并进行验证。受这一理论启发，我们提出了SpecDec++，这是一种增强版的投机解码，能够动态确定候选长度。我们通过在草稿模型中增加一个训练好的接受预测头来预测候选标记的条件接受概率。当预测的至少一个标记被拒绝的概率超过阈值时，SpecDec++将停止当前的推测。我们实现了SpecDec++并将其应用于llama-2-chat 7B & 70B模型对。我们的自适应方法在Alpaca数据集上实现了2.04倍的加速（比基准投机解码额外提高了7.2%）。在GSM8K和HumanEval数据集上，我们的方法分别实现了2.26倍（9.4%改进）和2.23倍（11.1%改进）的加速。

更新时间: 2024-06-21 01:01:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19715v2

Securing the Future: Proactive Threat Hunting for Sustainable IoT Ecosystems

In the rapidly evolving landscape of the IoT, the security of connected devices has become a paramount concern. This paper explores the concept of proactive threat hunting as a pivotal strategy for enhancing the security and sustainability of IoT systems. Proactive threat hunting is an alternative to traditional reactive security measures that analyses IoT networks continuously and in advance to find and eliminate threats before they occure. By improving the security posture of IoT devices this approach significantly contributes to extending IoT operational lifespan and reduces environmental impact. By integrating security metrics similar to the Common Vulnerability Scoring System (CVSS) into consumer platforms, this paper argues that proactive threat hunting can elevate user awareness about the security of IoT devices. This has the potential to impact consumer choices and encourage a security-conscious mindset in both the manufacturing and user communities. Through a comprehensive analysis, this study demonstrates how proactive threat hunting can contribute to the development of a more secure, sustainable, and user-aware IoT ecosystem.

Updated: 2024-06-21 00:44:17

标题: 确保未来：可持续物联网生态系统的主动威胁狩猎

摘要: 在物联网快速发展的背景下，连接设备的安全性已经成为一个重要关注点。本文探讨了积极的威胁猎捕概念作为增强物联网系统安全性和可持续性的关键策略。积极的威胁猎捕是一种替代传统的被动安全措施的方法，它持续地和预先分析物联网网络，以在威胁发生之前找到并消除威胁。通过改善物联网设备的安全性状况，这种方法显著地延长了物联网的运行寿命，并减少了环境影响。通过将类似于通用漏洞评分系统（CVSS）的安全度量集成到消费者平台中，本文认为积极的威胁猎捕可以提高用户对物联网设备安全性的意识。这有潜力影响消费者的选择，并在制造和用户社区中鼓励安全意识。通过全面分析，本研究展示了积极的威胁猎捕如何有助于打造一个更安全、可持续和用户意识到的物联网生态系统。

更新时间: 2024-06-21 00:44:17

领域: cs.CR,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.14804v1

MeGA: Merging Multiple Independently Trained Neural Networks Based on Genetic Algorithm

In this paper, we introduce a novel method for merging the weights of multiple pre-trained neural networks using a genetic algorithm called MeGA. Traditional techniques, such as weight averaging and ensemble methods, often fail to fully harness the capabilities of pre-trained networks. Our approach leverages a genetic algorithm with tournament selection, crossover, and mutation to optimize weight combinations, creating a more effective fusion. This technique allows the merged model to inherit advantageous features from both parent models, resulting in enhanced accuracy and robustness. Through experiments on the CIFAR-10 dataset, we demonstrate that our genetic algorithm-based weight merging method improves test accuracy compared to individual models and conventional methods. This approach provides a scalable solution for integrating multiple pre-trained networks across various deep learning applications. Github is available at: https://github.com/YUNBLAK/MeGA-Merging-Multiple-Independently-Trained-Neural-Networks-Based-on-Genetic-Algorithm

Updated: 2024-06-21 00:38:58

标题: MeGA：基于遗传算法的合并多个独立训练的神经网络

摘要: 在这篇论文中，我们介绍了一种新颖的方法，使用一种名为MeGA的遗传算法来合并多个预训练神经网络的权重。传统技术，如权重平均和集成方法，通常无法充分发挥预训练网络的能力。我们的方法利用了一种具有锦标赛选择、交叉和变异的遗传算法来优化权重组合，从而创建更有效的融合。这种技术使合并模型能够从两个父模型中继承有利特性，提高了准确性和鲁棒性。通过对CIFAR-10数据集的实验，我们展示了我们基于遗传算法的权重合并方法相对于单独模型和传统方法能够提高测试准确性。这种方法为在各种深度学习应用中整合多个预训练网络提供了可扩展的解决方案。Github链接为：https://github.com/YUNBLAK/MeGA-Merging-Multiple-Independently-Trained-Neural-Networks-Based-on-Genetic-Algorithm

更新时间: 2024-06-21 00:38:58

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.04607v3

Probabilistic Emulation of a Global Climate Model with Spherical DYffusion

Data-driven deep learning models are on the verge of transforming global weather forecasting. It is an open question if this success can extend to climate modeling, where long inference rollouts and data complexity pose significant challenges. Here, we present the first conditional generative model able to produce global climate ensemble simulations that are accurate and physically consistent. Our model runs at 6-hourly time steps and is shown to be stable for 10-year-long simulations. Our approach beats relevant baselines and nearly reaches a gold standard for successful climate model emulation. We discuss the key design choices behind our dynamics-informed diffusion model-based approach which enables this significant step towards efficient, data-driven climate simulations that can help us better understand the Earth and adapt to a changing climate.

Updated: 2024-06-21 00:16:55

标题: 带有球形扩散的全球气候模型的概率仿真

摘要: 数据驱动的深度学习模型正处于改变全球天气预报的边缘。目前尚不清楚这种成功是否能延伸到气候建模领域，因为长期推理和数据复杂性带来了重大挑战。在这里，我们提出了第一个能够生成准确且物理一致的全球气候集合模拟的条件生成模型。我们的模型以每6小时的时间步长运行，并显示在为期10年的模拟中是稳定的。我们的方法击败了相关的基准线，并几乎达到了成功的气候模型模拟的黄金标准。我们讨论了我们基于动力学信息扩散模型的方法背后的关键设计选择，这使得向高效、数据驱动的气倿模拟迈出了重要一步，这有助于我们更好地了解地球并适应不断变化的气候。

更新时间: 2024-06-21 00:16:55

领域: cs.LG,cs.AI,physics.ao-ph,stat.ML

下载: http://arxiv.org/abs/2406.14798v1

Camera-Invariant Meta-Learning Network for Single-Camera-Training Person Re-identification

Single-camera-training person re-identification (SCT re-ID) aims to train a re-ID model using SCT datasets where each person appears in only one camera. The main challenge of SCT re-ID is to learn camera-invariant feature representations without cross-camera same-person (CCSP) data as supervision. Previous methods address it by assuming that the most similar person should be found in another camera. However, this assumption is not guaranteed to be correct. In this paper, we propose a Camera-Invariant Meta-Learning Network (CIMN) for SCT re-ID. CIMN assumes that the camera-invariant feature representations should be robust to camera changes. To this end, we split the training data into meta-train set and meta-test set based on camera IDs and perform a cross-camera simulation via meta-learning strategy, aiming to enforce the representations learned from the meta-train set to be robust to the meta-test set. With the cross-camera simulation, CIMN can learn camera-invariant and identity-discriminative representations even there are no CCSP data. However, this simulation also causes the separation of the meta-train set and the meta-test set, which ignores some beneficial relations between them. Thus, we introduce three losses: meta triplet loss, meta classification loss, and meta camera alignment loss, to leverage the ignored relations. The experiment results demonstrate that our method achieves comparable performance with and without CCSP data, and outperforms the state-of-the-art methods on SCT re-ID benchmarks. In addition, it is also effective in improving the domain generalization ability of the model.

Updated: 2024-06-21 00:15:32

标题: 单摄像头训练的人员再识别的相机不变元学习网络

摘要: 单摄像头训练的人员重新识别（SCT re-ID）旨在使用SCT数据集训练重新识别模型，其中每个人只出现在一个摄像头中。SCT re-ID的主要挑战是学习在没有跨摄像头相同人（CCSP）数据监督的情况下学习摄像头不变特征表示。先前的方法通过假设最相似的人应该在另一个摄像头中找到来解决此问题。然而，这种假设并不能保证是正确的。在本文中，我们提出了一种用于SCT re-ID的摄像头不变元学习网络（CIMN）。CIMN假设摄像头不变特征表示应该对摄像头变化具有鲁棒性。为此，我们根据摄像头ID将训练数据分为元训练集和元测试集，并通过元学习策略进行跨摄像头模拟，旨在强化从元训练集中学到的表示对元测试集的鲁棒性。通过跨摄像头模拟，CIMN可以学习摄像头不变和身份鉴别表示，即使没有CCSP数据。然而，这种模拟也导致元训练集和元测试集的分离，忽略了它们之间的一些有益关系。因此，我们引入了三种损失：元三元组损失、元分类损失和元摄像头对齐损失，以利用被忽视的关系。实验结果表明，我们的方法在有或没有CCSP数据的情况下均取得了可比较的性能，并在SCT re-ID基准测试中优于最先进的方法。此外，它还有效地提高了模型的领域泛化能力。

更新时间: 2024-06-21 00:15:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.14797v1

MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning

Recent advancements in Machine Unlearning (MU) have introduced solutions to selectively remove certain training samples, such as those with outdated or sensitive information, from trained models. Despite these advancements, evaluation of MU methods have been inconsistent, employing different trained models and architectures, and sample removal strategies, which hampers accurate comparison. In addition, prior MU approaches have mainly focused on singular tasks or modalities, which is not comprehensive. To address these limitations, we develop MU-Bench, the first comprehensive benchmark for MU that (i) unifies the sets of deleted samples and trained models, and (ii) provides broad coverage of tasks and data modalities, including previously unexplored domains such as speech and video classification. Our evaluation show that RandLabel and SalUn are the most effective general MU approaches on MU-Bench, and BadT and SCRUB are capable of achieving random performance on the deletion set. We analyze several under-investigated aspects of unlearning, including scalability, the impacts of parameter-efficient fine-tuning and curriculum learning, and susceptibility to dataset biases. MU-Bench provides an easy-to-use package that includes dataset splits, models, and implementations, together with a leader board to enable unified and scalable MU research.

Updated: 2024-06-21 00:13:17

标题: MU-Bench：用于机器取消学习的多任务多模态基准

摘要: 最近在机器遗忘（MU）领域取得了一些进展，引入了解决方案，可以有选择性地从训练模型中删除某些训练样本，例如那些包含过时或敏感信息的样本。尽管取得了这些进展，对MU方法的评估仍然存在不一致性，采用了不同的训练模型和架构，以及样本删除策略，这阻碍了准确的比较。此外，先前的MU方法主要集中在单一任务或模态上，缺乏全面性。为了解决这些限制，我们开发了MU-Bench，这是第一个针对MU的全面基准测试，它（i）统一了已删除样本和训练模型的集合，以及（ii）提供广泛的任务和数据模态覆盖，包括以前未探索的领域，如语音和视频分类。我们的评估显示，在MU-Bench上，RandLabel和SalUn是最有效的通用MU方法，而BadT和SCRUB能够在删除集上实现随机性能。我们分析了一些未被充分研究的遗忘方面，包括可扩展性，参数高效微调和课程学习的影响，以及对数据集偏见的敏感性。MU-Bench提供了一个易于使用的软件包，包括数据集分割、模型和实现，以及一个排行榜，以便进行统一和可扩展的MU研究。

更新时间: 2024-06-21 00:13:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.14796v1

FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs

This paper introduces FedSecurity, an end-to-end benchmark that serves as a supplementary component of the FedML library for simulating adversarial attacks and corresponding defense mechanisms in Federated Learning (FL). FedSecurity eliminates the need for implementing the fundamental FL procedures, e.g., FL training and data loading, from scratch, thus enables users to focus on developing their own attack and defense strategies. It contains two key components, including FedAttacker that conducts a variety of attacks during FL training, and FedDefender that implements defensive mechanisms to counteract these attacks. FedSecurity has the following features: i) It offers extensive customization options to accommodate a broad range of machine learning models (e.g., Logistic Regression, ResNet, and GAN) and FL optimizers (e.g., FedAVG, FedOPT, and FedNOVA); ii) it enables exploring the effectiveness of attacks and defenses across different datasets and models; and iii) it supports flexible configuration and customization through a configuration file and some APIs. We further demonstrate FedSecurity's utility and adaptability through federated training of Large Language Models (LLMs) to showcase its potential on a wide range of complex applications.

Updated: 2024-06-21 00:01:52

标题: FedSecurity：基于联邦学习和联邦LLMs的攻击和防御基准测试

摘要: 这篇论文介绍了FedSecurity，它是FedML库的一个补充组件，用于模拟联邦学习（FL）中的对抗攻击和相应的防御机制。FedSecurity消除了需要从头开始实现基本的FL程序，例如FL训练和数据加载的需求，从而使用户能够专注于开发自己的攻击和防御策略。它包含两个关键组件，包括在FL训练期间进行各种攻击的FedAttacker，以及实施防御机制以抵消这些攻击的FedDefender。FedSecurity具有以下特点：i）它提供广泛的定制选项，以适应各种机器学习模型（例如逻辑回归，ResNet和GAN）和FL优化器（例如FedAVG，FedOPT和FedNOVA）；ii）它能够探索攻击和防御在不同数据集和模型上的有效性；iii）它通过配置文件和一些API支持灵活的配置和定制。我们进一步通过对大型语言模型（LLM）进行联邦训练来展示FedSecurity的实用性和适应性，展示其在各种复杂应用上的潜力。

更新时间: 2024-06-21 00:01:52

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2306.04959v5