Arxiv Day: Article

Towards Multi-Task Multi-Modal Models: A Video Generative Perspective

Advancements in language foundation models have primarily fueled the recent surge in artificial intelligence. In contrast, generative learning of non-textual modalities, especially videos, significantly trails behind language modeling. This thesis chronicles our endeavor to build multi-task models for generating videos and other modalities under diverse conditions, as well as for understanding and compression applications. Given the high dimensionality of visual data, we pursue concise and accurate latent representations. Our video-native spatial-temporal tokenizers preserve high fidelity. We unveil a novel approach to mapping bidirectionally between visual observation and interpretable lexical terms. Furthermore, our scalable visual token representation proves beneficial across generation, compression, and understanding tasks. This achievement marks the first instances of language models surpassing diffusion models in visual synthesis and a video tokenizer outperforming industry-standard codecs. Within these multi-modal latent spaces, we study the design of multi-task generative models. Our masked multi-task transformer excels at the quality, efficiency, and flexibility of video generation. We enable a frozen language model, trained solely on text, to generate visual content. Finally, we build a scalable generative multi-modal transformer trained from scratch, enabling the generation of videos containing high-fidelity motion with the corresponding audio given diverse conditions. Throughout the course, we have shown the effectiveness of integrating multiple tasks, crafting high-fidelity latent representation, and generating multiple modalities. This work suggests intriguing potential for future exploration in generating non-textual data and enabling real-time, interactive experiences across various media forms.

Updated: 2024-05-26 23:56:45

标题: 朝向多任务多模态模型：视频生成视角

摘要: 语言基础模型的进展主要推动了最近人工智能的激增。相比之下，非文本模态的生成学习，特别是视频，明显落后于语言建模。本文记录了我们在不同条件下构建多任务模型，用于生成视频和其他模态，以及用于理解和压缩应用的努力。考虑到视觉数据的高维度，我们追求简洁准确的潜在表示。我们的视频本地空间-时间标记器保留了高保真度。我们揭示了一种新颖的方法，在视觉观察和可解释的词汇术语之间进行双向映射。此外，我们的可扩展视觉标记表示在生成、压缩和理解任务中都具有益处。这一成就标志着语言模型首次超越扩散模型在视觉合成方面，并且视频标记器超越了行业标准编解码器。在这些多模态潜在空间中，我们研究了多任务生成模型的设计。我们的蒙面多任务变压器在视频生成的质量、效率和灵活性方面表现出色。我们使一个仅在文本上训练的冻结语言模型能够生成视觉内容。最后，我们构建了一个可扩展的生成多模态变压器，从头开始训练，能够在不同条件下生成包含相应音频的高保真度运动视频。在整个过程中，我们展示了整合多任务、打造高保真度潜在表示和生成多种模态的有效性。这项工作为未来在生成非文本数据和实现跨各种媒体形式实时互动体验提供了有趣的探索潜力。

更新时间: 2024-05-26 23:56:45

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2405.16728v1

Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

The Transformer architecture processes sequences by implementing a form of neural message-passing that consists of iterative information retrieval (attention), followed by local processing (position-wise MLP). Two types of information are essential under this general computational paradigm: "sensory" information about individual objects, and "relational" information describing the relationships between objects. Standard attention naturally encodes the former, but does not explicitly encode the latter. In this paper, we present an extension of Transformers where multi-head attention is augmented with two distinct types of attention heads, each routing information of a different type. The first type is the standard attention mechanism of Transformers, which captures object-level features, while the second type is a novel attention mechanism we propose to explicitly capture relational information. The two types of attention heads each possess different inductive biases, giving the resulting architecture greater efficiency and versatility. The promise of this approach is demonstrated empirically across a range of tasks.

Updated: 2024-05-26 23:52:51

标题: 在变压器架构中解开并整合关系和感官信息

摘要: Transformer架构通过实现一种神经消息传递来处理序列，该消息传递包括迭代信息检索（注意力），然后是局部处理（位置 - 逐点多层感知机）。在这种通用计算范式下，两种信息至关重要：“感知”单个对象的信息和描述对象之间关系的“关系”信息。标准注意力自然编码前者，但不显式编码后者。在本文中，我们提出了一种Transformer的扩展，其中多头注意力与两种不同类型的注意力头相结合，每种类型路由不同类型的信息。第一种类型是Transformer的标准注意力机制，捕获对象级特征，而第二种类型是我们提出的新型注意力机制，明确捕获关系信息。两种类型的注意力头具有不同的归纳偏差，使得产生的架构具有更高的效率和多功能性。这种方法的潜力在一系列任务中经验性地得到了证实。

更新时间: 2024-05-26 23:52:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.16727v1

Towards Responsible and Safe AI in the Era of Foudnation Models: A Reference Architecture for Designing Foundation Model based Systems

The release of ChatGPT, Gemini, and other large language model has drawn huge interests on foundations models. There is a broad consensus that foundations models will be the fundamental building blocks for future AI systems. However, there is a lack of systematic guidance on the architecture design. Particularly, the the rapidly growing capabilities of foundations models can eventually absorb other components of AI systems, posing challenges of moving boundary and interface evolution in architecture design. Furthermore, incorporating foundations models into AI systems raises significant concerns about responsible and safe AI due to their opaque nature and rapidly advancing intelligence. To address these challenges, the paper first presents an architecture evolution of AI systems in the era of foundation models, transitioning from "foundation-model-as-a-connector" to "foundation-model-as-a-monolithic architecture". The paper then identifies key design decisions and proposes a pattern-oriented reference architecture for designing responsible foundation-model-based systems. The patterns can enable the potential of foundation models while ensuring associated risks.

Updated: 2024-05-26 23:51:04

标题: 走向基于基础模型的负责任和安全人工智能时代：设计基于基础模型系统的参考架构

摘要: ChatGPT、Gemini和其他大型语言模型的发布引起了对基础模型的巨大兴趣。广泛认为基础模型将成为未来人工智能系统的基本构建模块。然而，在架构设计方面缺乏系统性指导。特别是，基础模型的快速增长能力最终可能吸收人工智能系统的其他组件，提出了在架构设计中移动边界和界面演变的挑战。此外，将基础模型纳入人工智能系统引发了对负责任和安全人工智能的重大关注，因为它们的不透明性和快速进步的智能。为了解决这些挑战，本文首先介绍了基础模型时代人工智能系统的架构演变，从“基础模型作为连接器”过渡到“基础模型作为单一架构”。然后，本文确定了关键设计决策并提出了一个面向模式的参考架构，用于设计负责任的基础模型系统。这些模式可以发挥基础模型的潜力，同时确保相关风险。

更新时间: 2024-05-26 23:51:04

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2304.11090v4

Exploring Edge Probability Graph Models Beyond Edge Independency: Concepts, Analyses, and Algorithms

Desirable random graph models (RGMs) should (i) be tractable so that we can compute and control graph statistics, and (ii) generate realistic structures such as high clustering (i.e., high subgraph densities). A popular category of RGMs (e.g., Erdos-Renyi and stochastic Kronecker) outputs edge probabilities, and we need to realize (i.e., sample from) the edge probabilities to generate graphs. Typically, each edge (in)existence is assumed to be determined independently. However, with edge independency, RGMs theoretically cannot produce high subgraph densities unless they "replicate" input graphs. In this work, we explore realization beyond edge independence that can produce more realistic structures while ensuring high tractability. Specifically, we propose edge-dependent realization schemes called binding and derive closed-form tractability results on subgraph (e.g., triangle) densities in graphs generated with binding. We propose algorithms for graph generation with binding and parameter fitting of binding. We empirically validate that binding exhibits high tractability and generates realistic graphs with high clustering, significantly improving upon existing RGMs assuming edge independency.

Updated: 2024-05-26 23:48:30

标题: 超越边独立性的边缘概率图模型探索：概念、分析和算法

摘要: 理想的随机图模型（RGMs）应该（i）易于处理，以便我们可以计算和控制图的统计数据，并且（ii）生成现实结构，例如高聚类（即高子图密度）。一种流行的RGMs类别（例如，Erdos-Renyi和随机Kronecker）输出边缘概率，我们需要实现（即从中采样）边缘概率以生成图形。通常，每条边（存在）被假定独立确定。然而，采用边缘独立性，RGMs在理论上无法产生高子图密度，除非它们“复制”输入图形。在这项工作中，我们探索了超越边缘独立性的实现，可以产生更具现实性的结构，同时确保高易处理性。具体而言，我们提出了依赖边缘的实现方案，称为binding，并导出了在使用binding生成的图形中的子图（例如三角形）密度的闭式易处理性结果。我们提出了使用binding生成图形的算法和binding的参数拟合。我们经验性地验证了binding表现出高易处理性，并生成具有高聚类的现实图形，明显改进了现有RGMs的边缘独立性假设。

更新时间: 2024-05-26 23:48:30

领域: cs.LG

下载: http://arxiv.org/abs/2405.16726v1

Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

In many reinforcement learning (RL) applications, we want policies that reach desired states and then keep the controlled system within an acceptable region around the desired states over an indefinite period of time. This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states. For example, in stochastic queuing networks, where queues of waiting jobs can grow without bound, the desired state is all-zero queue lengths. Here, a stable policy ensures queue lengths are finite while an optimal policy minimizes queue lengths. Since an optimal policy is also stable, one would expect that RL algorithms would implicitly give us stable policies. However, in this work, we find that deep RL algorithms that directly minimize the distance to the desired state during online training often result in unstable policies, i.e., policies that drift far away from the desired state. We attribute this instability to poor credit-assignment for destabilizing actions. We then introduce an approach based on two ideas: 1) a Lyapunov-based cost-shaping technique and 2) state transformations to the unbounded state space. We conduct an empirical study on various queueing networks and traffic signal control problems and find that our approach performs competitively against strong baselines with knowledge of the transition dynamics. Our code is available here: https://github.com/Badger-RL/STOP.

Updated: 2024-05-26 23:47:48

标题: 学习在无界状态空间中稳定在线强化学习

摘要: 在许多强化学习（RL）应用中，我们希望能够获得到达期望状态并在无限期内保持受控系统在期望状态周围一个可接受的区域内的策略。后者的目标被称为稳定性，在状态空间无界的情况下尤为重要，因此状态可以相互远离，智能体可以远离期望状态。例如，在随机排队网络中，等待作业的队列可以无限增长，期望状态是所有零队列长度。在这里，稳定的策略确保队列长度有限，而最优策略则最小化队列长度。由于最优策略也是稳定的，人们期望RL算法会隐式地给出稳定的策略。然而，在这项工作中，我们发现直接在在线训练中最小化与期望状态之间的距离的深度RL算法通常会导致不稳定的策略，即远离期望状态的策略。我们将这种不稳定归因于不稳定行为的贷款分配不良。然后，我们介绍了一种基于两个思想的方法：1）基于李雅普诺夫的成本塑形技术和2）将状态转换到无界状态空间。我们对各种排队网络和交通信号控制问题进行了实证研究，并发现我们的方法在具有过渡动态知识的强基线竞争中表现出色。我们的代码可以在这里找到：https://github.com/Badger-RL/STOP。

更新时间: 2024-05-26 23:47:48

领域: cs.LG

下载: http://arxiv.org/abs/2306.01896v3

Attending to Topological Spaces: The Cellular Transformer

Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data. Topological neural networks operate on spaces such as cell complexes and hypergraphs, that can be seen as generalizations of graphs. In this work, we introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers to cell complexes. First, we propose a new formulation of the usual self- and cross-attention mechanisms, tailored to leverage incidence relations in cell complexes, e.g., edge-face and node-edge relations. Additionally, we propose a set of topological positional encodings specifically designed for cell complexes. By transforming three graph datasets into cell complex datasets, our experiments reveal that CT not only achieves state-of-the-art performance, but it does so without the need for more complex enhancements such as virtual nodes, in-domain structural encodings, or graph rewiring.

Updated: 2024-05-26 23:29:11

标题: 关注拓扑空间：细胞变换器

摘要: 拓扑深度学习旨在通过利用输入数据中的拓扑结构来增强神经网络模型的预测性能。拓扑神经网络在细胞复合体和超图等空间上操作，这些可以被看作是图的泛化。在本研究中，我们介绍了细胞变换器（CT），这是一种将基于图的变换器推广到细胞复合体的新颖架构。首先，我们提出了一种新的自注意和交叉注意机制的公式，旨在利用细胞复合体中的关系，例如边-面和节点-边关系。此外，我们提出了一组专门设计用于细胞复合体的拓扑位置编码。通过将三个图数据集转换为细胞复合体数据集，我们的实验表明，CT不仅实现了最先进的性能，而且无需更复杂的增强措施，如虚拟节点、域内结构编码或图重连。

更新时间: 2024-05-26 23:29:11

领域: cs.LG,cs.AI,cs.CV,math.AT,stat.ML

下载: http://arxiv.org/abs/2405.14094v2

Alistair: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems

With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Alistair, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Alistair into Chrome and evaluate it on microbenchmarks and advertising datasets. Across all workloads, Alistair significantly outperforms baselines in enabling more advertising measurements under comparable DP protection.

Updated: 2024-05-26 23:27:27

标题: 艾利斯泰尔：针对差分隐私广告测量系统的高效设备端预算管理

摘要: 随着主要浏览器即将移除第三方Cookie并引入新的保护隐私广告API，研究界有机会帮助行业在Web隐私方面取得质的改进。本文讨论了我们在W3C社区小组内的努力，以增强现有的隐私保护广告测量API。我们分析了来自Google、Apple、Meta和Mozilla的设计，并通过更严格和高效的差分隐私（DP）预算组件加以补充。我们的方法，称为Alistair，强制执行明确定义的DP保证，并使广告商能够准确进行更隐私的测量查询。通过将隐私保证框定为一种个体形式的DP，我们可以使DP预算比当前使用传统DP定义的系统更有效。我们将Alistair整合到Chrome中，并在微基准测试和广告数据集上进行评估。在所有工作负载中，Alistair在提供更多广告测量的同时保持相当的DP保护方面显著优于基线。

更新时间: 2024-05-26 23:27:27

领域: cs.CR

下载: http://arxiv.org/abs/2405.16719v1

Enhancing User Interest based on Stream Clustering and Memory Networks in Large-Scale Recommender Systems

Recommender Systems (RSs) provide personalized recommendation service based on user interest, which are widely used in various platforms. However, there are lots of users with sparse interest due to lacking consumption behaviors, which leads to poor recommendation results for them. This problem is widespread in large-scale RSs and is particularly difficult to address. To solve this problem, we propose a novel solution named User Interest Enhancement (UIE) which enhances user interest including user profile and user history behavior sequences using the enhancement vectors and personalized enhancement vector generated based on stream clustering and memory networks from different perspectives. UIE not only remarkably improves model performance on the users with sparse interest but also significantly enhance model performance on other users. UIE is an end-to-end solution which is easy to be implemented based on ranking model. Moreover, we expand our solution and apply similar methods to long-tail items, which also achieves excellent improvement. Furthermore, we conduct extensive offline and online experiments in a large-scale industrial RS. The results demonstrate that our model outperforms other models remarkably, especially for the users with sparse interest. Until now, UIE has been fully deployed in multiple large-scale RSs and achieved remarkable improvements.

Updated: 2024-05-26 23:18:53

标题: 在大规模推荐系统中基于流聚类和记忆网络来增强用户兴趣

摘要: 推荐系统（RSs）基于用户兴趣提供个性化推荐服务，在各种平台上广泛使用。然而，由于缺乏消费行为，有许多用户兴趣稀疏，导致对他们的推荐结果不佳。这个问题在大规模RSs中普遍存在，尤其难以解决。为了解决这个问题，我们提出了一种名为用户兴趣增强（UIE）的新颖解决方案，利用增强向量和基于流聚类和记忆网络生成的个性化增强向量，从不同的角度增强用户兴趣，包括用户配置文件和用户历史行为序列。UIE不仅显著提高了对兴趣稀疏用户的模型性能，还显著提高了对其他用户的模型性能。UIE是一个端到端的解决方案，基于排名模型易于实施。此外，我们扩展了我们的解决方案，并将类似方法应用于长尾物品，也取得了出色的改进。此外，我们在大规模工业RS中进行了广泛的离线和在线实验。结果表明，我们的模型明显优于其他模型，尤其是对于兴趣稀疏用户。到目前为止，UIE已经完全部署在多个大规模RS中，并取得了显著的改进。

更新时间: 2024-05-26 23:18:53

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.13238v2

Amortized Active Causal Induction with Deep Reinforcement Learning

We present Causal Amortized Active Structure Learning (CAASL), an active intervention design policy that can select interventions that are adaptive, real-time and that does not require access to the likelihood. This policy, an amortized network based on the transformer, is trained with reinforcement learning on a simulator of the design environment, and a reward function that measures how close the true causal graph is to a causal graph posterior inferred from the gathered data. On synthetic data and a single-cell gene expression simulator, we demonstrate empirically that the data acquired through our policy results in a better estimate of the underlying causal graph than alternative strategies. Our design policy successfully achieves amortized intervention design on the distribution of the training environment while also generalizing well to distribution shifts in test-time design environments. Further, our policy also demonstrates excellent zero-shot generalization to design environments with dimensionality higher than that during training, and to intervention types that it has not been trained on.

Updated: 2024-05-26 23:14:37

标题: 用深度强化学习进行摊销主动因果归纳

摘要: 我们提出了因果摊销主动结构学习（CAASL），这是一种可以选择自适应、实时且不需要访问概率的干预设计策略。该策略是基于变压器的摊销网络，通过在设计环境的模拟器上进行强化学习训练，并使用一个奖励函数来衡量真实因果图与从收集的数据中推断的因果图后验之间的接近程度。在合成数据和单细胞基因表达模拟器上，我们通过实验证明，通过我们的策略获得的数据比替代策略更好地估计了潜在的因果图。我们的设计策略成功地实现了在训练环境的分布上摊销干预设计，同时在测试时间的设计环境中也具有良好的泛化能力。此外，我们的策略还展现了对于在训练期间维度较高的设计环境和未经训练的干预类型的出色零样本泛化能力。

更新时间: 2024-05-26 23:14:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16718v1

A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health

Restless multi-armed bandits (RMAB) have demonstrated success in optimizing resource allocation for large beneficiary populations in public health settings. Unfortunately, RMAB models lack flexibility to adapt to evolving public health policy priorities. Concurrently, Large Language Models (LLMs) have emerged as adept automated planners across domains of robotic control and navigation. In this paper, we propose a Decision Language Model (DLM) for RMABs, enabling dynamic fine-tuning of RMAB policies in public health settings using human-language commands. We propose using LLMs as automated planners to (1) interpret human policy preference prompts, (2) propose reward functions as code for a multi-agent RMAB environment, and (3) iterate on the generated reward functions using feedback from grounded RMAB simulations. We illustrate the application of DLM in collaboration with ARMMAN, an India-based non-profit promoting preventative care for pregnant mothers, that currently relies on RMAB policies to optimally allocate health worker calls to low-resource populations. We conduct a technology demonstration in simulation using the Gemini Pro model, showing DLM can dynamically shape policy outcomes using only human prompts as input.

Updated: 2024-05-26 22:46:45

标题: 一个用于公共卫生领域动态不安静多臂老虎机任务的决策语言模型（DLM）

摘要: 不安定的多臂赌博机（RMAB）在公共卫生领域为大规模受益人群优化资源分配方面取得成功。不幸的是，RMAB模型缺乏灵活性来适应不断变化的公共卫生政策优先事项。同时，大型语言模型（LLMs）已经成为在机器人控制和导航领域的熟练自动规划者。在本文中，我们提出了一个决策语言模型（DLM）用于RMAB，在公共卫生领域利用人类语言命令动态微调RMAB政策。我们提出使用LLMs作为自动规划者来（1）解释人类政策偏好提示，（2）提出用于多代理RMAB环境的奖励函数代码，（3）通过来自基于RMAB的模拟的反馈迭代生成的奖励函数。我们展示了DLM在与ARMMAN合作的应用，ARMMAN是一个总部位于印度的非营利组织，致力于为孕妇提供预防保健，目前依赖RMAB政策来将健康工作者的电话最优地分配给低资源人群。我们在模拟中使用Gemini Pro模型进行了技术演示，展示了DLM可以仅使用人类提示动态地塑造政策结果。

更新时间: 2024-05-26 22:46:45

领域: cs.MA,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.14807v3

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Large language models (LLMs) have been increasingly employed for (interactive) decision-making, via the development of LLM-based autonomous agents. Despite their emerging successes, the performance of LLM agents in decision-making has not been fully investigated through quantitative metrics, especially in the multi-agent setting when they interact with each other, a typical scenario in real-world LLM-agent applications. To better understand the limits of LLM agents in these interactive environments, we propose to study their interactions in benchmark decision-making settings in online learning and game theory, through the performance metric of \emph{regret}. We first empirically study the {no-regret} behaviors of LLMs in canonical (non-stationary) online learning problems, as well as the emergence of equilibria when LLM agents interact through playing repeated games. We then provide some theoretical insights into the no-regret behaviors of LLM agents, under certain assumptions on the supervised pre-training and the rationality model of human decision-makers who generate the data. Notably, we also identify (simple) cases where advanced LLMs such as GPT-4 fail to be no-regret. To promote the no-regret behaviors, we propose a novel \emph{unsupervised} training loss of \emph{regret-loss}, which, in contrast to the supervised pre-training loss, does not require the labels of (optimal) actions. We then establish the statistical guarantee of generalization bound for regret-loss minimization, followed by the optimization guarantee that minimizing such a loss may automatically lead to known no-regret learning algorithms. Our further experiments demonstrate the effectiveness of our regret-loss, especially in addressing the above ``regrettable'' cases.

Updated: 2024-05-26 22:32:25

标题: LLM代理人是否会后悔？在线学习和游戏的案例研究

摘要: 大型语言模型（LLMs）越来越多地被用于（互动）决策制定，通过开发基于LLM的自主代理。尽管它们取得了新兴的成功，但在决策制定中LLM代理的表现尚未通过定量指标完全调查，特别是在它们相互作用时的多代理设置中，这是现实世界LLM代理应用中的典型情况。为了更好地理解LLM代理在这些互动环境中的限制，我们建议研究它们在在线学习和博弈论中基准决策设置中的相互作用，通过\emph{遗憾}性能指标。我们首先在经典（非稳态）在线学习问题中经验性地研究LLMs的{无遗憾}行为，以及当LLM代理通过玩重复游戏相互作用时平衡的出现。然后，我们对LLM代理的无遗憾行为提供了一些理论见解，在对人类决策者的监督预训练和理性模型作出某些假设的基础上生成数据。值得注意的是，我们还确定了（简单的）情况，其中高级LLMs如GPT-4不能保持无遗憾。为了促进无遗憾行为，我们提出了一种新颖的\emph{无监督}训练损失，即\emph{遗憾损失}，与监督预训练损失相比，它不需要（最优）行动的标签。然后，我们建立了遗憾损失最小化的泛化边界的统计保证，随后是最优化保证，即最小化这种损失可能自动导致已知的无遗憾学习算法。我们进一步的实验证明了我们的遗憾损失的有效性，特别是在解决上述“令人遗憾”的情况时。

更新时间: 2024-05-26 22:32:25

领域: cs.LG,cs.AI,cs.GT

下载: http://arxiv.org/abs/2403.16843v2

Crafting Interpretable Embeddings by Asking LLMs Questions

Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. Training QA-Emb reduces to selecting a set of underlying questions rather than learning model weights. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli. QA-Emb significantly outperforms an established interpretable baseline, and does so while requiring very few questions. This paves the way towards building flexible feature spaces that can concretize and evaluate our understanding of semantic brain representations. We additionally find that QA-Emb can be effectively approximated with an efficient model, and we explore broader applications in simple NLP tasks.

Updated: 2024-05-26 22:30:29

标题: 通过向LLMs提问来打造可解释的嵌入

摘要: 大型语言模型（LLMs）已经迅速改进了文本嵌入，用于越来越多的自然语言处理任务。然而，它们的不透明性和在神经科学等科学领域的普及已经产生了对可解释性的增长需求。在这里，我们探讨是否可以通过LLM提示获得可解释的嵌入。我们引入了问答嵌入（QA-Emb），其中每个特征代表对LLM提出的一个是/否问题的答案。训练QA-Emb相对于学习模型权重而言，是选择一组潜在问题。我们使用QA-Emb灵活地生成可解释的模型，用于预测语言刺激对fMRI体素响应。QA-Emb明显优于一个已建立的可解释基线，并且只需要很少的问题。这为构建灵活的特征空间铺平了道路，可以具体化和评估我们对语义大脑表示的理解。此外，我们发现QA-Emb可以有效地近似为一个高效的模型，并且我们在简单的自然语言处理任务中探索更广泛的应用。

更新时间: 2024-05-26 22:30:29

领域: cs.CL,cs.AI,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2405.16714v1

A Circuit Approach to Constructing Blockchains on Blockchains

Since the creation of Bitcoin 15 years ago, there has been an explosion in the number of permissionless blockchains. Each of these blockchains provides an open ledger that anyone can read from and write to. In this multi-chain world, an important question emerges: how can we build a more secure overlay blockchain by reading from and writing to a given set of blockchains? Drawing an analogy with switching circuits, we approach the problem by defining two basic compositional operations between blockchains, serial and triangular compositions, and use these operations as building blocks to construct general overlay blockchains. Under the partially synchronous setting, we have the following results: 1) the serial composition, between two blockchains, yields an overlay blockchain that is safe if at least one of the two underlay blockchains is safe and that is live if both underlay blockchains are live; 2) the triangular composition between three blockchains, akin to parallel composition of switching circuits, yields an overlay blockchain that is safe if all underlay blockchains are safe and that is live if at least half of them are live; 3) repeated composition of these two basic operations can yield all possible tradeoffs of safety and liveness for an overlay blockchain built on arbitrary number of underlay chains. The results are also extended to the synchronous setting.

Updated: 2024-05-26 22:29:33

标题: 一种在区块链上构建区块链的电路方法

摘要: 自比特币创建15年以来，无需许可的区块链数量激增。每个区块链都提供了一个任何人都可以读取和写入的开放账本。在这个多链世界中，一个重要问题出现了：如何通过从一组给定的区块链读取和写入来构建一个更安全的叠加区块链？通过与开关电路类比，我们通过定义两种基本的区块链之间的组合操作，串行和三角形组合，以及将这些操作作为构建通用叠加区块链的基本构建块。在部分同步设置下，我们有以下结果：1）两个区块链之间的串行组合产生一个叠加区块链，如果其中至少一个底层区块链是安全的，则该叠加区块链是安全的，并且如果两个底层区块链都是活跃的，则该叠加区块链是活跃的；2）三个区块链之间的三角形组合，类似于开关电路的并行组合，产生一个叠加区块链，如果所有底层区块链都是安全的，则该叠加区块链是安全的，并且如果其中至少一半是活跃的，则该叠加区块链是活跃的；3）这两种基本操作的重复组合可以产生建立在任意数量的底层链上的叠加区块链的所有可能的安全性和活跃性的折衷。这些结果也可以扩展到同步设置。

更新时间: 2024-05-26 22:29:33

领域: cs.CR

下载: http://arxiv.org/abs/2402.00220v3

tinyBenchmarks: evaluating LLMs with fewer examples

The versatility of large language models (LLMs) led to the creation of diverse benchmarks that thoroughly test a variety of language models' abilities. These benchmarks consist of tens of thousands of examples making evaluation of LLMs very expensive. In this paper, we investigate strategies to reduce the number of evaluations needed to assess the performance of an LLM on several key benchmarks. For example, we show that to accurately estimate the performance of an LLM on MMLU, a popular multiple-choice QA benchmark consisting of 14K examples, it is sufficient to evaluate this LLM on 100 curated examples. We release evaluation tools and tiny versions of popular benchmarks: Open LLM Leaderboard, MMLU, HELM, and AlpacaEval 2.0. Our empirical analysis demonstrates that these tools and tiny benchmarks are sufficient to reliably and efficiently reproduce the original evaluation results.

Updated: 2024-05-26 22:27:23

标题: 小型基准测试：用较少的示例评估LLMs

摘要: 大型语言模型（LLMs）的多功能性导致了多样化的基准的创建，这些基准充分测试了各种语言模型的能力。这些基准由数万个示例组成，使得评估LLMs非常昂贵。在本文中，我们研究了减少评估LLM在几个关键基准上性能所需评估数量的策略。例如，我们展示了为了准确估计LLM在包含14K个示例的流行多项选择QA基准MMLU上的性能，只需在100个策划示例上评估这个LLM就足够了。我们发布了评估工具和流行基准的微型版本：Open LLM排行榜、MMLU、HELM和AlpacaEval 2.0。我们的实证分析表明，这些工具和微型基准足以可靠且高效地复现原始的评估结果。

更新时间: 2024-05-26 22:27:23

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.14992v2

Zamba: A Compact 7B SSM Hybrid Model

In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which achieves competitive performance against leading open-weight models at a comparable scale. Zamba is trained on 1T tokens from openly available datasets and is the best non-transformer model at this scale. Zamba pioneers a unique architecture combining a Mamba backbone with a single shared attention module, thus obtaining the benefits of attention at minimal parameter cost. Due to its architecture, Zamba is significantly faster at inference than comparable transformer models and requires substantially less memory for generation of long sequences. Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay. We open-source the weights and all checkpoints for Zamba, through both phase 1 and annealing phases.

Updated: 2024-05-26 22:23:02

标题: 赞巴：一种紧凑的7B SSM混合模型

摘要: 在这份技术报告中，我们介绍了Zamba，一种新颖的7B SSM-transformer混合模型，它在可比规模下与领先的开放权重模型实现了竞争性能。Zamba在公开可用数据集上训练1T个标记，并且是这个规模下最好的非transformer模型。Zamba开创了一种独特的结构，将Mamba骨干与单个共享注意力模块结合在一起，从而以最小参数成本获得注意力的好处。由于其架构，Zamba在推理过程中比可比的transformer模型快得多，并且在生成长序列时需要更少的内存。Zamba经过两个阶段的预训练：第一阶段基于现有的网络数据集，而第二阶段包括对模型进行高质量训练和合成数据集的退火，并且具有快速学习率衰减。我们通过两个阶段的权重和所有检查点开源Zamba。

更新时间: 2024-05-26 22:23:02

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.16712v1

The AI-DEC: A Card-based Design Method for User-centered AI Explanations

Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four dimensions of AI explanations that are critical for the integration of AI systems -- communication content, modality, frequency, and direction -- and offers design examples for end-users to design AI explanations that meet their needs. We evaluated this method through co-design sessions with workers in healthcare, finance, and management industries who regularly use AI systems in their daily work. Findings indicate that the AI-DEC effectively supported workers in designing explanations that accommodated diverse levels of performance and autonomy needs, which varied depending on the AI system's workplace role and worker values. We discuss the implications of using the AI-DEC for the user-centered design of AI explanations in real-world systems.

Updated: 2024-05-26 22:18:38

标题: AI-DEC：一种基于卡片的用户中心人工智能解释设计方法

摘要: 越来越多的证据表明，许多部署的人工智能系统并不足以支持最终用户的交互和信息需求。与最终用户参与这些系统设计可以揭示用户需求和期望，然而有效地吸引最终用户参与人工智能解释设计的方法仍未得到充分探讨。为了填补这一空白，我们开发了一种名为AI-DEC的设计方法，它定义了人工智能解释的四个关键维度 -- 通信内容、形式、频率和方向，并提供了设计示例，供最终用户设计满足其需求的人工智能解释。我们通过与医疗保健、金融和管理行业的工作人员进行共同设计会话来评估这种方法，这些工作人员在日常工作中经常使用人工智能系统。研究结果表明，AI-DEC有效地支持工作人员设计适应各种绩效和自主需求的解释，这些需求因人工智能系统的工作场所角色和工作人员的价值观而异。我们讨论了在实际系统中使用AI-DEC进行用户中心设计人工智能解释的影响。

更新时间: 2024-05-26 22:18:38

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.16711v1

Visualizing the Shadows: Unveiling Data Poisoning Behaviors in Federated Learning

This demo paper examines the susceptibility of Federated Learning (FL) systems to targeted data poisoning attacks, presenting a novel system for visualizing and mitigating such threats. We simulate targeted data poisoning attacks via label flipping and analyze the impact on model performance, employing a five-component system that includes Simulation and Data Generation, Data Collection and Upload, User-friendly Interface, Analysis and Insight, and Advisory System. Observations from three demo modules: label manipulation, attack timing, and malicious attack availability, and two analysis components: utility and analytical behavior of local model updates highlight the risks to system integrity and offer insight into the resilience of FL systems. The demo is available at https://github.com/CathyXueqingZhang/DataPoisoningVis.

Updated: 2024-05-26 21:58:32

标题: 可视化阴影：揭示联邦学习中的数据毒化行为

摘要: 这篇演示论文探讨了联邦学习（FL）系统对有针对性的数据毒化攻击的易受性，提出了一种新颖的系统来可视化和减轻这种威胁。我们通过标签翻转模拟有针对性的数据毒化攻击，并分析对模型性能的影响，采用了包括模拟和数据生成、数据收集和上传、用户友好界面、分析和洞察以及咨询系统在内的五个组件系统。来自三个演示模块的观察：标签操作、攻击时机和恶意攻击可用性，以及两个分析组件：本地模型更新的效用和分析行为，突出了对系统完整性的风险，并为FL系统的弹性提供了洞察。该演示可在https://github.com/CathyXueqingZhang/DataPoisoningVis 上找到。

更新时间: 2024-05-26 21:58:32

领域: cs.CR

下载: http://arxiv.org/abs/2405.16707v1

Self-Play Preference Optimization for Language Model Alignment

Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed \textit{Self-play Probabilistic Preference Optimization} (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys a theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53\% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models.

Updated: 2024-05-26 21:50:05

标题: 自我对弈偏好优化用于语言模型对齐

摘要: 传统的从人类反馈中学习（RLHF）方法依赖于像Bradley-Terry模型这样的参数模型，无法捕捉人类偏好中的不传递性和不理性。最近的进展表明，直接使用偏好概率可以更准确地反映人类偏好，实现更灵活和准确的语言模型对齐。本文提出了一种基于自我对弈的语言模型对齐方法，将问题视为一个旨在识别纳什均衡策略的常和两人游戏。我们的方法，称为\textit{自我对弈概率偏好优化}（SPPO），通过迭代策略更新来近似纳什均衡，并享有理论收敛保证。我们的方法可以有效地增加所选响应的对数似然，减少被拒绝响应的对数似然，这是对称配对损失（如直接偏好优化（DPO）和身份偏好优化（IPO））无法轻易实现的。在我们的实验中，仅使用UltraFeedback数据集中的60k提示（没有响应）且没有任何提示增强，通过利用一个仅有0.4B参数的预训练偏好模型PairRM，SPPO可以从微调Mistral-7B-Instruct-v0.2中获得一个在AlpacaEval 2.0上击败GPT-4-Turbo的最新长度控制胜率28.53％的模型。它还在MT-Bench和Open LLM排行榜上胜过（迭代的）DPO和IPO。值得注意的是，SPPO的强大表现是在没有来自GPT-4或其他更强大语言模型的额外外部监督（如响应、偏好等）的情况下实现的。

更新时间: 2024-05-26 21:50:05

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2405.00675v3

Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs

Large Language Models (LLMs) have demonstrated impressive performance on multimodal tasks, without any multimodal finetuning. They are the building block for Large Multimodal Models, yet, we still lack a proper understanding of their success. In this work, we expose frozen LLMs to image, video, audio and text inputs and analyse their internal representation aiming to understand their generalization beyond textual inputs. Findings. Perceptual tokens (1) are easily distinguishable from textual ones inside LLMs, with significantly different representations, and complete translation to textual tokens does not exist. Yet, (2) both perceptual and textual tokens activate similar LLM weights. Despite being different, (3) perceptual and textual tokens are implicitly aligned inside LLMs, we call this the implicit multimodal alignment (IMA), and argue that this is linked to architectural design, helping LLMs to generalize. This provide more evidence to believe that the generalization of LLMs to multimodal inputs is mainly due to their architecture. Implications. (1) We find a positive correlation between the implicit alignment score and the task performance, suggesting that this could act as a proxy metric for model evaluation and selection. (2) A negative correlation exists regarding hallucinations, revealing that this problem is mainly due to misalignment between the internal perceptual and textual representations. (3) Perceptual tokens change slightly throughout the model, thus, we propose different approaches to skip computations (e.g. in FFN layers), and significantly reduce the inference cost. (4) Due to the slowly changing embeddings across layers, and the high overlap between textual and multimodal activated weights, we compress LLMs by keeping only 1 subnetwork that works well across a wide range of multimodal tasks. Paper code: https://github.com/mshukor/ima-lmms.

Updated: 2024-05-26 21:31:59

标题: 隐性多模态对齐：关于将冻结的LLMs推广到多模态输入

摘要: 大型语言模型（LLMs）在多模态任务上表现出令人印象深刻的性能，而无需进行任何多模态微调。它们是大型多模态模型的构建块，然而，我们仍然缺乏对它们成功的适当理解。在这项工作中，我们将冷冻的LLMs暴露给图像、视频、音频和文本输入，并分析它们的内部表示，以便理解它们在文本输入之外的泛化能力。发现。感知令牌（1）在LLMs内部与文本令牌明显可区分，表示有显著差异，且不存在完全转换为文本令牌的情况。然而，（2）感知和文本令牌都会激活类似的LLMs权重。尽管不同，（3）感知和文本令牌在LLMs内部隐含地对齐，我们称之为隐性多模态对齐（IMA），并认为这与架构设计有关，有助于LLMs泛化。这为我们提供了更多证据，表明LLMs对多模态输入的泛化主要是由于它们的架构。影响。（1）我们发现隐式对齐分数与任务性能之间存在正相关，表明这可能作为模型评估和选择的代理度量。（2）关于幻觉存在负相关，揭示了这个问题主要是由于内部感知和文本表示之间的不对齐。（3）感知令牌在整个模型中略有变化，因此，我们提出不同方法来跳过计算（例如在FFN层），从而显著减少推理成本。（4）由于跨层间嵌入缓慢变化，并且文本和多模态激活权重之间的高重叠性，我们通过仅保留一个在各种多模态任务中表现良好的子网络来压缩LLMs。论文代码：https://github.com/mshukor/ima-lmms。

更新时间: 2024-05-26 21:31:59

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16700v1

Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning

In-context learning refers to the learning ability of a model during inference time without adapting its parameters. The input (i.e., prompt) to the model (e.g., transformers) consists of both a context (i.e., instance-label pairs) and a query instance. The model is then able to output a label for the query instance according to the context during inference. A possible explanation for in-context learning is that the forward pass of (linear) transformers implements iterations of gradient descent on the instance-label pairs in the context. In this paper, we prove by construction that transformers can also implement temporal difference (TD) learning in the forward pass, a phenomenon we refer to as in-context TD. We demonstrate the emergence of in-context TD after training the transformer with a multi-task TD algorithm, accompanied by theoretical analysis. Furthermore, we prove that transformers are expressive enough to implement many other policy evaluation algorithms in the forward pass, including residual gradient, TD with eligibility trace, and average-reward TD.

Updated: 2024-05-26 21:27:03

标题: 变压器学习时间差异方法用于上下文强化学习

摘要: 上下文学习是指模型在推理时学习能力，而无需调整其参数。输入（即提示）到模型（例如，变压器）包括上下文（即实例-标签对）和查询实例。然后，模型能够根据上下文在推理期间为查询实例输出标签。上下文学习的一个可能解释是（线性）变压器的前向传递在上下文中的实例-标签对上实现梯度下降的迭代。在本文中，我们通过构造证明，变压器在前向传递中也可以实现时序差异（TD）学习，我们将这种现象称为上下文TD。我们通过训练变压器与多任务TD算法一起，伴随着理论分析证明了上下文TD的出现。此外，我们证明变压器具有足够的表达力，可以在前向传递中实现许多其他策略评估算法，包括残差梯度、带有资格跟踪的TD和平均奖励TD。

更新时间: 2024-05-26 21:27:03

领域: cs.LG

下载: http://arxiv.org/abs/2405.13861v2

Glocal Hypergradient Estimation with Koopman Operator

Gradient-based hyperparameter optimization methods update hyperparameters using hypergradients, gradients of a meta criterion with respect to hyperparameters. Previous research used two distinct update strategies: optimizing hyperparameters using global hypergradients obtained after completing model training or local hypergradients derived after every few model updates. While global hypergradients offer reliability, their computational cost is significant; conversely, local hypergradients provide speed but are often suboptimal. In this paper, we propose *glocal* hypergradient estimation, blending "global" quality with "local" efficiency. To this end, we use the Koopman operator theory to linearize the dynamics of hypergradients so that the global hypergradients can be efficiently approximated only by using a trajectory of local hypergradients. Consequently, we can optimize hyperparameters greedily using estimated global hypergradients, achieving both reliability and efficiency simultaneously. Through numerical experiments of hyperparameter optimization, including optimization of optimizers, we demonstrate the effectiveness of the glocal hypergradient estimation.

Updated: 2024-05-26 21:15:38

标题: 使用库普曼算子进行全局超梯度估计

摘要: 基于梯度的超参数优化方法使用超梯度更新超参数，超梯度是元标准相对于超参数的梯度。先前的研究采用了两种不同的更新策略：使用在完成模型训练后获得的全局超梯度来优化超参数，或者使用在每几次模型更新后导出的局部超梯度。虽然全局超梯度提供可靠性，但计算成本显著；相反，局部超梯度提供速度但通常不是最佳的。在本文中，我们提出了*glocal*超梯度估计，将“全局”质量与“局部”效率相结合。为此，我们使用Koopman算子理论来线性化超梯度的动态，以便全局超梯度只能通过使用局部超梯度轨迹来高效近似。因此，我们可以贪婪地使用估计的全局超梯度来优化超参数，同时实现可靠性和效率。通过超参数优化的数值实验，包括优化优化器，我们展示了*glocal*超梯度估计的有效性。

更新时间: 2024-05-26 21:15:38

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.02741v2

CNN Autoencoder Resizer: A Power-Efficient LoS/NLoS Detector in MIMO-enabled UAV Networks

Optimizing the design, performance, and resource efficiency of wireless networks (WNs) necessitates the ability to discern Line of Sight (LoS) and Non-Line of Sight (NLoS) scenarios across diverse applications and environments. Unmanned Aerial Vehicles (UAVs) exhibit significant potential in this regard due to their rapid mobility, aerial capabilities, and payload characteristics. Particularly, UAVs can serve as vital non-terrestrial base stations (NTBS) in the event of terrestrial base station (TBS) failures or downtime. In this paper, we propose CNN autoencoder resizer (CAR) as a framework that improves the accuracy of LoS/NLoS detection without demanding extra power consumption. Our proposed method increases the mean accuracy of detecting LoS/NLoS signals from 66% to 86%, while maintaining consistent power consumption levels. In addition, the resolution provided by CAR shows that it can be employed as a preprocessing tool in other methods to enhance the quality of signals.

Updated: 2024-05-26 21:12:34

标题: CNN自编码器调整器：MIMO无人机网络中的低功耗LoS/NLoS检测器

摘要: 优化无线网络（WNs）的设计、性能和资源效率需要能够区分视线（LoS）和非视线（NLoS）场景，涵盖各种应用和环境。无人机（UAVs）由于其快速移动性、空中能力和有效载荷特性，在这方面表现出显著潜力。特别地，无人机可以在地面基站（TBS）故障或停机时作为重要的非地面基站（NTBS）。在本文中，我们提出了CNN自编码器调整器（CAR）作为一个框架，提高LoS/NLoS检测的准确性，而不需额外功耗。我们提出的方法将LoS/NLoS信号的平均检测准确性从66%提高到86%，同时保持一致的功耗水平。此外，CAR提供的分辨率显示它可以作为其他方法中的预处理工具，以提高信号质量。

更新时间: 2024-05-26 21:12:34

领域: cs.LG

下载: http://arxiv.org/abs/2405.16697v1

ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference

Whereas the ability of deep networks to produce useful predictions has been amply demonstrated, estimating the reliability of these predictions remains challenging. Sampling approaches such as MC-Dropout and Deep Ensembles have emerged as the most popular ones for this purpose. Unfortunately, they require many forward passes at inference time, which slows them down. Sampling-free approaches can be faster but suffer from other drawbacks, such as lower reliability of uncertainty estimates, difficulty of use, and limited applicability to different types of tasks and data. In this work, we introduce a sampling-free approach that is generic and easy to deploy, while producing reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost. It is predicated on training the network to produce the same output with and without additional information about it. At inference time, when no prior information is given, we use the network's own prediction as the additional information. We then take the distance between the predictions with and without prior information as our uncertainty measure. We demonstrate our approach on several classification and regression tasks. We show that it delivers results on par with those of Ensembles but at a much lower computational cost.

Updated: 2024-05-26 21:10:08

标题: ZigZag: 通过两步推理实现的通用无抽样不确定性估计

摘要: 尽管深度网络产生有用预测的能力已得到充分证明，但估计这些预测的可靠性仍然具有挑战性。像MC-Dropout和Deep Ensembles这样的采样方法已成为这一目的中最受欢迎的方法。不幸的是，它们在推断时需要进行许多前向传递，这会减慢速度。无采样方法可能更快，但会受到其他缺点的影响，如不确定性估计的可靠性较低、使用困难以及适用于不同类型的任务和数据的限制。在这项工作中，我们介绍了一种无采样方法，它是通用且易于部署，同时产生与最先进方法相当的可靠性不确定性估计，但计算成本显著较低。其基础是训练网络在有和没有额外信息的情况下产生相同的输出。在推断时，当没有先验信息时，我们将网络自身的预测作为额外信息。然后，我们将具有和没有先验信息的预测之间的距离作为我们的不确定性测量。我们在几个分类和回归任务上展示了我们的方法。我们展示它提供了与集成方法相当的结果，但计算成本要低得多。

更新时间: 2024-05-26 21:10:08

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2211.11435v3

Lagrangian Neural Networks for Reversible Dissipative Evolution

There is a growing attention given to utilizing Lagrangian and Hamiltonian mechanics with network training in order to incorporate physics into the network. Most commonly, conservative systems are modeled, in which there are no frictional losses, so the system may be run forward and backward in time without requiring regularization. This work addresses systems in which the reverse direction is ill-posed because of the dissipation that occurs in forward evolution. The novelty is the use of Morse-Feshbach Lagrangian, which models dissipative dynamics by doubling the number of dimensions of the system in order to create a mirror latent representation that would counterbalance the dissipation of the observable system, making it a conservative system, albeit embedded in a larger space. We start with their formal approach by redefining a new Dissipative Lagrangian, such that the unknown matrices in the Euler-Lagrange's equations arise as partial derivatives of the Lagrangian with respect to only the observables. We then train a network from simulated training data for dissipative systems such as Fickian diffusion that arise in materials sciences. It is shown by experiments that the systems can be evolved in both forward and reverse directions without regularization beyond that provided by the Morse-Feshbach Lagrangian. Experiments of dissipative systems, such as Fickian diffusion, demonstrate the degree to which dynamics can be reversed.

Updated: 2024-05-26 21:03:09

标题: 拉格朗日神经网络用于可逆耗散演化

摘要: 越来越多的关注被给予利用拉格朗日和哈密顿力学与网络训练，以将物理学纳入网络中。最常见的是对保守系统进行建模，其中没有摩擦损失，因此系统可以在不需要正规化的情况下向前和向后运行。本文讨论了由于前向演化中发生的耗散而导致逆向方向不适宜的系统。其创新之处在于使用莫尔斯-费希巴赖氏量，通过将系统的维度数量加倍来模拟耗散动态，以创建一个镜像潜在表示，该表示将抵消可观察系统的耗散，使其成为一个保守系统，尽管嵌入在一个更大的空间中。我们从重新定义新的耗散性拉格朗日开始，使得欧拉-拉格朗日方程中的未知矩阵仅由拉格朗日对可观察变量的偏导数产生。然后，我们从模拟训练数据中训练网络，用于材料科学中出现的耗散系统，如菲克扩散。通过实验证明，系统可以在前向和逆向方向演化，而无需超出莫尔斯-费希巴拉格朗日提供的正规化。耗散系统的实验，如菲克扩散，展示了动态可以被逆转的程度。

更新时间: 2024-05-26 21:03:09

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2405.14645v2

Detection of decision-making manipulation in the pairwise comparisons method

Most decision-making models, including the pairwise comparison method, assume the decision-makers honesty. However, it is easy to imagine a situation where a decision-maker tries to manipulate the ranking results. This paper presents three simple manipulation methods in the pairwise comparison method. We then try to detect these methods using appropriately constructed neural networks. Experimental results accompany the proposed solutions on the generated data, showing a considerable manipulation detection level.

Updated: 2024-05-26 20:58:12

标题: 检测成对比较方法中的决策操纵

摘要: 大多数决策模型，包括成对比较方法，都假设决策者是诚实的。然而，很容易想象出一种情况，即决策者试图操纵排名结果。本文提出了成对比较方法中的三种简单操纵方法。然后我们尝试使用适当构建的神经网络来检测这些方法。实验结果伴随提出的解决方案在生成的数据上显示了相当高的操纵检测水平。

更新时间: 2024-05-26 20:58:12

领域: cs.AI,cs.DM

下载: http://arxiv.org/abs/2405.16693v1

Cost Function Unrolling in Unsupervised Optical Flow

Steepest descent algorithms, which are commonly used in deep learning, use the gradient as the descent direction, either as-is or after a direction shift using preconditioning. In many scenarios calculating the gradient is numerically hard due to complex or non-differentiable cost functions, specifically next to singular points. In this work we focus on the derivation of the Total Variation semi-norm commonly used in unsupervised cost functions. Specifically, we derive a differentiable proxy to the hard L1 smoothness constraint in a novel iterative scheme which we refer to as Cost Unrolling. Producing more accurate gradients during training, our method enables finer predictions of a given DNN model through improved convergence, without modifying its architecture or increasing computational complexity. We demonstrate our method in the unsupervised optical flow task. Replacing the L1 smoothness constraint with our unrolled cost during the training of a well known baseline, we report improved results on both MPI Sintel and KITTI 2015 unsupervised optical flow benchmarks. Particularly, we report EPE reduced by up to 15.82% on occluded pixels, where the smoothness constraint is dominant, enabling the detection of much sharper motion edges.

Updated: 2024-05-26 20:49:27

标题: 无监督光流中的成本函数展开

摘要: 最陡下降算法通常用于深度学习中，使用梯度作为下降方向，可以直接使用或在预处理之后进行方向调整。在许多情况下，由于复杂或不可微分的成本函数，特别是在奇异点附近，计算梯度通常很困难。本文关注常用于无监督成本函数中的全变差半范数的推导。具体来说，我们通过一种新颖的迭代方案提出了一个可微的代理来替代硬L1平滑约束，我们将其称为成本展开。在训练过程中产生更准确的梯度，我们的方法通过改进收敛性，能够通过提高精度预测给定的DNN模型，而无需修改其架构或增加计算复杂性。我们在无监督光流任务中展示了我们的方法。在训练一个知名基准模型时，用我们展开的成本替代L1平滑约束，我们在MPI Sintel和KITTI 2015的无监督光流基准上报告了改进的结果。特别是，在被遮挡像素上，我们报告EPE降低了高达15.82%，在这里平滑约束占主导地位，使得能够检测到更锐利的运动边缘。

更新时间: 2024-05-26 20:49:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2011.14814v3

Explorations of Self-Repair in Language Models

Prior interpretability research studying narrow distributions has preliminarily identified self-repair, a phenomena where if components in large language models are ablated, later components will change their behavior to compensate. Our work builds off this past literature, demonstrating that self-repair exists on a variety of models families and sizes when ablating individual attention heads on the full training distribution. We further show that on the full training distribution self-repair is imperfect, as the original direct effect of the head is not fully restored, and noisy, since the degree of self-repair varies significantly across different prompts (sometimes overcorrecting beyond the original effect). We highlight two different mechanisms that contribute to self-repair, including changes in the final LayerNorm scaling factor and sparse sets of neurons implementing Anti-Erasure. We additionally discuss the implications of these results for interpretability practitioners and close with a more speculative discussion on the mystery of why self-repair occurs in these models at all, highlighting evidence for the Iterative Inference hypothesis in language models, a framework that predicts self-repair.

Updated: 2024-05-26 20:47:41

标题: 语言模型中自我修复的探索

摘要: 过去的可解释性研究已初步确定了自我修复现象，即如果在大型语言模型中去除组件，则后续组件将改变其行为以进行补偿。我们的工作基于这些过去的文献，证明了在完整的训练分布上去除单个注意力头时，自我修复存在于各种模型家族和大小中。我们进一步表明，在完整的训练分布上，自我修复是不完美的，因为头部的原始直接效应并未完全恢复，并且是嘈杂的，因为自我修复的程度在不同提示之间变化显著（有时会超出原始效应）。我们强调了两种不同的机制，对自我修复的贡献，包括最终LayerNorm缩放因子的变化和实施Anti-Erasure的稀疏神经元集。我们还讨论了这些结果对可解释性从业者的影响，并最后用更加推测性的讨论结束，探讨为什么这些模型中会发生自我修复的神秘，强调了语言模型中迭代推理假设的证据，该框架预测了自我修复。

更新时间: 2024-05-26 20:47:41

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.15390v2

Machine Learning Clifford invariants of ADE Coxeter elements

There has been recent interest in novel Clifford geometric invariants of linear transformations. This motivates the investigation of such invariants for a certain type of geometric transformation of interest in the context of root systems, reflection groups, Lie groups and Lie algebras: the Coxeter transformations. We perform exhaustive calculations of all Coxeter transformations for $A_8$, $D_8$ and $E_8$ for a choice of basis of simple roots and compute their invariants, using high-performance computing. This computational algebra paradigm generates a dataset that can then be mined using techniques from data science such as supervised and unsupervised machine learning. In this paper we focus on neural network classification and principal component analysis. Since the output -- the invariants -- is fully determined by the choice of simple roots and the permutation order of the corresponding reflections in the Coxeter element, we expect huge degeneracy in the mapping. This provides the perfect setup for machine learning, and indeed we see that the datasets can be machine learned to very high accuracy. This paper is a pump-priming study in experimental mathematics using Clifford algebras, showing that such Clifford algebraic datasets are amenable to machine learning, and shedding light on relationships between these novel and other well-known geometric invariants and also giving rise to analytic results.

Updated: 2024-05-26 20:33:55

标题: 机器学习ADE Coxeter元素的Clifford不变量

摘要: 最近对线性变换的新Clifford几何不变量产生了兴趣。这促使我们研究在根系、反射群、李群和李代数背景下感兴趣的一种几何变换的这些不变量：Coxeter变换。我们对$A_8$、$D_8$和$E_8$的所有Coxeter变换进行详尽计算，选择一组简单根基，并使用高性能计算来计算它们的不变量。这种计算代数范式生成了一个数据集，可以使用数据科学技术如监督和无监督机器学习进行挖掘。在本文中，我们专注于神经网络分类和主成分分析。由于输出——不变量——完全由简单根基的选择和对应反射在Coxeter元素中的排列顺序决定，我们预期在映射中会存在巨大的退化。这为机器学习提供了完美的设置，事实上我们看到数据集可以被机器学习到非常高的准确性。本文是在使用Clifford代数进行实验数学研究的启动研究，展示了这种Clifford代数数据集适合机器学习，并揭示了这些新颖和其他众所周知的几何不变量之间的关系，同时也产生了分析结果。

更新时间: 2024-05-26 20:33:55

领域: cs.LG,hep-th,math-ph,math.GR,math.MP,math.RT

下载: http://arxiv.org/abs/2310.00041v2

gzip Predicts Data-dependent Scaling Laws

Past work has established scaling laws that predict the performance of a neural language model (LM) as a function of its parameter count and the number of tokens it's trained on, enabling optimal allocation of a fixed compute budget. Are these scaling laws agnostic to training data as some prior work suggests? We generate training datasets of varying complexities by modulating the syntactic properties of a PCFG, finding that 1) scaling laws are sensitive to differences in data complexity and that 2) gzip, a compression algorithm, is an effective predictor of how data complexity impacts scaling properties. We propose a new data-dependent scaling law for LM's that accounts for the training data's gzip-compressibility; its compute-optimal frontier increases in dataset size preference (over parameter count preference) as training data becomes harder to compress.

Updated: 2024-05-26 20:33:08

标题: gzip 预测数据相关的缩放定律

摘要: 过去的研究建立了预测神经语言模型（LM）性能的缩放定律，这些定律是参数数量和训练标记数量的函数，从而使固定计算预算得以最佳分配。这些缩放定律是否与一些先前工作所暗示的训练数据无关？我们通过调制PCFG的句法属性生成不同复杂性的训练数据集，发现1）缩放定律对数据复杂性的差异敏感，2）gzip，一种压缩算法，是预测数据复杂性如何影响缩放属性的有效方法。我们提出了一种新的基于数据的LM缩放定律，该定律考虑了训练数据的gzip可压缩性；随着训练数据变得难以压缩，其计算最优前沿在数据集大小偏好（优于参数数量偏好）上增加。

更新时间: 2024-05-26 20:33:08

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16684v1

Toward Digitalization: A Secure Approach to Find a Missing Person Using Facial Recognition Technology

Facial Recognition is a technique, based on machine learning technology that can recognize a human being analyzing his facial profile, and is applied in solving various types of realworld problems nowadays. In this paper, a common real-world problem, finding a missing person has been solved in a secure and effective way with the help of facial recognition technology. Although there exist a few works on solving the problem, the proposed work is unique with respect to its security, design, and feasibility. Impeding intruders in participating in the processes and giving importance to both finders and family members of a missing person are two of the major features of this work. The proofs of the works of our system in finding a missing person have been described in the result section of the paper. The advantages that our system provides over the other existing systems can be realized from the comparisons, described in the result summary section of the paper. The work is capable of providing a worthy solution to find a missing person on the digital platform.

Updated: 2024-05-26 20:25:04

标题: 走向数字化：使用人脸识别技术安全寻找失踪者的方法

摘要: 面部识别是一种基于机器学习技术的技术，可以通过分析一个人的面部轮廓来识别他，目前在解决各种类型的现实世界问题中得到应用。本文介绍了一种常见的现实世界问题，即如何通过面部识别技术安全有效地找到失踪人员。尽管已经存在一些解决这一问题的方法，但本文提出的方法在安全性、设计和可行性方面都具有独特性。阻止入侵者参与过程并重视失踪人员的寻找者和家人是本文的两个主要特点。我们系统在找到失踪人员方面的工作证明已在论文的结果部分中描述。我们系统相对于其他现有系统的优势可以从论文的结果摘要部分的比较中体现出来。该工作能够在数字平台上提供一个有价值的解决方案来寻找失踪人员。

更新时间: 2024-05-26 20:25:04

领域: cs.CV,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.16683v1

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may "over-generalize", in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Q, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MixCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies. Our code and models are publicly available at https://github.com/bloomberg/mixce-acl2023

Updated: 2024-05-26 20:24:55

标题: MixCE：通过混合正向和反向交叉熵训练自回归语言模型

摘要: 自回归语言模型通过最小化模型分布Q相对于数据分布P的交叉熵来进行训练 - 即最小化前向交叉熵，这等同于最大似然估计（MLE）。我们观察到以这种方式训练的模型可能会"过度泛化"，即它们产生非人类的文本。此外，我们认为反向交叉熵，即P相对于Q的交叉熵，更能反映人类如何评估模型生成的文本。因此，我们提出使用MixCE进行学习，这是一个将前向和反向交叉熵混合的目标函数。我们在已知P的合成数据设置（P已知）和真实数据上评估使用该目标函数训练的模型，并展示结果模型生成的文本更好，而无需复杂的解码策略。我们的代码和模型可在https://github.com/bloomberg/mixce-acl2023 上公开获取。

更新时间: 2024-05-26 20:24:55

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2305.16958v2

A Systematic Review of Federated Generative Models

Federated Learning (FL) has emerged as a solution for distributed systems that allow clients to train models on their data and only share models instead of local data. Generative Models are designed to learn the distribution of a dataset and generate new data samples that are similar to the original data. Many prior works have tried proposing Federated Generative Models. Using Federated Learning and Generative Models together can be susceptible to attacks, and designing the optimal architecture remains challenging. This survey covers the growing interest in the intersection of FL and Generative Models by comprehensively reviewing research conducted from 2019 to 2024. We systematically compare nearly 100 papers, focusing on their FL and Generative Model methods and privacy considerations. To make this field more accessible to newcomers, we highlight the state-of-the-art advancements and identify unresolved challenges, offering insights for future research in this evolving field.

Updated: 2024-05-26 20:20:44

标题: 一个联合生成模型的系统综述

摘要: 联邦学习（FL）已经成为分布式系统的解决方案，允许客户在其数据上训练模型，并仅共享模型，而不是本地数据。生成模型旨在学习数据集的分布，并生成类似于原始数据的新数据样本。许多先前的研究尝试提出联邦生成模型。将联邦学习和生成模型结合使用可能容易受到攻击，并且设计最佳架构仍然具有挑战性。本调查涵盖了对FL和生成模型交叉领域的兴趣的增长，通过全面审查从2019年到2024年进行的研究。我们系统地比较了近100篇论文，重点关注它们的FL和生成模型方法以及隐私考虑。为了使这一领域更容易接近初学者，我们突出了最新的进展，并确定了未解决的挑战，为这一不断发展的领域的未来研究提供了见解。

更新时间: 2024-05-26 20:20:44

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2405.16682v1

Discovery and Expansion of New Domains within Diffusion Models

In this work, we study the generalization properties of diffusion models in a few-shot setup, introduce a novel tuning-free paradigm to synthesize the target out-of-domain (OOD) data, and demonstrate its advantages compared to existing methods in data-sparse scenarios with large domain gaps. Specifically, given a pre-trained model and a small set of images that are OOD relative to the model's training distribution, we explore whether the frozen model is able to generalize to this new domain. We begin by revealing that Denoising Diffusion Probabilistic Models (DDPMs) trained on single-domain images are already equipped with sufficient representation abilities to reconstruct arbitrary images from the inverted latent encoding following bi-directional deterministic diffusion and denoising trajectories. We then demonstrate through both theoretical and empirical perspectives that the OOD images establish Gaussian priors in latent spaces of the given model, and the inverted latent modes are separable from their initial training domain. We then introduce our novel tuning-free paradigm to synthesize new images of the target unseen domain by discovering qualified OOD latent encodings in the inverted noisy spaces. This is fundamentally different from the current paradigm that seeks to modify the denoising trajectory to achieve the same goal by tuning the model parameters. Extensive cross-model and domain experiments show that our proposed method can expand the latent space and generate unseen images via frozen DDPMs without impairing the quality of generation of their original domain. We also showcase a practical application of our proposed heuristic approach in dramatically different domains using astrophysical data, revealing the great potential of such a generalization paradigm in data spare fields such as scientific explorations.

Updated: 2024-05-26 20:17:35

标题: 发现和扩展扩散模型中的新领域

摘要: 在这项工作中，我们研究了扩散模型在少样本设置中的泛化特性，引入了一种新颖的无调谐范式来合成目标域外（OOD）数据，并展示了与现有方法相比，在数据稀疏且具有较大域差距的情况下的优势。具体来说，给定一个预训练模型和一小组相对于模型训练分布为OOD的图像，我们探讨冻结模型是否能够泛化到这个新领域。我们首先揭示，在单域图像上训练的去噪扩散概率模型（DDPMs）已经具备足够的表示能力，可以通过双向确定性扩散和去噪轨迹重建任意图像的倒置潜在编码。然后，我们通过理论和实证角度证明，OOD图像在给定模型的潜在空间中建立了高斯先验，倒置潜在模式可以与其初始训练领域分离。接着，我们引入我们的新颖无调谐范式，通过在倒置嘈杂空间中发现合格的OOD潜在编码来合成目标未见域的新图像。这与当前寻求通过调整模型参数修改去噪轨迹以达到相同目标的范式有根本的不同。广泛的跨模型和领域实验表明，我们提出的方法可以通过冻结DDPMs扩展潜在空间，并生成未见图像，而不会损害其原始领域生成质量。我们还展示了我们提出的启发式方法在截然不同的领域（如天体物理数据）中的实际应用，揭示了这种泛化范式在科学探索等数据稀缺领域中的巨大潜力。

更新时间: 2024-05-26 20:17:35

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2310.09213v2

Octo: An Open-Source Generalist Robot Policy

Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-source, widely applicable, generalist policies for robotic manipulation. As a first step, we introduce Octo, a large transformer-based policy trained on 800k trajectories from the Open X-Embodiment dataset, the largest robot manipulation dataset to date. It can be instructed via language commands or goal images and can be effectively finetuned to robot setups with new sensory inputs and action spaces within a few hours on standard consumer GPUs. In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces. We also perform detailed ablations of design decisions for the Octo model, from architecture to training data, to guide future research on building generalist robot models.

Updated: 2024-05-26 19:55:26

标题: Octo：一种开源通用机器人策略

摘要: 在多样化的机器人数据集上预训练的大型政策具有改变机器人学习的潜力：与从头开始训练新政策不同，这种通用机器人政策可以通过少量领域内数据进行微调，但具有广泛的泛化能力。然而，为了在各种机器人学习场景、环境和任务中普遍适用，这些政策需要处理多样化的传感器和动作空间，适应各种常用的机器人平台，并且能够迅速有效地在新领域进行微调。在这项工作中，我们旨在为开发面向机器人操作的开源、广泛适用的通用政策奠定基础。作为第一步，我们介绍了Octo，这是一个基于大型变压器的政策，经过800k条轨迹的训练，这是迄今为止最大的机器人操作数据集Open X-Embodiment数据集。它可以通过语言命令或目标图像进行指导，并且可以在标准消费级GPU上几小时内有效地微调到具有新感知输入和动作空间的机器人设置。在9个机器人平台的实验中，我们展示了Octo作为一种多功能政策初始化，可以有效地微调到新的观察和动作空间。我们还对Octo模型的设计决策进行了详细的消融实验，从架构到训练数据，以指导未来建立通用机器人模型的研究。

更新时间: 2024-05-26 19:55:26

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.12213v2

Rethinking Transfer Learning for Medical Image Classification

Transfer learning (TL) from pretrained deep models is a standard practice in modern medical image classification (MIC). However, what levels of features to be reused are problem-dependent, and uniformly finetuning all layers of pretrained models may be suboptimal. This insight has partly motivated the recent differential TL strategies, such as TransFusion (TF) and layer-wise finetuning (LWFT), which treat the layers in the pretrained models differentially. In this paper, we add one more strategy into this family, called TruncatedTL, which reuses and finetunes appropriate bottom layers and directly discards the remaining layers. This yields not only superior MIC performance but also compact models for efficient inference, compared to other differential TL methods. Our code is available at: https://github.com/sun-umn/TTL

Updated: 2024-05-26 19:45:01

标题: 重新思考医学图像分类的迁移学习

摘要: 来自预训练深度模型的迁移学习（TL）是现代医学图像分类（MIC）中的标准实践。然而，要重复使用哪个级别的特征取决于具体问题，并且统一微调预训练模型的所有层可能不是最优选择。这一观点在一定程度上促使了最近的差异化TL策略，如TransFusion（TF）和逐层微调（LWFT），这些策略对待预训练模型中的层有所不同。在本文中，我们将另一种策略加入到这个家族中，称为TruncatedTL，它重复使用并微调适当的底层，并直接丢弃其余层。与其他差异化TL方法相比，这不仅产生了更优越的MIC性能，还可以得到紧凑的模型以进行高效推理。我们的代码可以在以下网址找到：https://github.com/sun-umn/TTL

更新时间: 2024-05-26 19:45:01

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2106.05152v8

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Deep learning models have achieved significant success across various applications but continue to struggle with tasks requiring complex reasoning over sequences, such as function composition and compositional tasks. Despite advancements, models like Structured State Space Models (SSMs) and Transformers underperform in deep compositionality tasks due to inherent architectural and training limitations. Maintaining accuracy over multiple reasoning steps remains a primary challenge, as current models often rely on shortcuts rather than genuine multi-step reasoning, leading to performance degradation as task complexity increases. Existing research highlights these shortcomings but lacks comprehensive theoretical and empirical analysis for SSMs. Our contributions address this gap by providing a theoretical framework based on complexity theory to explain SSMs' limitations. Moreover, we present extensive empirical evidence demonstrating how these limitations impair function composition and algorithmic task performance. Our experiments reveal significant performance drops as task complexity increases, even with Chain-of-Thought (CoT) prompting. Models frequently resort to shortcuts, leading to errors in multi-step reasoning. This underscores the need for innovative solutions beyond current deep learning paradigms to achieve reliable multi-step reasoning and compositional task-solving in practical applications.

Updated: 2024-05-26 19:33:23

标题: 深度学习的局限性：通过复杂性理论视角对序列建模进行研究

摘要: 深度学习模型在各种应用中取得了显著成功，但在需要复杂推理的任务中仍然存在困难，比如函数组合和合成任务。尽管有所进展，像结构化状态空间模型（SSMs）和Transformers这样的模型在深度组成任务中表现不佳，这是由于固有的架构和训练限制。在多个推理步骤中保持准确性仍然是一个主要挑战，因为当前模型往往依赖捷径而不是真正的多步推理，导致任务复杂性增加时性能下降。现有研究强调了这些不足，但缺乏对SSMs进行全面理论和实证分析。我们的贡献填补了这一空白，提供了一个基于复杂性理论的理论框架来解释SSMs的局限性。此外，我们提供了大量的实证证据，展示了这些局限性如何影响函数组合和算法任务的性能。我们的实验揭示了随着任务复杂性增加，即使使用Chain-of-Thought（CoT）提示，性能也会显著下降。模型经常采取捷径，导致多步推理出现错误。这凸显了在实际应用中实现可靠的多步推理和组合任务解决的创新解决方案的必要性超越当前的深度学习范式。

更新时间: 2024-05-26 19:33:23

领域: cs.LG,cs.CC,cs.LO

下载: http://arxiv.org/abs/2405.16674v1

Transfer Learning Under High-Dimensional Graph Convolutional Regression Model for Node Classification

Node classification is a fundamental task, but obtaining node classification labels can be challenging and expensive in many real-world scenarios. Transfer learning has emerged as a promising solution to address this challenge by leveraging knowledge from source domains to enhance learning in a target domain. Existing transfer learning methods for node classification primarily focus on integrating Graph Convolutional Networks (GCNs) with various transfer learning techniques. While these approaches have shown promising results, they often suffer from a lack of theoretical guarantees, restrictive conditions, and high sensitivity to hyperparameter choices. To overcome these limitations, we propose a Graph Convolutional Multinomial Logistic Regression (GCR) model and a transfer learning method based on the GCR model, called Trans-GCR. We provide theoretical guarantees of the estimate obtained under GCR model in high-dimensional settings. Moreover, Trans-GCR demonstrates superior empirical performance, has a low computational cost, and requires fewer hyperparameters than existing methods.

Updated: 2024-05-26 19:30:14

标题: 高维图卷积回归模型下的节点分类迁移学习

摘要: 节点分类是一项基础任务，但在许多实际情况下，获取节点分类标签可能具有挑战性和昂贵。迁移学习已经成为一种有前途的解决方案，通过利用源域的知识来增强目标域的学习。现有的节点分类迁移学习方法主要集中在将图卷积网络（GCNs）与各种迁移学习技术集成。尽管这些方法显示出有前途的结果，但它们通常受到缺乏理论保证、限制条件和对超参数选择的高敏感性的困扰。为了克服这些限制，我们提出了一个基于图卷积多项式逻辑回归（GCR）模型和一种基于GCR模型的迁移学习方法，称为Trans-GCR。我们提供了在高维设置下获得的GCR模型估计的理论保证。此外，Trans-GCR表现出卓越的实证性能，计算成本低，需要的超参数比现有方法少。

更新时间: 2024-05-26 19:30:14

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.16672v1

Mixture of Experts Using Tensor Products

In multi-task learning, the conventional approach involves training a model on multiple tasks simultaneously. However, the training signals from different tasks can interfere with one another, potentially leading to \textit{negative transfer}. To mitigate this, we investigate if modular language models can facilitate positive transfer and systematic generalization. Specifically, we propose a novel modular language model (\texttt{TensorPoly}), that balances parameter efficiency with nuanced routing methods. For \textit{modules}, we reparameterize Low-Rank Adaptation (\texttt{LoRA}) by employing an entangled tensor through the use of tensor product operations and name the resulting approach \texttt{TLoRA}. For \textit{routing function}, we tailor two innovative routing functions according to the granularity: \texttt{TensorPoly-I} which directs to each rank within the entangled tensor while \texttt{TensorPoly-II} offers a finer-grained routing approach targeting each order of the entangled tensor. The experimental results from the multi-task T0-benchmark demonstrate that: 1) all modular LMs surpass the corresponding dense approaches, highlighting the potential of modular language models to mitigate negative inference in multi-task learning and deliver superior outcomes. 2) \texttt{TensorPoly-I} achieves higher parameter efficiency in adaptation and outperforms other modular LMs, which shows the potential of our approach in multi-task transfer learning.

Updated: 2024-05-26 19:25:08

标题: 专家混合使用张量积

摘要: 在多任务学习中，传统方法涉及同时在多个任务上训练模型。然而，来自不同任务的训练信号可能会相互干扰，潜在地导致负迁移。为了减轻这种情况，我们研究模块化语言模型是否可以促进正迁移和系统化概括。具体来说，我们提出了一种新颖的模块化语言模型（TensorPoly），在参数效率和微妙的路由方法之间取得平衡。对于模块，我们通过使用张量积操作重新参数化低秩适应（LoRA），并将结果命名为TLoRA。对于路由函数，我们根据粒度量身定制了两种创新的路由函数：TensorPoly-I指导到纠缠张量中的每个秩，而TensorPoly-II提供了更精细的路由方法，针对纠缠张量的每个阶。来自多任务T0基准的实验结果表明：1）所有模块化LM均超越相应的密集方法，突显了模块化语言模型在减轻多任务学习中的负推理和提供卓越结果的潜力。2）TensorPoly-I在适应中具有更高的参数效率，并优于其他模块化LM，这显示了我们方法在多任务迁移学习中的潜力。

更新时间: 2024-05-26 19:25:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16671v1

Provably Efficient Off-Policy Adversarial Imitation Learning with Convergence Guarantees

Adversarial Imitation Learning (AIL) faces challenges with sample inefficiency because of its reliance on sufficient on-policy data to evaluate the performance of the current policy during reward function updates. In this work, we study the convergence properties and sample complexity of off-policy AIL algorithms. We show that, even in the absence of importance sampling correction, reusing samples generated by the $o(\sqrt{K})$ most recent policies, where $K$ is the number of iterations of policy updates and reward updates, does not undermine the convergence guarantees of this class of algorithms. Furthermore, our results indicate that the distribution shift error induced by off-policy updates is dominated by the benefits of having more data available. This result provides theoretical support for the sample efficiency of off-policy AIL algorithms. To the best of our knowledge, this is the first work that provides theoretical guarantees for off-policy AIL algorithms.

Updated: 2024-05-26 19:17:32

标题: 可以翻译为：具有收敛保证的可证明高效的离策略对抗性模仿学习

摘要: 对抗性模仿学习（AIL）面临着样本效率的挑战，因为它依赖足够的在线数据来评估当前策略在奖励函数更新期间的性能。在这项工作中，我们研究了离线策略AIL算法的收敛特性和样本复杂度。我们表明，即使没有重要性抽样校正，在最近生成的$ o（\sqrt{K}）$个策略中重用样本，其中$ K $是策略更新和奖励更新的迭代次数，也不会损害这类算法的收敛保证。此外，我们的结果表明，由离线更新引起的分布漂移误差被更多数据可用所带来的好处所主导。这个结果为离线AIL算法的样本效率提供了理论支持。据我们所知，这是第一个为离线AIL算法提供理论保证的工作。

更新时间: 2024-05-26 19:17:32

领域: cs.LG

下载: http://arxiv.org/abs/2405.16668v1

Machine Learning and Data Analysis Using Posets: A Survey

Posets are discrete mathematical structures which are ubiquitous in a broad range of data analysis and machine learning applications. Research connecting posets to the data science domain has been ongoing for many years. In this paper, a comprehensive review of a wide range of studies on data analysis and machine learning using posets are examined in terms of their theory, algorithms and applications. In addition, the applied lattice theory domain of formal concept analysis will also be highlighted in terms of its machine learning applications.

Updated: 2024-05-26 19:16:11

标题: 用偏序集进行机器学习和数据分析：一项调查

摘要: Posets是离散数学结构，在广泛的数据分析和机器学习应用中无处不在。将posets与数据科学领域相连接的研究已经持续多年。本文综合审查了关于使用posets进行数据分析和机器学习的一系列研究，涵盖它们的理论、算法和应用。此外，还将重点介绍形式概念分析的应用格论领域在机器学习应用中的作用。

更新时间: 2024-05-26 19:16:11

领域: cs.LG,06A06

下载: http://arxiv.org/abs/2404.03082v2

Comments on Friedman's Method for Class Distribution Estimation

The purpose of class distribution estimation (also known as quantification) is to determine the values of the prior class probabilities in a test dataset without class label observations. A variety of methods to achieve this have been proposed in the literature, most of them based on the assumption that the distributions of the training and test data are related through prior probability shift (also known as label shift). Among these methods, Friedman's method has recently been found to perform relatively well both for binary and multi-class quantification. We discuss the properties of Friedman's method and another approach mentioned by Friedman (called DeBias method in the literature) in the context of a general framework for designing linear equation systems for class distribution estimation.

Updated: 2024-05-26 19:13:51

标题: Friedman的类别分布估计方法评论

摘要: 类别分布估计（也称为量化）的目的是在没有类别标签观测的测试数据集中确定先验类别概率的值。文献中提出了各种方法来实现这一目标，其中大多数基于一个假设，即训练数据和测试数据的分布通过先验概率转移（也称为标签转移）相关。在这些方法中，弗里德曼的方法最近被发现在二元和多类量化中表现相对良好。我们讨论了弗里德曼方法的特性以及弗里德曼在文献中提到的另一种方法（在文献中称为DeBias方法），并将其置于设计线性方程系统以用于类别分布估计的一般框架中进行讨论。

更新时间: 2024-05-26 19:13:51

领域: cs.LG,stat.ML,62F10, 62P30

下载: http://arxiv.org/abs/2405.16666v1

Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevance of question-and-answer interactions are critically tested against complex medical challenges. By combining existing clinical medical data with atypical natural phenomena, we redefine two types of attacks: mismatched malicious attack (2M-attack) and optimized mismatched malicious attack (O2M-attack). Using our own constructed voluminous 3MAD dataset, which covers a wide range of medical image modalities and harmful medical scenarios, we conduct a comprehensive analysis and propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs. Evaluations with this dataset and novel attack methods, including white-box attacks on LLaVA-Med and transfer attacks on four other state-of-the-art models, indicate that even MedMLLMs designed with enhanced security features are vulnerable to security breaches. Our work underscores the urgent need for a concerted effort to implement robust security measures and enhance the safety and efficacy of open-source MedMLLMs, particularly given the potential severity of jailbreak attacks and other malicious or clinically significant exploits in medical settings. For further research and replication, anonymous access to our code is available at https://github.com/dirtycomputer/O2M_attack. Warning: Medical large model jailbreaking may generate content that includes unverified diagnoses and treatment recommendations. Always consult professional medical advice.

Updated: 2024-05-26 19:11:21

标题: 跨模态越狱和医学多模式大型语言模型的不匹配攻击

摘要: 与大型语言模型（LLMs）相关的安全问题已被广泛探讨，但是多模态大型语言模型（MLLMs），特别是在医学背景下（MedMLLMs）的安全影响仍然研究不足。本文深入探讨了MedMLLMs的未被充分探讨的安全漏洞，特别是当它们部署在临床环境中时，问题与回答交互的准确性和相关性将受到对抗复杂医疗挑战的严峻考验。通过将现有临床医学数据与非典型自然现象相结合，我们重新定义了两种类型的攻击：不匹配的恶意攻击（2M-attack）和优化的不匹配恶意攻击（O2M-attack）。我们使用自己构建的庞大的3MAD数据集，涵盖了各种医学影像模态和有害的医疗情景，进行了全面分析并提出了MCM优化方法，显著提高了对MedMLLMs的攻击成功率。使用这个数据集和新颖的攻击方法，包括对LLaVA-Med的白盒攻击和对其他四种最先进模型的转移攻击，评估表明，即使设计有增强安全功能的MedMLLMs也容易受到安全漏洞的影响。我们的工作强调了实施强有力的安全措施，并增强开源MedMLLMs的安全性和有效性的迫切需要，特别是考虑到在医疗环境中潜在的越狱攻击和其他恶意或临床重要的利用。为了进一步研究和复制，我们的代码可匿名访问https://github.com/dirtycomputer/O2M_attack。警告：医学大模型越狱可能生成包括未经验证的诊断和治疗建议在内的内容。请始终咨询专业医疗建议。

更新时间: 2024-05-26 19:11:21

领域: cs.CR,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2405.20775v1

Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust

We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erd\H{o}s-R\'enyi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).

Updated: 2024-05-26 18:59:44

标题: 随机图的私有边缘密度估计：最佳、高效和稳健

摘要: 我们提出了第一个多项式时间、差异节点私密性和鲁棒性算法，用于估计Erd\H{o}s-R\'enyi随机图及其推广，不均匀随机图的边密度。我们进一步证明了信息论下界，表明我们算法的误差率在对数因子上是最优的。先前的算法要么运行时间指数级增长，要么误差率次优。我们算法的两个关键要素是（1）用于鲁棒边密度估计的新的平方和算法，以及（2）基于Hopkins等人（STOC 2023）的平方和指数机制，将隐私性降低到鲁棒性。

更新时间: 2024-05-26 18:59:44

领域: cs.DS,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.16663v1

RLSF: Reinforcement Learning via Symbolic Feedback

In recent years, large language models (LLMs) have had a dramatic impact on various sub-fields of AI, most notably on natural language understanding tasks. However, there is widespread agreement that the logical reasoning capabilities of contemporary LLMs are, at best, fragmentary (i.e., may work well on some problem instances but fail dramatically on others). While traditional LLM fine-tuning approaches (e.g., those that use human feedback) do address this problem to some degree, they suffer from many issues, including unsound black-box reward models, difficulties in collecting preference data, and sparse scalar reward values. To address these challenges, we propose a new training/fine-tuning paradigm we refer to as Reinforcement Learning via Symbolic Feedback (RLSF), which is aimed at enhancing the reasoning capabilities of LLMs. In the RLSF setting, the LLM that is being trained/fine-tuned is considered as the RL agent, while the environment is allowed access to reasoning or domain knowledge tools (e.g., solvers, algebra systems). Crucially, in RLSF, these reasoning tools can provide feedback to the LLMs via poly-sized certificates (e.g., proofs), that characterize errors in the LLM-generated object with respect to some correctness specification. The ability of RLSF-based training/fine-tuning to leverage certificate-generating symbolic tools enables sound fine-grained (token-level) reward signals to LLMs, and thus addresses the limitations of traditional reward models mentioned above. Via extensive evaluations, we show that our RLSF-based fine-tuning of LLMs outperforms traditional approaches on two different applications, namely, program synthesis from natural language pseudo-code to programming language (C++) and solving the Game of 24.

Updated: 2024-05-26 18:49:59

标题: RLSF: 通过符号反馈进行强化学习

摘要: 近年来，大型语言模型（LLMs）对人工智能的各个子领域产生了巨大影响，尤其是对自然语言理解任务。然而，普遍认为当前LLMs的逻辑推理能力充其量是零碎的（即，在某些问题实例上可能表现良好，但在其他问题上则失败惨重）。虽然传统的LLM微调方法（例如使用人类反馈的方法）在一定程度上解决了这个问题，但它们存在许多问题，包括不可靠的黑盒奖励模型、收集偏好数据的困难和稀疏的标量奖励值。为了解决这些挑战，我们提出了一种新的训练/微调范式，我们称之为通过符号反馈进行强化学习（RLSF），旨在增强LLMs的推理能力。在RLSF设置中，被训练/微调的LLM被视为RL代理，而环境可以访问推理或领域知识工具（例如求解器、代数系统）。在RLSF中，这些推理工具能够通过多项大小的证书（例如证明）向LLMs提供反馈，这些证书描述了LLM生成对象与某些正确性规范之间的错误。RLSF基于训练/微调利用生成证书的符号工具使LLMs获得合理的细粒度（标记级别）奖励信号，从而解决了上述传统奖励模型的局限性。通过广泛的评估，我们展示了我们基于RLSF对LLMs进行微调在两个不同应用程序上的表现优于传统方法，即从自然语言伪代码到编程语言（C++）的程序合成和解决24点游戏。

更新时间: 2024-05-26 18:49:59

领域: cs.CL,cs.AI,cs.LG,cs.LO

下载: http://arxiv.org/abs/2405.16661v1

Long Story Short: Omitted Variable Bias in Causal Machine Learning

We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such assumptions are made. We show how simple plausibility judgments on the maximum explanatory power of omitted variables are sufficient to bound the magnitude of the bias, thus facilitating sensitivity analysis in otherwise complex, nonlinear models. Finally, we provide flexible and efficient statistical inference methods for the bounds, which can leverage modern machine learning algorithms for estimation. These results allow empirical researchers to perform sensitivity analyses in a flexible class of machine-learned causal models using very simple, and interpretable, tools. We demonstrate the utility of our approach with two empirical examples.

Updated: 2024-05-26 18:43:02

标题: 长话短说：因果机器学习中的遗漏变量偏差

摘要: 我们开发了一个普遍的遗漏变量偏误理论，适用于广泛的常见因果参数，包括（但不限于）潜在结果的平均值、平均治疗效应、平均因果导数以及来自协变量变化的政策效应。我们的理论适用于非参数模型，同时在做出（半）参数限制（如部分线性）的情况下也是自然的。我们展示了如何简单的对遗漏变量的最大解释力进行合理判断就足以界定偏误的幅度，从而在复杂的非线性模型中促进了敏感性分析。最后，我们提供了灵活高效的边界统计推断方法，可以利用现代机器学习算法进行估计。这些结果使实证研究人员能够使用非常简单且可解释的工具在机器学习因果模型的灵活类别中进行敏感性分析。我们通过两个实证例子展示了我们方法的实用性。

更新时间: 2024-05-26 18:43:02

领域: econ.EM,cs.LG,stat.ME,stat.ML,62G

下载: http://arxiv.org/abs/2112.13398v5

Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative binary operations. To further accelerate, we elucidate arithmetic operations through the lens of the Kolmogorov-Arnold (KA) representation theorem, revealing its correspondence to the transformer architecture: embedding, decoder block, and classifier. Observing the shared structure between KA representations associated with binary operations, we suggest various transfer learning mechanisms that expedite grokking. This interpretation is substantiated through a series of rigorous experiments. In addition, our approach is successful in learning two nonstandard arithmetic tasks: composition of operations and a system of equations. Furthermore, we reveal that the model is capable of learning arithmetic operations using a limited number of tokens under embedding transfer, which is supported by a set of experiments as well.

Updated: 2024-05-26 18:29:24

标题: 通过科尔莫戈洛夫-阿诺德表示加速算术运算理解的学习

摘要: 我们提出了旨在加速“领悟”现象的新方法论，该现象指的是在长时间过拟合后测试准确性迅速增加，正如~\cite{power2022grokking}所报道的。我们专注于通过Transformer模型学习算术二进制运算中出现的“领悟”现象，我们从交换性二进制运算的情况下讨论数据增强开始。为了进一步加速，我们通过Kolmogorov-Arnold（KA）表示定理的视角阐明算术运算，揭示其与Transformer架构的对应关系：嵌入、解码器块和分类器。观察到与二进制运算相关的KA表示之间的共享结构，我们提出了各种能加速“领悟”的迁移学习机制。通过一系列严格的实验来证实这种解释。此外，我们的方法在学习两个非标准算术任务方面取得了成功：运算组合和方程组。此外，我们揭示模型能够在嵌入迁移的支持下使用有限数量的令牌学习算术运算，这也得到了一系列实验的支持。

更新时间: 2024-05-26 18:29:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16658v1

Predicting Likely-Vulnerable Code Changes: Machine Learning-based Vulnerability Protections for Android Open Source Project

This paper presents a framework that selectively triggers security reviews for incoming source code changes. Functioning as a review bot within a code review service, the framework can automatically request additional security reviews at pre-submit time before the code changes are submitted to a source code repository. Because performing such secure code reviews add cost, the framework employs a classifier trained to identify code changes with a high likelihood of vulnerabilities. The online classifier leverages various types of input features to analyze the review patterns, track the software engineering process, and mine specific text patterns within given code changes. The classifier and its features are meticulously chosen and optimized using data from the submitted code changes and reported vulnerabilities in Android Open Source Project (AOSP). The evaluation results demonstrate that our Vulnerability Prevention (VP) framework identifies approximately 80% of the vulnerability-inducing code changes in the dataset with a precision ratio of around 98% and a false positive rate of around 1.7%. We discuss the implications of deploying the VP framework in multi-project settings and future directions for Android security research. This paper explores and validates our approach to code change-granularity vulnerability prediction, offering a preventive technique for software security by preemptively detecting vulnerable code changes before submission.

Updated: 2024-05-26 18:17:46

标题: 预测可能脆弱的代码更改：基于机器学习的Android开源项目漏洞保护

摘要: 本文提出了一个框架，可以有选择地触发对传入源代码更改的安全审查。作为代码审查服务中的审查机器人，该框架可以在代码更改提交到源代码存储库之前，在提交前自动请求额外的安全审查。由于执行此类安全代码审查会增加成本，因此该框架采用了一个经过训练的分类器，用于识别存在高潜在风险漏洞的代码更改。在线分类器利用各种类型的输入特征来分析审查模式，跟踪软件工程过程，并挖掘给定代码更改中的特定文本模式。分类器及其特征是通过使用提交的代码更改数据和Android开源项目（AOSP）中报告的漏洞进行精心选择和优化的。评估结果表明，我们的漏洞预防（VP）框架可以识别数据集中约80%的引起漏洞的代码更改，精确率约为98%，误报率约为1.7%。我们讨论了在多项目设置中部署VP框架的影响以及Android安全研究的未来方向。本文探讨并验证了我们对代码更改粒度漏洞预测的方法，提供了一种通过在提交之前预先检测存在漏洞的代码更改来提前检测软件安全问题的预防技术。

更新时间: 2024-05-26 18:17:46

领域: cs.CR,cs.AI,cs.CY,cs.LG,cs.SE

下载: http://arxiv.org/abs/2405.16655v1

Self-Infilling Code Generation

This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. Our approach capitalizes on the observation that recent infilling-capable code language models can self-infill: whereas infilling operations aim to fill in the middle based on a predefined prefix and suffix, self-infilling sequentially generates both such surrounding context and the infilled content. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding, evolving it into a non-monotonic process. Interruptions allow for postponing the generation of specific code until a definitive suffix is established, enhancing control over the output. Meanwhile, the looping mechanism, which leverages the complementary nature of self-infilling and left-to-right decoding, can iteratively update and synchronize each piece of generation cyclically. Extensive experiments are conducted to demonstrate that our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.

Updated: 2024-05-26 18:15:39

标题: 自我填充式代码生成

摘要: 这项工作介绍了自动填充代码生成，这是一个将填充操作整合到自回归解码中的通用框架。我们的方法利用了最近具备填充能力的代码语言模型可以自动填充的观察结果：填充操作旨在根据预定义的前缀和后缀填充中间部分，而自动填充则顺序生成这两个周围上下文和填充内容。我们利用这一能力在传统解码中引入了新颖的中断和循环机制，将其演变为非单调过程。中断允许推迟生成特定代码，直到确切的后缀被确定，增强对输出的控制。同时，循环机制利用自动填充和从左到右解码的互补性质，可以循环地更新和同步每个生成部分。进行了大量实验来证明我们提出的解码过程在提高多个代码生成基准上的规律性和质量方面是有效的。

更新时间: 2024-05-26 18:15:39

领域: cs.PL,cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.17972v3

Improved off-policy training of diffusion samplers

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.

Updated: 2024-05-26 18:06:40

标题: 改进的离策略扩散采样器训练

摘要: 我们研究了训练扩散模型以从具有给定非标准化密度或能量函数的分布中采样的问题。我们对几种扩散结构推断方法进行了基准测试，包括基于模拟的变分方法和离线策略方法（连续生成流网络）。我们的结果揭示了现有算法的相对优势，同时对过去工作中的一些声明提出了质疑。我们还提出了一种基于本地搜索在目标空间中使用回放缓冲区的离线策略方法的新颖探索策略，并展示了它在各种目标分布上提高了样本质量。我们的采样方法和基准研究的代码可通过https://github.com/GFNOrg/gfn-diffusion 公开获取，作为未来关于扩散模型用于摊销推断的基础。

更新时间: 2024-05-26 18:06:40

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.05098v3

Unveiling the Secrets: How Masking Strategies Shape Time Series Imputation

In this study, we explore the impact of different masking strategies on time series imputation models. We evaluate the effects of pre-masking versus in-mini-batch masking, normalization timing, and the choice between augmenting and overlaying artificial missingness. Using three diverse datasets, we benchmark eleven imputation models with different missing rates. Our results demonstrate that masking strategies significantly influence imputation accuracy, revealing that more sophisticated and data-driven masking designs are essential for robust model evaluation. We advocate for refined experimental designs and comprehensive disclosureto better simulate real-world patterns, enhancing the practical applicability of imputation models.

Updated: 2024-05-26 18:05:12

标题: 揭示秘密：面具策略如何塑造时间序列填补

摘要: 在这项研究中，我们探讨了不同掩盖策略对时间序列插补模型的影响。我们评估了预掩盖与小批量掩盖、归一化时机以及增强和叠加人为缺失选择之间的影响。使用三个不同的数据集，我们对十一个插补模型进行了基准测试，其中包括不同的缺失率。我们的结果表明，掩盖策略显著影响插补准确性，表明更复杂和数据驱动的掩盖设计对于稳健模型评估至关重要。我们主张改进实验设计和全面披露，以更好地模拟真实世界模式，提高插补模型的实际适用性。

更新时间: 2024-05-26 18:05:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.17508v1

A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i.e., experts, through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory or computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in MoE architecture is largely unexplored. To the best of our knowledge, this paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy, while significantly reducing the model size and the computational requirements. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models such as VMoE and E3MoE finetuned on benchmark datasets such as CIFAR10, CIFAR100, and ImageNet.

Updated: 2024-05-26 17:52:58

标题: 一种在精细调整稀疏专家混合模型中修剪专家的可证明有效方法

摘要: 专家稀疏门控混合专家（MoE）架构通过可训练的路由器将不同的输入发送到不同的子网络，即专家。MoE架构显著减少了大型模型的训练计算量，但对于一些下游任务，部署可能仍然会占用大量内存或计算资源。模型修剪是减少推理计算的常用方法，但在MoE架构中的应用尚未得到广泛探讨。据我们所知，本文提供了第一个在微调MoE模型中修剪专家的经过证明的有效技术。我们在理论上证明，优先修剪与预训练模型的路由器l2范数变化较小的专家可以保证测试准确性的保留，同时显著减小模型大小和计算需求。尽管我们的理论分析集中在简化的MoE架构上的二元分类任务，但我们的专家修剪方法已在大型视觉MoE模型上进行验证，例如在基准数据集（如CIFAR10、CIFAR100和ImageNet）上微调的VMoE和E3MoE。

更新时间: 2024-05-26 17:52:58

领域: cs.LG

下载: http://arxiv.org/abs/2405.16646v1

Enabling Weak LLMs to Judge Response Reliability via Meta Ranking

Despite the strong performance of large language models (LLMs) across a wide range of tasks, they still have reliability issues. Previous studies indicate that strong LLMs like GPT-4-turbo excel in evaluating the reliability of responses from LLMs, but face efficiency and local deployment issues. Thus, to enable weak LLMs to effectively assess the reliability of LLM responses, we propose a novel cross-query-comparison-based method called $\textit{Meta Ranking}$ (MR). Unlike previous few-shot methods that solely based on in-context learning capabilities in LLMs, MR assesses reliability by pairwisely ranking the target query-response pair with multiple reference query-response pairs. We found that MR is highly effective in error detection for LLM responses, where weak LLMs, such as Phi-2, could surpass strong baselines like GPT-3.5-turbo, requiring only five reference samples and significantly improving efficiency. We further demonstrate that MR can enhance strong LLMs' performance in two practical applications: model cascading and instruction tuning. In model cascading, we combine open- and closed-source LLMs to achieve performance comparable to GPT-4-turbo with lower costs. In instruction tuning, we use MR for iterative training data filtering, significantly reducing data processing time and enabling LLaMA-7B and Phi-2 to surpass Alpaca-13B with fewer training tokens. These results underscore the high potential of MR in both efficiency and effectiveness.

Updated: 2024-05-26 17:46:42

标题: 通过元排名使弱LLMs能够判断响应可靠性

摘要: 尽管大型语言模型（LLMs）在各种任务中表现出色，但它们仍然存在可靠性问题。先前的研究表明，像GPT-4-turbo这样的强大LLMs在评估LLMs响应可靠性方面表现出色，但面临效率和本地部署问题。因此，为了使弱LLMs能够有效评估LLMs响应的可靠性，我们提出了一种基于交叉查询比较的新方法，称为$\textit{Meta Ranking}$（MR）。与之前仅基于LLMs上下文学习能力的少样本方法不同，MR通过将目标查询-响应对与多个参考查询-响应对进行成对排名来评估可靠性。我们发现，MR在LLMs响应错误检测方面非常有效，其中弱LLMs（如Phi-2）可以超越GPT-3.5-turbo等强基线，仅需要五个参考样本并显著提高效率。我们进一步证明，MR可以提高强大LLMs在两个实际应用中的性能：模型级联和指导调整。在模型级联中，我们结合开源和闭源LLMs，以较低成本实现与GPT-4-turbo相媲美的性能。在指导调整中，我们使用MR进行迭代训练数据过滤，显著减少数据处理时间，并使LLaMA-7B和Phi-2能够以更少的训练令牌超越Alpaca-13B。这些结果突显了MR在效率和效果方面的潜力。

更新时间: 2024-05-26 17:46:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.12146v2

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.

Updated: 2024-05-26 17:45:21

标题: 逆向诅咒：以“A is B”为训练的LLMs无法学习“B is A”

摘要: 我们揭示了自回归大型语言模型(LLMs)在泛化方面的一个令人惊讶的失败。如果模型在形式为“A是B”的句子上训练，它不会自动泛化到反向方向“B是A”。这就是反向诅咒。例如，如果一个模型在“瓦伦蒂娜·捷列什科娃是第一个到太空旅行的女人”上训练，它不会自动回答“谁是第一个到太空旅行的女人？”的问题。此外，正确答案（“瓦伦蒂娜·捷列什科娃”）的可能性也不会高于一个随机名字。因此，模型没有泛化到它们训练集中的一种普遍模式：如果“A是B”发生，那么“B是A”更有可能发生。然而，值得注意的是，如果“A是B”出现在上下文中，模型可以推断出反向关系。我们通过对虚构陈述（如“乌赖亚·霍桑是深渊旋律的作曲家”）进行微调GPT-3和Llama-1，并展示它们无法正确回答“谁作曲了深渊旋律？”来提供反向诅咒的证据。反向诅咒在不同模型大小和模型系列中是稳健的，并且不受数据增强的缓解。我们还评估了ChatGPT（GPT-3.5和GPT-4）对于有关现实世界名人的问题，如“汤姆·克鲁斯的母亲是谁？[答：玛丽·李·菲佛]”和反向问题“玛丽·李·菲佛的儿子是谁？”。与后者相比，GPT-4在79%的时间内正确回答类似前者的问题，而后者为33%。代码可在以下链接找到：https://github.com/lukasberglund/reversal_curse。

更新时间: 2024-05-26 17:45:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.12288v4

Foundation Policies with Hilbert Representations

Unsupervised and self-supervised objectives, such as next token prediction, have enabled pre-training generalist models from large amounts of unlabeled data. In reinforcement learning (RL), however, finding a truly general and scalable unsupervised pre-training objective for generalist policies from offline data remains a major open question. While a number of methods have been proposed to enable generic self-supervised RL, based on principles such as goal-conditioned RL, behavioral cloning, and unsupervised skill learning, such methods remain limited in terms of either the diversity of the discovered behaviors, the need for high-quality demonstration data, or the lack of a clear adaptation mechanism for downstream tasks. In this work, we propose a novel unsupervised framework to pre-train generalist policies that capture diverse, optimal, long-horizon behaviors from unlabeled offline data such that they can be quickly adapted to any arbitrary new tasks in a zero-shot manner. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment, and then to span this learned latent space with directional movements, which enables various zero-shot policy "prompting" schemes for downstream tasks. Through our experiments on simulated robotic locomotion and manipulation benchmarks, we show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion, even often outperforming prior methods designed specifically for each setting. Our code and videos are available at https://seohong.me/projects/hilp/.

Updated: 2024-05-26 17:44:52

标题: 基于希尔伯特表示的基金会政策

摘要: 无监督和自监督目标，比如接下来的标记预测，已经使得可以从大量未标记数据中预训练通用模型。然而，在强化学习（RL）中，找到一个真正通用和可扩展的无监督预训练目标，以从离线数据中为通用政策进行预训练，仍然是一个重要的开放问题。虽然已经提出了许多方法来实现通用的自监督RL，基于目标条件RL、行为克隆和无监督技能学习等原则，但这些方法在发现行为的多样性、需要高质量示范数据，或者缺乏明确的下游任务适应机制方面仍然存在局限性。在这项工作中，我们提出了一个新颖的无监督框架，用于从未标记的离线数据中预训练通用政策，捕捉多样化、最佳、长视程行为，以便它们可以快速适应任何任意新任务的零射击方式。我们的关键洞察是学习一个结构化表示，保留潜在环境的时间结构，然后用方向运动跨越这个学习到的潜在空间，这使得各种零射击政策“提示”方案适用于下游任务。通过我们在模拟机器人运动和操作基准测试上的实验，我们展示了我们的无监督政策可以以零射击方式解决目标条件和一般RL任务，甚至经常优于专门为每种情况设计的先前方法。我们的代码和视频可在https://seohong.me/projects/hilp/。

更新时间: 2024-05-26 17:44:52

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2402.15567v2

Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning

In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Our findings reveal that the fastest rate of normal approximation is achieved when setting the most aggressive step size $\alpha_{k} \asymp k^{-1/2}$. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.

Updated: 2024-05-26 17:43:30

标题: 高斯逼近和乘法自举用于Polyak-Ruppert平均线性随机逼近，及其在TD学习中的应用

摘要: 在本文中，我们得到了针对线性随机逼近（LSA）算法Polyak-Ruppert平均迭代的多元正态逼近的Berry-Esseen界限，步长递减。我们的研究结果显示，当设定最激进的步长$\alpha_{k} \asymp k^{-1/2}$时，正态逼近的最快速率被实现。此外，我们证明了基于乘法器自举的LSA参数估计置信区间的非渐近有效性。这个过程在后续观测到达时，将LSA估计与一组随机扰动的LSA估计一起更新。我们在具有线性函数逼近的时间差学习设置中说明了我们的发现。

更新时间: 2024-05-26 17:43:30

领域: stat.ML,cs.LG,math.OC,math.PR,math.ST,stat.TH,60F05, 62L20, 62E20

下载: http://arxiv.org/abs/2405.16644v1

On Convergence of the Alternating Directions SGHMC Algorithm

We study convergence rates of Hamiltonian Monte Carlo (HMC) algorithms with leapfrog integration under mild conditions on stochastic gradient oracle for the target distribution (SGHMC). Our method extends standard HMC by allowing the use of general auxiliary distributions, which is achieved by a novel procedure of Alternating Directions. The convergence analysis is based on the investigations of the Dirichlet forms associated with the underlying Markov chain driving the algorithms. For this purpose, we provide a detailed analysis on the error of the leapfrog integrator for Hamiltonian motions with both the kinetic and potential energy functions in general form. We characterize the explicit dependence of the convergence rates on key parameters such as the problem dimension, functional properties of both the target and auxiliary distributions, and the quality of the oracle.

Updated: 2024-05-26 17:40:30

标题: 关于交替方向SGHMC算法的收敛性

摘要: 我们研究了在目标分布（SGHMC）的随机梯度预言条件下，具有跃点积分的哈密尔顿蒙特卡罗（HMC）算法的收敛速率。我们的方法通过允许使用一般辅助分布来扩展标准HMC，这是通过交替方向的新颖程序实现的。收敛分析基于与驱动算法的基础马尔可夫链相关联的狄利克雷形式的研究。为此，我们对哈密尔顿运动的跃点积分器的误差进行了详细分析，其中动能和势能函数具有一般形式。我们表征了收敛速率对关键参数（如问题维度、目标和辅助分布的功能属性以及预言的质量）的显式依赖。

更新时间: 2024-05-26 17:40:30

领域: math.ST,cs.LG,math.PR,stat.TH

下载: http://arxiv.org/abs/2405.13140v2

Pick up the PACE: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

A key challenge in lifelong reinforcement learning (RL) is the loss of plasticity, where previous learning progress hinders an agent's adaptation to new tasks. While regularization and resetting can help, they require precise hyperparameter selection at the outset and environment-dependent adjustments. Building on the principled theory of online convex optimization, we present a parameter-free optimizer for lifelong RL, called PACE, which requires no tuning or prior knowledge about the distribution shifts. Extensive experiments on Procgen, Atari, and Gym Control environments show that PACE works surprisingly well$\unicode{x2013}$mitigating loss of plasticity and rapidly adapting to challenging distribution shifts$\unicode{x2013}$despite the underlying optimization problem being nonconvex and nonstationary.

Updated: 2024-05-26 17:38:44

标题: 拾起PACE：一种无参数优化器用于终生强化学习

摘要: 终身强化学习（RL）中的一个关键挑战是可塑性的丧失，先前的学习进展妨碍了代理人对新任务的适应。虽然正则化和重置可以帮助，但它们需要在最初精确选择超参数并进行环境相关调整。基于在线凸优化的原则性理论，我们提出了一种无参数调节的终身强化学习优化器，名为PACE，不需要调整或关于分布变化的先验知识。在Procgen、Atari和Gym Control环境上进行的大量实验表明，PACE表现出乎意料的好，可以减少可塑性的丧失，并快速适应具有挑战性的分布变化，尽管基础优化问题是非凸和非稳态的。

更新时间: 2024-05-26 17:38:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16642v1

State of the Art in Fair ML: From Moral Philosophy and Legislation to Fair Classifiers

Machine learning is becoming an ever present part in our lives as many decisions, e.g. to lend a credit, are no longer made by humans but by machine learning algorithms. However those decisions are often unfair and discriminating individuals belonging to protected groups based on race or gender. With the recent General Data Protection Regulation (GDPR) coming into effect, new awareness has been raised for such issues and with computer scientists having such a large impact on peoples lives it is necessary that actions are taken to discover and prevent discrimination. This work aims to give an introduction into discrimination, legislative foundations to counter it and strategies to detect and prevent machine learning algorithms from showing such behavior.

Updated: 2024-05-26 17:32:06

标题: 公平机器学习的现状：从道德哲学和立法到公平分类器

摘要: 机器学习正逐渐成为我们生活中不可或缺的一部分，许多决定，例如发放信用，不再由人类而是由机器学习算法做出。然而，这些决定往往是不公平的，基于种族或性别歧视属于受保护群体的个人。随着最近实施的《通用数据保护条例》（GDPR），对这些问题引起了新的关注，由于计算机科学家对人们生活产生了如此巨大的影响，有必要采取行动来发现和防止歧视。本研究旨在介绍歧视、打击歧视的立法基础以及检测和防止机器学习算法表现出这种行为的策略。

更新时间: 2024-05-26 17:32:06

领域: cs.CY,cs.LG,stat.ML

下载: http://arxiv.org/abs/1811.09539v2

A Survey of Multimodal Large Language Model from A Data-centric Perspective

Human beings perceive the world through diverse senses such as sight, smell, hearing, and touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of traditional large language models by integrating and processing data from multiple modalities including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Specifically, we explore methods for preparing multimodal data during the pretraining and adaptation phases of MLLMs. Additionally, we analyze the evaluation methods for datasets and review benchmarks for evaluating MLLMs. Our survey also outlines potential future research directions. This work aims to provide researchers with a detailed understanding of the data-driven aspects of MLLMs, fostering further exploration and innovation in this field.

Updated: 2024-05-26 17:31:21

标题: 一个数据中心视角下的多模态大语言模型调查

摘要: 人类通过多种感官如视觉、嗅觉、听觉和触觉感知世界。同样，多模态大型语言模型（MLLMs）通过整合和处理来自多种模态的数据，包括文本、视觉、音频、视频和3D环境，增强了传统大型语言模型的能力。数据在这些模型的开发和完善中起着关键作用。在本调查中，我们从数据中心的角度全面审查了关于MLLMs的文献。具体来说，我们探讨了在MLLMs的预训练和适应阶段准备多模态数据的方法。此外，我们分析了数据集的评估方法，并审查了评估MLLMs的基准。我们的调查还概述了潜在的未来研究方向。这项工作旨在为研究人员提供对MLLMs数据驱动方面的详细了解，促进该领域的进一步探索和创新。

更新时间: 2024-05-26 17:31:21

领域: cs.AI,cs.CL,cs.CV,cs.MM

下载: http://arxiv.org/abs/2405.16640v1

A unified law of robustness for Bregman divergence losses

In contemporary deep learning practice, models are often trained to near zero loss i.e. to nearly interpolate the training data. However, the number of parameters in the model is usually far more than the number of data points $n$, the theoretical minimum needed for interpolation: a phenomenon referred to as overparameterization. In an interesting piece of work that contributes to the considerable research that has been devoted to understand overparameterization, Bubeck, and Sellke showed that for a broad class of covariate distributions (specifically those satisfying a natural notion of concentration of measure), overparameterization is necessary for robust interpolation i.e. if the interpolating function is required to be Lipschitz. However, their robustness results were proved only in the setting of regression with square loss. In practice, however many other kinds of losses are used, e.g. cross entropy loss for classification. In this work, we generalize Bubeck and Selke's result to Bregman divergence losses, which form a common generalization of square loss and cross-entropy loss. Our generalization relies on identifying a bias variance-type decomposition that lies at the heart of the proof and Bubeck and Sellke.

Updated: 2024-05-26 17:30:44

标题: 一个关于Bregman散度损失鲁棒性的统一定律

摘要: 在当代深度学习实践中，模型通常被训练到接近零损失，即几乎插值训练数据。然而，模型中的参数数量通常远远超过数据点$n$的数量，这是插值所需的理论最小值，这种现象被称为过参数化。在一项有趣的工作中，Bubeck和Sellke表明，对于一类广泛的协变量分布（特别是那些满足自然测度集中的概念的分布），过参数化是实现稳健插值所必需的，即如果要求插值函数是Lipschitz的。然而，他们的稳健性结果仅在回归与平方损失的情况下得到证明。然而，在实践中，还会使用许多其他类型的损失，例如分类中的交叉熵损失。在这项工作中，我们将Bubeck和Selke的结果推广到Bregman散度损失，这是平方损失和交叉熵损失的常见推广。我们的推广依赖于识别出一个偏差-方差类型的分解，这是证明和Bubeck与Sellke的核心所在。

更新时间: 2024-05-26 17:30:44

领域: cs.LG

下载: http://arxiv.org/abs/2405.16639v1

Evolution and learning in differentiable robots

The automatic design of robots has existed for 30 years but has been constricted by serial non-differentiable design evaluations, premature convergence to simple bodies or clumsy behaviors, and a lack of sim2real transfer to physical machines. Thus, here we employ massively-parallel differentiable simulations to rapidly and simultaneously optimize individual neural control of behavior across a large population of candidate body plans and return a fitness score for each design based on the performance of its fully optimized behavior. Non-differentiable changes to the mechanical structure of each robot in the population -- mutations that rearrange, combine, add, or remove body parts -- were applied by a genetic algorithm in an outer loop of search, generating a continuous flow of novel morphologies with highly-coordinated and graceful behaviors honed by gradient descent. This enabled the exploration of several orders-of-magnitude more designs than all previous methods, despite the fact that robots here have the potential to be much more complex, in terms of number of independent motors, than those in prior studies. We found that evolution reliably produces ``increasingly differentiable'' robots: body plans that smooth the loss landscape in which learning operates and thereby provide better training paths toward performant behaviors. Finally, one of the highly differentiable morphologies discovered in simulation was realized as a physical robot and shown to retain its optimized behavior. This provides a cyberphysical platform to investigate the relationship between evolution and learning in biological systems and broadens our understanding of how a robot's physical structure can influence the ability to train policies for it. Videos and code at https://sites.google.com/view/eldir.

Updated: 2024-05-26 17:24:12

标题: Differentiable robots中的进化和学习

摘要: 自动设计机器人已经存在30年，但受到串行非可微设计评估、对简单机体或笨拙行为的过早收敛以及缺乏模拟到实际机器人的转移的限制。因此，在这里，我们利用大规模并行可微模拟，快速并同时优化候选机体方案的神经控制行为，并根据其完全优化行为的表现返回每个设计的适应性评分。种群中每个机器人的机械结构的不可微变化--重新排列、组合、添加或去除机体部件的突变--由遗传算法在搜索的外循环中应用，产生一系列通过梯度下降调优的高度协调和优雅行为的新型形态。尽管这里的机器人在独立马达数量方面可能比以往研究中的机器人复杂得多，但相比所有先前的方法，这使得探索设计数量上相差数个数量级。我们发现进化可靠地产生“越来越可微”的机器人：机体方案平滑了学习操作的损失景观，从而为学习提供更好的训练路径以实现高性能行为。最后，在模拟中发现的一种高度可微形态被实现为物理机器人，并显示保留了其优化行为。这提供了一个研究生物系统中进化和学习关系以及拓展我们对机器人物理结构如何影响训练政策能力的认识的网络物理平台。视频和代码请访问https://sites.google.com/view/eldir。

更新时间: 2024-05-26 17:24:12

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.14712v2

On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

We investigate the computational limits of the memory retrieval dynamics of modern Hopfield models from the fine-grained complexity analysis. Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. Specifically, we establish an upper bound criterion for the norm of input query patterns and memory patterns. Only below this criterion, sub-quadratic (efficient) variants of the modern Hopfield model exist, assuming the Strong Exponential Time Hypothesis (SETH). To showcase our theory, we provide a formal example of efficient constructions of modern Hopfield models using low-rank approximation when the efficient criterion holds. This includes a derivation of a lower bound on the computational time, scaling linearly with $\max\{$\# of stored memory patterns, length of input query sequence$\}$. In addition, we prove its memory retrieval error bound and exponential memory capacity.

Updated: 2024-05-26 17:18:34

标题: 现代Hopfield模型的计算限制：一种细粒度复杂性分析

摘要: 我们从细粒度复杂性分析研究了现代Hopfield模型的记忆检索动态的计算限制。我们的关键贡献是基于模式的范数对所有可能的现代Hopfield模型的效率进行特征化，特别是我们建立了一个阶段转变行为的上界标准，该标准适用于输入查询模式和记忆模式。只有在这个标准以下，假设强指数时间假设（SETH），现代Hopfield模型的次二次（高效）变体存在。为展示我们的理论，当高效标准满足时，我们提供了现代Hopfield模型的高效构造的正式示例，包括在低秩近似时的推导。这包括一个计算时间的下界，与存储的记忆模式数量和输入查询序列长度的最大值线性缩放。此外，我们证明了其记忆检索误差界限和指数内存容量。

更新时间: 2024-05-26 17:18:34

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.04520v4

Enhancing Sustainable Urban Mobility Prediction with Telecom Data: A Spatio-Temporal Framework Approach

Traditional traffic prediction, limited by the scope of sensor data, falls short in comprehensive traffic management. Mobile networks offer a promising alternative using network activity counts, but these lack crucial directionality. Thus, we present the TeltoMob dataset, featuring undirected telecom counts and corresponding directional flows, to predict directional mobility flows on roadways. To address this, we propose a two-stage spatio-temporal graph neural network (STGNN) framework. The first stage uses a pre-trained STGNN to process telecom data, while the second stage integrates directional and geographic insights for accurate prediction. Our experiments demonstrate the framework's compatibility with various STGNN models and confirm its effectiveness. We also show how to incorporate the framework into real-world transportation systems, enhancing sustainable urban mobility.

Updated: 2024-05-26 17:14:50

标题: 利用电信数据增强可持续城市交通预测：一种时空框架方法

摘要: 传统的交通预测受限于传感器数据的范围，无法实现全面的交通管理。移动网络提供了一种有前途的替代方案，利用网络活动计数，但这些计数缺乏关键的方向性。因此，我们提出了TeltoMob数据集，其中包含无向的电信计数和相应的方向流，用于预测道路上的方向移动流。为了解决这个问题，我们提出了一个两阶段的时空图神经网络（STGNN）框架。第一阶段使用经过预训练的STGNN处理电信数据，而第二阶段集成了方向和地理洞察力来进行准确预测。我们的实验证明了该框架与各种STGNN模型的兼容性，并证实了其有效性。我们还展示了如何将该框架整合到现实世界的交通系统中，以增强可持续的城市移动性。

更新时间: 2024-05-26 17:14:50

领域: cs.LG,cs.AI,cs.NI

下载: http://arxiv.org/abs/2405.17507v1

Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM evaluation. To mitigate these issues, we have finetuned LLMs specifically for Vietnamese and developed a comprehensive evaluation framework encompassing 10 common tasks and 31 metrics. Our evaluation results reveal that the fine-tuned LLMs exhibit enhanced comprehension and generative capabilities in Vietnamese. Moreover, our analysis indicates that models with more parameters can introduce more biases and uncalibrated outputs and the key factor influencing LLM performance is the quality of the training or fine-tuning datasets. These insights underscore the significance of meticulous fine-tuning with high-quality datasets in enhancing LLM performance.

Updated: 2024-05-26 17:13:32

标题: 跨越语言视野：越南大型语言模型的微调和综合评估

摘要: 最近大型语言模型（LLMs）的进展强调了它们在人工智能演进中的重要性。然而，尽管在多语言数据集上进行了广泛预训练，可用的开源LLMs在处理越南语方面表现出有限的效果。这一挑战加剧了因为缺乏专门为越南语LLM评估定制的系统化基准数据集和指标。为了缓解这些问题，我们特别针对越南语进行了微调LLMs，并开发了一个包括10个常见任务和31个指标的综合评估框架。我们的评估结果显示，经过微调的LLMs在越南语中展现出增强的理解和生成能力。此外，我们的分析表明，参数更多的模型可能会引入更多的偏见和未校准的输出，影响LLM性能的关键因素是训练或微调数据集的质量。这些见解强调了通过高质量数据集进行细致微调以增强LLM性能的重要性。

更新时间: 2024-05-26 17:13:32

领域: cs.CL,cs.AI,68T50

下载: http://arxiv.org/abs/2403.02715v2

Bayesian Inference with Deep Weakly Nonlinear Networks

We show at a physics level of rigor that Bayesian inference with a fully connected neural network and a shaped nonlinearity of the form $\phi(t) = t + \psi t^3/L$ is (perturbatively) solvable in the regime where the number of training datapoints $P$ , the input dimension $N_0$, the network layer widths $N$, and the network depth $L$ are simultaneously large. Our results hold with weak assumptions on the data; the main constraint is that $P < N_0$. We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature. We report the following results from the first-order computation: 1. When the width $N$ is much larger than the depth $L$ and training set size $P$, neural network Bayesian inference coincides with Bayesian inference using a kernel. The value of $\psi$ determines the curvature of a sphere, hyperbola, or plane into which the training data is implicitly embedded under the feature map. 2. When $LP/N$ is a small constant, neural network Bayesian inference departs from the kernel regime. At zero temperature, neural network Bayesian inference is equivalent to Bayesian inference using a data-dependent kernel, and $LP/N$ serves as an effective depth that controls the extent of feature learning. 3. In the restricted case of deep linear networks ($\psi=0$) and noisy data, we show a simple data model for which evidence and generalization error are optimal at zero temperature. As $LP/N$ increases, both evidence and generalization further improve, demonstrating the benefit of depth in benign overfitting.

Updated: 2024-05-26 17:08:04

标题: 用深度弱非线性网络进行贝叶斯推理

摘要: 我们在物理层面上展示了贝叶斯推断与一个完全连接的神经网络和形状非线性$\phi(t) = t + \psi t^3/L$形式下是（微扰性地）可解的，当训练数据点数$P$，输入维度$N_0$，网络层宽度$N$和网络深度$L$同时很大的情况下。我们的结果在数据上有弱假设；主要约束是$P < N_0$。我们提供了计算模型证据和后验到任意阶数的技术，以及在任意温度下。我们从一阶计算中报告以下结果： 1. 当宽度$N$远大于深度$L$和训练集大小$P$时，神经网络贝叶斯推断与使用核的贝叶斯推断相一致。$\psi$的值决定了训练数据被隐式嵌入的球体、双曲线或平面的曲率。 2. 当$LP/N$是一个小常数时，神经网络贝叶斯推断与核的区别。在零温度下，神经网络贝叶斯推断等价于使用数据相关核的贝叶斯推断，$LP/N$作为一个有效深度，控制特征学习的程度。 3. 在深线性网络（$\psi=0$）和有噪声数据的受限情况下，我们展示了一个简单的数据模型，证据和泛化误差在零温度下是最优的。随着$LP/N$的增加，证据和泛化进一步提高，展示了深度在良性过拟合中的好处。

更新时间: 2024-05-26 17:08:04

领域: stat.ML,cs.AI,cs.LG,math.PR,physics.data-an

下载: http://arxiv.org/abs/2405.16630v1

Competing for pixels: a self-play algorithm for weakly-supervised segmentation

Weakly-supervised segmentation (WSS) methods, reliant on image-level labels indicating object presence, lack explicit correspondence between labels and regions of interest (ROIs), posing a significant challenge. Despite this, WSS methods have attracted attention due to their much lower annotation costs compared to fully-supervised segmentation. Leveraging reinforcement learning (RL) self-play, we propose a novel WSS method that gamifies image segmentation of a ROI. We formulate segmentation as a competition between two agents that compete to select ROI-containing patches until exhaustion of all such patches. The score at each time-step, used to compute the reward for agent training, represents likelihood of object presence within the selection, determined by an object presence detector pre-trained using only image-level binary classification labels of object presence. Additionally, we propose a game termination condition that can be called by either side upon exhaustion of all ROI-containing patches, followed by the selection of a final patch from each. Upon termination, the agent is incentivised if ROI-containing patches are exhausted or disincentivised if an ROI-containing patch is found by the competitor. This competitive setup ensures minimisation of over- or under-segmentation, a common problem with WSS methods. Extensive experimentation across four datasets demonstrates significant performance improvements over recent state-of-the-art methods. Code: https://github.com/s-sd/spurl/tree/main/wss

Updated: 2024-05-26 17:00:17

标题: 争夺像素：一种用于弱监督分割的自我对弈算法

摘要: 弱监督分割（WSS）方法依赖于指示物体存在的图像级标签，缺乏标签和感兴趣区域（ROIs）之间的明确对应关系，构成了一个重要挑战。尽管如此，由于相比完全监督分割具有更低的注释成本，WSS方法受到了关注。利用强化学习（RL）的自我对弈，我们提出了一种新颖的WSS方法，该方法将ROI的图像分割游戏化。我们将分割形式化为两个代理竞争选择包含ROI的补丁，直到所有这样的补丁都被耗尽。每个时间步的得分用于计算代理训练的奖励，代表了选择中物体存在的可能性，由一个仅使用图像级二进制分类标签训练的物体存在检测器确定。此外，我们提出了一个游戏终止条件，任何一方在所有ROI包含的补丁都被耗尽后都可以调用该条件，然后选择最终的补丁。在终止时，如果ROI包含的补丁被耗尽，则代理将获得激励，如果竞争对手发现了包含ROI的补丁，则代理将受到惩罚。这种竞争设置确保了减少过度或不足分割，这是WSS方法的常见问题。通过对四个数据集进行广泛实验，我们展示了与最近最先进方法相比显著的性能改进。代码：https://github.com/s-sd/spurl/tree/main/wss

更新时间: 2024-05-26 17:00:17

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.16628v1

Avoiding Catastrophe in Continuous Spaces by Asking for Help

Most reinforcement learning algorithms with formal regret guarantees assume all mistakes are reversible and essentially rely on trying all possible behaviors. This approach leads to poor outcomes when some mistakes are irreparable or even catastrophic. We propose a variant of the contextual bandit problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff each round represents the chance of avoiding catastrophe that round, and try to maximize the product of payoffs (the overall chance of avoiding catastrophe). We allow a limited number of queries to a mentor and assume a Lipschitz continuous payoff function. We first show that in general, any algorithm either constantly queries the mentor or is nearly guaranteed to cause catastrophe. However, when the mentor policy class has bounded Natarajan dimension and contains at least some "reasonable" policies, we provide an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows. We also present an alternative algorithm which provides the same regret and query guarantees when the mentor's action changes a constant number of times in a 1D state space, and can handle adversarially chosen states.

Updated: 2024-05-26 16:55:07

标题: 通过寻求帮助避免在连续空间中发生灾难

摘要: 大多数具有正式后悔保证的强化学习算法假设所有错误都是可逆的，并且基本上依赖于尝试所有可能的行为。当一些错误是不可修复的甚至是灾难性的时，这种方法会导致不良结果。我们提出了一个变体的情境赌博问题，其目标是最小化灾难发生的概率。具体来说，我们假设每轮的回报代表了那一轮避免灾难的概率，并试图最大化回报的乘积（总体避免灾难的概率）。我们允许有限次向导咨询，并假设回报函数是Lipschitz连续的。我们首先证明，一般来说，任何算法要么不断向导师查询，要么几乎肯定会导致灾难发生。然而，当导师策略类具有有界的Natarajan维度并且至少包含一些“合理”的策略时，我们提供了一种算法，随着时间跨度的增长，其后悔和向导师查询率都接近于0。我们还提出了一种替代算法，当导师的行动在1D状态空间中改变固定次数，并且可以处理对手选择的状态时，可以提供相同的后悔和查询保证。

更新时间: 2024-05-26 16:55:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.08062v2

Neural Feature Learning in Function Space

We present a novel framework for learning system design with neural feature extractors. First, we introduce the feature geometry, which unifies statistical dependence and feature representations in a function space equipped with inner products. This connection defines function-space concepts on statistical dependence, such as norms, orthogonal projection, and spectral decomposition, exhibiting clear operational meanings. In particular, we associate each learning setting with a dependence component and formulate learning tasks as finding corresponding feature approximations. We propose a nesting technique, which provides systematic algorithm designs for learning the optimal features from data samples with off-the-shelf network architectures and optimizers. We further demonstrate multivariate learning applications, including conditional inference and multimodal learning, where we present the optimal features and reveal their connections to classical approaches.

Updated: 2024-05-26 16:53:25

标题: 在功能空间中的神经特征学习

摘要: 我们提出了一个新颖的框架，用于利用神经特征提取器学习系统设计。首先，我们引入特征几何，将统计依赖和特征表示统一在一个具有内积的函数空间中。这种连接在统计依赖上定义了函数空间概念，如范数、正交投影和谱分解，展示了明确的操作意义。特别是，我们将每个学习设置与一个依赖组件关联，并将学习任务制定为找到相应的特征近似值。我们提出了一种嵌套技术，为利用现成网络架构和优化器从数据样本中学习最佳特征提供了系统化的算法设计。我们进一步展示了多元学习应用，包括条件推断和多模式学习，其中我们展示了最佳特征并揭示了它们与经典方法的联系。

更新时间: 2024-05-26 16:53:25

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2309.10140v3

Combining Constrained Diffusion Models and Numerical Solvers for Efficient and Robust Non-Convex Trajectory Optimization

Motivated by the need to solve open-loop optimal control problems with computational efficiency and reliable constraint satisfaction, we introduce a general framework that combines diffusion models and numerical optimization solvers. Optimal control problems are rarely solvable in closed form, hence they are often transcribed into numerical trajectory optimization problems, which then require initial guesses. These initial guesses are supplied in our framework by diffusion models. To mitigate the effect of samples that violate the problem constraints, we develop a novel constrained diffusion model to approximate the true distribution of locally optimal solutions with an additional constraint violation loss in training. To further enhance the robustness, the diffusion samples as initial guesses are fed to the numerical solver to refine and derive final optimal (and hence feasible) solutions. Experimental evaluations on three tasks verify the improved constraint satisfaction and computational efficiency with 4$\times$ to 30$\times$ acceleration using our proposed framework, which generalizes across trajectory optimization problems and scales well with problem complexity.

Updated: 2024-05-26 16:52:21

标题: 将受限扩散模型和数值求解器结合起来，实现高效且稳健的非凸轨迹优化

摘要: 受到解决开环最优控制问题的需求，以及需要在计算效率和可靠的约束满足性方面进行改进的动机，我们引入了一个将扩散模型和数值优化求解器结合起来的通用框架。最优控制问题很少能以封闭形式解决，因此它们通常被转化为数值轨迹优化问题，这些问题需要初始猜测。在我们的框架中，这些初始猜测由扩散模型提供。为了减轻违反问题约束的样本的影响，我们开发了一种新颖的受约束扩散模型，通过在训练中引入额外的约束违反损失来近似局部最优解的真实分布。为了进一步增强鲁棒性，扩散样本作为初始猜测被输入数值求解器中进行优化，从而得到最终的最优（因此可行）解。在三个任务上的实验评估验证了使用我们提出的框架可以改善约束满足性和计算效率，加速4倍到30倍，该框架适用于轨迹优化问题，并且随着问题复杂度的增加而扩展。

更新时间: 2024-05-26 16:52:21

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2403.05571v3

Improved Forward-Forward Contrastive Learning

The backpropagation algorithm, or backprop, is a widely utilized optimization technique in deep learning. While there's growing evidence suggesting that models trained with backprop can accurately explain neuronal data, no backprop-like method has yet been discovered in the biological brain for learning. Moreover, employing a naive implementation of backprop in the brain has several drawbacks. In 2022, Geoffrey Hinton proposed a biologically plausible learning method known as the Forward-Forward (FF) algorithm. Shortly after this paper, a modified version called FFCL was introduced. However, FFCL had limitations, notably being a three-stage learning system where the final stage still relied on regular backpropagation. In our approach, we address these drawbacks by eliminating the last two stages of FFCL and completely removing regular backpropagation. Instead, we rely solely on local updates, offering a more biologically plausible alternative.

Updated: 2024-05-26 16:40:11

标题: 改进前向-前向对比学习

摘要: 反向传播算法，或称反向传播，是深度学习中广泛使用的优化技术。尽管越来越多的证据表明，使用反向传播训练的模型可以准确解释神经元数据，但在生物大脑中尚未发现类似于反向传播的学习方法。此外，在大脑中采用反向传播的天真实现存在一些缺点。在2022年，Geoffrey Hinton提出了一种被称为前向前向（FF）算法的生物学合理的学习方法。在这篇论文之后不久，引入了一个改进版本称为FFCL。然而，FFCL有局限性，特别是作为一个三阶段学习系统，最终阶段仍依赖于常规的反向传播。在我们的方法中，我们通过消除FFCL的最后两个阶段并完全消除常规反向传播来解决这些缺点。相反，我们仅依靠局部更新，提供一个更具生物学可行性的替代方案。

更新时间: 2024-05-26 16:40:11

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.03432v3

Graph neural networks with configuration cross-attention for tensor compilers

With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.

Updated: 2024-05-26 16:39:19

标题: 使用配置交叉注意力的图神经网络用于张量编译器

摘要: 随着神经网络的近期流行，需要高效地进行推断工作负载服务。神经网络推断工作负载可以表示为一个计算图，其中节点是将多维张量进行转换的操作符。这些张量可以以组合方式进行转置和/或平铺，其中一些配置可以加速推断。我们提出了TGraph，一种神经图架构，允许筛选目标计算图的快速配置，从而代表一种人工智能（AI）张量编译器，与传统的基于启发式的编译器形成对比。所提出的解决方案改善了TpuGraph布局集合中均值Kendall's $\tau$从可靠基线的29.8%到TGraph的67.4%。我们估计与我们的工作相关的潜在CO$_2$排放减少量相当于托管人工智能数据中心的地区总家庭排放量的50%以上。

更新时间: 2024-05-26 16:39:19

领域: cs.LG,cs.AR,cs.PF

下载: http://arxiv.org/abs/2405.16623v1

FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors' explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence. The benchmark dataset is available at https://github.com/zjunlp/FactCHD.

Updated: 2024-05-26 16:37:01

标题: FactCHD：事实冲突幻觉检测的基准比较

摘要: 尽管LLM具有令人印象深刻的生成能力，但在现实世界的应用中受到事实冲突幻觉的阻碍。尤其是在复杂的推理场景中，对LLM生成的文本中幻觉的准确识别是一个相对未被探索的领域。为了填补这一空白，我们提出了FactCHD，一个专门设计用于检测LLM生成的文本中事实冲突幻觉的基准。FactCHD包含一个涵盖各种事实模式的多样化数据集，包括基本、多跳、比较和集合操作。FactCHD的一个独特元素是其集成了基于事实的证据链，显著增强了对检测器解释深度的评估。对不同LLM的实验揭示了当前方法在准确检测事实错误方面的缺陷。此外，我们引入了Truth-Triangulator，通过工具增强的ChatGPT和基于Llama2的LoRA调优综合反思，旨在通过预测结果和证据的融合产生更可信的检测结果。基准数据集可在https://github.com/zjunlp/FactCHD找到。

更新时间: 2024-05-26 16:37:01

领域: cs.CL,cs.AI,cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2310.12086v3

Continual Multimodal Knowledge Graph Construction

Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations, often succumbing to catastrophic forgetting-loss of previously acquired knowledge. This study introduces benchmarks aimed at fostering the development of the continual MKGC domain. We further introduce MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing. MSPT harmonizes the retention of learned knowledge (stability) and the integration of new data (plasticity), outperforming current continual learning and multimodal methods. Our results confirm MSPT's superior performance in evolving knowledge environments, showcasing its capacity to navigate balance between stability and plasticity.

Updated: 2024-05-26 16:29:05

标题: 持续多模态知识图谱构建

摘要: 目前的多模态知识图构建（MKGC）模型在不断涌现的实体和关系的现实世界动态性方面存在困难，通常会遭受灾难性遗忘-丢失先前获得的知识。本研究引入旨在促进持续MKGC领域发展的基准。我们进一步介绍了MSPT框架，旨在克服现有MKGC方法在多媒体数据处理过程中的缺点。MSPT协调了学习知识的保留（稳定性）和新数据的整合（可塑性），优于当前的持续学习和多模态方法。我们的结果证实了MSPT在不断发展的知识环境中的卓越表现，展示了其在稳定性和可塑性之间寻找平衡的能力。

更新时间: 2024-05-26 16:29:05

领域: cs.CL,cs.AI,cs.DB,cs.LG,cs.MM

下载: http://arxiv.org/abs/2305.08698v3

Bringing UFUs Back into the Air With FUEL: A Framework for Evaluating the Effectiveness of Unrestricted File Upload Vulnerability Scanners

Unrestricted file upload (UFU) is a class of web security vulnerabilities that can have a severe impact on web applications if uploaded files are not sufficiently validated or securely handled. A review of related work shows an increased interest in finding new methods to discover such vulnerabilities. However, each publication evaluates its new vulnerability scanner against a different set of artificial or real-world applications available at the time of writing. Thus, we identify the need for a comprehensive testing framework to allow a reproducible comparison between existing and future UFU vulnerability scanners. Our contributions include the File Upload Exploitation Lab (FUEL), which models 15 distinct UFU vulnerabilities in isolated scenarios to enable a reproducible evaluation of UFU scanners' capabilities. The results of evaluating four black-box UFU scanners against FUEL show that no scanner manages to identify all UFU vulnerabilities, leaving real-world websites at risk of compromise due to false negatives. Our work aims to solve this problem by extending an existing UFU scanner with multiple new detection and exploitation techniques, which we call Fuxploider-NG, to increase its accuracy from ~50% to over 90%, thereby surpassing the capabilities of existing UFU scanners and showcasing the importance of FUEL as a UFU vulnerability evaluation framework. To foster open science and future work in this area, we open-source FUEL and Fuxploider-NG.

Updated: 2024-05-26 16:27:39

标题: 利用FUEL框架评估无限制文件上传漏洞扫描器的有效性：将UFUs重新带回空中

摘要: 不受限制的文件上传（UFU）是一类网络安全漏洞，如果上传的文件没有得到充分验证或安全处理，可能会对网络应用程序产生严重影响。相关工作的综述显示，人们越来越关注发现此类漏洞的新方法。然而，每篇论文评估其新的漏洞扫描器时，都会针对不同的人工或真实世界应用程序进行评估。因此，我们确定了需要一个全面的测试框架，以便在现有和未来的UFU漏洞扫描器之间进行可重复比较。我们的贡献包括文件上传利用实验室（FUEL），它模拟了15种不同的UFU漏洞，并使其处于孤立的情境中，以便对UFU扫描器的能力进行可重复评估。对四个黑盒UFU扫描器在FUEL上的评估结果显示，没有一个扫描器能够识别所有的UFU漏洞，这使得真实世界的网站面临着由于误报而遭受危害的风险。我们的工作旨在通过对现有的UFU扫描器进行扩展，引入多种新的检测和利用技术，我们称之为Fuxploider-NG，将其准确率从约50％提高到90％以上，从而超越现有UFU扫描器的能力，并展示了FUEL作为UFU漏洞评估框架的重要性。为了促进该领域的开放科学和未来工作，我们开源了FUEL和Fuxploider-NG。

更新时间: 2024-05-26 16:27:39

领域: cs.CR

下载: http://arxiv.org/abs/2405.16619v1

DPHGNN: A Dual Perspective Hypergraph Neural Networks

Message passing on hypergraphs has been a standard framework for learning higher-order correlations between hypernodes. Recently-proposed hypergraph neural networks (HGNNs) can be categorized into spatial and spectral methods based on their design choices. In this work, we analyze the impact of change in hypergraph topology on the suboptimal performance of HGNNs and propose DPHGNN, a novel dual-perspective HGNN that introduces equivariant operator learning to capture lower-order semantics by inducing topology-aware spatial and spectral inductive biases. DPHGNN employs a unified framework to dynamically fuse lower-order explicit feature representations from the underlying graph into the super-imposed hypergraph structure. We benchmark DPHGNN over eight benchmark hypergraph datasets for the semi-supervised hypernode classification task and obtain superior performance compared to seven state-of-the-art baselines. We also provide a theoretical framework and a synthetic hypergraph isomorphism test to express the power of spatial HGNNs and quantify the expressivity of DPHGNN beyond the Generalized Weisfeiler Leman (1-GWL) test. Finally, DPHGNN was deployed by our partner e-commerce company for the Return-to-Origin (RTO) prediction task, which shows ~7% higher macro F1-Score than the best baseline.

Updated: 2024-05-26 16:08:55

标题: DPHGNN：双重视角超图神经网络

摘要: 超图上的消息传递已经成为学习超节点之间高阶相关性的标准框架。最近提出的超图神经网络（HGNNs）可以根据其设计选择分为空间和频谱方法。在这项工作中，我们分析了超图拓扑变化对HGNNs次优性能的影响，并提出了DPHGNN，一种新颖的双视角HGNN，引入等变算子学习来捕捉低阶语义，通过引入拓扑感知的空间和频谱归纳偏见。DPHGNN采用统一框架，动态地将底层图中的低阶显式特征表示融合到叠加的超图结构中。我们在八个基准超图数据集上对DPHGNN进行基准测试，用于半监督超节点分类任务，与七种最先进的基线相比，获得了优越的性能。我们还提供了一个理论框架和一个合成的超图同构测试，以表达空间HGNN的能力，并量化DPHGNN超越广义Weisfeiler Leman（1-GWL）测试的表达能力。最后，DPHGNN被我们的合作伙伴电子商务公司用于归原（RTO）预测任务，显示出比最佳基线高约7%的宏F1分数。

更新时间: 2024-05-26 16:08:55

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2405.16616v1

Tradeoffs Between Alignment and Helpfulness in Language Models with Representation Engineering

Language model alignment has become an important component of AI safety, allowing safe interactions between humans and language models, by enhancing desired behaviors and inhibiting undesired ones. It is often done by tuning the model or inserting preset aligning prompts. Recently, representation engineering, a method which alters the model's behavior via changing its representations post-training, was shown to be effective in aligning LLMs (Zou et al., 2023a). Representation engineering yields gains in alignment oriented tasks such as resistance to adversarial attacks and reduction of social biases, but was also shown to cause a decrease in the ability of the model to perform basic tasks. In this paper we study the tradeoff between the increase in alignment and decrease in helpfulness of the model. We propose a theoretical framework which provides bounds for these two quantities, and demonstrate their relevance empirically. First, we find that under the conditions of our framework, alignment can be guaranteed with representation engineering, and at the same time that helpfulness is harmed in the process. Second, we show that helpfulness is harmed quadratically with the norm of the representation engineering vector, while the alignment increases linearly with it, indicating a regime in which it is efficient to use representation engineering. We validate our findings empirically, and chart the boundaries to the usefulness of representation engineering for alignment.

Updated: 2024-05-26 16:07:55

标题: 语言模型中对齐和帮助之间的权衡：具有表示工程的模型

摘要: 语言模型对齐已成为人工智能安全的重要组成部分，通过增强期望行为和抑制不良行为，实现人类与语言模型之间的安全互动。通常通过调整模型或插入预设对齐提示来实现。最近，一种称为表示工程的方法被证明在对齐LLMs（Zou等，2023a）方面非常有效，该方法通过改变模型的表示在训练后改变模型的行为。表示工程在抵抗对抗性攻击和减少社会偏见等对齐导向任务中取得了收益，但也显示出降低模型执行基本任务能力的问题。本文研究了对齐增加和模型帮助性降低之间的权衡。我们提出了一个理论框架，为这两个量提供了界限，并在实证上展示了它们的相关性。首先，我们发现在我们的框架条件下，表示工程可以保证对齐，同时在过程中有害模型的帮助性。其次，我们展示了帮助性随表示工程向量的范数的平方而受损，而对齐与之成正比增加，表明存在一种有效使用表示工程的情境。我们通过实证验证了我们的发现，并划定了表示工程对于对齐的有用性的边界。

更新时间: 2024-05-26 16:07:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.16332v3

RA: A machine based rational agent, Part 2, Preliminary test

A preliminary test of the software package RA is presented. The main focus of this test is to assess RA`s reasoning capabilities that are based on the formal system PECR. Particular attention is given to the finite computational resources of the real-world machine that define the environment within which programs are to be executed.

Updated: 2024-05-26 16:02:19

标题: RA：基于机器的理性代理商，第二部分，初步测试

摘要: 这篇论文介绍了软件包RA的初步测试。该测试的主要重点是评估基于正式系统PECR的RA的推理能力。特别关注的是定义程序执行环境的现实世界机器的有限计算资源。

更新时间: 2024-05-26 16:02:19

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2405.16613v1

IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus

Large Language Models (LLMs) demonstrate remarkable potential across various domains; however, they exhibit a significant performance gap in Information Extraction (IE). Note that high-quality instruction data is the vital key for enhancing the specific capabilities of LLMs, while current IE datasets tend to be small in scale, fragmented, and lack standardized schema. To this end, we introduce IEPile, a comprehensive bilingual (English and Chinese) IE instruction corpus, which contains approximately 0.32B tokens. We construct IEPile by collecting and cleaning 33 existing IE datasets, and introduce schema-based instruction generation to unearth a large-scale corpus. Experimentally, IEPile enhance the performance of LLMs for IE, with notable improvements in zero-shot generalization. We open-source the resource and pre-trained models, hoping to provide valuable support to the NLP community.

Updated: 2024-05-26 15:54:41

标题: IEPile：挖掘大规模基于模式的信息提取语料库

摘要: 大型语言模型（LLMs）在各个领域展现出了显著的潜力；然而，在信息抽取（IE）领域它们表现出了显著的性能差距。值得注意的是，高质量的指导数据是提升LLMs特定能力的关键，而当前的IE数据集往往规模较小、碎片化，并且缺乏标准化的模式。因此，我们引入了IEPile，一个包含大约0.32B标记的全面的双语（英文和中文）IE指导语料库。我们通过收集和清洁33个现有的IE数据集来构建IEPile，并引入基于模式的指导生成方法来挖掘大规模语料库。实验结果显示，IEPile提升了LLMs在IE方面的性能，尤其在零样本泛化方面有显著改进。我们开源了资源和预训练模型，希望为自然语言处理社区提供有价值的支持。

更新时间: 2024-05-26 15:54:41

领域: cs.CL,cs.AI,cs.DB,cs.IR,cs.LG

下载: http://arxiv.org/abs/2402.14710v3

The Art of Deception: Robust Backdoor Attack using Dynamic Stacking of Triggers

The area of Machine Learning as a Service (MLaaS) is experiencing increased implementation due to recent advancements in the AI (Artificial Intelligence) industry. However, this spike has prompted concerns regarding AI defense mechanisms, specifically regarding potential covert attacks from third-party providers that cannot be entirely trusted. Recent research has uncovered that auditory backdoors may use certain modifications as their initiating mechanism. DynamicTrigger is introduced as a methodology for carrying out dynamic backdoor attacks that use cleverly designed tweaks to ensure that corrupted samples are indistinguishable from clean. By utilizing fluctuating signal sampling rates and masking speaker identities through dynamic sound triggers (such as the clapping of hands), it is possible to deceive speech recognition systems (ASR). Our empirical testing demonstrates that DynamicTrigger is both potent and stealthy, achieving impressive success rates during covert attacks while maintaining exceptional accuracy with non-poisoned datasets.

Updated: 2024-05-26 15:53:32

标题: 欺骗的艺术：利用动态堆叠触发器的强大后门攻击

摘要: 机器学习即服务（MLaaS）领域正在经历增长，这归因于人工智能（AI）行业最近的进步。然而，这种增长引发了有关AI防御机制的担忧，特别是关于来自无法完全信任的第三方供应商的潜在秘密攻击。最近的研究发现，听觉后门可能使用某些修改作为其启动机制。DynamicTrigger被引入作为一种用于进行动态后门攻击的方法，利用巧妙设计的调整来确保被污染的样本与干净的样本无法区分。通过利用波动的信号采样率和通过动态声音触发器（例如拍手）掩盖说话者身份，可以欺骗语音识别系统（ASR）。我们的实证测试表明，DynamicTrigger既强大又潜在，能够在秘密攻击中取得令人印象深刻的成功率，同时在非毒害数据集中保持出色的准确性。

更新时间: 2024-05-26 15:53:32

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.01537v2

The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol

Neural Architecture Search (NAS) has been widely adopted to design neural networks for various computer vision tasks. One of its most promising subdomains is differentiable NAS (DNAS), where the optimal architecture is found in a differentiable manner. However, gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture. In our work, we first study the risk of discretization error and show how it affects an unregularized supernet. Then, we present that penalizing high entropy, a common technique of architecture regularization, can hinder the supernet's performance. Therefore, to robustify the DNAS framework, we introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture. Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset and attains performance 1.1% higher than the optimal network of DCNAS on the non-dense search space comprising short connections. The entire training process takes only 5.5 GPU days due to the weight reuse, and yields a computationally efficient architecture. Additionally, we propose a new dataset split procedure, which substantially improves results and prevents architecture degeneration in DARTS.

Updated: 2024-05-26 15:44:53

标题: 离散化差异中的魔鬼：利用单阶段搜索协议增强可微分NAS

摘要: 神经架构搜索(NAS)被广泛用于设计各种计算机视觉任务的神经网络。其中一个最有前景的子领域是可微分NAS(DNAS)，其中以可微分的方式找到最佳架构。然而，基于梯度的方法受到离散化误差的影响，这可能严重影响最终架构的获取过程。在我们的工作中，我们首先研究了离散化误差的风险，并展示了它对未正规化的超网络的影响。然后，我们提出了对高熵进行惩罚，这是一种常见的架构正则化技术，可能会阻碍超网络的性能。因此，为了加强DNAS框架的稳健性，我们引入了一种新颖的单阶段搜索协议，不依赖于解码连续的架构。我们的结果表明，这种方法在Cityscapes验证数据集的搜索阶段实现了75.3%的表现，并且在由短连接组成的非密集搜索空间上的性能比DCNAS的最佳网络高出1.1%。整个训练过程仅需5.5个GPU天，由于重复利用权重，产生了计算效率高的架构。此外，我们提出了一种新的数据集拆分程序，显著改善了结果，并防止了DARTS中架构的退化。

更新时间: 2024-05-26 15:44:53

领域: cs.CV,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.16610v1

Efficient Probabilistic Modeling of Crystallization at Mesoscopic Scale

Crystallization processes at the mesoscopic scale, where faceted, dendritic growth, and multigrain formation can be observed, are of particular interest within materials science and metallurgy. These processes are highly nonlinear, stochastic, and sensitive to small perturbations of system parameters and initial conditions. Methods for the simulation of these processes have been developed using discrete numerical models, but these are computationally expensive. This work aims to scale crystal growth simulation with a machine learning emulator. Specifically, autoregressive latent variable models are well suited for modeling the joint distribution over system parameters and the crystallization trajectories. However, successfully training such models is challenging due to the stochasticity and sensitivity of the system. Existing approaches consequently fail to produce diverse and faithful crystallization trajectories. In this paper, we introduce the Crystal Growth Neural Emulator (CGNE), a probabilistic model for efficient crystal growth emulation at the mesoscopic scale that overcomes these challenges. We validate CGNE results using the morphological properties of the crystals produced by numerical simulation. CGNE delivers a factor of 11 improvement in inference time and performance gains compared with recent state-of-the-art probabilistic models for dynamical systems.

Updated: 2024-05-26 15:37:19

标题: 高效的几率建模方法在中尺度结晶中的应用

摘要: 在介观尺度上的晶化过程，其中可以观察到有面向、树枝状生长和多颗粒形成，对材料科学和冶金学具有特殊的兴趣。这些过程是高度非线性的，随机的，并且对系统参数和初始条件的微小扰动非常敏感。已经开发了使用离散数值模型模拟这些过程的方法，但这些方法在计算方面非常昂贵。本研究旨在使用机器学习模拟晶体生长过程。具体来说，自回归潜变量模型非常适合对系统参数和晶化轨迹的联合分布进行建模。然而，由于系统的随机性和敏感性，成功训练这些模型是具有挑战性的。现有方法因此未能产生多样化和忠实的晶化轨迹。在本文中，我们介绍了Crystal Growth Neural Emulator（CGNE），这是一种概率模型，可有效模拟介观尺度上的晶体生长，并克服了这些挑战。我们使用数值模拟产生的晶体的形态特性验证了CGNE的结果。与最近的动态系统的最新概率模型相比，CGNE在推断时间和性能上提供了11倍的改进。

更新时间: 2024-05-26 15:37:19

领域: cs.LG,cond-mat.mes-hall,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2405.16608v1

Design Editing for Offline Model-based Optimization

Offline model-based optimization (MBO) aims to maximize a black-box objective function using only an offline dataset of designs and scores. A prevalent approach involves training a conditional generative model on existing designs and their associated scores, followed by the generation of new designs conditioned on higher target scores. However, these newly generated designs often underperform due to the lack of high-scoring training data. To address this challenge, we introduce a novel method, Design Editing for Offline Model-based Optimization (DEMO), which consists of two phases. In the first phase, termed pseudo-target distribution generation, we apply gradient ascent on the offline dataset using a trained surrogate model, producing a synthetic dataset where the predicted scores serve as new labels. A conditional diffusion model is subsequently trained on this synthetic dataset to capture a pseudo-target distribution, which enhances the accuracy of the conditional diffusion model in generating higher-scoring designs. Nevertheless, the pseudo-target distribution is susceptible to noise stemming from inaccuracies in the surrogate model, consequently predisposing the conditional diffusion model to generate suboptimal designs. We hence propose the second phase, existing design editing, to directly incorporate the high-scoring features from the offline dataset into design generation. In this phase, top designs from the offline dataset are edited by introducing noise, which are subsequently refined using the conditional diffusion model to produce high-scoring designs. Overall, high-scoring designs begin with inheriting high-scoring features from the second phase and are further refined with a more accurate conditional diffusion model in the first phase. Empirical evaluations on 7 offline MBO tasks show that DEMO outperforms various baseline methods.

Updated: 2024-05-26 15:32:47

标题: 离线基于模型的优化的设计编辑

摘要: 离线基于模型的优化（MBO）旨在最大化一个黑盒目标函数，仅使用离线设计和评分数据集。一种普遍的方法涉及在现有设计和相关评分上训练条件生成模型，然后生成新设计，条件是更高的目标评分。然而，由于缺乏高评分训练数据，这些新生成的设计通常表现不佳。为了解决这一挑战，我们引入了一种新颖的方法，离线基于模型的优化设计编辑（DEMO），包括两个阶段。在第一阶段，称为伪目标分布生成，我们利用经过训练的替代模型在离线数据集上进行梯度上升，生成一个合成数据集，其中预测的评分作为新标签。随后，在这个合成数据集上训练条件扩散模型，以捕获伪目标分布，增强条件扩散模型生成更高评分设计的准确性。然而，伪目标分布容易受到来自替代模型不准确性的噪声的影响，因此使条件扩散模型倾向于生成次优设计。因此，我们提出第二阶段，现有设计编辑，直接将离线数据集中高评分特征合并到设计生成中。在这一阶段，从离线数据集中选择顶级设计，并引入噪声进行编辑，随后使用条件扩散模型进行精细化，产生高评分设计。总的来说，高评分设计首先从第二阶段继承高评分特征，并在第一阶段使用更准确的条件扩散模型进一步精化。对于7个离线MBO任务的实证评估表明，DEMO优于各种基线方法。

更新时间: 2024-05-26 15:32:47

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2405.13964v2

AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning

Language agents have achieved considerable performance on various complex question-answering tasks by planning with external tools. Despite the incessant exploration in this field, existing language agent systems still struggle with costly, non-reproducible data reliance and face the challenge of compelling a single model for multiple functions. To this end, we introduce AutoAct, an automatic agent learning framework for QA that does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models (e.g., GPT-4). Given limited data with a tool library, AutoAct first automatically synthesizes planning trajectories without any assistance from humans or strong closed-source models. Then, AutoAct leverages a division-of-labor strategy to automatically differentiate based on the target task information and synthesized trajectories, producing a sub-agent group to complete the task. We conduct comprehensive experiments with different LLMs, which demonstrates that AutoAct yields better or parallel performance compared to various strong baselines. Further analysis demonstrates the effectiveness of the division-of-labor strategy, with the trajectory quality generated by AutoAct generally outperforming that of others. Code will be available at https://github.com/zjunlp/AutoAct.

Updated: 2024-05-26 15:31:24

标题: AutoAct：通过自主规划从零开始自动学习问答机器人

摘要: 语言代理在各种复杂的问答任务中通过与外部工具规划取得了相当可观的表现。尽管在这一领域不断探索，但现有的语言代理系统仍然面临着昂贵、非可复制的数据依赖和将单一模型用于多个功能的挑战。为此，我们介绍了AutoAct，这是一个针对问答任务的自动代理学习框架，不依赖于大规模标注数据和来自封闭源模型（例如GPT-4）的合成规划轨迹。在具有工具库的有限数据情况下，AutoAct首先自动合成规划轨迹，无需人类或强大的封闭源模型的帮助。然后，AutoAct利用一种分工策略根据目标任务信息和合成轨迹自动区分，形成一个子代理组来完成任务。我们进行了与不同LLM的广泛实验，结果表明AutoAct相比各种强基准模型表现更好或者相当。进一步的分析表明，分工策略的有效性，AutoAct生成的轨迹质量通常优于其他模型。代码将在https://github.com/zjunlp/AutoAct 上提供。

更新时间: 2024-05-26 15:31:24

领域: cs.CL,cs.AI,cs.HC,cs.LG,cs.MA

下载: http://arxiv.org/abs/2401.05268v4

Intelligence as Computation

This paper proposes a specific conceptualization of intelligence as computation. This conceptualization is intended to provide a unified view for all disciplines of intelligence research. Already, it unifies several conceptualizations currently under investigation, including physical, neural, embodied, morphological, and mechanical intelligences. To achieve this, the proposed conceptualization explains the differences among existing views by different computational paradigms, such as digital, analog, mechanical, or morphological computation. Viewing intelligence as a composition of computations from different paradigms, the challenges posed by previous conceptualizations are resolved. Intelligence is hypothesized as a multi-paradigmatic computation relying on specific computational principles. These principles distinguish intelligence from other, non-intelligent computations. The proposed conceptualization implies a multi-disciplinary research agenda that is intended to lead to unified science of intelligence.

Updated: 2024-05-26 15:30:34

标题: 智力作为计算

摘要: 本文提出了一个将智能概念化为计算的具体概念。这种概念化旨在为所有智能研究领域提供统一的视角。目前，它已经统一了几种正在研究中的概念化，包括物理、神经、具身、形态和机械智能。为了实现这一目标，所提出的概念化通过不同的计算范式（如数字、模拟、机械或形态计算）解释了现有观点之间的差异。将智能视为不同范式计算的组合，解决了先前概念化带来的挑战。智能被假设为依赖于特定计算原则的多范式计算。这些原则将智能与其他非智能计算区分开来。所提出的概念化暗示了一个多学科研究议程，旨在推动统一的智能科学。

更新时间: 2024-05-26 15:30:34

领域: cs.AI,cs.RO,68T01,I.2.0

下载: http://arxiv.org/abs/2405.16604v1

Modelling the Impact of Quantum Circuit Imperfections on Networks and Computer Applications

Post Quantum and Quantum Cryptography schemes are feasible quantum computer applications for 7G networks. These schemes could possibly replace existing schemes. These algorithms have been compromised by advances in quantum search algorithms run on quantum computers like Shor algorithm. Shor algorithm is a quantum algorithm for finding the prime factors of an integer which is the basis of existing algorithm. This has become an available quantum computer application putting the use of ESA algorithm at risk. Our recent paper provides a detailed survey of the work on post quantum and quantum cryptography algorithms with focus on their applicability in 7G networks. Since the paper focuses on the cryptography algorithms as a follow up, in this paper, we provide a new framework for quantum network optimization and survey in detail the work on enabling technologies (quantum hardware) for the practical implementation of these algorithms including the most important segments of quantum hardware in 7G. As always in engineering practice practical solutions are a compromise between the performance and complexity of the implementation. For this reason, as the main contribution, the paper presents a network and computer applications optimization framework that includes implementation imperfections. The tools should be useful in optimizing future generation practical computer system design. After that a comprehensive survey of the existing work on quantum hardware is presented pointing out the sources of these imperfections. This enables us to make a fair assessment of how much investment into quantum hardware improvements contributes to the performance enhancement of the overall system. In this way a decision can be made on proper partitioning between the investment in hardware and system level complexity.

Updated: 2024-05-26 15:29:02

标题: 建模量子电路缺陷对网络和计算应用的影响

摘要: Post Quantum和Quantum密码方案是7G网络可行的量子计算机应用。这些方案可能取代现有方案。这些算法已经被在量子计算机上运行的量子搜索算法（如Shor算法）的进步所破坏。Shor算法是一种用于找到整数的质因数的量子算法，这是现有算法的基础。这已成为一个可用的量子计算机应用，使ESA算法的使用面临风险。我们最近的论文提供了对后量子和量子密码算法的工作的详细调查，重点关注它们在7G网络中的适用性。由于本文侧重于密码算法作为后续，因此在本文中，我们提供了一个新的量子网络优化框架，并详细调查了实现这些算法的关键技术（量子硬件），包括7G中最重要的量子硬件部分。正如在工程实践中总是实现性能和实施复杂性之间的妥协。出于这个原因，作为主要贡献，本文提出了一个网络和计算机应用优化框架，其中包括实现缺陷。这些工具应该有助于优化未来一代实际计算机系统设计。之后，对现有量子硬件工作的全面调查呈现出这些缺陷的来源。这使我们能够公正评估投资于量子硬件改进对整个系统性能提升的贡献。通过这种方式，可以就硬件投资和系统级复杂性之间的适当分区做出决定。

更新时间: 2024-05-26 15:29:02

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2404.00062v3

A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework to establish the first provable guarantees in this important setting. We obtain task-averaged regret bounds for the reward maximization (optimality gap) and constraint violations using gradient-based meta-learning and show that the task-averaged optimality gap and constraint satisfaction improve with task-similarity in a static environment or task-relatedness in a dynamic environment. Several technical challenges arise when making this framework practical. To this end, we propose a meta-algorithm that performs inexact online learning on the upper bounds of within-task optimality gap and constraint violations estimated by off-policy stationary distribution corrections. Furthermore, we enable the learning rates to be adapted for every task and extend our approach to settings with a competing dynamically changing oracle. Finally, experiments are conducted to demonstrate the effectiveness of our approach.

Updated: 2024-05-26 15:28:42

标题: 一个用于元安全强化学习的在线CMDP框架

摘要: 元元强化学习广泛被用作学习-学习框架，用于解决具有有限经验的未知任务。然而，现有研究尚未充分解决违反约束的问题，从而限制了它们在现实世界中的应用。本文通过CMDP-within-online框架研究了元安全强化学习（Meta-SRL）问题，以建立在这一重要环境中的第一个可证明的保证。我们利用基于梯度的元学习获得了任务平均遗憾界，用于最大化奖励（最优性差距）和违反约束，展示了在静态环境中任务相似性或在动态环境中任务相关性提高时，任务平均最优性差距和约束满足改善的情况。在使这一框架实用化时出现了几个技术挑战。为此，我们提出了一种元算法，通过对离线策略稳态分布修正估计的任务内最优性差距和约束违反上界进行不精确在线学习。此外，我们使学习速率适应每个任务，并将我们的方法扩展到具有竞争性动态变化oracle的环境。最后，进行了实验以展示我们方法的有效性。

更新时间: 2024-05-26 15:28:42

领域: cs.LG

下载: http://arxiv.org/abs/2405.16601v1

EchoSpike Predictive Plasticity: An Online Local Learning Rule for Spiking Neural Networks

The drive to develop artificial neural networks that efficiently utilize resources has generated significant interest in bio-inspired Spiking Neural Networks (SNNs). These networks are particularly attractive due to their potential in applications requiring low power and memory. This potential is further enhanced by the ability to perform online local learning, enabling them to adapt to dynamic environments. This requires the model to be adaptive in a self-supervised manner. While self-supervised learning has seen great success in many deep learning domains, its application for online local learning in multi-layer SNNs remains underexplored. In this paper, we introduce the "EchoSpike Predictive Plasticity" (ESPP) learning rule, a pioneering online local learning rule designed to leverage hierarchical temporal dynamics in SNNs through predictive and contrastive coding. We validate the effectiveness of this approach using benchmark datasets, demonstrating that it performs on par with current state-of-the-art supervised learning rules. The temporal and spatial locality of ESPP makes it particularly well-suited for low-cost neuromorphic processors, representing a significant advancement in developing biologically plausible self-supervised learning models for neuromorphic computing at the edge.

Updated: 2024-05-26 15:20:51

标题: 回声刺激预测性可塑性：脉冲神经网络的在线局部学习规则

摘要: 为了开发高效利用资源的人工神经网络，人们对受生物启发的尖峰神经网络（SNNs）产生了极大兴趣。这些网络特别吸引人的原因在于它们在需要低功耗和内存的应用中的潜力。通过在线本地学习的能力进一步增强了这一潜力，使它们能够适应动态环境。这要求模型以一种自监督的方式具有适应性。虽然自监督学习在许多深度学习领域取得了巨大成功，但在多层SNNs中进行在线本地学习的应用仍未得到充分探索。在本文中，我们介绍了"EchoSpike预测可塑性"（ESPP）学习规则，这是一种旨在通过预测性和对比编码利用SNNs中的分层时间动态的开创性在线本地学习规则。我们使用基准数据集验证了这种方法的有效性，表明它的性能与当前最先进的监督学习规则相当。ESPP的时间和空间局部性使其特别适用于成本低廉的神经形态处理器，代表了在边缘计算中开发生物学上合理的自监督学习模型的重大进展。

更新时间: 2024-05-26 15:20:51

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2405.13976v2

Regularized Projection Matrix Approximation with Applications to Community Detection

This paper introduces a regularized projection matrix approximation framework aimed at recovering cluster information from the affinity matrix. The model is formulated as a projection approximation problem incorporating an entrywise penalty function. We explore three distinct penalty functions addressing bounded, positive, and sparse scenarios, respectively, and derive the Alternating Direction Method of Multipliers (ADMM) algorithm to solve the problem. Then, we provide a theoretical analysis establishing the convergence properties of the proposed algorithm. Extensive numerical experiments on both synthetic and real-world datasets demonstrate that our regularized projection matrix approximation approach significantly outperforms state-of-the-art methods in terms of clustering performance.

Updated: 2024-05-26 15:18:22

标题: 正则化投影矩阵逼近及其在社区检测中的应用

摘要: 这篇论文介绍了一个正则化投影矩阵逼近框架，旨在从亲和矩阵中恢复集群信息。该模型被构建为一个包含逐项惩罚函数的投影逼近问题。我们探讨了三种不同的惩罚函数，分别解决了有界、正数和稀疏情况，并推导出交替方向乘法器（ADMM）算法来解决问题。然后，我们提供了一个理论分析，建立了所提算法的收敛性质。对合成和真实数据集进行了大量数值实验，结果表明我们的正则化投影矩阵逼近方法在聚类性能方面显著优于现有技术方法。

更新时间: 2024-05-26 15:18:22

领域: cs.LG

下载: http://arxiv.org/abs/2405.16598v1

An Evolutionary Framework for Connect-4 as Test-Bed for Comparison of Advanced Minimax, Q-Learning and MCTS

A major challenge in decision making domains with large state spaces is to effectively select actions which maximize utility. In recent years, approaches such as reinforcement learning (RL) and search algorithms have been successful to tackle this issue, despite their differences. RL defines a learning framework that an agent explores and interacts with. Search algorithms provide a formalism to search for a solution. However, it is often difficult to evaluate the performances of such approaches in a practical way. Motivated by this problem, we focus on one game domain, i.e., Connect-4, and develop a novel evolutionary framework to evaluate three classes of algorithms: RL, Minimax and Monte Carlo tree search (MCTS). The contribution of this paper is threefold: i) we implement advanced versions of these algorithms and provide a systematic comparison with their standard counterpart, ii) we develop a novel evaluation framework, which we call the Evolutionary Tournament, and iii) we conduct an extensive evaluation of the relative performance of each algorithm to compare our findings. We evaluate different metrics and show that MCTS achieves the best results in terms of win percentage, whereas Minimax and Q-Learning are ranked in second and third place, respectively, although the latter is shown to be the fastest to make a decision.

Updated: 2024-05-26 15:11:45

标题: 一个演化框架用于Connect-4作为比较先进的Minimax、Q-Learning和MCTS的测试基准

摘要: 在具有大状态空间的决策领域中，一个主要挑战是有效地选择最大化效用的行动。近年来，诸如强化学习（RL）和搜索算法等方法已成功解决这一问题，尽管它们之间存在差异。RL定义了一个代理探索和互动的学习框架。搜索算法提供了一种搜索解决方案的形式化方法。然而，往往很难以实际方式评估这些方法的表现。受到这一问题的启发，我们专注于一个游戏领域，即Connect-4，并开发了一个新颖的进化框架来评估三类算法：RL、Minimax和蒙特卡罗树搜索（MCTS）。本文的贡献有三个方面：i）我们实现了这些算法的高级版本，并与它们的标准对应物进行了系统比较，ii）我们开发了一个新颖的评估框架，我们称之为进化锦标赛，iii）我们对每种算法的相对性能进行了广泛评估，以比较我们的发现。我们评估了不同的指标，并显示MCTS在获胜百分比方面取得了最佳结果，而Minimax和Q-Learning分别排名第二和第三，尽管后者被证明是最快做出决策的。

更新时间: 2024-05-26 15:11:45

领域: cs.AI,cs.GT,cs.NE

下载: http://arxiv.org/abs/2405.16595v1

Training-Conditional Coverage Bounds under Covariate Shift

Training-conditional coverage guarantees in conformal prediction concern the concentration of the error distribution, conditional on the training data, below some nominal level. The conformal prediction methodology has recently been generalized to the covariate shift setting, namely, the covariate distribution changes between the training and test data. In this paper, we study the training-conditional coverage properties of a range of conformal prediction methods under covariate shift via a weighted version of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality tailored for distribution change. The result for the split conformal method is almost assumption-free, while the results for the full conformal and jackknife+ methods rely on strong assumptions including the uniform stability of the training algorithm.

Updated: 2024-05-26 15:07:16

标题: 在协变量转移情况下的训练条件覆盖界限

摘要: 在符合预测中，训练条件覆盖保证涉及错误分布的集中，条件是在训练数据下，低于某个名义水平。最近，符合预测方法已经推广到协变量转移设置中，即，在训练和测试数据之间协变量分布发生变化。在本文中，我们通过适用于分布变化的Dvoretzky-Kiefer-Wolfowitz（DKW）不等式的加权版本，研究在协变量转移下一系列符合预测方法的训练条件覆盖属性。分裂符合方法的结果几乎不受假设限制，而全符合和jackknife+方法的结果依赖于强假设，包括训练算法的统一稳定性。

更新时间: 2024-05-26 15:07:16

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.16594v1

DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy

Recent advances in large language models (LLMs) have revolutionized the landscape of reasoning tasks. To enhance the capabilities of LLMs to emulate human reasoning, prior studies have focused on modeling reasoning steps using various thought structures like chains, trees, or graphs. However, LLM-based reasoning still encounters the following challenges: (1) Limited adaptability of preset structures to diverse tasks; (2) Insufficient precision in exploiting known conditions to derive new ones; and (3) Inadequate consideration of historical reasoning experiences for subsequent reasoning steps. To this end, we propose DetermLR, a novel perspective that rethinks the reasoning process as an evolution from indeterminacy to determinacy. First, we categorize known conditions into two types: determinate and indeterminate premises This provides an oveall direction for the reasoning process and guides LLMs in converting indeterminate data into progressively determinate insights. Subsequently, we leverage quantitative measurements to prioritize more relevant premises to explore new insights. Furthermore, we automate the storage and extraction of available premises and reasoning paths with reasoning memory, preserving historical reasoning details for subsequent reasoning steps. Comprehensive experimental results demonstrate that DetermLR surpasses all baselines on various logical reasoning benchmarks: LogiQA, ProofWriter, FOLIO, PrOntoQA, and LogicalDeduction. Compared to previous multi-step reasoning methods, DetermLR achieves higher accuracy with fewer reasoning steps, highlighting its superior efficiency and effectiveness in solving logical reasoning tasks.

Updated: 2024-05-26 14:47:13

标题: DetermLR：从不确定性到确定性增强基于LLM的逻辑推理

摘要: 最近大型语言模型(LLMs)的进展已经彻底改变了推理任务的格局。为了增强LLMs模拟人类推理的能力，先前的研究集中于使用各种思维结构如链、树或图来建模推理步骤。然而，基于LLMs的推理仍然面临以下挑战：(1) 预设结构对不同任务的适应性有限；(2) 利用已知条件推导新条件的精度不足；以及 (3) 不充分考虑历史推理经验对后续推理步骤的影响。为此，我们提出了DetermLR，这是一种重新思考推理过程的新视角，将推理过程视为从不确定性到确定性的演变。首先，我们将已知条件分为两类：确定性和不确定性前提。这为推理过程提供了整体方向，并引导LLMs将不确定数据转化为逐渐确定的见解。随后，我们利用定量测量来优先考虑更相关的前提，以探索新的见解。此外，我们通过推理记忆自动化存储和提取可用前提和推理路径，保留历史推理细节以供后续推理步骤使用。全面的实验结果表明，DetermLR在各种逻辑推理基准测试中均超越了所有基线：LogiQA，ProofWriter，FOLIO，PrOntoQA和LogicalDeduction。与先前的多步推理方法相比，DetermLR在更少的推理步骤下实现了更高的准确性，突显了其在解决逻辑推理任务中优越的效率和效果。

更新时间: 2024-05-26 14:47:13

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2310.18659v2

Attaining Human`s Desirable Outcomes in Human-AI Interaction via Structural Causal Games

In human-AI interaction, a prominent goal is to attain human`s desirable outcome with the assistance of AI agents, which can be ideally delineated as a problem of seeking the optimal Nash Equilibrium that matches the human`s desirable outcome. However, reaching the outcome is usually challenging due to the existence of multiple Nash Equilibria that are related to the assisting task but do not correspond to the human`s desirable outcome. To tackle this issue, we employ a theoretical framework called structural causal game (SCG) to formalize the human-AI interactive process. Furthermore, we introduce a strategy referred to as pre-policy intervention on the SCG to steer AI agents towards attaining the human`s desirable outcome. In more detail, a pre-policy is learned as a generalized intervention to guide the agents` policy selection, under a transparent and interpretable procedure determined by the SCG. To make the framework practical, we propose a reinforcement learning-like algorithm to search out this pre-policy. The proposed algorithm is tested in both gridworld environments and realistic dialogue scenarios with large language models, demonstrating its adaptability in a broader class of problems and potential effectiveness in real-world situations.

Updated: 2024-05-26 14:42:49

标题: 通过结构因果博弈实现人类与人工智能交互中的人类期望结果

摘要: 在人类与人工智能的互动中，一个主要的目标是在人工智能代理的帮助下实现人类的期望结果，这可以理想地描述为寻找与人类期望结果匹配的最佳纳什均衡的问题。然而，由于存在与辅助任务相关但与人类期望结果不相符的多个纳什均衡，因此实现这一结果通常是具有挑战性的。为了解决这个问题，我们采用了一个称为结构因果博弈（SCG）的理论框架来形式化人类与人工智能的互动过程。此外，我们引入了一种被称为预策略干预的策略，以引导人工智能代理实现人类的期望结果。更详细地说，预策略作为一种广义干预被学习出来，以透明和可解释的程序在SCG确定的情况下指导代理的策略选择。为了使这个框架实用，我们提出了一种类似强化学习的算法来寻找这个预策略。所提出的算法在网格世界环境和具有大型语言模型的现实对话场景中进行了测试，展示了它在更广泛的问题类别和潜在的真实世界情境中的适应性和潜在有效性。

更新时间: 2024-05-26 14:42:49

领域: cs.AI,cs.GT,cs.HC

下载: http://arxiv.org/abs/2405.16588v1

Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

With the rapid advancement of large language models (LLMs), the diversity of multi-LLM tasks and the variability in their pricing structures have become increasingly important, as costs can vary greatly between different LLMs. To tackle these challenges, we introduce the \textit{C2MAB-V}, a \underline{C}ost-effective \underline{C}ombinatorial \underline{M}ulti-armed \underline{B}andit with \underline{V}ersatile reward models for optimal LLM selection and usage. This online model differs from traditional static approaches or those reliant on a single LLM without cost consideration. With multiple LLMs deployed on a scheduling cloud and a local server dedicated to handling user queries, \textit{C2MAB-V} facilitates the selection of multiple LLMs over a combinatorial search space, specifically tailored for various collaborative task types with different reward models. Based on our designed online feedback mechanism and confidence bound technique, \textit{C2MAB-V} can effectively address the multi-LLM selection challenge by managing the exploration-exploitation trade-off across different models, while also balancing cost and reward for diverse tasks. The NP-hard integer linear programming problem for selecting multiple LLMs with trade-off dilemmas is addressed by: i) decomposing the integer problem into a relaxed form by the local server, ii) utilizing a discretization rounding scheme that provides optimal LLM combinations by the scheduling cloud, and iii) continual online updates based on feedback. Theoretically, we prove that \textit{C2MAB-V} offers strict guarantees over versatile reward models, matching state-of-the-art results for regret and violations in some degenerate cases. Empirically, we show that \textit{C2MAB-V} effectively balances performance and cost-efficiency with nine LLMs for three application scenarios.

Updated: 2024-05-26 14:38:24

标题: 成本效益的在线多LLM选择与多功能奖励模型

摘要: 随着大型语言模型（LLMs）的快速发展，多种LLM任务的多样性和其定价结构的变化变得越来越重要，因为不同LLMs之间的成本可能差异巨大。为了解决这些挑战，我们引入了一种成本效益高的组合多臂赌博机模型C2MAB-V，用于选择和使用最佳LLM。这种在线模型不同于传统的静态方法或仅依赖于单个LLM且不考虑成本的方法。通过在调度云上部署多个LLMs，并在本地服务器上专门处理用户查询，C2MAB-V有助于在组合搜索空间中选择多个LLMs，专门针对不同奖励模型的各种协同任务类型定制。基于我们设计的在线反馈机制和置信界技术，C2MAB-V可以有效地通过在不同模型之间平衡勘探与利用的权衡，同时平衡各种任务的成本和奖励，解决多LLM选择挑战。通过：i）由本地服务器将整数问题分解为松弛形式，ii）利用离散化取整方案由调度云提供最佳LLM组合，以及iii）基于反馈的持续在线更新。从理论上讲，我们证明了C2MAB-V在多种奖励模型上提供严格的保证，在某些退化情况下与追悔和违规的最新结果相匹配。在实证方面，我们展示了C2MAB-V在三种应用场景中的九个LLMs上有效平衡了性能和成本效率。

更新时间: 2024-05-26 14:38:24

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.16587v1

Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity

Federated learning (FL) has emerged as a new paradigm for privacy-preserving collaborative training. Under domain skew, the current FL approaches are biased and face two fairness problems. 1) Parameter Update Conflict: data disparity among clients leads to varying parameter importance and inconsistent update directions. These two disparities cause important parameters to potentially be overwhelmed by unimportant ones of dominant updates. It consequently results in significant performance decreases for lower-performing clients. 2) Model Aggregation Bias: existing FL approaches introduce unfair weight allocation and neglect domain diversity. It leads to biased model convergence objective and distinct performance among domains. We discover a pronounced directional update consistency in Federated Learning and propose a novel framework to tackle above issues. First, leveraging the discovered characteristic, we selectively discard unimportant parameter updates to prevent updates from clients with lower performance overwhelmed by unimportant parameters, resulting in fairer generalization performance. Second, we propose a fair aggregation objective to prevent global model bias towards some domains, ensuring that the global model continuously aligns with an unbiased model. The proposed method is generic and can be combined with other existing FL methods to enhance fairness. Comprehensive experiments on Digits and Office-Caltech demonstrate the high fairness and performance of our method.

Updated: 2024-05-26 14:29:10

标题: 领域偏斜下公平的联邦学习：局部一致性和领域多样性

摘要: 联邦学习（FL）已经成为一种新的隐私保护协作训练范式。在领域偏差的情况下，当前的FL方法存在偏见并面临两个公平性问题。1）参数更新冲突：客户端之间的数据不平衡导致参数重要性和更新方向的不一致。这两种差异导致重要参数可能被一些主要更新的不重要参数淹没。结果导致表现较差的客户端的性能显著下降。2）模型聚合偏差：现有的FL方法引入不公平的权重分配并忽视领域的多样性。这导致了偏向性的模型收敛目标和不同领域之间的性能差异。我们在联邦学习中发现了明显的方向更新一致性，并提出了一个新的框架来解决上述问题。首先，利用所发现的特征，我们选择性地丢弃不重要的参数更新，以防止性能较低的客户端的更新被不重要的参数淹没，从而实现更公平的泛化性能。其次，我们提出了一个公平的聚合目标，以防止全局模型偏向某些领域，确保全局模型始终与一个无偏模型保持一致。所提出的方法是通用的，并可以与其他现有的FL方法结合以增强公平性。在Digits和Office-Caltech上进行的综合实验表明了我们方法的高公平性和性能。

更新时间: 2024-05-26 14:29:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16585v1

Subspace Node Pruning

A significant increase in the commercial use of deep neural network models increases the need for efficient AI. Node pruning is the art of removing computational units such as neurons, filters, attention heads, or even entire layers while keeping network performance at a maximum. This can significantly reduce the inference time of a deep network and thus enhance its efficiency. Few of the previous works have exploited the ability to recover performance by reorganizing network parameters while pruning. In this work, we propose to create a subspace from unit activations which enables node pruning while recovering maximum accuracy. We identify that for effective node pruning, a subspace can be created using a triangular transformation matrix, which we show to be equivalent to Gram-Schmidt orthogonalization, which automates this procedure. We further improve this method by reorganizing the network prior to subspace formation. Finally, we leverage the orthogonal subspaces to identify layer-wise pruning ratios appropriate to retain a significant amount of the layer-wise information. We show that this measure outperforms existing pruning methods on VGG networks. We further show that our method can be extended to other network architectures such as residual networks.

Updated: 2024-05-26 14:27:26

标题: 子空间节点修剪

摘要: 随着深度神经网络模型在商业应用中的显著增加，对高效人工智能的需求日益增加。节点剪枝是一种在保持网络性能最大化的同时移除计算单元，如神经元、滤波器、注意力头甚至整个层的技术。这可以显著减少深度网络的推断时间，从而提高其效率。以往的研究很少利用重组网络参数来恢复性能的能力。在本文中，我们提出利用单元激活创建一个子空间，从而实现节点剪枝并恢复最大准确性。我们发现，为了有效地进行节点剪枝，可以使用三角形变换矩阵创建一个子空间，我们证明这等价于格拉姆-施密特正交化，从而自动化这一过程。我们进一步通过在形成子空间之前重新组织网络来改进这种方法。最后，我们利用正交子空间确定适合保留大量层信息的适当的逐层剪枝比率。我们展示了这种方法在VGG网络上优于现有的剪枝方法，并进一步展示了我们的方法可以扩展到其他网络架构，如残差网络。

更新时间: 2024-05-26 14:27:26

领域: cs.LG,cs.CV,cs.NE

下载: http://arxiv.org/abs/2405.17506v1

On Bits and Bandits: Quantifying the Regret-Information Trade-off

In interactive decision-making tasks, information can be acquired by direct interactions, through receiving indirect feedback, and from external knowledgeable sources. We examine the trade-off between the information an agent accumulates and the regret it suffers. We show that information from external sources, measured in bits, can be traded off for regret, measured in reward. We invoke information-theoretic methods for obtaining regret lower bounds, that also allow us to easily re-derive several known lower bounds. We then generalize a variety of interactive decision-making tasks with external information to a new setting. Using this setting, we introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates. These lower bounds also prove the near-optimality of Thompson sampling for Bayesian problems. Finally, we demonstrate the utility of these bounds in improving the performance of a question-answering task with large language models, allowing us to obtain valuable insights.

Updated: 2024-05-26 14:18:38

标题: 关于比特和强盗：量化遗憾信息权衡

摘要: 在互动决策任务中，信息可以通过直接交互、接收间接反馈和外部知识来源获取。我们研究了代理积累信息和遭受后悔之间的权衡。我们表明，来自外部来源的信息（以比特为单位）可以用于换取以奖励为单位衡量的后悔。我们利用信息论方法获得后悔的下界，这也使我们能够轻松地重新推导出一些已知的下界。接着，我们将各种带有外部信息的互动决策任务推广到一个新的设置。利用这个设置，我们引入了第一个依赖于代理积累信息的贝叶斯后悔下界。这些下界也证明了贝叶斯问题中汤普森抽样的近似最优性。最后，我们展示了这些下界在提高具有大型语言模型的问答任务性能方面的实用性，从而使我们能够获得宝贵的见解。

更新时间: 2024-05-26 14:18:38

领域: cs.LG

下载: http://arxiv.org/abs/2405.16581v1

A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing

In recent years, the deterioration of artificial materials used in structures has become a serious social issue, increasing the importance of inspections. Non-destructive testing is gaining increased demand due to its capability to inspect for defects and deterioration in structures while preserving their functionality. Among these, Laser Ultrasonic Visualization Testing (LUVT) stands out because it allows the visualization of ultrasonic propagation. This makes it visually straightforward to detect defects, thereby enhancing inspection efficiency. With the increasing number of the deterioration structures, challenges such as a shortage of inspectors and increased workload in non-destructive testing have become more apparent. Efforts to address these challenges include exploring automated inspection using machine learning. However, the lack of anomalous data with defects poses a barrier to improving the accuracy of automated inspection through machine learning. Therefore, in this study, we propose a method for automated LUVT inspection using an anomaly detection approach with a diffusion model that can be trained solely on negative examples (defect-free data). We experimentally confirmed that our proposed method improves defect detection and localization compared to general object detection algorithms used previously.

Updated: 2024-05-26 14:14:35

标题: 超声无损检测中使用生成模型进行无监督异常检测和缺陷定位的研究

摘要: 近年来，结构中使用的人工材料的恶化已成为一个严重的社会问题，增加了检测的重要性。无损检测由于其能够检测结构中的缺陷和恶化而保持其功能，因此需求量不断增加。在这些方法中，激光超声可视化测试（LUVT）因其能够可视化超声波传播而脱颖而出。这使得检测缺陷变得直观简单，从而提高了检测效率。随着恶化结构数量的增加，检查员短缺和无损检测工作量增加等挑战变得更加明显。应对这些挑战的努力包括探索使用机器学习进行自动化检测。然而，缺乏具有缺陷的异常数据是阻碍通过机器学习提高自动化检测准确性的障碍。因此，在本研究中，我们提出了一种使用扩散模型的异常检测方法，该模型可以仅仅在负例（无缺陷数据）上进行训练，用于自动化LUVT检测。我们通过实验证实，我们提出的方法相比先前使用的一般目标检测算法，可以改善缺陷检测和定位。

更新时间: 2024-05-26 14:14:35

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2405.16580v1

Reflected Flow Matching

Continuous normalizing flows (CNFs) learn an ordinary differential equation to transform prior samples into data. Flow matching (FM) has recently emerged as a simulation-free approach for training CNFs by regressing a velocity model towards the conditional velocity field. However, on constrained domains, the learned velocity model may lead to undesirable flows that result in highly unnatural samples, e.g., oversaturated images, due to both flow matching error and simulation error. To address this, we add a boundary constraint term to CNFs, which leads to reflected CNFs that keep trajectories within the constrained domains. We propose reflected flow matching (RFM) to train the velocity model in reflected CNFs by matching the conditional velocity fields in a simulation-free manner, similar to the vanilla FM. Moreover, the analytical form of conditional velocity fields in RFM avoids potentially biased approximations, making it superior to existing score-based generative models on constrained domains. We demonstrate that RFM achieves comparable or better results on standard image benchmarks and produces high-quality class-conditioned samples under high guidance weight.

Updated: 2024-05-26 14:09:43

标题: 反射流匹配

摘要: 连续正规化流(CNFs)通过学习一个常微分方程将先验样本转化为数据。流匹配(FM)最近出现作为一种无需模拟的训练CNFs的方法，通过将速度模型回归到条件速度场。然而，在受限域上，学习的速度模型可能导致不可取的流动，从而产生高度不自然的样本，例如过度饱和的图像，这是由流匹配误差和模拟误差共同引起的。为了解决这个问题，我们向CNFs添加了一个边界约束项，导致反射CNFs，使轨迹保持在受限域内。我们提出了反射流匹配(RFM)来训练反射CNFs中的速度模型，通过以与普通FM类似的无需模拟的方式匹配条件速度场。此外，在RFM中的条件速度场的解析形式避免了潜在的有偏近似，使其在受限域上优于现有的基于评分的生成模型。我们证明RFM在标准图像基准上取得了可比较或更好的结果，并在高引导权重下产生高质量的类条件样本。

更新时间: 2024-05-26 14:09:43

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.16577v1

ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling

We propose ID-to-3D, a method to generate identity- and text-guided 3D human heads with disentangled expressions, starting from even a single casually captured in-the-wild image of a subject. The foundation of our approach is anchored in compositionality, alongside the use of task-specific 2D diffusion models as priors for optimization. First, we extend a foundational model with a lightweight expression-aware and ID-aware architecture, and create 2D priors for geometry and texture generation, via fine-tuning only 0.2% of its available training parameters. Then, we jointly leverage a neural parametric representation for the expressions of each subject and a multi-stage generation of highly detailed geometry and albedo texture. This combination of strong face identity embeddings and our neural representation enables accurate reconstruction of not only facial features but also accessories and hair and can be meshed to provide render-ready assets for gaming and telepresence. Our results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation, generalizing to a ``world'' of unseen 3D identities, without relying on large 3D captured datasets of human assets. Explore our 3D results at: https://https://idto3d.github.io.

Updated: 2024-05-26 13:36:45

标题: ID-to-3D：通过分数蒸馏抽样实现富有表现力的ID引导3D头部

摘要: 我们提出了ID-to-3D，一种从仅一个随意捕获的野外图像开始生成具有分离表情的身份和文本引导的3D人类头部的方法。我们方法的基础是在组合性的基础上，同时利用任务特定的2D扩散模型作为优化的先验。首先，我们通过仅调整0.2％的可用训练参数，将基础模型扩展为轻量级的表情感知和ID感知体系结构，并创建2D先验用于几何和纹理生成。然后，我们共同利用了每个主体表达式的神经参数化表示和高度详细几何和颜色纹理的多阶段生成。这种强大的面部身份嵌入和我们的神经表示的组合使得不仅可以准确重建面部特征，还可以包括配饰和头发，并可进行网格化以提供游戏和远程呈现所需的准备好的资产。我们的结果实现了前所未有的身份一致性和高质量的纹理和几何生成水平，泛化到未见的3D身份的“世界”，而无需依赖大型3D人类资产数据集。请访问我们的3D结果：https://https://idto3d.github.io。

更新时间: 2024-05-26 13:36:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16570v1

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 12\% and 17\% of the attacks with naive prompts, respectively, while ChatGPT blocks 84\% of them. Then, we further propose a stronger automated jailbreaking pipeline for T2I generation systems, which produces prompts that bypass their safety guards. Our automated jailbreaking framework leverages an LLM optimizer to generate prompts to maximize degree of violation from the generated images without any weight updates or gradient computation. Surprisingly, our simple yet effective approach successfully jailbreaks the ChatGPT with 11.0\% block rate, making it generate copyrighted contents in 76\% of the time. Finally, we explore various defense strategies, such as post-generation filtering and machine unlearning techniques, but found that they were inadequate, which suggests the necessity of stronger defense mechanisms.

Updated: 2024-05-26 13:32:24

标题: 文献标题翻译为：文本到图像生成AI系统的自动越狱

摘要: 最近的人工智能系统在诸如信息检索、语言生成和基于大型语言模型（LLMs）的图像生成等各种任务上表现出极其强大的性能，甚至超越了人类表现。与此同时，存在各种安全风险，可以通过规避LLMs中的对齐而导致恶意内容的生成，这经常被称为越狱。然而，大多数先前的研究仅关注LLMs中基于文本的越狱，对文本到图像（T2I）生成系统的越狱相对被忽视。在本文中，我们首先评估了商用T2I生成系统（如ChatGPT、Copilot和Gemini）在使用天真提示时对侵犯版权的安全性。通过这一实证研究，我们发现Copilot和Gemini分别仅阻止了12%和17%的使用天真提示的攻击，而ChatGPT阻止了84%的攻击。然后，我们进一步提出了一个更强大的自动化越狱流程，用于T2I生成系统，该流程生成可绕过其安全保护的提示。我们的自动化越狱框架利用LLM优化器生成提示，以最大程度地违反生成的图像，且不需要任何权重更新或梯度计算。令人惊讶的是，我们简单而有效的方法成功越狱了ChatGPT，其封锁率为11.0%，使其在76%的时间内生成受版权保护的内容。最后，我们探讨了各种防御策略，如后生成过滤和机器遗忘技术，但发现它们不足以应对，这表明有必要采取更强大的防御机制。

更新时间: 2024-05-26 13:32:24

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2405.16567v1

Contextual Linear Optimization with Bandit Feedback

Contextual linear optimization (CLO) uses predictive observations to reduce uncertainty in random cost coefficients and thereby improve average-cost performance. An example is a stochastic shortest path with random edge costs (e.g., traffic) and predictive features (e.g., lagged traffic, weather). Existing work on CLO assumes the data has fully observed cost coefficient vectors, but in many applications, we can only see the realized cost of a historical decision, that is, just one projection of the random cost coefficient vector, to which we refer as bandit feedback. We study a class of algorithms for CLO with bandit feedback, which we term induced empirical risk minimization (IERM), where we fit a predictive model to directly optimize the downstream performance of the policy it induces. We show a fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of the optimization estimate, and we develop computationally tractable surrogate losses. A byproduct of our theory of independent interest is fast-rate regret bound for IERM with full feedback and misspecified policy class. We compare the performance of different modeling choices numerically using a stochastic shortest path example and provide practical insights from the empirical results.

Updated: 2024-05-26 13:27:27

标题: 具有贝叶斯反馈的情境线性优化

摘要: 背景线性优化（CLO）利用预测观察结果来减少随机成本系数的不确定性，从而提高平均成本绩效。一个例子是具有随机边缘成本（例如交通）和预测特征（例如滞后交通、天气）的随机最短路径。现有关于CLO的工作假设数据具有完全观察到的成本系数向量，但在许多应用中，我们只能看到历史决策的实现成本，即随机成本系数向量的一个投影，我们称之为强盗反馈。我们研究了一类算法，用于具有强盗反馈的CLO，我们称之为诱导经验风险最小化（IERM），其中我们拟合一个预测模型，直接优化其引导的策略的下游性能。我们展示了IERM的快速遗憾界限，允许错误规定的模型类和灵活选择的优化估计，并开发了可计算的替代损失。我们独立感兴趣的理论副产品是IERM对完全反馈和错误规定策略类的快速遗憾界限。我们使用随机最短路径示例数字比较不同建模选择的性能，并从实证结果中提供实用见解。

更新时间: 2024-05-26 13:27:27

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.16564v1

Reality Only Happens Once: Single-Path Generalization Bounds for Transformers

One of the inherent challenges in deploying transformers on time series is that \emph{reality only happens once}; namely, one typically only has access to a single trajectory of the data-generating process comprised of non-i.i.d. observations. We derive non-asymptotic statistical guarantees in this setting through bounds on the \textit{generalization} of a transformer network at a future-time $t$, given that it has been trained using $N\le t$ observations from a single perturbed trajectory of a Markov process. Under the assumption that the Markov process satisfies a log-Sobolev inequality, we obtain a generalization bound which effectively converges at the rate of ${O}(1/\sqrt{N})$. Our bound depends explicitly on the activation function ($\operatorname{Swish}$, $\operatorname{GeLU}$, or $\tanh$ are considered), the number of self-attention heads, depth, width, and norm-bounds defining the transformer architecture. Our bound consists of three components: (I) The first quantifies the gap between the stationary distribution of the data-generating Markov process and its distribution at time $t$, this term converges exponentially to $0$. (II) The next term encodes the complexity of the transformer model and, given enough time, eventually converges to $0$ at the rate ${O}(\log(N)^r/\sqrt{N})$ for any $r>0$. (III) The third term guarantees that the bound holds with probability at least $1$-$\delta$, and converges at a rate of ${O}(\sqrt{\log(1/\delta)}/\sqrt{N})$.

Updated: 2024-05-26 13:19:32

标题: 现实只发生一次：变压器的单路径泛化界限

摘要: 在将变压器部署到时间序列中存在的固有挑战之一是\emph{现实只发生一次}；换句话说，通常只能访问由非独立同分布观测组成的数据生成过程的单个轨迹。我们通过对在未来时间$t$处经过训练的变压器网络的\textit{泛化}进行界定，从而在这种情况下推导非渐近统计保证，假设使用来自马尔可夫过程的单个扰动轨迹的$N\le t$观测进行训练。在假设马尔可夫过程满足对数Sobolev不等式的情况下，我们获得了一个泛化界限，其有效地以${O}(1/\sqrt{N})$的速率收敛。我们的界限明确取决于激活函数（$\operatorname{Swish}$，$\operatorname{GeLU}$或$\tanh$），自注意力头的数量，深度，宽度和定义变压器架构的范数界限。我们的界限由三个组成部分组成：(I) 第一个量化数据生成马尔可夫过程的平稳分布与其在时间$t$处的分布之间的差距，该项指数收敛于$0$。(II) 下一个项编码了变压器模型的复杂性，并在足够时间内最终以${O}(\log(N)^r/\sqrt{N})$的速率收敛于$0$，其中$r>0$。(III) 第三项保证该界限至少以概率$1$-$\delta$成立，并以${O}(\sqrt{\log(1/\delta)}/\sqrt{N})$的速率收敛。

更新时间: 2024-05-26 13:19:32

领域: cs.LG,cs.NA,cs.NE,math.NA,math.PR,stat.ML,60G35, 62M20, 68T07, 41A65

下载: http://arxiv.org/abs/2405.16563v1

Realizing Disentanglement in LM Latent Space via Vocabulary-Defined Semantics

Understanding the latent space of language models (LMs) is important for improving the performance and interpretability of LMs. Existing analyses often fail to provide insights that take advantage of the semantic properties of language models and often overlook crucial aspects of language model adaptation. In response, we introduce a pioneering approach called vocabulary-defined semantics, which establishes a reference frame grounded in LM vocabulary within the LM latent space. We propose a novel technique to compute disentangled logits and gradients in latent space, not entangled ones on vocabulary. Further, we perform semantical clustering on data representations as a novel way of LM adaptation. Through extensive experiments across diverse text understanding datasets, our approach outperforms state-of-the-art methods of retrieval-augmented generation and parameter-efficient finetuning, showcasing its effectiveness and efficiency.

Updated: 2024-05-26 13:12:35

标题: 通过词汇定义的语义在语言模型潜空间中实现解缠。

摘要: 理解语言模型（LMs）的潜在空间对于提高LMs的性能和可解释性至关重要。现有的分析往往未能提供利用语言模型语义特性的洞察，经常忽视语言模型适应的关键方面。为此，我们引入了一种开创性的方法，称为词汇定义的语义，该方法在LM潜在空间内建立了一个基于LM词汇的参考框架。我们提出了一种新颖的技术，在潜在空间中计算解耦的logits和梯度，而不是在词汇上纠缠不清。此外，我们通过在数据表示上进行语义聚类，作为LM适应的一种新颖方式。通过在多样化的文本理解数据集上进行大量实验，我们的方法优于检索增强生成和参数高效微调的最新方法，展示了其有效性和效率。

更新时间: 2024-05-26 13:12:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2401.16184v5

Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data, enabling the rapid adaptation to new unseen tasks. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. In this paper, we empirically and theoretically identify and analyze the model heterogeneity in DFML. We find that model heterogeneity introduces a heterogeneity-homogeneity trade-off, where homogeneous models reduce task conflicts but also increase the overfitting risk. Balancing this trade-off is crucial for learning shared representations across tasks. Based on our findings, we propose Task Groupings Regularization, a novel approach that benefits from model heterogeneity by grouping and aligning conflicting tasks. Specifically, we embed pre-trained models into a task space to compute dissimilarity, and group heterogeneous models together based on this measure. Then, we introduce implicit gradient regularization within each group to mitigate potential conflicts. By encouraging a gradient direction suitable for all tasks, the meta-model captures shared representations that generalize across tasks. Comprehensive experiments showcase the superiority of our approach in multiple benchmarks, effectively tackling the model heterogeneity in challenging multi-domain and multi-architecture scenarios.

Updated: 2024-05-26 13:11:55

标题: 任务分组规范化：使用异构预训练模型进行无数据元学习

摘要: Data-Free Meta-Learning (DFML)旨在从一组预训练模型中提取知识，而无需访问其原始数据，从而实现对新的未知任务的快速适应。当前的方法通常忽视预训练模型之间的异质性，这会导致由于任务冲突而性能下降。本文在经验和理论上识别和分析了DFML中的模型异质性。我们发现模型异质性引入了一个异质性-同质性的权衡，其中同质模型减少了任务冲突，但也增加了过拟合风险。平衡这种权衡对于学习跨任务共享表示非常重要。基于我们的发现，我们提出了一种新颖的方法——任务分组正则化，通过将存在冲突的任务分组和对齐，从而受益于模型异质性。具体来说，我们将预训练模型嵌入到任务空间中以计算不相似性，并根据这个度量将异质模型分组在一起。然后，我们在每个组内引入隐式梯度正则化，以减轻潜在的冲突。通过鼓励适用于所有任务的梯度方向，元模型捕捉到跨任务泛化的共享表示。全面的实验展示了我们的方法在多个基准测试中的优越性，有效地解决了具有挑战性的多领域和多架构场景中的模型异质性。

更新时间: 2024-05-26 13:11:55

领域: cs.LG

下载: http://arxiv.org/abs/2405.16560v1

RealTCD: Temporal Causal Discovery from Interventional Data with Large Language Model

In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional data. However, existing methods mainly focus on synthetic datasets with heavy reliance on intervention targets and ignore the textual information hidden in real-world systems, failing to conduct causal discovery for real industrial scenarios. To tackle this problem, in this paper we propose to investigate temporal causal discovery in industrial scenarios, which faces two critical challenges: 1) how to discover causal relationships without the interventional targets that are costly to obtain in practice, and 2) how to discover causal relations via leveraging the textual information in systems which can be complex yet abundant in industrial contexts. To address these challenges, we propose the RealTCD framework, which is able to leverage domain knowledge to discover temporal causal relationships without interventional targets. Specifically, we first develop a score-based temporal causal discovery method capable of discovering causal relations for root cause analysis without relying on interventional targets through strategic masking and regularization. Furthermore, by employing Large Language Models (LLMs) to handle texts and integrate domain knowledge, we introduce LLM-guided meta-initialization to extract the meta-knowledge from textual information hidden in systems to boost the quality of discovery. We conduct extensive experiments on simulation and real-world datasets to show the superiority of our proposed RealTCD framework over existing baselines in discovering temporal causal structures.

Updated: 2024-05-26 13:08:00

标题: RealTCD：使用大型语言模型从干预数据中发现时间因果关系

摘要: 在信息技术运营的人工智能领域中，因果发现对于图构建的运营和维护至关重要，有助于促进下游工业任务，如根本原因分析。作为一种新兴方法，时间因果发现旨在通过利用干预数据，直接从观察中识别变量之间的时间因果关系。然而，现有方法主要集中于对合成数据集的依赖较重，并忽视了隐藏在真实世界系统中的文本信息，未能对真实工业场景进行因果发现。为了解决这一问题，本文提出在工业场景中研究时间因果发现，面临两个关键挑战：1）如何在实践中获取代价高昂的干预目标的情况下发现因果关系，2）如何通过利用系统中的文本信息发现因果关系，这在工业背景下可能复杂但丰富。为了应对这些挑战，我们提出了RealTCD框架，该框架能够利用领域知识发现时间因果关系而无需干预目标。具体而言，我们首先开发了一种基于分数的时间因果发现方法，能够通过策略性的掩蔽和正则化，在不依赖干预目标的情况下发现根本原因分析的因果关系。此外，通过使用大型语言模型（LLMs）处理文本并整合领域知识，我们引入了LLM引导的元初始化，从系统中隐藏的文本信息中提取元知识，以提高发现的质量。我们在模拟和真实世界数据集上进行了广泛实验，展示了我们提出的RealTCD框架在发现时间因果结构方面优于现有基线方法的优势。

更新时间: 2024-05-26 13:08:00

领域: cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2404.14786v2

Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furthermore, these methods typically fail to provide robust initial embeddings for values infrequently observed or even absent within the training set, posing significant challenges to model generalizability. In response to these challenges, we propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token, effectively bypassing the need for imputation. SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism. Coupling SCANE with the Transformer Encoder architecture, we develop the Scalable nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries. Our experimental validation, conducted across three disparate electronic health record (EHR) datasets marked by elevated missing value frequencies, confirms the superior performance of SUMMIT over contemporary state-of-the-art approaches addressing similar challenges. These results substantiate the efficacy of SCANE and SUMMIT, underscoring their potential applicability across a broad spectrum of MTS data analytical tasks.

Updated: 2024-05-26 13:06:45

标题: 多变量时间序列的可扩展数值嵌入：增强医疗数据表示学习

摘要: 多变量时间序列（MTS）数据在不规则和异步采样时通常会出现大量缺失值。传统的MTS分析方法往往依赖于基于时间戳的时间嵌入，需要后续的插补，然而这些插补值通常与其实际值存在显著偏差，从而影响了预测准确性。此外，这些方法通常无法为训练集中很少观察到甚至不存在的值提供稳健的初始嵌入，给模型的泛化能力带来了重大挑战。为了应对这些挑战，我们提出了SCAlable Numerical Embedding（SCANE），这是一个创新的框架，将每个特征值视为独立的标记，有效地避免了插补的需求。SCANE通过一个可扩展的嵌入机制调节不同特征嵌入的特性，并通过增强表征学习来提高性能。将SCANE与Transformer编码器架构相结合，我们开发了可扩展的nUMerical eMbeddIng Transformer（SUMMIT），旨在为具有普遍缺失条目特征的MTS提供精确的预测输出。我们在三个不同的电子健康记录（EHR）数据集上进行的实验证实了SUMMIT相对于解决类似挑战的当代最先进方法的卓越性能。这些结果证实了SCANE和SUMMIT的有效性，并强调了它们在广泛的MTS数据分析任务中的潜在适用性。

更新时间: 2024-05-26 13:06:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16557v1

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment of self-driving cars. To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. Our method, RadarOcc, circumvents the limitations of sparse radar point clouds by directly processing the 4D radar tensor, thus preserving essential scene details. RadarOcc innovatively addresses the challenges associated with the voluminous and noisy 4D radar data by employing Doppler bins descriptors, sidelobe-aware spatial sparsification, and range-wise self-attention mechanisms. To minimize the interpolation errors associated with direct coordinate transformations, we also devise a spherical-based feature encoding followed by spherical-to-Cartesian feature aggregation. We benchmark various baseline methods based on distinct modalities on the public K-Radar dataset. The results demonstrate RadarOcc's state-of-the-art performance in radar-based 3D occupancy prediction and promising results even when compared with LiDAR- or camera-based methods. Additionally, we present qualitative evidence of the superior performance of 4D radar in adverse weather conditions and explore the impact of key pipeline components through ablation studies.

Updated: 2024-05-26 12:50:13

标题: 雷达占用：具有4D成像雷达的稳健3D占用预测

摘要: 基于3D占据感知管道的文献摘要表明，通过捕捉详细的场景描述并展示强大的泛化能力，自动驾驶得到了显著的改进。目前的方法主要依赖于LiDAR或相机输入进行3D占据预测。这些方法容易受到恶劣天气条件的影响，限制了自动驾驶汽车的全天候部署。为了提高感知的稳健性，我们利用汽车雷达的最新进展，引入了一种利用4D成像雷达传感器进行3D占据预测的新方法。我们的方法RadarOcc通过直接处理4D雷达张量而绕过稀疏雷达点云的限制，从而保留了关键的场景细节。RadarOcc通过采用多普勒频率分布描述、旁瓣感知空间稀疏化和范围自注意机制，创新性地解决了与庞大且嘈杂的4D雷达数据相关的挑战。为了最小化直接坐标转换相关的插值误差，我们还设计了基于球面的特征编码，随后进行球面到笛卡尔特征聚合。我们在公共K-Radar数据集上基于不同模态的各种基线方法进行了基准测试。结果表明，与LiDAR或相机方法相比，RadarOcc在基于雷达的3D占据预测中表现出最先进的性能，并且即使在恶劣天气条件下，也展示了4D雷达优越性能的定性证据，并通过消蚀研究探讨了关键管道组件的影响。

更新时间: 2024-05-26 12:50:13

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.14014v2

SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation

Existing Large Language Models (LLMs) generate text through unidirectional autoregressive decoding methods to respond to various user queries. These methods tend to consider token selection in a simple sequential manner, making it easy to fall into suboptimal options when encountering uncertain tokens, referred to as chaotic points in our work. Many chaotic points exist in texts generated by LLMs, and they often significantly affect the quality of subsequently generated tokens, which can interfere with LLMs' generation. This paper proposes Self-Evaluation Decoding, SED, a decoding method for enhancing model generation. Analogous to the human decision-making process, SED integrates speculation and evaluation steps into the decoding process, allowing LLMs to make more careful decisions and thus optimize token selection at chaotic points. Experimental results across various tasks using different LLMs demonstrate SED's effectiveness.

Updated: 2024-05-26 12:43:18

标题: SED：自我评估解码增强大型语言模型以实现更好的生成

摘要: 现有的大型语言模型（LLMs）通过单向自回归解码方法生成文本，以应对各种用户查询。这些方法往往以简单的顺序方式考虑令牌选择，当遇到不确定的令牌时容易陷入次优选项，我们的工作中将其称为混沌点。LLMs生成的文本中存在许多混沌点，它们通常会显著影响随后生成的令牌的质量，从而干扰LLMs的生成。本文提出了自评估解码（SED），这是一种用于增强模型生成的解码方法。类似于人类决策过程，SED将猜测和评估步骤整合到解码过程中，使LLMs能够做出更谨慎的决定，从而优化混沌点处的令牌选择。使用不同的LLMs在各种任务上的实验结果证明了SED的有效性。

更新时间: 2024-05-26 12:43:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.16552v1

ReCODE: Modeling Repeat Consumption with Neural ODE

In real-world recommender systems, such as in the music domain, repeat consumption is a common phenomenon where users frequently listen to a small set of preferred songs or artists repeatedly. The key point of modeling repeat consumption is capturing the temporal patterns between a user's repeated consumption of the items. Existing studies often rely on heuristic assumptions, such as assuming an exponential distribution for the temporal gaps. However, due to the high complexity of real-world recommender systems, these pre-defined distributions may fail to capture the intricate dynamic user consumption patterns, leading to sub-optimal performance. Drawing inspiration from the flexibility of neural ordinary differential equations (ODE) in capturing the dynamics of complex systems, we propose ReCODE, a novel model-agnostic framework that utilizes neural ODE to model repeat consumption. ReCODE comprises two essential components: a user's static preference prediction module and the modeling of user dynamic repeat intention. By considering both immediate choices and historical consumption patterns, ReCODE offers comprehensive modeling of user preferences in the target context. Moreover, ReCODE seamlessly integrates with various existing recommendation models, including collaborative-based and sequential-based models, making it easily applicable in different scenarios. Experimental results on two real-world datasets consistently demonstrate that ReCODE significantly improves the performance of base models and outperforms other baseline methods.

Updated: 2024-05-26 12:40:23

标题: 重新编码：使用神经ODE对重复消费进行建模

摘要: 在真实世界的推荐系统中，比如音乐领域，重复消费是一个常见现象，用户经常反复听一小组喜欢的歌曲或艺术家。建模重复消费的关键点是捕捉用户对物品重复消费之间的时间模式。现有研究通常依赖于启发式假设，比如假设时间间隔遵循指数分布。然而，由于真实世界推荐系统的复杂性很高，这些预定义分布可能无法捕捉复杂的动态用户消费模式，导致性能次优。受神经常微分方程（ODE）在捕捉复杂系统动态方面的灵活性启发，我们提出了ReCODE，一个利用神经ODE建模重复消费的新型模型无关框架。ReCODE包括两个关键组件：用户静态偏好预测模块和用户动态重复意向建模。通过考虑即时选择和历史消费模式，ReCODE提供了对目标环境中用户偏好的全面建模。此外，ReCODE可以无缝集成各种现有推荐模型，包括基于协同和基于序列的模型，使其在不同场景下易于应用。两个真实世界数据集上的实验结果一致表明，ReCODE显著提高了基础模型的性能，并优于其他基准方法。

更新时间: 2024-05-26 12:40:23

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.16550v1

Multiplicative Reweighting for Robust Neural Network Optimization

Neural networks are widespread due to their powerful performance. However, they degrade in the presence of noisy labels at training time. Inspired by the setting of learning with expert advice, where multiplicative weight (MW) updates were recently shown to be robust to moderate data corruptions in expert advice, we propose to use MW for reweighting examples during neural networks optimization. We theoretically establish the convergence of our method when used with gradient descent and prove its advantages in 1d cases. We then validate our findings empirically for the general case by showing that MW improves the accuracy of neural networks in the presence of label noise on CIFAR-10, CIFAR-100 and Clothing1M. We also show the impact of our approach on adversarial robustness.

Updated: 2024-05-26 12:27:35

标题: 神经网络优化中的乘法重新加权

摘要: 神经网络因其强大的性能而广泛应用。然而，在训练时存在噪声标签时，它们会退化。受到学习专家建议的启发，最近显示出乘法权重（MW）更新在专家建议中对中等数据损坏具有鲁棒性，我们建议在神经网络优化过程中使用MW重新加权示例。我们在理论上证明了我们的方法在与梯度下降一起使用时的收敛性，并证明了其在一维案例中的优势。然后，我们通过在CIFAR-10、CIFAR-100和Clothing1M上显示MW在存在标签噪声时提高神经网络准确性的实验证实了我们的发现。我们还展示了我们方法对对抗性鲁棒性的影响。

更新时间: 2024-05-26 12:27:35

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2102.12192v4

Guidance with Spherical Gaussian Constraint for Conditional Diffusion

Recent advances in diffusion models attempt to handle conditional generative tasks by utilizing a differentiable loss function for guidance without the need for additional training. While these methods achieved certain success, they often compromise on sample quality and require small guidance step sizes, leading to longer sampling processes. This paper reveals that the fundamental issue lies in the manifold deviation during the sampling process when loss guidance is employed. We theoretically show the existence of manifold deviation by establishing a certain lower bound for the estimation error of the loss guidance. To mitigate this problem, we propose Diffusion with Spherical Gaussian constraint (DSG), drawing inspiration from the concentration phenomenon in high-dimensional Gaussian distributions. DSG effectively constrains the guidance step within the intermediate data manifold through optimization and enables the use of larger guidance steps. Furthermore, we present a closed-form solution for DSG denoising with the Spherical Gaussian constraint. Notably, DSG can seamlessly integrate as a plugin module within existing training-free conditional diffusion methods. Implementing DSG merely involves a few lines of additional code with almost no extra computational overhead, yet it leads to significant performance improvements. Comprehensive experimental results in various conditional generation tasks validate the superiority and adaptability of DSG in terms of both sample quality and time efficiency.

Updated: 2024-05-26 12:26:15

标题: 带球面高斯约束的条件扩散引导

摘要: 最近，扩散模型的最新进展尝试通过利用可微损失函数进行条件生成任务的处理，无需额外的训练。虽然这些方法取得了一定的成功，但它们经常在样本质量上做出妥协，并需要较小的引导步长，导致采样过程更加耗时。本文揭示了当使用损失引导时，在采样过程中流形偏差是根本问题所在。我们从高维高斯分布中的浓缩现象中获得灵感，提出了具有球形高斯约束的扩散方法（DSG）。DSG通过优化有效地将引导步骤约束在中间数据流形内，并允许使用更大的引导步骤。此外，我们提出了带有球形高斯约束的DSG去噪的封闭解。值得注意的是，DSG可以作为插件模块无缝集成到现有的无需训练的条件扩散方法中。实施DSG仅涉及几行额外代码，几乎没有额外的计算开销，但却带来显著的性能改进。在各种条件生成任务中的全面实验结果验证了DSG在样本质量和时间效率方面的优越性和适应性。

更新时间: 2024-05-26 12:26:15

领域: cs.LG

下载: http://arxiv.org/abs/2402.03201v3

Mamba4KT:An Efficient and Effective Mamba-based Knowledge Tracing Model

Knowledge tracing (KT) enhances student learning by leveraging past performance to predict future performance. Current research utilizes models based on attention mechanisms and recurrent neural network structures to capture long-term dependencies and correlations between exercises, aiming to improve model accuracy. Due to the growing amount of data in smart education scenarios, this poses a challenge in terms of time and space consumption for knowledge tracing models. However, existing research often overlooks the efficiency of model training and inference and the constraints of training resources. Recognizing the significance of prioritizing model efficiency and resource usage in knowledge tracing, we introduce Mamba4KT. This novel model is the first to explore enhanced efficiency and resource utilization in knowledge tracing. We also examine the interpretability of the Mamba structure both sequence-level and exercise-level to enhance model interpretability. Experimental findings across three public datasets demonstrate that Mamba4KT achieves comparable prediction accuracy to state-of-the-art models while significantly improving training and inference efficiency and resource utilization. As educational data continues to grow, our work suggests a promising research direction for knowledge tracing that improves model prediction accuracy, model efficiency, resource utilization, and interpretability simultaneously.

Updated: 2024-05-26 12:26:03

标题: Mamba4KT：一种高效且有效的基于Mamba的知识追踪模型

摘要: 知识追踪（KT）通过利用过去的表现来预测未来的表现，提升学生学习效果。当前研究利用基于注意力机制和循环神经网络结构的模型，捕捉练习之间的长期依赖性和相关性，旨在提高模型准确性。由于智能教育场景中数据量不断增长，这给知识追踪模型的时间和空间消耗带来挑战。然而，现有研究常常忽视模型训练和推断的效率以及训练资源的限制。认识到在知识追踪中优先考虑模型效率和资源利用的重要性，我们引入了Mamba4KT。这种新颖模型是首个探索知识追踪中增强效率和资源利用的模型。我们还研究了Mamba结构在序列级和练习级上的可解释性，以增强模型的可解释性。通过对三个公共数据集的实验结果表明，Mamba4KT在达到与最先进模型相当的预测准确性的同时，显著提高了训练和推断的效率以及资源利用。随着教育数据的不断增长，我们的工作为知识追踪提出了一个有前景的研究方向，同时提高了模型的预测准确性、模型效率、资源利用和可解释性。

更新时间: 2024-05-26 12:26:03

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.16542v1

Variance-Reducing Couplings for Random Features: Perspectives from Optimal Transport

Random features (RFs) are a popular technique to scale up kernel methods in machine learning, replacing exact kernel evaluations with stochastic Monte Carlo estimates. They underpin models as diverse as efficient transformers (by approximating attention) to sparse spectrum Gaussian processes (by approximating the covariance function). Efficiency can be further improved by speeding up the convergence of these estimates: a variance reduction problem. We tackle this through the unifying framework of optimal transport, using theoretical insights and numerical algorithms to develop novel, high-performing RF couplings for kernels defined on Euclidean and discrete input spaces. They enjoy concrete theoretical performance guarantees and sometimes provide strong empirical downstream gains, including for scalable approximate inference on graphs. We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.

Updated: 2024-05-26 12:25:09

标题: 随机特征的方差减少耦合：来自最优输运的视角

摘要: 随机特征（RFs）是一种流行的技术，用于扩展机器学习中的核方法，将精确的核评估替换为随机蒙特卡洛估计。它们支撑着各种模型，例如高效的transformers（通过近似注意力）和稀疏谱高斯过程（通过近似协方差函数）。通过加快这些估计的收敛速度，可以进一步提高效率：一个方差减少问题。我们通过最优输运的统一框架来解决这个问题，利用理论见解和数值算法来开发新颖、高性能的RF耦合，用于定义在欧几里得和离散输入空间上的核。它们享有具体的理论性能保证，有时提供强大的经验性下游收益，包括对图上的可扩展近似推理。我们对方差减少作为一种范式的好处和局限性得出了令人惊讶的结论。

更新时间: 2024-05-26 12:25:09

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.16541v1

MinRank Gabidulin encryption scheme on matrix codes

The McEliece scheme is a generic frame which allows to use any error correcting code of which there exists an efficient decoding algorithm to design an encryption scheme by hiding the generator matrix code. Similarly, the Niederreiter frame is the dual version of the McEliece scheme, and achieves smaller ciphertexts. We propose a generalization of the McEliece frame and the Niederreiter frame to matrix codes and the MinRank problem, that we apply to Gabidulin matrix codes (Gabidulin rank codes considered as matrix codes). The masking we consider consists in starting from a rank code C, to consider a matrix version of C and to concatenate a certain number of rows and columns to the matrix codes version of the rank code C and then apply to an isometry for matric codes. The security of the schemes relies on the MinRank problem to decrypt a ciphertext, and the structural security of the scheme relies on a new problem EGMC-Indistinguishability problem that we introduce and that we study in detail. The main structural attack that we propose consists in trying to recover the masked linearity over the extension field which is lost during the masking process. Overall, starting from Gabidulin codes we obtain a very appealing tradeoff between the size of ciphertext and the size of the public key. For 128b of security we propose parameters ranging from ciphertext of size 65 B (and public keys of size 98 kB) to ciphertext of size 138B (and public key of size 41 kB). Our new approach permits to achieve better trade-off between ciphertexts and public key than the classical McEliece scheme. Our new approach permits to obtain an alternative scheme to the classic McEliece scheme, to obtain very small ciphertexts, with moreover smaller public keys than in the classic McEliece scheme. For 256 bits of security, we can obtain ciphertext as low as 119B, or public key as low as 87kB.

Updated: 2024-05-26 12:04:01

标题: MinRank Gabidulin矩阵码加密方案

摘要: 麦克利斯（McEliece）方案是一个通用框架，允许使用存在高效解码算法的任何纠错码来设计一个加密方案，通过隐藏生成矩阵码。类似地，尼德赖特（Niederreiter）方案是麦克利斯方案的对偶版本，并且实现了更小的密文。我们提出了将麦克利斯框架和尼德赖特框架推广到矩阵码和MinRank问题的一般化方法，我们将其应用于Gabidulin矩阵码（将Gabidulin秩码视为矩阵码）。我们考虑的掩蔽方式是从一个秩码C开始，考虑C的矩阵版本，并将一定数量的行和列连接到秩码C的矩阵码版本，然后应用一个矩阵码的同构。该方案的安全性依赖于MinRank问题来解密密文，方案的结构安全性依赖于我们引入并详细研究的新问题EGMC-Indistinguishability问题。我们提出的主要结构攻击是尝试恢复在掩蔽过程中丢失的扩展域上的掩蔽线性性。总的来说，从Gabidulin码开始，我们得到了一个非常吸引人的密文大小和公钥大小之间的权衡。对于128位的安全性，我们提出了从65 B的密文（和98 kB的公钥）到138B的密文（和41 kB的公钥）的参数范围。我们的新方法使得在密文和公钥之间实现更好的权衡比传统的麦克利斯方案。我们的新方法允许获得一种替代方案以取代经典的麦克利斯方案，获得非常小的密文，并且比传统的麦克利斯方案具有更小的公钥。对于256位的安全性，我们可以获得低至119B的密文，或低至87kB的公钥。

更新时间: 2024-05-26 12:04:01

领域: cs.CR

下载: http://arxiv.org/abs/2405.16539v1

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in machine learning production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of latency, accuracy, and cost in inference pipelines, providers frequently opt to consider one of them. However, the challenge lies in reconciling latency, accuracy, and cost trade-offs. To address this challenge and propose a solution to efficiently manage model variants in inference pipelines, we present IPA, an online deep learning Inference Pipeline Adaptation system that efficiently leverages model variants for each deep learning task. Model variants are different versions of pre-trained models for the same deep learning task with variations in resource requirements, latency, and accuracy. IPA dynamically configures batch size, replication, and model variants to optimize accuracy, minimize costs, and meet user-defined latency Service Level Agreements (SLAs) using Integer Programming. It supports multi-objective settings for achieving different trade-offs between accuracy and cost objectives while remaining adaptable to varying workloads and dynamic traffic patterns. Navigating a wider variety of configurations allows \namex{} to achieve better trade-offs between cost and accuracy objectives compared to existing methods. Extensive experiments in a Kubernetes implementation with five real-world inference pipelines demonstrate that IPA improves end-to-end accuracy by up to 21% with a minimal cost increase. The code and data for replications are available at https://github.com/reconfigurable-ml-pipeline/ipa.

Updated: 2024-05-26 12:03:03

标题: IPA：推理管道适应以实现高准确性和成本效益

摘要: 高效地优化多模型推理管道，以实现快速、准确和具有成本效益的推理，在机器学习生产系统中是一个关键挑战，因为这些系统具有严格的端到端延迟要求。为了简化在推理管道中延迟、准确性和成本的广泛而复杂的权衡空间的探索，提供者经常选择考虑其中之一。然而，挑战在于协调延迟、准确性和成本的权衡。为了解决这一挑战并提出一种有效管理推理管道中模型变体的解决方案，我们提出了IPA，一种在线深度学习推理管道适应系统，它有效地利用每个深度学习任务的模型变体。模型变体是相同深度学习任务的预训练模型的不同版本，具有资源需求、延迟和准确性的差异。IPA通过整数规划动态配置批处理大小、复制和模型变体，以优化准确性，最小化成本，并满足用户定义的延迟服务水平协议（SLAs）。它支持多目标设置，以实现在准确性和成本目标之间不同的权衡，同时适应不同的工作负载和动态流量模式。通过导航更广泛的配置变体，\namex{}相比现有方法能够实现更好的成本和准确性目标之间的权衡。在一个具有五个真实世界推理管道的Kubernetes实现中进行的大量实验表明，IPA可以将端到端准确性提高高达21%，而成本增加最小。复制的代码和数据可在https://github.com/reconfigurable-ml-pipeline/ipa获得。

更新时间: 2024-05-26 12:03:03

领域: cs.DC,cs.LG,cs.PF

下载: http://arxiv.org/abs/2308.12871v3

Gamified AI Approch for Early Detection of Dementia

This paper aims to develop a new deep learning-inspired gaming approach for early detection of dementia. This research integrates a robust convolutional neural network (CNN)-based model for early dementia detection using health metrics data as well as facial image data through a cognitive assessment-based gaming application. We have collected 1000 data samples of health metrics dataset from Apollo Diagnostic Center Kolkata that is labeled as either demented or non-demented for the training of MOD-1D-CNN for the game level 1 and another dataset of facial images containing 1800 facial data that are labeled as either demented or non-demented is collected by our research team for the training of MOD-2D-CNN model in-game level 2. In our work, the loss for the proposed MOD-1D-CNN model is 0.2692 and the highest accuracy is 70.50% for identifying the dementia traits using real-life health metrics data. Similarly, the proposed MOD-2D-CNN model loss is 0.1755 and the highest accuracy is obtained here 95.72% for recognizing the dementia status using real-life face-based image data. Therefore, a rule-based weightage method is applied to combine both the proposed methods to achieve the final decision. The MOD-1D-CNN and MOD-2D-CNN models are more lightweight and computationally efficient alternatives because they have a significantly lower number of parameters when compared to the other state-of-the-art models. We have compared their accuracies and parameters with the other state-of-the-art deep learning models.

Updated: 2024-05-26 12:01:30

标题: 基于游戏化人工智能的早期痴呆症检测方法

摘要: 这篇论文旨在开发一种新的深度学习启发的游戏方法，用于早期痴呆症的检测。该研究整合了一个基于强大卷积神经网络（CNN）的模型，通过基于认知评估的游戏应用，利用健康指标数据和面部图像数据进行早期痴呆症检测。我们从加尔各答阿波罗诊断中心收集了1000个健康指标数据样本，标记为患有痴呆症或非痴呆症，用于MOD-1D-CNN在游戏级别1中的训练，同时我们的研究团队收集了另一个包含1800个面部数据的面部图像数据集，标记为患有痴呆症或非痴呆症，用于MOD-2D-CNN模型在游戏级别2中的训练。在我们的工作中，所提出的MOD-1D-CNN模型的损失为0.2692，最高准确率为70.50%，用于识别使用真实健康指标数据的痴呆特征。同样，所提出的MOD-2D-CNN模型损失为0.1755，最高准确率为95.72%，用于使用真实基于面部图像数据识别痴呆状态。因此，应用基于规则的加权方法来结合两种提出的方法以达到最终决策。MOD-1D-CNN和MOD-2D-CNN模型是更轻量级和计算效率更高的替代方案，因为与其他最先进的模型相比，它们具有显著较少的参数。我们已将它们的准确性和参数与其他最先进的深度学习模型进行了比较。

更新时间: 2024-05-26 12:01:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16538v1

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilities, multi-modal multi-task visual understanding foundation models (MM-VUFMs) effectively process and fuse data from diverse modalities and simultaneously handle various driving-related tasks with powerful adaptability, contributing to a more holistic understanding of the surrounding scene. In this survey, we present a systematic analysis of MM-VUFMs specifically designed for road scenes. Our objective is not only to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques, but also to highlight their advanced capabilities in diverse learning paradigms. These paradigms include open-world understanding, efficient transfer for road scenes, continual learning, interactive and generative capability. Moreover, we provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models. To facilitate researchers in staying abreast of the latest developments in MM-VUFMs for road scenes, we have established a continuously updated repository at https://github.com/rolsheng/MM-VUFM4DS

Updated: 2024-05-26 11:54:10

标题: 深入探讨道路场景理解的多模态多任务基础模型：从学习范式的视角

摘要: 基于基础模型的影响确实在各个领域产生了深远影响，成为显著塑造智能系统能力的关键组成部分。在智能车辆领域，利用基础模型的力量已被证明是具有变革性的，为视觉理解提供了显著的进步。配备多模态和多任务学习能力的多模态多任务视觉理解基础模型（MM-VUFMs）有效处理和融合来自不同模态的数据，并同时处理各种与驾驶相关的任务，具有强大的适应性，有助于更全面地理解周围场景。在本调查中，我们对专门设计用于道路场景的MM-VUFMs进行了系统分析。我们的目标不仅是提供对常见做法的全面概述，涉及特定任务模型、统一多模态模型、统一多任务模型和基础模型提示技术，还要突出它们在不同学习范式中的高级能力。这些范式包括对开放世界的理解、道路场景的有效迁移、持续学习、互动和生成能力。此外，我们提供了关键挑战和未来趋势的见解，如闭环驾驶系统、可解释性、具身驾驶代理和世界模型。为了帮助研究人员及时了解道路场景中MM-VUFMs的最新发展，我们在https://github.com/rolsheng/MM-VUFM4DS建立了一个持续更新的存储库。

更新时间: 2024-05-26 11:54:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.02968v2

Why is parameter averaging beneficial in SGD? An objective smoothing perspective

It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et al. (2018) connected this bias with the smoothing effect of SGD which eliminates sharp local minima by the convolution using the stochastic gradient noise. We follow this line of research and study the commonly-used averaged SGD algorithm, which has been empirically observed in Izmailov et al. (2018) to prefer a flat minimum and therefore achieves better generalization. We prove that in certain problem settings, averaged SGD can efficiently optimize the smoothed objective which avoids sharp local minima. In experiments, we verify our theory and show that parameter averaging with an appropriate step size indeed leads to significant improvement in the performance of SGD.

Updated: 2024-05-26 11:54:08

标题: 为什么参数平均对随机梯度下降有益？一个目标平滑的视角

摘要: 经常观察到随机梯度下降（SGD）及其变体隐式选择具有良好泛化性能的解决方案；这种隐式偏差通常以极小值的陡峭程度来表征。Kleinberg等人（2018）将这种偏差与SGD的平滑效应联系起来，通过使用随机梯度噪声的卷积消除尖锐的局部极小值。我们沿着这条研究线路，研究了常用的平均SGD算法，根据Izmailov等人（2018）的经验观察，该算法倾向于选择一个平坦的极小值，因此实现更好的泛化。我们证明，在某些问题设置中，平均SGD可以有效地优化平滑的目标函数，避免尖锐的局部极小值。在实验中，我们验证了我们的理论，并展示了使用适当步长的参数平均确实可以显著改善SGD的性能。

更新时间: 2024-05-26 11:54:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2302.09376v2

Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective

In long-term time series forecasting (LTSF) tasks, an increasing number of models have acknowledged that discrete time series originate from continuous dynamic systems and have attempted to model their dynamical structures. Recognizing the chaotic nature of real-world data, our model, \textbf{\textit{Attraos}}, incorporates chaos theory into LTSF, perceiving real-world time series as observations from unknown high-dimensional chaotic dynamic systems. Under the concept of attractor invariance, Attraos utilizes non-parametric Phase Space Reconstruction embedding and the proposed multi-scale dynamic memory unit to memorize historical dynamics structure and predicts by a frequency-enhanced local evolution strategy. Detailed theoretical analysis and abundant empirical evidence consistently show that Attraos outperforms various LTSF methods on mainstream LTSF datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.

Updated: 2024-05-26 11:51:22

标题: 吸引子记忆对长期时间序列预测的影响：混沌视角

摘要: 在长期时间序列预测（LTSF）任务中，越来越多的模型已经认识到离散时间序列源自连续动态系统，并尝试对其动态结构进行建模。认识到现实世界数据的混沌性质，我们的模型\textbf{\textit{Attraos}}将混沌理论融入到LTSF中，将现实世界时间序列视为来自未知高维混沌动态系统的观测值。在吸引子不变性的概念下，Attraos利用非参数化相空间重构嵌入和提出的多尺度动态记忆单元来记忆历史动态结构，并通过频率增强的局部演化策略进行预测。详细的理论分析和丰富的经验证据一致表明，与PatchTST相比，Attraos仅使用了十二分之一的参数，在主流LTSF数据集和混沌数据集上表现优于各种LTSF方法。

更新时间: 2024-05-26 11:51:22

领域: cs.LG,cs.AI,nlin.CD

下载: http://arxiv.org/abs/2402.11463v2

Rethinking of Encoder-based Warm-start Methods in Hyperparameter Optimization

Effectively representing heterogeneous tabular datasets for meta-learning remains an open problem. Previous approaches rely on predefined meta-features, for example, statistical measures or landmarkers. Encoder-based models, such as Dataset2Vec, allow us to extract significant meta-features automatically without human intervention. This research introduces a novel encoder-based representation of tabular datasets implemented within the liltab package available on GitHub https://github.com/azoz01/liltab. Our package is based on an established model for heterogeneous tabular data proposed in [Tomoharu Iwata and Atsutoshi Kumagai. Meta-learning from Tasks with Heterogeneous Attribute Spaces. In Advances in Neural Information Processing Systems, 2020]. The proposed approach employs a different model for encoding feature relationships, generating alternative representations compared to existing methods like Dataset2Vec. Both of them leverage the fundamental assumption of dataset similarity learning. In this work, we evaluate Dataset2Vec and liltab on two common meta-tasks -- representing entire datasets and hyperparameter optimization warm-start. However, validation on an independent metaMIMIC dataset highlights the nuanced challenges in representation learning. We show that general representations may not suffice for some meta-tasks where requirements are not explicitly considered during extraction.

Updated: 2024-05-26 11:38:24

标题: 在超参数优化中重新思考基于编码器的热启动方法

摘要: 对于元学习而言，有效地表示异构表格数据集仍然是一个开放的问题。先前的方法依赖于预定义的元特征，例如统计量或地标。基于编码器的模型，如Dataset2Vec，允许我们在不需要人工干预的情况下自动提取重要的元特征。本研究介绍了一个在liltab软件包中实现的基于编码器的表格数据集的新颖表示，该软件包可在GitHub上获得 https://github.com/azoz01/liltab。我们的软件包基于[Tomoharu Iwata和Atsutoshi Kumagai. Meta-learning from Tasks with Heterogeneous Attribute Spaces. In Advances in Neural Information Processing Systems, 2020]中提出的异构表格数据的已建立模型。所提出的方法采用了一种不同的模型来编码特征之间的关系，生成了与Dataset2Vec等现有方法不同的替代表示。它们都利用了数据集相似性学习的基本假设。在这项工作中，我们评估了Dataset2Vec和liltab在两个常见的元任务上 -- 表示整个数据集和超参数优化的热启动。然而，在一个独立的metaMIMIC数据集上的验证突显了表示学习中微妙的挑战。我们展示了通用表示可能不足以满足一些元任务，其中在提取过程中未明确考虑要求。

更新时间: 2024-05-26 11:38:24

领域: cs.LG

下载: http://arxiv.org/abs/2403.04720v3

LoQT: Low Rank Adapters for Quantized Training

Training of large neural networks requires significant computational resources. Despite advances using low-rank adapters and quantization, pretraining of models such as LLMs on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose LoQT, a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning of models, which we demonstrate experimentally for language modeling and downstream task adaptation. We find that LoQT enables efficient training of models up to 7B parameters on a consumer-grade 24GB GPU. We also demonstrate the feasibility of training a 13B parameter model using per-layer gradient updates on the same hardware.

Updated: 2024-05-26 11:29:57

标题: LoQT: 量化训练的低秩适配器

摘要: 大型神经网络的训练需要大量的计算资源。尽管使用低秩适配器和量化等技术取得了进展，但在消费者硬件上对LLMs等模型进行预训练仍然不可能，除非采用模型分片、训练期间的卸载或者每层梯度更新。为了解决这些限制，我们提出了LoQT，一种用于高效训练量化模型的方法。LoQT使用基于梯度的张量分解来初始化低秩可训练的权重矩阵，这些矩阵定期合并为量化的全秩权重矩阵。我们的方法适用于模型的预训练和微调，在语言建模和下游任务适应方面进行了实验验证。我们发现LoQT使得在消费者级别的24GB GPU上高效训练多达70亿参数的模型成为可能。我们还展示了在同一硬件上使用每层梯度更新训练130亿参数模型的可行性。

更新时间: 2024-05-26 11:29:57

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.16528v1

Multi-State TD Target for Model-Free Reinforcement Learning

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.

Updated: 2024-05-26 11:17:49

标题: 无模型强化学习的多状态TD目标

摘要: 时间差分（TD）学习是强化学习中的一种基本技术，通过使用TD目标来更新状态或状态-动作对的值估计。这个目标代表通过结合即时奖励和后续状态的估计值，对真实值的改进估计。传统上，TD学习依赖于单个后续状态的值。我们提出了一个增强的多状态TD（MSTD）目标，利用多个后续状态的估计值。基于这个新的MSTD概念，我们开发了完整的演员-评论家算法，包括在两种模式下管理重放缓冲区，并与深度确定性策略优化（DDPG）和软演员-评论家（SAC）集成。实验结果表明，采用MSTD目标的算法与传统方法相比显著提高了学习性能。

更新时间: 2024-05-26 11:17:49

领域: cs.LG,cs.AI,68T05(Primary)

下载: http://arxiv.org/abs/2405.16522v1

Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust

Rust is one of the most promising systems programming languages to fundamentally solve the memory safety issues that have plagued low-level software for over forty years. However, to accommodate the scenarios where Rust's type rules might be too restrictive for certain systems programming and where programmers opt for performance over security checks, Rust opens security escape hatches allowing writing unsafe source code or calling unsafe libraries. Consequently, unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust. This problem can be mitigated by isolating unsafe memory objects (those accessed by unsafe code) and sandboxing memory accesses to the unsafe memory. One category of prior work utilizes existing program analysis frameworks on LLVM IR to identify unsafe memory objects and accesses. However, they suffer the limitations of prolonged analysis time and low precision. In this paper, we tackled these two challenges using summary-based whole-program analysis on Rust's MIR. The summary-based analysis computes information on demand so as to save analysis time. Performing analysis on Rust's MIR exploits the rich high-level type information inherent to Rust, which is unavailable in LLVM IR. This manuscript is a preliminary study of ongoing research. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects. We reported the overhead and the efficacy of the analysis in this paper.

Updated: 2024-05-26 11:15:28

标题: 快速基于摘要的全程序分析，用于识别Rust中的不安全内存访问

摘要: Rust是最有前途的系统编程语言之一，可以从根本上解决困扰低级软件长达四十多年的内存安全问题。然而，为了适应某些系统编程情景，Rust的类型规则可能对某些情况过于限制，并且程序员选择性能而非安全检查时，Rust开放了安全逃逸口，允许编写不安全的源代码或调用不安全的库。因此，不安全的Rust代码和直接链接的不安全外部库可能不仅会引入内存安全违规，还可能危及整个程序，因为它们在与安全Rust相同的单片地址空间中运行。这个问题可以通过隔离不安全内存对象（被不安全代码访问的对象）并对不安全内存访问进行隔离来缓解。先前的研究利用现有的LLVM IR程序分析框架来识别不安全内存对象和访问，但它们受到分析时间延长和低精度的限制。本文通过基于Rust的MIR的基于摘要的全程序分析来解决这两个挑战。基于摘要的分析根据需计算信息，以节省分析时间。在Rust的MIR上执行分析利用了Rust中固有的丰富高级类型信息，这种信息在LLVM IR中不可用。本文是正在进行中的研究的初步研究。我们为识别不安全堆分配和对这些不安全堆对象的内存访问开发了一个全程序分析的原型。我们在本文中报告了分析的开销和有效性。

更新时间: 2024-05-26 11:15:28

领域: cs.CR

下载: http://arxiv.org/abs/2310.10298v3

Injective Sliced-Wasserstein embedding for weighted sets and point clouds

We present the $\textit{Sliced Wasserstein Embedding}$ $\unicode{x2014}$ a novel method to embed multisets and distributions over $\mathbb{R}^d$ into Euclidean space. Our embedding is injective and approximately preserves the Sliced Wasserstein distance. Moreover, when restricted to multisets, it is bi-Lipschitz. We also prove that it is $\textit{impossible}$ to embed distributions over $\mathbb{R}^d$ into a Euclidean space in a bi-Lipschitz manner, even under the assumption that their support is bounded and finite. We demonstrate empirically that our embedding offers practical advantage in learning tasks over existing methods for handling multisets.

Updated: 2024-05-26 11:04:41

标题: 注射性切片-沃瓦斯坦嵌入在加权集和点云中的应用

摘要: 我们提出了$\textit{切片Wasserstein嵌入}$——一种将多集和$\mathbb{R}^d$上的分布嵌入到欧几里得空间的新方法。我们的嵌入是单射的，并且大致保持切片Wasserstein距离。此外，当限制为多集时，它是双Lipschitz的。我们还证明了，即使在它们的支持有界且有限的假设下，将$\mathbb{R}^d$上的分布嵌入到欧几里得空间中也是$\textit{不可能}$以双Lipschitz的方式进行。我们在实证上证明，我们的嵌入在处理多集的学习任务中相比现有方法具有实际优势。

更新时间: 2024-05-26 11:04:41

领域: cs.LG

下载: http://arxiv.org/abs/2405.16519v1

Distributed agency in second language learning and teaching through generative AI

Generative AI offers significant opportunities for language learning. Tools like ChatGPT can provide informal second language practice through chats in written or voice forms, with the learner specifying through prompts conversational parameters such as proficiency level, language register, and discussion topics. AI can be instructed to give corrective feedback, create practice exercises, or develop an extended study plan. Instructors can use AI to build learning and assessment materials in a variety of media. AI is likely to make immersive technologies more powerful and versatile, moving away from scripted interactions. For both learners and teachers, it is important to understand the limitations of AI systems that arise from their purely statistical model of human language, which limits their ability to deal with nuanced social and cultural aspects of language use. Additionally, there are ethical concerns over how AI systems are created as well as practical constraints in their use, especially for less privileged populations. The power and versatility of AI tools are likely to turn them into valuable and constant companions in many peoples lives (akin to smartphones), creating a close connection that goes beyond simple tool use. Ecological theories such as sociomaterialism are helpful in examining the shared agency that develops through close user-AI interactions, as are the perspectives on human-object relations from Indigenous cultures.

Updated: 2024-05-26 11:00:47

标题: 通过生成式人工智能在第二语言学习和教学中的分布式代理

摘要: 生成式人工智能为语言学习提供了重要机会。像ChatGPT这样的工具可以通过书面或语音形式的对话，为学习者提供非正式的第二语言练习，学习者可通过提示设定对话参数，如熟练程度、语言风格和讨论主题等。人工智能可以被指示提供纠正性反馈、创建练习题或制定延伸学习计划。教师可以利用人工智能在各种媒体上构建学习和评估材料。人工智能可能会使沉浸式技术更加强大和多样化，摆脱了脚本化的互动方式。对于学习者和教师来说，了解人工智能系统的局限性是重要的，这些局限性源于它们对人类语言的纯统计模型，限制了它们处理语言使用中微妙的社会和文化方面的能力。此外，对于人工智能系统的创建有伦理方面的担忧，以及在使用中的实际约束，尤其是对于较不幸运的群体而言。人工智能工具的强大和多样性可能会使它们成为许多人生活中有价值且不断伴随的伙伴（类似于智能手机），形成一种超越简单工具使用的密切联系。生态理论，如社会物质主义，对于研究通过密切用户-人工智能互动而形成的共享代理是有帮助的，同样有助于研究来自原住民文化的人-物关系观点。

更新时间: 2024-05-26 11:00:47

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.20216v3

FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information

This paper establishes a mathematical foundation for the Adam optimizer, elucidating its connection to natural gradient descent through Riemannian and information geometry. We rigorously analyze the diagonal empirical Fisher information matrix (FIM) in Adam, clarifying all detailed approximations and advocating for the use of log probability functions as loss, which should be based on discrete distributions, due to the limitations of empirical FIM. Our analysis uncovers flaws in the original Adam algorithm, leading to proposed corrections such as enhanced momentum calculations, adjusted bias corrections, adaptive epsilon, and gradient clipping. We refine the weight decay term based on our theoretical framework. Our modified algorithm, Fisher Adam (FAdam), demonstrates superior performance across diverse domains including LLM, ASR, and VQ-VAE, achieving state-of-the-art results in ASR.

Updated: 2024-05-26 10:59:04

标题: FAdam: Adam是一种利用对角经验Fisher信息的自然梯度优化器

摘要: 这篇论文为Adam优化器建立了数学基础，阐明了它与自然梯度下降之间通过黎曼和信息几何的连接。我们对Adam中的对角经验费舍尔信息矩阵（FIM）进行了严格分析，澄清了所有详细的近似，并主张使用对数概率函数作为损失函数，这应该基于离散分布，由于经验FIM的限制。我们的分析揭示了原始Adam算法中的缺陷，导致提出了一些修正，如增强的动量计算、调整的偏差校正、自适应的epsilon和梯度裁剪。我们根据我们的理论框架细化了权重衰减项。我们修改后的算法，费舍尔Adam（FAdam），在包括LLM、ASR和VQ-VAE在内的各个领域表现出优越性能，在ASR方面实现了最先进的结果。

更新时间: 2024-05-26 10:59:04

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.12807v3

Boosting long-term forecasting performance for continuous-time dynamic graph networks via data augmentation

This study focuses on long-term forecasting (LTF) on continuous-time dynamic graph networks (CTDGNs), which is important for real-world modeling. Existing CTDGNs are effective for modeling temporal graph data due to their ability to capture complex temporal dependencies but perform poorly on LTF due to the substantial requirement for historical data, which is not practical in most cases. To relieve this problem, a most intuitive way is data augmentation. In this study, we propose \textbf{\underline{U}ncertainty \underline{M}asked \underline{M}ix\underline{U}p (UmmU)}: a plug-and-play module that conducts uncertainty estimation to introduce uncertainty into the embedding of intermediate layer of CTDGNs, and perform masked mixup to further enhance the uncertainty of the embedding to make it generalize to more situations. UmmU can be easily inserted into arbitrary CTDGNs without increasing the number of parameters. We conduct comprehensive experiments on three real-world dynamic graph datasets, the results demonstrate that UmmU can effectively improve the long-term forecasting performance for CTDGNs.

Updated: 2024-05-26 10:47:29

标题: 通过数据增强提升连续时间动态图网络的长期预测性能

摘要: 这项研究集中在长期预测（LTF）在连续时间动态图网络（CTDGNs）上，这对于实际建模至关重要。现有的CTDGNs能够有效地对时间图数据进行建模，因为它们能够捕捉复杂的时间依赖关系，但是在LTF方面表现不佳，因为需要大量的历史数据，这在大多数情况下并不实际。为了缓解这个问题，最直观的方法是数据增强。在这项研究中，我们提出了\textbf{\underline{U}ncertainty \underline{M}asked \underline{M}ix\underline{U}p (UmmU)}：这是一个即插即用的模块，进行不确定性估计，将不确定性引入CTDGNs的中间层嵌入，并执行掩码混合以进一步增强嵌入的不确定性，使其泛化到更多情况。UmmU可以轻松地插入任意的CTDGNs中，而不增加参数数量。我们在三个实际动态图数据集上进行了全面的实验，结果表明UmmU可以有效地提高CTDGNs的长期预测性能。

更新时间: 2024-05-26 10:47:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2304.05749v2

SE3Set: Harnessing equivariant hypergraph neural networks for molecular representation learning

In this paper, we develop SE3Set, an SE(3) equivariant hypergraph neural network architecture tailored for advanced molecular representation learning. Hypergraphs are not merely an extension of traditional graphs; they are pivotal for modeling high-order relationships, a capability that conventional equivariant graph-based methods lack due to their inherent limitations in representing intricate many-body interactions. To achieve this, we first construct hypergraphs via proposing a new fragmentation method that considers both chemical and three-dimensional spatial information of molecular system. We then design SE3Set, which incorporates equivariance into the hypergragh neural network. This ensures that the learned molecular representations are invariant to spatial transformations, thereby providing robustness essential for accurate prediction of molecular properties. SE3Set has shown performance on par with state-of-the-art (SOTA) models for small molecule datasets like QM9 and MD17. It excels on the MD22 dataset, achieving a notable improvement of approximately 20% in accuracy across all molecules, which highlights the prevalence of complex many-body interactions in larger molecules. This exceptional performance of SE3Set across diverse molecular structures underscores its transformative potential in computational chemistry, offering a route to more accurate and physically nuanced modeling.

Updated: 2024-05-26 10:43:16

标题: SE3Set：利用等变超图神经网络进行分子表示学习

摘要: 在这篇论文中，我们开发了SE3Set，这是一种针对高级分子表示学习定制的SE(3)等变超图神经网络架构。超图不仅仅是传统图的扩展；它们对于建模高阶关系至关重要，而传统等变基于图的方法由于在表示复杂多体相互作用方面固有的局限性而缺乏这种能力。为了实现这一点，我们首先通过提出一种新的分割方法来构建超图，该方法考虑了分子系统的化学和三维空间信息。然后，我们设计了SE3Set，它将等变性融入到超图神经网络中。这确保了学习到的分子表示对空间变换是不变的，从而为准确预测分子性质提供了必要的稳健性。SE3Set在小分子数据集（如QM9和MD17）方面表现出与最先进模型（SOTA）相当的性能。它在MD22数据集上表现出色，对所有分子的准确性均有约20%的显著提高，这突显了较大分子中复杂多体相互作用的普遍性。SE3Set在各种分子结构上的卓越性能凸显了其在计算化学中的变革潜力，为更准确和物理细致的建模提供了一条途径。

更新时间: 2024-05-26 10:43:16

领域: cs.LG,cs.AI,physics.comp-ph

下载: http://arxiv.org/abs/2405.16511v1

Meta-Task Planning for Language Agents

The rapid advancement of neural language models has sparked a new surge of intelligent agent research. Unlike traditional agents, large language model-based agents (LLM agents) have emerged as a promising paradigm for achieving artificial general intelligence (AGI) due to their superior reasoning and generalization capabilities. Effective planning is crucial for the success of LLM agents in real-world tasks, making it a highly pursued topic in the community. Current planning methods typically translate tasks into executable action sequences. However, determining a feasible or optimal sequence for complex tasks at fine granularity, which often requires compositing long chains of heterogeneous actions, remains challenging. This paper introduces Meta-Task Planning (MTP), a zero-shot methodology for collaborative LLM-based multi-agent systems that simplifies complex task planning by decomposing it into a hierarchy of subordinate tasks, or meta-tasks. Each meta-task is then mapped into executable actions. MTP was assessed on two rigorous benchmarks, TravelPlanner and API-Bank. Notably, MTP achieved an average $\sim40\%$ success rate on TravelPlanner, significantly higher than the state-of-the-art (SOTA) baseline ($2.92\%$), and outperforming $LLM_{api}$-4 with ReAct on API-Bank by $\sim14\%$, showing the immense potential of integrating LLM with multi-agent systems.

Updated: 2024-05-26 10:33:17

标题: 语言代理的元任务规划

摘要: 神经语言模型的快速发展引发了智能代理研究的新浪潮。与传统代理不同，基于大型语言模型的代理（LLM代理）由于其出色的推理和泛化能力，已成为实现人工通用智能（AGI）的一种有前途的范式。有效的规划对LLM代理在现实任务中取得成功至关重要，因此在社区中备受追捧。当前的规划方法通常将任务转化为可执行的动作序列。然而，在细粒度上确定复杂任务的可行或最佳序列，通常需要合成长链的异质动作，这仍然是一个具有挑战性的问题。本文介绍了元任务规划（MTP），这是一种零-shot方法，用于协作LLM基础的多代理系统，通过将复杂任务分解成一系列从属任务或元任务，简化了任务规划。然后将每个元任务映射到可执行动作。MTP在两个严格的基准测试中进行了评估，即TravelPlanner和API-Bank。值得注意的是，MTP在TravelPlanner上实现了平均约40%的成功率，远高于最先进（SOTA）基线（2.92%），并且在API-Bank上比LLM_api-4与ReAct的性能高出约14%，显示了将LLM与多代理系统集成的巨大潜力。

更新时间: 2024-05-26 10:33:17

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16510v1

LaVy: Vietnamese Multimodal Large Language Model

Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. Our project is public at https://github.com/baochi0212/LaVy

Updated: 2024-05-26 10:27:51

标题: LaVy：越南多模态大型语言模型

摘要: 大型语言模型（LLMs）和多模态大语言模型（MLLMs）以其在复杂推理和语言理解方面的卓越能力席卷了世界。同时，有大量与越南大型语言模型相关的作品，但多模态资源的缺乏限制了越南MLLMs的进展。在本文中，我们通过引入LaVy来解决这一问题，LaVy是一种最先进的越南MLLM，并且我们还引入了LaVy-Bench基准测试，用于评估MLLMs对越南视觉语言任务的理解能力。我们的项目可以在https://github.com/baochi0212/LaVy 上公开获取。

更新时间: 2024-05-26 10:27:51

领域: cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.07922v5

AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

Interpretable deep learning aims at developing neural architectures whose decision-making processes could be understood by their users. Among these techniqes, Concept Bottleneck Models enhance the interpretability of neural networks by integrating a layer of human-understandable concepts. These models, however, necessitate training a new model from the beginning, consuming significant resources and failing to utilize already trained large models. To address this issue, we introduce "AnyCBM", a method that transforms any existing trained model into a Concept Bottleneck Model with minimal impact on computational resources. We provide both theoretical and experimental insights showing the effectiveness of AnyCBMs in terms of classification performances and effectivenss of concept-based interventions on downstream tasks.

Updated: 2024-05-26 10:19:04

标题: AnyCBMs：如何将任何黑匣子转化为概念瓶颈模型

摘要: 可解释的深度学习旨在开发神经结构，使用户能够理解其决策过程。在这些技术中，概念瓶颈模型通过集成一层人类可理解的概念来增强神经网络的可解释性。然而，这些模型需要从头开始训练一个新模型，消耗大量资源，并且无法利用已经训练好的大型模型。为了解决这个问题，我们引入了"AnyCBM"方法，将任何现有训练好的模型转化为一个概念瓶颈模型，对计算资源的影响最小。我们提供了理论和实验上的见解，展示了AnyCBM在分类性能和基于概念干预对下游任务的有效性方面的有效性。

更新时间: 2024-05-26 10:19:04

领域: cs.LG

下载: http://arxiv.org/abs/2405.16508v1

Causal Concept Embedding Models: Beyond Causal Opacity in Deep Learning

Causal opacity denotes the difficulty in understanding the "hidden" causal structure underlying a deep neural network's (DNN) reasoning. This leads to the inability to rely on and verify state-of-the-art DNN-based systems especially in high-stakes scenarios. For this reason, causal opacity represents a key open challenge at the intersection of deep learning, interpretability, and causality. This work addresses this gap by introducing Causal Concept Embedding Models (Causal CEMs), a class of interpretable models whose decision-making process is causally transparent by design. The results of our experiments show that Causal CEMs can: (i) match the generalization performance of causally-opaque models, (ii) support the analysis of interventional and counterfactual scenarios, thereby improving the model's causal interpretability and supporting the effective verification of its reliability and fairness, and (iii) enable human-in-the-loop corrections to mispredicted intermediate reasoning steps, boosting not just downstream accuracy after corrections but also accuracy of the explanation provided for a specific instance.

Updated: 2024-05-26 10:15:20

标题: 因果概念嵌入模型：超越深度学习中的因果不透明性

摘要: 因果不透明性指的是理解深度神经网络（DNN）推理背后的“隐藏”因果结构的困难。这导致无法依赖和验证最先进的基于DNN的系统，特别是在高风险场景中。因此，因果不透明性代表了深度学习、可解释性和因果性交叉点上的一个关键开放挑战。本研究通过引入因果概念嵌入模型（Causal CEMs），一类设计上决策过程因果透明的可解释模型，来填补这一空白。我们实验的结果显示，因果CEMs可以：（i）与因果不透明模型的泛化性能相匹配，（ii）支持干预和反事实场景的分析，从而提高模型的因果可解释性，并支持对其可靠性和公平性的有效验证，以及（iii）使人类在循环中对错误的中间推理步骤进行更正，不仅提高了在更正后的下游准确性，还提高了为特定实例提供的解释的准确性。

更新时间: 2024-05-26 10:15:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16507v1

GRAG: Graph Retrieval-Augmented Generation

While Retrieval-Augmented Generation (RAG) enhances the accuracy and relevance of responses by generative language models, it falls short in graph-based contexts where both textual and topological information are important. Naive RAG approaches inherently neglect the structural intricacies of textual graphs, resulting in a critical gap in the generation process. To address this challenge, we introduce $\textbf{Graph Retrieval-Augmented Generation (GRAG)}$, which significantly enhances both the retrieval and generation processes by emphasizing the importance of subgraph structures. Unlike RAG approaches that focus solely on text-based entity retrieval, GRAG maintains an acute awareness of graph topology, which is crucial for generating contextually and factually coherent responses. Our GRAG approach consists of four main stages: indexing of $k$-hop ego-graphs, graph retrieval, soft pruning to mitigate the impact of irrelevant entities, and generation with pruned textual subgraphs. GRAG's core workflow-retrieving textual subgraphs followed by soft pruning-efficiently identifies relevant subgraph structures while avoiding the computational infeasibility typical of exhaustive subgraph searches, which are NP-hard. Moreover, we propose a novel prompting strategy that achieves lossless conversion from textual subgraphs to hierarchical text descriptions. Extensive experiments on graph multi-hop reasoning benchmarks demonstrate that in scenarios requiring multi-hop reasoning on textual graphs, our GRAG approach significantly outperforms current state-of-the-art RAG methods while effectively mitigating hallucinations.

Updated: 2024-05-26 10:11:40

标题: GRAG: 图检索增强生成

摘要: 在检索增强生成（RAG）通过生成语言模型增强响应的准确性和相关性，但在文本和拓扑信息都很重要的基于图的背景下，RAG存在不足。天真的RAG方法固有地忽视了文本图的结构复杂性，导致生成过程中存在关键差距。为了解决这一挑战，我们引入了$\textbf{图检索增强生成（GRAG）}$，通过强调子图结构的重要性显著增强了检索和生成过程。与仅关注基于文本实体检索的RAG方法不同，GRAG保持对图拓扑的敏锐意识，这对生成上下文和事实相关的响应至关重要。我们的GRAG方法包括四个主要阶段：$k$-hop ego-graphs的索引，图检索，软修剪以减轻不相关实体的影响，以及使用修剪后的文本子图生成。GRAG的核心工作流程-检索文本子图后进行软修剪-有效地识别相关子图结构，同时避免了耗费计算资源的详尽子图搜索的典型难题，这是NP难问题。此外，我们提出了一种新颖的提示策略，实现了从文本子图到分层文本描述的无损转换。在图多跳推理基准测试上进行的广泛实验表明，在需要对文本图进行多跳推理的场景中，我们的GRAG方法在显著优于当前最先进的RAG方法的同时有效减轻了幻觉。

更新时间: 2024-05-26 10:11:40

领域: cs.LG

下载: http://arxiv.org/abs/2405.16506v1

Convergence Conditions of Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we study the mean square asymptotic stability of a class of random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences dependent on the homogeneous ones. Secondly, we introduce the concept of random Tikhonov regularization path, and show that if the regularization path is slowly time-varying in some sense, then the output of the algorithm is consistent with the regularization path in mean square. Furthermore, if the data streams also satisfy the RKHS persistence of excitation condition, i.e. there exists a fixed length of time period, such that each eigenvalue of the conditional expectation of the operators induced by the input data accumulated over every time period has a uniformly positive lower bound with respect to time, then the output of the algorithm is consistent with the unknown function in mean square. Finally, for the case with independent and non-identically distributed data streams, the algorithm achieves the mean square consistency provided the marginal probability measures induced by the input data are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.

Updated: 2024-05-26 10:09:37

标题: 在线正则化统计学习在具有非平稳数据的再生核希尔伯特空间中的收敛条件

摘要: 我们研究了在重现核希尔伯特空间（RKHS）中具有依赖性和非平稳在线数据流的递归正则化学习算法的收敛性。首先，我们研究了RKHS中一类随机差分方程的均方渐近稳定性，其非齐次项是依赖于齐次项的鞅差分序列。其次，我们引入了随机Tikhonov正则化路径的概念，并且展示了如果正则化路径在某种意义上是缓慢变化的，那么算法的输出与正则化路径在均方上是一致的。此外，如果数据流还满足RKHS激励条件，即存在一个固定长度的时间段，使得在每个时间段累积的输入数据诱导的操作符的条件期望的每个特征值都对时间具有一致的正下界，则算法的输出与未知函数在均方上是一致的。最后，对于具有独立且非同分布数据流的情况，算法在边际概率度量由输入数据诱导的慢时间变化且每个固定长度时间段的平均度量具有一致严格正下界时，实现了均方一致性。

更新时间: 2024-05-26 10:09:37

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.03211v2

A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models

Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs, all featuring sub-quadratic complexity in sequence length and excellent scaling properties, enabling the construction of a new type of foundation models. In this paper, we present a unified view of these models, formulating such layers as implicit causal self-attention layers. The formulation includes most of their sub-components and is not limited to a specific part of the architecture. The framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods. Our experiments show that our attention matrices and attribution method outperform an alternative and a more limited formulation that was recently proposed for Mamba. For the other architectures for which our method is the first to provide such a view, our method is effective and competitive in the relevant metrics compared to the results obtained by state-of-the-art transformer explainability methods. Our code is publicly available.

Updated: 2024-05-26 09:57:45

标题: 一个统一的隐式关注机制公式用于门控线性循环序列模型

摘要: 近年来，在高效序列建模方面取得了重大进展，出现了无注意力层，如Mamba、RWKV和各种门控RNN，所有这些都具有次二次复杂度的特点，并具有优秀的扩展性能，使得可以构建一种新类型的基础模型。在本文中，我们提出了这些模型的统一视角，将这些层形式化为隐式因果自注意层。该公式包括它们的大部分子组件，并不局限于体系结构的特定部分。该框架在不同层面上比较了这些底层机制，并提供了一种直接的方法来应用可解释性方法。我们的实验表明，我们的注意力矩阵和归因方法胜过了最近为Mamba提出的一种替代和更有限的公式。对于我们的方法首次提供这种视角的其他架构，我们的方法在相关指标上是有效的和具有竞争力的，与现有最先进的变压器可解释性方法获得的结果相比。我们的代码是公开可用的。

更新时间: 2024-05-26 09:57:45

领域: cs.LG,F.2.2; I.2.7

下载: http://arxiv.org/abs/2405.16504v1

Integrating GNN and Neural ODEs for Estimating Two-Body Interactions in Mixed-Species Collective Motion

Analyzing the motion of multiple biological agents, be it cells or individual animals, is pivotal for the understanding of complex collective behaviors. With the advent of advanced microscopy, detailed images of complex tissue formations involving multiple cell types have become more accessible in recent years. However, deciphering the underlying rules that govern cell movements is far from trivial. Here, we present a novel deep learning framework to estimate the underlying equations of motion from observed trajectories, a pivotal step in decoding such complex dynamics. Our framework integrates graph neural networks with neural differential equations, enabling effective prediction of two-body interactions based on the states of the interacting entities. We demonstrate the efficacy of our approach through two numerical experiments. First, we used a simulated data from a toy model to tune the hyperparameters. Based on the obtained hyperparameters, we then applied this approach to a more complex model that describes interacting cells of cellular slime molds. Our results show that the proposed method can accurately estimate the function of two-body interactions, thereby precisely replicating both individual and collective behaviors within these systems.

Updated: 2024-05-26 09:47:17

标题: 将GNN和神经ODE集成用于估计混合物种集体运动中的双体相互作用

摘要: 分析多个生物代理的运动，无论是细胞还是个体动物，对于理解复杂的集体行为至关重要。随着先进显微镜技术的出现，近年来对涉及多种细胞类型的复杂组织形态的详细图像变得更加容易获得。然而，解读控制细胞运动的基本规律远非易事。在这里，我们提出了一种新颖的深度学习框架，用于从观察到的轨迹中估计基本的运动方程，这是解码这种复杂动态的关键一步。我们的框架将图神经网络与神经微分方程集成在一起，从而能够根据相互作用实体的状态有效地预测两体交互作用。我们通过两个数值实验展示了我们方法的有效性。首先，我们使用一个玩具模型的模拟数据来调整超参数。根据获得的超参数，我们将这种方法应用于描述细胞黏液模具相互作用的更复杂模型。我们的结果表明，所提出的方法可以准确估计两体交互作用的功能，从而精确复制这些系统中的个体和集体行为。

更新时间: 2024-05-26 09:47:17

领域: physics.bio-ph,cs.LG,J.2, J.3

下载: http://arxiv.org/abs/2405.16503v1

Decision-focused predictions via pessimistic bilevel optimization: a computational study

Dealing with uncertainty in optimization parameters is an important and longstanding challenge. Typically, uncertain parameters are predicted accurately, and then a deterministic optimization problem is solved. However, the decisions produced by this so-called \emph{predict-then-optimize} procedure can be highly sensitive to uncertain parameters. In this work, we contribute to recent efforts in producing \emph{decision-focused} predictions, i.e., to build predictive models that are constructed with the goal of minimizing a \emph{regret} measure on the decisions taken with them. We begin by formulating the exact expected regret minimization as a pessimistic bilevel optimization model. Then, we establish NP-completeness of this problem, even in a heavily restricted case. Using duality arguments, we reformulate it as a non-convex quadratic optimization problem. Finally, we show various computational techniques to achieve tractability. We report extensive computational results on shortest-path instances with uncertain cost vectors. Our results indicate that our approach can improve training performance over the approach of Elmachtoub and Grigas (2022), a state-of-the-art method for decision-focused learning.

Updated: 2024-05-26 09:36:07

标题: 决策聚焦的预测：悲观双层优化的计算研究

摘要: 处理优化参数中的不确定性是一个重要且长期存在的挑战。通常，不确定参数被准确预测，然后解决确定性优化问题。然而，通过这种所谓的“先预测再优化”程序产生的决策可能对不确定参数高度敏感。在这项工作中，我们致力于最近努力产生“以决策为中心”的预测，即构建以最小化与其一起采取的决策上的“遗憾”度量为目标的预测模型。我们首先将确切的期望遗憾最小化制定为一种悲观的双层优化模型。然后，我们证明了这个问题的NP完备性，即使在极其受限的情况下也是如此。通过对偶论证，我们将其重新制定为一个非凸二次优化问题。最后，我们展示了各种计算技术以实现可处理性。我们在具有不确定成本向量的最短路径实例上报告了广泛的计算结果。我们的结果表明，我们的方法可以提高训练性能，超过了Elmachtoub和Grigas（2022年）的以决策为重点学习的最新方法。

更新时间: 2024-05-26 09:36:07

领域: cs.LG,math.OC,90C30

下载: http://arxiv.org/abs/2312.17640v2

On Sequential Loss Approximation for Continual Learning

We introduce for continual learning Autodiff Quadratic Consolidation (AQC), which approximates the previous loss function with a quadratic function, and Neural Consolidation (NC), which approximates the previous loss function with a neural network. Although they are not scalable to large neural networks, they can be used with a fixed pre-trained feature extractor. We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results, unless combined with replay. We find that for small datasets, quadratic approximation of the previous loss function leads to poor results, even with full Hessian computation, and NC could significantly improve the predictive performance, while for large datasets, when used with a fixed pre-trained feature extractor, AQC provides superior predictive performance. We also find that using tanh-output features can improve the predictive performance of AQC. In particular, in class-incremental Split MNIST, when a Convolutional Neural Network (CNN) with tanh-output features is pre-trained on EMNIST Letters and used as a fixed pre-trained feature extractor, AQC can achieve predictive performance comparable to joint training.

Updated: 2024-05-26 09:20:47

标题: 关于连续学习中的顺序损失逼近

摘要: 我们介绍了用于连续学习的自动微分二次合并（AQC）和神经合并（NC）方法，其中AQC用二次函数逼近前一个损失函数，NC用神经网络逼近前一个损失函数。虽然它们在大型神经网络上不可扩展，但可以与固定的预训练特征提取器一起使用。我们在类增量学习中经验性地研究了这些方法，对于这种情况，基于正则化的方法产生不令人满意的结果，除非与重放结合使用。我们发现对于小数据集，对前一个损失函数进行二次逼近会导致糟糕的结果，即使进行完整的Hessian计算，而NC可以显著改善预测性能；而对于大数据集，当与固定的预训练特征提取器一起使用时，AQC提供了优越的预测性能。我们还发现使用tanh输出特征可以提高AQC的预测性能。特别是，在类增量的Split MNIST中，当使用具有tanh输出特征的卷积神经网络(CNN)在EMNIST Letters上进行预训练并用作固定的预训练特征提取器时，AQC可以实现与联合训练相媲美的预测性能。

更新时间: 2024-05-26 09:20:47

领域: cs.LG

下载: http://arxiv.org/abs/2405.16498v1

Exploring a Multimodal Fusion-based Deep Learning Network for Detecting Facial Palsy

Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 21 facial palsy patients. Our experimental results show that among various data modalities (i.e. unstructured data - RGB images and images of facial line segments and structured data - coordinates of facial landmarks and features of facial expressions), the feed-forward neural network using features of facial expression achieved the highest precision of 76.22 while the ResNet-based model using images of facial line segments achieved the highest recall of 83.47. When we leveraged both images of facial line segments and features of facial expressions, our multimodal fusion-based deep learning model slightly improved the precision score to 77.05 at the expense of a decrease in the recall score.

Updated: 2024-05-26 09:16:34

标题: 探索基于多模态融合深度学习网络的面瘫检测

摘要: 算法检测面瘫有潜力改善当前的做法，通常涉及临床医生的劳动密集和主观评估。在本文中，我们提出了一种基于多模态融合的深度学习模型，利用非结构化数据（即带有面部线条的图像帧）和结构化数据（即面部表情特征）来检测面瘫。然后，我们贡献了一项研究，分析不同数据模态的影响以及使用21名面瘫患者的视频的多模态融合方法的好处。我们的实验结果显示，在各种数据模态（即非结构化数据 - RGB图像和面部线条图像以及结构化数据 - 面部标志物坐标和面部表情特征）中，使用面部表情特征的前馈神经网络实现了最高精度为76.22，而使用面部线条图像的ResNet模型实现了最高召回率为83.47。当我们利用面部线条图像和面部表情特征两者时，我们的多模态融合深度学习模型略微提高了精度评分至77.05，但降低了召回率评分。

更新时间: 2024-05-26 09:16:34

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16496v1

Causal-Aware Graph Neural Architecture Search under Distribution Shifts

Graph NAS has emerged as a promising approach for autonomously designing GNN architectures by leveraging the correlations between graphs and architectures. Existing methods fail to generalize under distribution shifts that are ubiquitous in real-world graph scenarios, mainly because the graph-architecture correlations they exploit might be spurious and varying across distributions. We propose to handle the distribution shifts in the graph architecture search process by discovering and exploiting the causal relationship between graphs and architectures to search for the optimal architectures that can generalize under distribution shifts. The problem remains unexplored with following challenges: how to discover the causal graph-architecture relationship that has stable predictive abilities across distributions, and how to handle distribution shifts with the discovered causal graph-architecture relationship to search the generalized graph architectures. To address these challenges, we propose Causal-aware Graph Neural Architecture Search (CARNAS), which is able to capture the causal graph-architecture relationship during the architecture search process and discover the generalized graph architecture under distribution shifts. Specifically, we propose Disentangled Causal Subgraph Identification to capture the causal subgraphs that have stable prediction abilities across distributions. Then, we propose Graph Embedding Intervention to intervene on causal subgraphs within the latent space, ensuring that these subgraphs encapsulate essential features for prediction while excluding non-causal elements. Additionally, we propose Invariant Architecture Customization to reinforce the causal invariant nature of the causal subgraphs, which are utilized to tailor generalized graph architectures. Extensive experiments demonstrate that CARNAS achieves advanced out-of-distribution generalization ability.

Updated: 2024-05-26 08:55:22

标题: 因果感知图神经架构搜索在分布转移下的应用

摘要: 图神经架构搜索（Graph NAS）已经成为一种有前途的方法，通过利用图和架构之间的相关性自主设计图神经网络（GNN）架构。现有方法在现实世界的图场景中普遍存在的分布转变下很难推广，主要是因为它们利用的图-架构相关性可能是虚假的，并且在不同分布之间变化。我们提出通过发现和利用图和架构之间的因果关系来处理图架构搜索过程中的分布转变，以搜索能够在分布转变下推广的最佳架构。这个问题在以下挑战中仍未被探索：如何发现在不同分布之间具有稳定预测能力的因果图-架构关系，以及如何利用发现的因果图-架构关系处理分布转变以搜索广义图架构。为了解决这些挑战，我们提出了Causal-aware Graph Neural Architecture Search（CARNAS），它能够在架构搜索过程中捕捉因果图-架构关系，并在分布转变下发现广义图架构。具体来说，我们提出了Disentangled Causal Subgraph Identification来捕捉在不同分布之间具有稳定预测能力的因果子图。然后，我们提出了Graph Embedding Intervention，在潜在空间内对因果子图进行干预，确保这些子图包含预测所需的关键特征，同时排除非因果元素。此外，我们提出了Invariant Architecture Customization来强化因果子图的因果不变性，这些子图被用来定制广义图架构。大量实验证明，CARNAS实现了先进的超出分布外泛化能力。

更新时间: 2024-05-26 08:55:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16489v1

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system's adeptness at processing both shape and texture according to the famous Trichromatic Theory, we explore the integration of a Mixture-of-Activation-Sparsity-Experts (MoASE) as an adapter for the CTTA task. Given the distinct reaction of neurons with low/high activation to domain-specific/agnostic features, MoASE decomposes the neural activation into high-activation and low-activation components with a non-differentiable Spatial Differentiate Dropout (SDD). Based on the decomposition, we devise a multi-gate structure comprising a Domain-Aware Gate (DAG) that utilizes domain information to adaptive combine experts that process the post-SDD sparse activations of different strengths, and the Activation Sparsity Gate (ASG) that adaptively assigned feature selection threshold of the SDD for different experts for more precise feature decomposition. Finally, we introduce a Homeostatic-Proximal (HP) loss to bypass the error accumulation problem when continuously adapting the model. Extensive experiments on four prominent benchmarks substantiate that our methodology achieves state-of-the-art performance in both classification and segmentation CTTA tasks. Our code is now available at https://github.com/RoyZry98/MoASE-Pytorch.

Updated: 2024-05-26 08:51:39

标题: 分解神经元：通过专家混合实现连续的测试时间适应活跃度稀疏化

摘要: 持续测试时间适应（CTTA）旨在使预训练模型适应不断发展的目标领域，成为视觉模型的重要任务。由于当前的视觉模型似乎严重偏向纹理，持续地将模型从一个领域分布适应到另一个领域可能导致严重的灾难性遗忘。受人类视觉系统在根据著名的三色理论处理形状和纹理方面的熟练程度的启发，我们探索了将一种混合激活稀疏专家（MoASE）作为CTTA任务的适配器的整合。鉴于神经元对低/高激活对特定领域/不可知特征的明显反应，MoASE将神经激活分解为具有不可微的空间不同的丢失（SDD）的高激活和低激活组件。基于这种分解，我们设计了一个多门结构，包括利用领域信息来自适应地组合处理不同强度的SDD后稀疏激活的专家的领域感知门（DAG）和自适应地为不同专家分配SDD的特征选择阈值的激活稀疏门（ASG）以更精确地进行特征分解。最后，我们引入了一个稳态近端（HP）损失，以避免在持续调整模型时积累错误的问题。对四个著名基准的广泛实验证实，我们的方法在分类和分割CTTA任务中实现了最先进的性能。我们的代码现在可以在https://github.com/RoyZry98/MoASE-Pytorch。

更新时间: 2024-05-26 08:51:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16486v1

INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer

Recently, deep reinforcement learning has shown promising results for learning fast heuristics to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an unseen distribution or distributions with different scales. To address this issue, we propose a novel architecture, called Invariant Nested View Transformer (INViT), which is designed to enforce a nested design together with invariant views inside the encoders to promote the generalizability of the learned solver. It applies a modified policy gradient algorithm enhanced with data augmentations. We demonstrate that the proposed INViT achieves a dominant generalization performance on both TSP and CVRP problems with various distributions and different problem scales.

Updated: 2024-05-26 08:27:25

标题: INViT：具有不变嵌套视图转换器的通用路由问题求解器

摘要: 最近，深度强化学习已经展现出在学习快速启发式方法解决路径问题方面的有希望的结果。同时，大多数解决方案在泛化到未知分布或具有不同尺度的分布时表现不佳。为了解决这个问题，我们提出了一种新颖的架构，称为不变嵌套视图变换器（INViT），旨在强制在编码器内部实现嵌套设计和不变视图，以促进学习求解器的泛化能力。它应用了增强数据增强的修改策略梯度算法。我们证明，所提出的INViT在TSP和CVRP问题上以及各种分布和不同问题规模上实现了卓越的泛化性能。

更新时间: 2024-05-26 08:27:25

领域: cs.LG

下载: http://arxiv.org/abs/2402.02317v3

Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion

This paper considers the problem of sampling from non-logconcave distribution, based on queries of its unnormalized density. It first describes a framework, Diffusion Monte Carlo (DMC), based on the simulation of a denoising diffusion process with its score function approximated by a generic Monte Carlo estimator. DMC is an oracle-based meta-algorithm, where its oracle is the assumed access to samples that generate a Monte Carlo score estimator. Then we provide an implementation of this oracle, based on rejection sampling, and this turns DMC into a true algorithm, termed Zeroth-Order Diffusion Monte Carlo (ZOD-MC). We provide convergence analyses by first constructing a general framework, i.e. a performance guarantee for DMC, without assuming the target distribution to be log-concave or satisfying any isoperimetric inequality. Then we prove that ZOD-MC admits an inverse polynomial dependence on the desired sampling accuracy, albeit still suffering from the curse of dimensionality. Consequently, for low dimensional distributions, ZOD-MC is a very efficient sampler, with performance exceeding latest samplers, including also-denoising-diffusion-based RDMC and RS-DMC. Last, we experimentally demonstrate the insensitivity of ZOD-MC to increasingly higher barriers between modes or discontinuity in non-convex potential.

Updated: 2024-05-26 08:14:30

标题: 非对数凹分布的零阶采样方法：通过去噪扩散缓解亚稳态

摘要: 本文考虑了从非对数凹分布中抽样的问题，基于对其非标准化密度的查询。首先描述了一个基于模拟去噪扩散过程的框架，Diffusion Monte Carlo（DMC），其中其得分函数由通用蒙特卡罗估计器近似。DMC是一个基于Oracle的元算法，其Oracle假设可以访问生成蒙特卡罗得分估计器的样本。然后我们提供了一个基于拒绝抽样的实现这个Oracle的方法，这使得DMC成为一个真正的算法，被称为Zeroth-Order Diffusion Monte Carlo（ZOD-MC）。我们通过首先构建一个通用框架，即对DMC的性能保证，而不假定目标分布是对数凹的或满足任何等周不等式，来提供收敛分析。然后我们证明ZOD-MC对所需抽样精度具有倒多项式依赖性，尽管仍然受到维度诅咒的影响。因此，对于低维分布，ZOD-MC是一个非常有效的采样器，性能超过最新的采样器，包括基于去噪扩散的RDMC和RS-DMC。最后，我们通过实验证明ZOD-MC对于模式之间障碍逐渐增加或非凸势能中的不连续性是不敏感的。

更新时间: 2024-05-26 08:14:30

领域: stat.ML,cs.LG,math.PR,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2402.17886v3

Vision-Based Approach for Food Weight Estimation from 2D Images

In response to the increasing demand for efficient and non-invasive methods to estimate food weight, this paper presents a vision-based approach utilizing 2D images. The study employs a dataset of 2380 images comprising fourteen different food types in various portions, orientations, and containers. The proposed methodology integrates deep learning and computer vision techniques, specifically employing Faster R-CNN for food detection and MobileNetV3 for weight estimation. The detection model achieved a mean average precision (mAP) of 83.41\%, an average Intersection over Union (IoU) of 91.82\%, and a classification accuracy of 100\%. For weight estimation, the model demonstrated a root mean squared error (RMSE) of 6.3204, a mean absolute percentage error (MAPE) of 0.0640\%, and an R-squared value of 98.65\%. The study underscores the potential applications of this technology in healthcare for nutrition counseling, fitness and wellness for dietary intake assessment, and smart food storage solutions to reduce waste. The results indicate that the combination of Faster R-CNN and MobileNetV3 provides a robust framework for accurate food weight estimation from 2D images, showcasing the synergy of computer vision and deep learning in practical applications.

Updated: 2024-05-26 08:03:51

标题: 基于视觉的食物重量估计方法从二维图像

摘要: 为了应对对高效和无创方法估算食物重量日益增长的需求，本文提出了一种利用2D图像的基于视觉的方法。研究采用了包含2380张图像的数据集，涵盖了十四种不同食物类型，各种不同的份量、取向和容器。所提出的方法集成了深度学习和计算机视觉技术，具体采用了Faster R-CNN进行食物检测和MobileNetV3进行重量估计。检测模型实现了83.41\%的平均精度（mAP），91.82\%的平均交集联合（IoU）和100\%的分类准确率。对于重量估计，模型表现出6.3204的均方根误差（RMSE），0.0640\%的平均绝对百分比误差（MAPE）和98.65\%的R平方值。该研究强调了该技术在健康护理中用于营养咨询、健身和健康中用于饮食摄入评估，以及智能食品存储解决方案以减少浪费的潜在应用。结果表明，Faster R-CNN和MobileNetV3的结合提供了一个强大的框架，可以准确估算2D图像中的食物重量，展示了计算机视觉和深度学习在实际应用中的协同作用。

更新时间: 2024-05-26 08:03:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16478v1

KiNETGAN: Enabling Distributed Network Intrusion Detection through Knowledge-Infused Synthetic Data Generation

In the realm of IoT/CPS systems connected over mobile networks, traditional intrusion detection methods analyze network traffic across multiple devices using anomaly detection techniques to flag potential security threats. However, these methods face significant privacy challenges, particularly with deep packet inspection and network communication analysis. This type of monitoring is highly intrusive, as it involves examining the content of data packets, which can include personal and sensitive information. Such data scrutiny is often governed by stringent laws and regulations, especially in environments like smart homes where data privacy is paramount. Synthetic data offers a promising solution by mimicking real network behavior without revealing sensitive details. Generative models such as Generative Adversarial Networks (GANs) can produce synthetic data, but they often struggle to generate realistic data in specialized domains like network activity. This limitation stems from insufficient training data, which impedes the model's ability to grasp the domain's rules and constraints adequately. Moreover, the scarcity of training data exacerbates the problem of class imbalance in intrusion detection methods. To address these challenges, we propose a Privacy-Driven framework that utilizes a knowledge-infused Generative Adversarial Network for generating synthetic network activity data (KiNETGAN). This approach enhances the resilience of distributed intrusion detection while addressing privacy concerns. Our Knowledge Guided GAN produces realistic representations of network activity, validated through rigorous experimentation. We demonstrate that KiNETGAN maintains minimal accuracy loss in downstream tasks, effectively balancing data privacy and utility.

Updated: 2024-05-26 08:02:02

标题: KiNETGAN：通过知识注入的合成数据生成实现分布式网络入侵检测

摘要: 在通过移动网络连接的物联网/CPS系统领域中，传统的入侵检测方法使用异常检测技术分析跨多个设备的网络流量，以标记潜在的安全威胁。然而，这些方法面临着重大的隐私挑战，特别是在深度数据包检查和网络通信分析方面。这种监测方式是高度侵入性的，因为它涉及检查数据包的内容，其中可能包含个人和敏感信息。这种数据审查通常受严格的法律和法规约束，特别是在数据隐私至关重要的智能家居等环境中。合成数据通过模拟真实网络行为而不透露敏感细节，提供了一种有前途的解决方案。生成模型如生成对抗网络（GANs）可以生成合成数据，但在网络活动等专业领域生成逼真数据时往往面临困难。这种限制源于训练数据不足，这阻碍了模型充分掌握领域规则和约束的能力。此外，训练数据的稀缺性加剧了入侵检测方法中类别不平衡的问题。为了解决这些挑战，我们提出了一个利用知识注入生成对抗网络生成合成网络活动数据（KiNETGAN）的隐私驱动框架。这种方法提高了分布式入侵检测的弹性，同时解决了隐私问题。我们的知识引导GAN生产了网络活动的逼真表示，在严格的实验验证中得到了证实。我们展示了KiNETGAN在下游任务中保持了最低的准确性损失，有效平衡了数据隐私和效用。

更新时间: 2024-05-26 08:02:02

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.16476v1

Looks Too Good To Be True: An Information-Theoretic Analysis of Hallucinations in Generative Restoration Models

The pursuit of high perceptual quality in image restoration has driven the development of revolutionary generative models, capable of producing results often visually indistinguishable from real data. However, as their perceptual quality continues to improve, these models also exhibit a growing tendency to generate hallucinations - realistic-looking details that do not exist in the ground truth images. The presence of hallucinations introduces uncertainty regarding the reliability of the models' predictions, raising major concerns about their practical application. In this paper, we employ information-theory tools to investigate this phenomenon, revealing a fundamental tradeoff between uncertainty and perception. We rigorously analyze the relationship between these two factors, proving that the global minimal uncertainty in generative models grows in tandem with perception. In particular, we define the inherent uncertainty of the restoration problem and show that attaining perfect perceptual quality entails at least twice this uncertainty. Additionally, we establish a relation between mean squared-error distortion, uncertainty and perception, through which we prove the aforementioned uncertainly-perception tradeoff induces the well-known perception-distortion tradeoff. This work uncovers fundamental limitations of generative models in achieving both high perceptual quality and reliable predictions for image restoration. We demonstrate our theoretical findings through an analysis of single image super-resolution algorithms. Our work aims to raise awareness among practitioners about this inherent tradeoff, empowering them to make informed decisions and potentially prioritize safety over perceptual performance.

Updated: 2024-05-26 07:58:51

标题: 看起来太美好以至于难以置信：生成性修复模型中幻觉的信息论分析

摘要: 图像恢复中对高感知质量的追求推动了具有革命性生成模型的发展，这些模型能够产生通常在视觉上难以区分的真实数据。然而，随着它们的感知质量不断提高，这些模型也表现出越来越多生成幻觉的倾向 - 看起来逼真的细节在真实数据中并不存在。幻觉的存在引入了关于模型预测可靠性的不确定性，引发了对它们实际应用的重大关切。在本文中，我们采用信息论工具来研究这一现象，揭示了不确定性和感知之间的基本权衡。我们严格分析了这两个因素之间的关系，证明了生成模型中的全局最小不确定性与感知同时增长。特别地，我们定义了恢复问题的固有不确定性，并且展示了获得完美感知质量至少需要两倍此不确定性。此外，我们通过均方误差失真、不确定性和感知之间的关系建立了一个联系，通过这一联系我们证明了前述的不确定性-感知权衡引发了众所周知的感知-失真权衡。本研究揭示了生成模型在实现高感知质量和图像恢复可靠预测方面的基本限制。我们通过对单图像超分辨率算法的分析展示了我们的理论发现。我们的工作旨在提高从业者对这种固有权衡的认识，使他们能够做出明智决策并潜在地将安全放在感知性能之上。

更新时间: 2024-05-26 07:58:51

领域: cs.LG,cs.AI,cs.CV,eess.IV

下载: http://arxiv.org/abs/2405.16475v1

Inaccurate Label Distribution Learning with Dependency Noise

In this paper, we introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning, which arise from dependencies on instances and labels. We start by modeling the inaccurate label distribution matrix as a combination of the true label distribution and a noise matrix influenced by specific instances and labels. To address this, we develop a linear mapping from instances to their true label distributions, incorporating label correlations, and decompose the noise matrix using feature and label representations, applying group sparsity constraints to accurately capture the noise. Furthermore, we employ graph regularization to align the topological structures of the input and output spaces, ensuring accurate reconstruction of the true label distribution matrix. Utilizing the Alternating Direction Method of Multipliers (ADMM) for efficient optimization, we validate our method's capability to recover true labels accurately and establish a generalization error bound. Extensive experiments demonstrate that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods.

Updated: 2024-05-26 07:58:07

标题: 带有依赖噪声的不准确标签分布学习

摘要: 本文介绍了依赖噪声的不准确标签分布学习（DN-ILDL）框架，以应对标签分布学习中由实例和标签依赖性引起的噪声挑战。我们首先将不准确的标签分布矩阵建模为真实标签分布和受特定实例和标签影响的噪声矩阵的组合。为了解决这个问题，我们开发了一个从实例到它们真实标签分布的线性映射，结合标签相关性，并使用特征和标签表示分解噪声矩阵，应用组稀疏约束来准确捕捉噪声。此外，我们采用图正则化来对齐输入和输出空间的拓扑结构，确保准确重建真实标签分布矩阵。通过使用交替方向乘法器（ADMM）进行高效优化，我们验证了我们的方法恢复真实标签的能力，并建立了一个泛化误差界限。大量实验证明，DN-ILDL有效地解决了ILDL问题，并优于现有的LDL方法。

更新时间: 2024-05-26 07:58:07

领域: cs.LG

下载: http://arxiv.org/abs/2405.16474v1

M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought

Multi-modal Chain-of-Thought (MCoT) requires models to leverage knowledge from both textual and visual modalities for step-by-step reasoning, which gains increasing attention. Nevertheless, the current MCoT benchmark still faces some challenges: (1) absence of visual modal reasoning, (2) single-step visual modal reasoning, and (3) Domain missing, thereby hindering the development of MCoT. Motivated by this, we introduce a novel benchmark (M$^3$CoT) to address the above challenges, advancing the multi-domain, multi-step, and multi-modal CoT. Additionally, we conduct a thorough evaluation involving abundant MCoT approaches on Vision Large Language Models (VLLMs). In addition, we highlight that the current VLLMs still struggle to correctly reason in M$^3$CoT and there remains a large gap between existing VLLMs and human performance in M$^3$CoT, despite their superior results on previous MCoT benchmarks. To our knowledge, we take the first meaningful step toward the multi-domain, multi-step, and multi-modal scenario in MCoT. We hope that M$^3$CoT can serve as a valuable resource, providing a pioneering foundation in multi-domain, multi-step, multi-modal chain-of-thought research.

Updated: 2024-05-26 07:56:30

标题: M$^3$CoT：一个新颖的多领域多步骤多模态思维链的基准测试

摘要: Multi-modal Chain-of-Thought (MCoT)要求模型利用文本和视觉两种模态的知识进行逐步推理，这引起了越来越多的关注。然而，目前的MCoT基准仍然面临一些挑战：(1) 缺乏视觉模态推理，(2) 单步视觉模态推理，(3) 领域缺失，从而阻碍了MCoT的发展。受此启发，我们引入了一个新的基准（M$^3$CoT）来解决上述挑战，推进多领域、多步骤和多模态的CoT。此外，我们在视觉大语言模型（VLLMs）上进行了充分评估，涉及丰富的MCoT方法。此外，我们强调当前的VLLMs在M$^3$CoT中仍然难以正确推理，存在着现有VLLMs和人类在M$^3$CoT中表现之间的巨大差距，尽管它们在先前的MCoT基准上表现优异。据我们所知，我们在MCoT中首次朝着多领域、多步骤和多模态情景迈出了有意义的一步。我们希望M$^3$CoT可以作为一个有价值的资源，为多领域、多步骤、多模态的链式推理研究提供开创性的基础。

更新时间: 2024-05-26 07:56:30

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.16473v1

Multi-Level Additive Modeling for Structured Non-IID Federated Learning

The primary challenge in Federated Learning (FL) is to model non-IID distributions across clients, whose fine-grained structure is important to improve knowledge sharing. For example, some knowledge is globally shared across all clients, some is only transferable within a subgroup of clients, and some are client-specific. To capture and exploit this structure, we train models organized in a multi-level structure, called ``Multi-level Additive Models (MAM)'', for better knowledge-sharing across heterogeneous clients and their personalization. In federated MAM (FeMAM), each client is assigned to at most one model per level and its personalized prediction sums up the outputs of models assigned to it across all levels. For the top level, FeMAM trains one global model shared by all clients as FedAvg. For every mid-level, it learns multiple models each assigned to a subgroup of clients, as clustered FL. Every bottom-level model is trained for one client only. In the training objective, each model aims to minimize the residual of the additive predictions by the other models assigned to each client. To approximate the arbitrary structure of non-IID across clients, FeMAM introduces more flexibility and adaptivity to FL by incrementally adding new models to the prediction of each client and reassigning another if necessary, automatically optimizing the knowledge-sharing structure. Extensive experiments show that FeMAM surpasses existing clustered FL and personalized FL methods in various non-IID settings. Our code is available at https://github.com/shutong043/FeMAM.

Updated: 2024-05-26 07:54:53

标题: 多层次加性建模用于结构化非独立同分布联邦学习

摘要: 在联邦学习（FL）中的主要挑战是对跨客户的非IID分布进行建模，其细粒度结构对于改善知识共享至关重要。例如，一些知识在所有客户之间是全局共享的，一些只能在客户子群之间转移，还有一些是特定于客户的。为了捕捉和利用这种结构，我们训练了一个多级结构组织的模型，称为“多级加法模型（MAM）”，以实现跨异构客户和他们的个性化知识共享。在联邦MAM（FeMAM）中，每个客户被分配到至多一个模型的每个级别，并且其个性化预测是分配给它的所有级别的模型输出之和。对于顶级模型，FeMAM训练一个全局模型，由所有客户共享，类似于FedAvg。对于每个中级，它学习多个模型，每个模型分配给一个客户子群，类似于聚类FL。每个底层模型仅为一个客户训练。在训练目标中，每个模型旨在通过为每个客户分配的其他模型的加法预测的残差最小化。为了近似跨客户的任意结构非IID，FeMAM通过逐步向每个客户的预测添加新模型并重新分配另一个模型（如果有必要），自动优化知识共享结构。广泛的实验表明，FeMAM在各种非IID设置中超越了现有的聚类FL和个性化FL方法。我们的代码可以在https://github.com/shutong043/FeMAM 上找到。

更新时间: 2024-05-26 07:54:53

领域: cs.LG

下载: http://arxiv.org/abs/2405.16472v1

Probabilistic Contrastive Learning with Explicit Concentration on the Hypersphere

Self-supervised contrastive learning has predominantly adopted deterministic methods, which are not suited for environments characterized by uncertainty and noise. This paper introduces a new perspective on incorporating uncertainty into contrastive learning by embedding representations within a spherical space, inspired by the von Mises-Fisher distribution (vMF). We introduce an unnormalized form of vMF and leverage the concentration parameter, kappa, as a direct, interpretable measure to quantify uncertainty explicitly. This approach not only provides a probabilistic interpretation of the embedding space but also offers a method to calibrate model confidence against varying levels of data corruption and characteristics. Our empirical results demonstrate that the estimated concentration parameter correlates strongly with the degree of unforeseen data corruption encountered at test time, enables failure analysis, and enhances existing out-of-distribution detection methods.

Updated: 2024-05-26 07:08:13

标题: 在超球面上明确集中的概率对比学习

摘要: 自我监督对比学习主要采用确定性方法，这些方法不适用于充满不确定性和噪声的环境。本文介绍了一种新的视角，通过将表示嵌入到球形空间中，受到von Mises-Fisher分布（vMF）的启发，将不确定性融入到对比学习中。我们引入了vMF的非归一化形式，并利用集中参数kappa作为一个直接的、可解释的度量来明确量化不确定性。这种方法不仅提供了嵌入空间的概率解释，还提供了一种校准模型置信度的方法，以应对不同水平的数据损坏和特征。我们的实证结果表明，估计的集中参数与测试时遇到的未预料数据损坏程度强相关，可以进行故障分析，并增强现有的超出分布检测方法。

更新时间: 2024-05-26 07:08:13

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.16460v1

Predicting Rental Price of Lane Houses in Shanghai with Machine Learning Methods and Large Language Models

Housing has emerged as a crucial concern among young individuals residing in major cities, including Shanghai. Given the unprecedented surge in property prices in this metropolis, young people have increasingly resorted to the rental market to address their housing needs. This study utilizes five traditional machine learning methods: multiple linear regression (MLR), ridge regression (RR), lasso regression (LR), decision tree (DT), and random forest (RF), along with a Large Language Model (LLM) approach using ChatGPT, for predicting the rental prices of lane houses in Shanghai. It applies these methods to examine a public data sample of about 2,609 lane house rental transactions in 2021 in Shanghai, and then compares the results of these methods. In terms of predictive power, RF has achieved the best performance among the traditional methods. However, the LLM approach, particularly in the 10-shot scenario, shows promising results that surpass traditional methods in terms of R-Squared value. The three performance metrics: mean squared error (MSE), mean absolute error (MAE), and R-Squared, are used to evaluate the models. Our conclusion is that while traditional machine learning models offer robust techniques for rental price prediction, the integration of LLM such as ChatGPT holds significant potential for enhancing predictive accuracy.

Updated: 2024-05-26 07:01:33

标题: 使用机器学习方法和大型语言模型预测上海弄堂房屋的租金价格

摘要: 住房已成为年轻人的一个关键问题，特别是在包括上海在内的大城市。鉴于这座大都市房价的空前飙升，年轻人越来越倾向于租赁市场来满足他们的住房需求。本研究利用了五种传统机器学习方法：多元线性回归（MLR）、岭回归（RR）、套索回归（LR）、决策树（DT）和随机森林（RF），以及使用ChatGPT的大语言模型（LLM）方法，来预测上海弄堂房的租金。它将这些方法应用于2021年上海约2,609笔弄堂房租赁交易的公共数据样本，并比较这些方法的结果。就预测能力而言，随机森林在传统方法中表现最佳。然而，LLM方法，特别是在10-shot情景下，显示出超越传统方法的有希望的结果，以R-Squared值衡量。三个性能指标：均方误差（MSE）、平均绝对误差（MAE）和R-Squared，被用来评估模型。我们的结论是，虽然传统机器学习模型提供了预测租金的强大技术，但像ChatGPT这样的LLM的整合具有提高预测准确性的重要潜力。

更新时间: 2024-05-26 07:01:33

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.17505v1

Dominant Shuffle: A Simple Yet Powerful Data Augmentation for Time-series Prediction

Recent studies have suggested frequency-domain Data augmentation (DA) is effec tive for time series prediction. Existing frequency-domain augmentations disturb the original data with various full-spectrum noises, leading to excess domain gap between augmented and original data. Although impressive performance has been achieved in certain cases, frequency-domain DA has yet to be generalized to time series prediction datasets. In this paper, we found that frequency-domain augmentations can be significantly improved by two modifications that limit the perturbations. First, we found that limiting the perturbation to only dominant frequencies significantly outperforms full-spectrum perturbations. Dominant fre quencies represent the main periodicity and trends of the signal and are more important than other frequencies. Second, we found that simply shuffling the dominant frequency components is superior over sophisticated designed random perturbations. Shuffle rearranges the original components (magnitudes and phases) and limits the external noise. With these two modifications, we proposed dominant shuffle, a simple yet effective data augmentation for time series prediction. Our method is very simple yet powerful and can be implemented with just a few lines of code. Extensive experiments with eight datasets and six popular time series models demonstrate that our method consistently improves the baseline performance under various settings and significantly outperforms other DA methods. Code can be accessed at https://kaizhao.net/time-series.

Updated: 2024-05-26 07:00:12

标题: 主导洗牌：一种简单而强大的时间序列预测数据增强方法

摘要: 最近的研究表明，频域数据增强（DA）对于时间序列预测是有效的。现有的频域增强会用各种全频噪声扰乱原始数据，导致增强和原始数据之间存在过多的领域差距。尽管在某些情况下取得了令人印象深刻的表现，但频域DA尚未推广到时间序列预测数据集。在本文中，我们发现通过两种限制扰动的修改可以显著改善频域增强。首先，我们发现仅限制扰动到主频率明显优于全频扰动。主频率代表信号的主要周期性和趋势，比其他频率更重要。其次，我们发现简单地对主频率分量进行混洗优于复杂设计的随机扰动。混洗重新排列原始分量（幅度和相位）并限制外部噪声。通过这两种修改，我们提出了主要混洗，这是一种简单而有效的时间序列预测数据增强方法。我们的方法非常简单但强大，可以仅用几行代码实现。对八个数据集和六种流行的时间序列模型进行了大量实验，结果表明我们的方法在各种设置下始终提高了基线表现，并显著优于其他DA方法。代码可在https://kaizhao.net/time-series上访问。

更新时间: 2024-05-26 07:00:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16456v1

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

Accurately aligning large language models (LLMs) with human preferences is crucial for informing fair, economically sound, and statistically efficient decision-making processes. However, we argue that reinforcement learning from human feedback (RLHF) -- the predominant approach for aligning LLMs with human preferences through a reward model -- suffers from an inherent algorithmic bias due to its Kullback--Leibler-based regularization in optimization. In extreme cases, this bias could lead to a phenomenon we term preference collapse, where minority preferences are virtually disregarded. To mitigate this algorithmic bias, we introduce preference matching (PM) RLHF, a novel approach that provably aligns LLMs with the preference distribution of the reward model under the Bradley--Terry--Luce/Plackett--Luce model. Central to our approach is a PM regularizer that takes the form of the negative logarithm of the LLM's policy probability distribution over responses, which helps the LLM balance response diversification and reward maximization. Notably, we obtain this regularizer by solving an ordinary differential equation that is necessary for the PM property. For practical implementation, we introduce a conditional variant of PM RLHF that is tailored to natural language generation. Finally, we empirically validate the effectiveness of conditional PM RLHF through experiments on the OPT-1.3B and Llama-2-7B models, demonstrating a 29% to 41% improvement in alignment with human preferences, as measured by a certain metric, compared to standard RLHF.

Updated: 2024-05-26 07:00:05

标题: 关于将大型语言模型与RLHF对齐的算法偏见：偏好坍塌和匹配正则化

摘要: 准确地将大型语言模型（LLMs）与人类偏好对齐对于指导公平、经济上健全和统计效率高的决策过程至关重要。然而，我们认为，通过奖励模型进行人类反馈的强化学习（RLHF）-- 是将LLMs与人类偏好对齐的主要方法 -- 在优化中由于其基于Kullback--Leibler的正则化而存在固有的算法偏差。在极端情况下，这种偏差可能导致一种我们称之为偏好崩溃的现象，其中少数偏好几乎被忽视。为了减轻这种算法偏差，我们引入了偏好匹配（PM）RLHF，这是一种新颖的方法，可以证明将LLMs与奖励模型的偏好分布对齐，采用了布拉德利--特里--卢斯/普拉克特--卢斯模型。我们方法的核心是一个PM正则化器，其形式为LLM的策略概率分布对响应的负对数，这有助于LLM平衡响应的多样化和奖励的最大化。值得注意的是，我们通过解决一个对PM属性必要的常微分方程来获得这个正则化器。为了实际实施，我们引入了一种针对自然语言生成的条件变体的PM RLHF。最后，我们通过对OPT-1.3B和Llama-2-7B模型的实验，从某种度量标准来衡量，证明了条件PM RLHF的有效性，与标准RLHF相比，对人类偏好的一致性提高了29%至41%。

更新时间: 2024-05-26 07:00:05

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.16455v1

Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an $O(T^{-1/3})$ convergence rate for both the average optimality gap and constraint violation, which further improves to $O(T^{-1/2})$ under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an $\widetilde{O}(\epsilon^{-4})$ sample complexity for $\epsilon$-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate the effectiveness of our methods through numerical experiments.

Updated: 2024-05-26 06:58:08

标题: 基于策略的原始-对偶方法用于具有方差减少的凹CMDP

摘要: 我们研究了凹凸约束马尔可夫决策过程（凹凸CMDPs），其中目标和约束都定义为状态-动作占据度量的凹函数。我们提出了方差减少的原始-对偶策略梯度算法（VR-PDPG），通过策略梯度上升更新原始变量，通过投影次梯度下降更新对偶变量。尽管由于可加性结构的丢失和问题的非凹性质而面临挑战，我们通过利用一种隐藏凹性的形式确立了VR-PDPG的全局收敛性。在确切的设置中，我们证明了平均最优性差和约束违规的$O(T^{-1/3})$收敛速度，进一步在占据度量中的目标强凹性下提高至$O(T^{-1/2})$。在基于样本的设置中，我们展示了VR-PDPG实现了$\widetilde{O}(\epsilon^{-4})$的样本复杂度用于$\epsilon$-全局最优性。此外，通过将一种减小的悲观项纳入约束中，我们表明VR-PDPG可以实现零约束违规，而不影响最优性差的收敛速度。最后，我们通过数值实验验证了我们方法的有效性。

更新时间: 2024-05-26 06:58:08

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2205.10715v4

A Slices Perspective for Incremental Nonparametric Inference in High Dimensional State Spaces

We introduce an innovative method for incremental nonparametric probabilistic inference in high-dimensional state spaces. Our approach leverages \slices from high-dimensional surfaces to efficiently approximate posterior distributions of any shape. Unlike many existing graph-based methods, our \slices perspective eliminates the need for additional intermediate reconstructions, maintaining a more accurate representation of posterior distributions. Additionally, we propose a novel heuristic to balance between accuracy and efficiency, enabling real-time operation in nonparametric scenarios. In empirical evaluations on synthetic and real-world datasets, our \slices approach consistently outperforms other state-of-the-art methods. It demonstrates superior accuracy and achieves a significant reduction in computational complexity, often by an order of magnitude.

Updated: 2024-05-26 06:52:56

标题: 在高维状态空间中增量非参数推断的切片视角

摘要: 我们引入了一种创新的方法，用于在高维状态空间中进行增量非参数概率推断。我们的方法利用高维表面的“切片”来高效地近似任何形状的后验分布。与许多现有的基于图的方法不同，我们的“切片”视角消除了对额外中间重建的需求，保持了后验分布的更准确表示。此外，我们提出了一种新颖的启发式方法来在准确性和效率之间取得平衡，从而实现在非参数场景下的实时操作。在合成和真实数据集上的实证评估中，我们的“切片”方法始终优于其他最先进的方法。它表现出更高的准确性，并实现了计算复杂性的显著降低，通常是一个数量级。

更新时间: 2024-05-26 06:52:56

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16453v1

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. By leveraging the capabilities of deep learning in semantic understanding, especially in feature extraction and representation learning, this study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals. Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI). CQA integrates a retrieval-augmented generation (RAG) pipeline to leverage large language models (LLMs) and external medical knowledge to generate detailed textual descriptions of ECGs. The generated text is enriched with information about demographics and waveform patterns. ESI integrates both contrastive and captioning loss to pretrain ECG encoders for enhanced representations. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experimental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches.

Updated: 2024-05-26 06:45:39

标题: ECG 语义整合器（ESI）：基于 LLM 增强心脏文本预训练的基础 ECG 模型

摘要: 深度学习在心电图（ECG）分析中的应用已经带来了心脏健康诊断的高准确性和效率。通过利用深度学习在语义理解中的能力，特别是在特征提取和表示学习方面，本研究引入了一个新的多模态对比预训练框架，旨在提高学习到的12导联ECG信号表示的质量和鲁棒性。我们的框架包括两个关键组件，包括心脏查询助手（CQA）和ECG语义集成器（ESI）。CQA整合了一个检索增强生成（RAG）管道，利用大型语言模型（LLMs）和外部医学知识生成ECG的详细文本描述。生成的文本丰富了关于人口统计信息和波形模式的信息。ESI整合了对比损失和字幕损失，为增强表示对ECG编码器进行预训练。我们通过各种下游任务验证了我们的方法，包括心律失常检测和基于ECG的主体识别。我们的实验结果表明，在这些任务中，我们相对于强基线方法有了实质性的改进。这些基线方法包括监督和自监督学习方法，以及先前的多模态预训练方法。

更新时间: 2024-05-26 06:45:39

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2405.19366v1

GPFL: A Gradient Projection-Based Client Selection Framework for Efficient Federated Learning

Federated learning client selection is crucial for determining participant clients while balancing model accuracy and communication efficiency. Existing methods have limitations in handling data heterogeneity, computational burdens, and independent client treatment. To address these challenges, we propose GPFL, which measures client value by comparing local and global descent directions. We also employ an Exploit-Explore mechanism to enhance performance. Experimental results on FEMINST and CIFAR-10 datasets demonstrate that GPFL outperforms baselines in Non-IID scenarios, achieving over 9\% improvement in FEMINST test accuracy. Moreover, GPFL exhibits shorter computation times through pre-selection and parameter reuse in federated learning.

Updated: 2024-05-26 06:34:29

标题: GPFL：一种基于梯度投影的客户端选择框架，用于高效的联邦学习

摘要: 联邦学习客户端选择对于确定参与者客户端并在平衡模型精度和通信效率方面至关重要。现有方法在处理数据异质性、计算负担和独立客户端处理方面存在局限性。为了解决这些挑战，我们提出了GPFL，通过比较本地和全局下降方向来衡量客户端价值。我们还采用了一种利用-探索机制来提高性能。在FEMINST和CIFAR-10数据集上的实验结果表明，GPFL在非独立同分布情况下表现优于基线，FEMINST测试精度提高了超过9\%。此外，GPFL在联邦学习中通过预先选择和参数重复使用展现出较短的计算时间。

更新时间: 2024-05-26 06:34:29

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2403.17833v2

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search framework (LLM-GS). Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods. We address the challenge of LLMs' inability to generate precise and grammatically correct programs in domain-specific languages (DSLs) by proposing a Pythonic-DSL strategy - an LLM is instructed to initially generate Python codes and then convert them into DSL programs. To further optimize the LLM-generated programs, we develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to consistently improve the programs. Experimental results in the Karel domain demonstrate the superior effectiveness and efficiency of our LLM-GS framework. Extensive ablation studies further verify the critical role of our Pythonic-DSL strategy and Scheduled Hill Climbing algorithm.

Updated: 2024-05-26 06:33:48

标题: 使用大型语言模型引导搜索合成程序化强化学习策略

摘要: 程序化强化学习（PRL）已经被探索用于通过程序表示策略，以实现可解释性和泛化性。尽管取得了一些令人期待的成果，但当前最先进的PRL方法受到样本效率低下的阻碍，需要数千万次程序-环境交互。为了解决这一挑战，我们引入了一种新颖的LLM引导搜索框架（LLM-GS）。我们的关键洞察是利用LLMs的编程专业知识和常识推理来增强无假设、随机猜测搜索方法的效率。我们通过提出Python-DSL策略来解决LLMs无法生成精确和语法正确的特定领域语言（DSLs）程序的挑战 - 一个LLM被指示首先生成Python代码，然后将其转换为DSL程序。为了进一步优化LLM生成的程序，我们开发了一种名为Scheduled Hill Climbing的搜索算法，旨在高效地探索程序搜索空间以持续改进程序。在Karel领域的实验结果显示了我们的LLM-GS框架的卓越有效性和效率。广泛的消融研究进一步验证了我们的Python-DSL策略和Scheduled Hill Climbing算法的关键作用。

更新时间: 2024-05-26 06:33:48

领域: cs.LG,cs.AI,cs.PL

下载: http://arxiv.org/abs/2405.16450v1

Reinforcement Learning for Jump-Diffusions

We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and q-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. Finally, we investigate as an application the mean-variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps.

Updated: 2024-05-26 06:33:11

标题: 跳跃扩散过程的强化学习

摘要: 我们研究了连续时间强化学习（RL）用于受跳跃扩散过程控制的随机控制的情况。我们制定了一个带有随机策略的熵正则化的探索性控制问题，以捕捉RL所必需的探索-利用平衡。与Wang等人（2020）最初研究的纯扩散情况不同，受跳跃扩散影响的探索性动力学的推导需要对跳跃部分进行谨慎的制定。通过理论分析，我们发现可以简单地使用Jia和Zhou（2022a，2023）中最初为受控扩散开发的相同策略评估和q-learning算法，而无需事先检查基础数据来自纯扩散还是跳跃扩散。然而，我们表明跳跃的存在应该在一般情况下影响演员和评论家的参数化。最后，我们以均值-方差投资组合选择问题作为一个应用，股价模型化为跳跃扩散，表明RL算法和参数化对跳跃是不变的。

更新时间: 2024-05-26 06:33:11

领域: cs.LG,math.OC,q-fin.MF

下载: http://arxiv.org/abs/2405.16449v1

Fast Asymmetric Factorization for Large Scale Multiple Kernel Clustering

Kernel methods are extensively employed for nonlinear data clustering, yet their effectiveness heavily relies on selecting suitable kernels and associated parameters, posing challenges in advance determination. In response, Multiple Kernel Clustering (MKC) has emerged as a solution, allowing the fusion of information from multiple base kernels for clustering. However, both early fusion and late fusion methods for large-scale MKC encounter challenges in memory and time constraints, necessitating simultaneous optimization of both aspects. To address this issue, we propose Efficient Multiple Kernel Concept Factorization (EMKCF), which constructs a new sparse kernel matrix inspired by local regression to achieve memory efficiency. EMKCF learns consensus and individual representations by extending orthogonal concept factorization to handle multiple kernels for time efficiency. Experimental results demonstrate the efficiency and effectiveness of EMKCF on benchmark datasets compared to state-of-the-art methods. The proposed method offers a straightforward, scalable, and effective solution for large-scale MKC tasks.

Updated: 2024-05-26 06:29:12

标题: 大规模多核聚类的快速不对称分解

摘要: 核方法被广泛应用于非线性数据聚类，然而它们的有效性严重依赖于选择合适的核和相关参数，在提前确定方面存在挑战。为此，多核聚类（MKC）已经成为一种解决方案，允许将来自多个基本核的信息融合用于聚类。然而，针对大规模MKC的早期融合和晚期融合方法在内存和时间约束方面遇到挑战，需要同时优化这两个方面。为解决这一问题，我们提出了高效的多核概念因子分解（EMKCF），它构建了一个受本地回归启发的新稀疏核矩阵以实现内存效率。EMKCF通过将正交概念因子分解扩展到处理多个核以实现时间效率，学习共识和个体表示。实验结果表明，与最先进的方法相比，EMKCF在基准数据集上的效率和有效性。提出的方法为大规模MKC任务提供了一种直接，可扩展和有效的解决方案。

更新时间: 2024-05-26 06:29:12

领域: cs.LG

下载: http://arxiv.org/abs/2405.16447v1

Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice and plays an important role in the generalization of modern machine learning. However, prior research has revealed instances where the generalization performance of SGD is worse than ridge regression due to uneven optimization along different dimensions. Preconditioning offers a natural solution to this issue by rebalancing optimization across different directions. Yet, the extent to which preconditioning can enhance the generalization performance of SGD and whether it can bridge the existing gap with ridge regression remains uncertain. In this paper, we study the generalization performance of SGD with preconditioning for the least squared problem. We make a comprehensive comparison between preconditioned SGD and (standard \& preconditioned) ridge regression. Our study makes several key contributions toward understanding and improving SGD with preconditioning. First, we establish excess risk bounds (generalization performance) for preconditioned SGD and ridge regression under an arbitrary preconditions matrix. Second, leveraging the excessive risk characterization of preconditioned SGD and ridge regression, we show that (through construction) there exists a simple preconditioned matrix that can make SGD comparable to (standard \& preconditioned) ridge regression. Finally, we show that our proposed preconditioning matrix is straightforward enough to allow robust estimation from finite samples while maintaining a theoretical improvement. Our empirical results align with our theoretical findings, collectively showcasing the enhanced regularization effect of preconditioned SGD.

Updated: 2024-05-26 06:17:59

标题: 通过预处理改善隐式正则化的随机梯度下降方法，针对最小二乘问题

摘要: 随机梯度下降（SGD）在实践中表现出强大的算法正则化效果，并在现代机器学习的泛化中扮演重要角色。然而，先前的研究揭示了在不同维度上优化不均匀时，SGD的泛化性能比岭回归差的情况。通过预调节可以通过重新平衡不同方向上的优化来解决这个问题。然而，预调节能够提高SGD的泛化性能的程度以及是否能够弥合与岭回归之间的现有差距仍然不确定。在本文中，我们研究了预调节SGD在最小二乘问题中的泛化性能。我们对预调节SGD和（标准和预调节）岭回归进行了全面比较。我们的研究在理解和改进预调节SGD方面做出了几个关键贡献。首先，我们建立了在任意预调节矩阵下预调节SGD和岭回归的超额风险界（泛化性能）。其次，利用预调节SGD和岭回归的过度风险表征，我们表明（通过构造）存在一个简单的预调节矩阵，可以使SGD与（标准和预调节）岭回归可比。最后，我们表明，我们提出的预调节矩阵足够简单，可以从有限样本中进行稳健估计，同时保持理论上的改进。我们的实证结果与我们的理论发现一致，共同展示了预调节SGD的增强正则化效果。

更新时间: 2024-05-26 06:17:59

领域: cs.LG

下载: http://arxiv.org/abs/2403.08585v3

Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach

With the proliferation of edge devices, there is a significant increase in attack surface on these devices. The decentralized deployment of threat intelligence on edge devices, coupled with adaptive machine learning techniques such as the in-context learning feature of Large Language Models (LLMs), represents a promising paradigm for enhancing cybersecurity on resource-constrained edge devices. This approach involves the deployment of lightweight machine learning models directly onto edge devices to analyze local data streams, such as network traffic and system logs, in real-time. Additionally, distributing computational tasks to an edge server reduces latency and improves responsiveness while also enhancing privacy by processing sensitive data locally. LLM servers can enable these edge servers to autonomously adapt to evolving threats and attack patterns, continuously updating their models to improve detection accuracy and reduce false positives. Furthermore, collaborative learning mechanisms facilitate peer-to-peer secure and trustworthy knowledge sharing among edge devices, enhancing the collective intelligence of the network and enabling dynamic threat mitigation measures such as device quarantine in response to detected anomalies. The scalability and flexibility of this approach make it well-suited for diverse and evolving network environments, as edge devices only send suspicious information such as network traffic and system log changes, offering a resilient and efficient solution to combat emerging cyber threats at the network edge. Thus, our proposed framework can improve edge computing security by providing better security in cyber threat detection and mitigation by isolating the edge devices from the network.

Updated: 2024-05-26 06:06:08

标题: 边缘设备上的分布式威胁情报：基于大型语言模型的方法

摘要: 随着边缘设备的增多，这些设备的攻击面也显著增加。在边缘设备上部署威胁情报的分散化，结合自适应机器学习技术，如大型语言模型（LLMs）的上下文学习特性，代表着一种有望增强资源受限的边缘设备网络安全的范式。这种方法涉及将轻量级机器学习模型直接部署到边缘设备上，以实时分析本地数据流，例如网络流量和系统日志。此外，将计算任务分配到边缘服务器可以减少延迟和提高响应性，同时通过在本地处理敏感数据来增强隐私。LLM服务器可以使这些边缘服务器自主适应不断发展的威胁和攻击模式，持续更新其模型以提高检测准确性并减少误报。此外，协作学习机制促进了边缘设备之间的安全和可信知识共享，增强了网络的集体智能，使动态威胁缓解措施如设备隔离以响应检测到的异常成为可能。这种方法的可扩展性和灵活性使其非常适合各种不断发展的网络环境，因为边缘设备仅发送可疑信息，如网络流量和系统日志变化，提供了一种弹性和高效的解决方案，用于对抗网络边缘出现的新兴网络威胁。因此，我们提出的框架可以通过将边缘设备与网络隔离来提高边缘计算安全，从而提供更好的网络威胁检测和缓解安全性。

更新时间: 2024-05-26 06:06:08

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.08755v2

CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion

Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, and when they are not, their precomputed KV caches cannot be directly used since they ignore the text's cross-attention with the preceding text in the LLM input. Thus, the benefits of reusing KV caches remain largely unrealized. This paper tackles just one question: when an LLM input contains multiple text chunks, how to quickly combine their precomputed KV caches in order to achieve the same generation quality as the expensive full prefill (i.e., without reusing KV cache)? We present CacheBlend, a scheme that reuses the pre-computed KV caches, regardless prefix or not, and selectively recomputes the KV values of a small subset of tokens to partially update each reused KV cache. In the meantime,the small extra delay for recomputing some tokens can be pipelined with the retrieval of KV caches within the same job,allowing CacheBlend to store KV caches in slower devices with more storage capacity while retrieving them without increasing the inference delay. By comparing CacheBlend with the state-of-the-art KV cache reusing schemes on three open-source LLMs of various sizes and four popular benchmark datasets of different tasks, we show that CacheBlend reduces time-to-first-token (TTFT) by 2.2-3.3X and increases the inference throughput by 2.8-5X, compared with full KV recompute, without compromising generation quality or incurring more storage cost.

Updated: 2024-05-26 06:00:17

标题: CacheBlend：具有缓存知识融合的快速大型语言模型服务

摘要: 大型语言模型（LLMs）通常在其输入中包含多个文本块，以提供必要的上下文。为了加速长LLM输入的预填充，可以预先计算文本的KV缓存，并在上下文被重复使用作为另一个LLM输入的前缀时重新使用KV缓存。然而，重复使用的文本块并不总是输入前缀，当它们不是时，它们的预先计算的KV缓存不能直接使用，因为它们忽略了LLM输入中与前一个文本的交叉注意力。因此，重用KV缓存的好处往往没有得到实现。本文只解决一个问题：当LLM输入包含多个文本块时，如何快速结合它们预先计算的KV缓存，以达到与昂贵的完全预填充（即，不重用KV缓存）相同的生成质量？我们提出了CacheBlend，这是一种方案，它重新使用预先计算的KV缓存，无论是否为前缀，并选择性地重新计算一小部分标记的KV值，以部分更新每个重用的KV缓存。同时，重新计算一些标记的小额延迟可以与在同一作业中检索KV缓存并行，使CacheBlend能够将KV缓存存储在更多存储容量的较慢设备中，而无需增加推理延迟。通过将CacheBlend与三个开源LLMs和四个流行基准数据集上的最先进的KV缓存重用方案进行比较，我们展示了与完全KV重新计算相比，CacheBlend将首个标记的时间（TTFT）降低了2.2-3.3倍，并将推理吞吐量提高了2.8-5倍，而不会损害生成质量或产生更多存储成本。

更新时间: 2024-05-26 06:00:17

领域: cs.LG

下载: http://arxiv.org/abs/2405.16444v1

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

The commercialization of text-to-image diffusion models (DMs) brings forth potential copyright concerns. Despite numerous attempts to protect DMs from copyright issues, the vulnerabilities of these solutions are underexplored. In this study, we formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion, to induce copyright infringement without requiring access to or control over training processes. Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data while carefully dispersing that information, making the poisoning data inconspicuous when integrated into a clean dataset. Our experiments show the stealth and efficacy of the poisoning data. When given specific text prompts, DMs trained with a poisoning ratio of 0.20% can produce copyrighted images. Additionally, the results reveal that the more sophisticated the DMs are, the easier the success of the attack becomes. These findings underline potential pitfalls in the prevailing copyright protection strategies and underscore the necessity for increased scrutiny to prevent the misuse of DMs.

Updated: 2024-05-26 06:00:10

标题: 强扩散模型越强，后门越容易：数据投毒引发版权侵权而不调整微调管道

摘要: 文本到图像扩散模型（DMs）的商业化引发了潜在的版权问题。尽管已经有许多尝试保护DMs免受版权问题的影响，但这些解决方案的漏洞尚未被充分探讨。在这项研究中，我们对生成式人工智能模型上的版权侵权攻击进行了正式化，并提出了一种后门攻击方法，SilentBadDiffusion，可以诱导版权侵权，而无需访问或控制训练过程。我们的方法在毒化数据中策略性地嵌入了受版权保护信息和文本引用之间的连接，同时谨慎地分散这些信息，使毒化数据在整合到干净数据集中时不易察觉。我们的实验展示了毒化数据的隐蔽性和有效性。在给定特定文本提示的情况下，使用毒化比率为0.20％训练的DMs可以生成受版权保护的图像。此外，结果表明，DMs越复杂，攻击成功的可能性就越大。这些发现突显了现有版权保护策略中潜在的风险，并强调了增加审查以防止DMs被滥用的必要性。

更新时间: 2024-05-26 06:00:10

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2401.04136v2

Categorical Flow Matching on Statistical Manifolds

We introduce Statistical Flow Matching (SFM), a novel and mathematically rigorous flow-matching framework on the manifold of parameterized probability measures inspired by the results from information geometry. We demonstrate the effectiveness of our method on the discrete generation problem by instantiating SFM on the manifold of categorical distributions whose geometric properties remain unexplored in previous discrete generative models. Utilizing the Fisher information metric, we equip the manifold with a Riemannian structure whose intrinsic geometries are effectively leveraged by following the shortest paths of geodesics. We develop an efficient training and sampling algorithm that overcomes numerical stability issues with a diffeomorphism between manifolds. Our distinctive geometric perspective of statistical manifolds allows us to apply optimal transport during training and interpret SFM as following the steepest direction of the natural gradient. Unlike previous models that rely on variational bounds for likelihood estimation, SFM enjoys the exact likelihood calculation for arbitrary probability measures. We manifest that SFM can learn more complex patterns on the statistical manifold where existing models often fail due to strong prior assumptions. Comprehensive experiments on real-world generative tasks ranging from image, text to biological domains further demonstrate that SFM achieves higher sampling quality and likelihood than other discrete diffusion or flow-based models.

Updated: 2024-05-26 05:50:39

标题: 统计流形上的分类流匹配

摘要: 我们引入了统计流匹配（SFM），这是一个新颖且数学严谨的流匹配框架，位于参数化概率测度流形上，受信息几何学结果启发而来。我们通过在分类分布流形上实例化SFM，展示了我们方法在离散生成问题上的有效性，这些分布的几何特性在先前的离散生成模型中尚未被探索。利用Fisher信息度量，我们为流形提供了一个黎曼结构，其内在几何通过遵循测地线的最短路径得到有效利用。我们开发了一个有效的训练和采样算法，通过流形之间的微分同胚克服了数值稳定性问题。我们独特的统计流形几何视角使我们能够在训练过程中应用最优输运，并将SFM解释为跟随自然梯度的最陡方向。与以往依赖变分界限进行似然估计的模型不同，SFM可以对任意概率测度进行精确似然计算。我们证明了SFM可以在统计流形上学习更复杂的模式，而现有模型通常由于强先验假设而失败。在从图像、文本到生物领域的真实生成任务上进行的全面实验进一步证明，SFM实现了比其他离散扩散或基于流的模型更高的采样质量和似然度。

更新时间: 2024-05-26 05:50:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.16441v1

MambaTS: Improved Selective State Space Models for Long-term Time Series Forecasting

In recent years, Transformers have become the de-facto architecture for long-term sequence forecasting (LTSF), but faces challenges such as quadratic complexity and permutation invariant bias. A recent model, Mamba, based on selective state space models (SSMs), has emerged as a competitive alternative to Transformer, offering comparable performance with higher throughput and linear complexity related to sequence length. In this study, we analyze the limitations of current Mamba in LTSF and propose four targeted improvements, leading to MambaTS. We first introduce variable scan along time to arrange the historical information of all the variables together. We suggest that causal convolution in Mamba is not necessary for LTSF and propose the Temporal Mamba Block (TMB). We further incorporate a dropout mechanism for selective parameters of TMB to mitigate model overfitting. Moreover, we tackle the issue of variable scan order sensitivity by introducing variable permutation training. We further propose variable-aware scan along time to dynamically discover variable relationships during training and decode the optimal variable scan order by solving the shortest path visiting all nodes problem during inference. Extensive experiments conducted on eight public datasets demonstrate that MambaTS achieves new state-of-the-art performance.

Updated: 2024-05-26 05:50:17

标题: MambaTS：改进的长期时间序列预测选择性状态空间模型

摘要: 最近几年，变压器已成为长期序列预测（LTSF）的事实架构，但面临着二次复杂性和排列不变偏差等挑战。最近出现的基于选择性状态空间模型（SSM）的模型Mamba已成为Transformer的竞争性替代方案，提供了与更高吞吐量和与序列长度相关的线性复杂性相当的性能。在本研究中，我们分析了当前Mamba在LTSF中的局限性，并提出了四项有针对性的改进，导致了MambaTS的诞生。我们首先引入了沿时间变量扫描，将所有变量的历史信息排列在一起。我们认为Mamba中的因果卷积对于LTSF并非必要，提出了时间Mamba块（TMB）。我们进一步在TMB的选择性参数中加入了一个辍学机制，以减轻模型过拟合。此外，我们通过引入变量排列训练来解决变量扫描顺序敏感性问题。我们进一步提出了沿时间变量感知扫描，以在训练期间动态发现变量关系，并通过解决在推理期间访问所有节点的最短路径问题来解码最佳的变量扫描顺序。在八个公共数据集上进行的大量实验表明，MambaTS实现了新的最先进性能。

更新时间: 2024-05-26 05:50:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16440v1

Towards Imitation Learning in Real World Unstructured Social Mini-Games in Pedestrian Crowds

Imitation Learning (IL) strategies are used to generate policies for robot motion planning and navigation by learning from human trajectories. Recently, there has been a lot of excitement in applying IL in social interactions arising in urban environments such as university campuses, restaurants, grocery stores, and hospitals. However, obtaining numerous expert demonstrations in social settings might be expensive, risky, or even impossible. Current approaches therefore, focus only on simulated social interaction scenarios. This raises the question: \textit{How can a robot learn to imitate an expert demonstrator from real world multi-agent social interaction scenarios}? It remains unknown which, if any, IL methods perform well and what assumptions they require. We benchmark representative IL methods in real world social interaction scenarios on a motion planning task, using a novel pedestrian intersection dataset collected at the University of Texas at Austin campus. Our evaluation reveals two key findings: first, learning multi-agent cost functions is required for learning the diverse behavior modes of agents in tightly coupled interactions and second, conditioning the training of IL methods on partial state information or providing global information in simulation can improve imitation learning, especially in real world social interaction scenarios.

Updated: 2024-05-26 05:48:21

标题: 朝着在行人群中进行真实世界非结构化社交小游戏中的模仿学习。

摘要: 模仿学习（IL）策略被用来通过学习人类轨迹生成机器人运动规划和导航策略。最近，在将IL应用于城市环境中的社交互动方面，如大学校园、餐厅、杂货店和医院等，引起了很多兴奋。然而，在社交环境中获得大量专家示范可能昂贵、风险高，甚至不可能。因此，当前方法仅关注模拟社交互动场景。这引发了一个问题：\textit{机器人如何从现实世界的多智能体社交互动场景中学习模仿专家示范者}? 目前尚不清楚哪种IL方法表现良好以及它们需要什么样的假设。我们在奥斯汀大学校园收集的新型行人路口数据集上，对真实世界社交互动场景中的代理人运动规划任务中的代表性IL方法进行基准测试。我们的评估揭示了两个关键发现：首先，在学习紧密耦合互动中代理人的多样行为模式时，需要学习多智能体成本函数；其次，在仿真中将IL方法的训练条件化为部分状态信息或提供全局信息可以改善模仿学习，尤其是在现实世界社交互动场景中。

更新时间: 2024-05-26 05:48:21

领域: cs.RO,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2405.16439v1

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Aligning generative models with human preference via RLHF typically suffers from overoptimization, where an imperfectly learned reward model can misguide the generative model to output undesired responses. We investigate this problem in a principled manner by identifying the source of the misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model; one that simultaneously minimizes the maximum likelihood estimation of the loss and a reward penalty term. Here, the reward penalty term is introduced to prevent the policy from choosing actions with spurious high proxy rewards, resulting in provable sample efficiency of the algorithm under a partial coverage style condition. Moving from theory to practice, the proposed algorithm further enjoys an equivalent but surprisingly easy-to-implement reformulation. Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines: (i) a preference optimization loss that directly aligns the policy with human preference, and (ii) a supervised learning loss that explicitly imitates the policy with a (suitable) baseline distribution. In the context of aligning large language models (LLM), this objective fuses the direct preference optimization (DPO) loss with the supervised fune-tuning (SFT) loss to help mitigate the overoptimization towards undesired responses, for which we name the algorithm Regularized Preference Optimization (RPO). Experiments of aligning LLMs demonstrate the improved performance of RPO compared with DPO baselines. Our work sheds light on the interplay between preference optimization and SFT in tuning LLMs with both theoretical guarantees and empirical evidence.

Updated: 2024-05-26 05:38:50

标题: 可以为RLHF中的过度优化提供证明：你的SFT损失隐含地是一个对抗正则化器

摘要: 通过RLHF将生成模型与人类偏好进行对齐通常会出现过度优化的问题，其中学习不完善的奖励模型可能会误导生成模型输出不受欢迎的响应。我们通过一种原则性的方法来研究这个问题，将错位的源头识别为一种分布转移和学习人类偏好中的不确定性。为了减轻过度优化，我们首先提出了一个理论算法，该算法选择对抗性选择的奖励模型的最佳策略；同时最小化损失的最大似然估计和奖励惩罚项。在这里，奖励惩罚项被引入以防止策略选择具有虚假高代理奖励的行动，从而在部分覆盖样式条件下证明了算法的样本效率。从理论到实践，所提出的算法进一步享有一个等效但令人惊讶地易于实现的重构。利用奖励模型和相应最优策略之间的等价性，该算法具有一个简单的目标，结合了：（i）直接将策略与人类偏好对齐的偏好优化损失，以及（ii）明确模仿策略与（适当的）基线分布的监督学习损失。在对齐大型语言模型（LLM）的背景下，这个目标将直接偏好优化（DPO）损失与监督微调（SFT）损失相结合，以帮助减轻对不受欢迎的响应的过度优化，我们将这个算法命名为正则化偏好优化（RPO）。对齐LLMs的实验表明，与DPO基线相比，RPO的性能有所提升。我们的工作通过理论保证和实证证据揭示了在调整LLMs时偏好优化和SFT之间的相互作用。

更新时间: 2024-05-26 05:38:50

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.16436v1

Structure-aware Semantic Node Identifiers for Learning on Graphs

We present a novel graph tokenization framework that generates structure-aware, semantic node identifiers (IDs) in the form of a short sequence of discrete codes, serving as symbolic representations of nodes. We employs vector quantization to compress continuous node embeddings from multiple layers of a graph neural network (GNN), into compact, meaningful codes, under both self-supervised and supervised learning paradigms. The resulting node IDs capture a high-level abstraction of graph data, enhancing the efficiency and interpretability of GNNs. Through extensive experiments on 34 datasets, including node classification, graph classification, link prediction, and attributed graph clustering tasks, we demonstrate that our generated node IDs not only improve computational efficiency but also achieve competitive performance compared to current state-of-the-art methods.

Updated: 2024-05-26 05:22:38

标题: 基于结构感知的语义节点标识符用于图学习

摘要: 我们提出了一种新颖的图标记化框架，该框架生成结构感知的、语义节点标识符（IDs），以短序列的离散代码的形式呈现，作为节点的符号表示。我们利用矢量量化将图神经网络（GNN）多层的连续节点嵌入压缩成紧凑、有意义的代码，在自监督和监督学习范式下。生成的节点IDs捕捉了图数据的高级抽象，提高了GNN的效率和可解释性。通过对34个数据集进行大量实验，包括节点分类、图分类、链接预测和属性图聚类任务，我们证明我们生成的节点IDs不仅提高了计算效率，还与当前最先进的方法相比取得了竞争性的性能。

更新时间: 2024-05-26 05:22:38

领域: cs.LG

下载: http://arxiv.org/abs/2405.16435v1

The Importance of Directional Feedback for LLM-based Optimizers

We study the potential of using large language models (LLMs) as an interactive optimizer for solving maximization problems in a text space using natural language and numerical feedback. Inspired by the classical optimization literature, we classify the natural language feedback into directional and non-directional, where the former is a generalization of the first-order feedback to the natural language space. We find that LLMs are especially capable of optimization when they are provided with {directional feedback}. Based on this insight, we design a new LLM-based optimizer that synthesizes directional feedback from the historical optimization trace to achieve reliable improvement over iterations. Empirically, we show our LLM-based optimizer is more stable and efficient in solving optimization problems, from maximizing mathematical functions to optimizing prompts for writing poems, compared with existing techniques.

Updated: 2024-05-26 05:22:35

标题: LLM-based优化器中方向性反馈的重要性

摘要: 我们研究了使用大型语言模型（LLMs）作为在文本空间中使用自然语言和数字反馈解决最大化问题的交互式优化器的潜力。受经典优化文献的启发，我们将自然语言反馈分类为定向和非定向，前者是对自然语言空间的一阶反馈的泛化。我们发现，当LLMs提供{定向反馈}时，它们在优化方面特别有效。基于这一认识，我们设计了一种新的基于LLM的优化器，从历史优化跟踪中合成定向反馈以实现可靠的迭代改进。经验上，我们展示了与现有技术相比，我们基于LLM的优化器在解决优化问题时更加稳定和高效，从最大化数学函数到优化写诗提示。

更新时间: 2024-05-26 05:22:35

领域: cs.AI,cs.CL,cs.NE

下载: http://arxiv.org/abs/2405.16434v1

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evaluate multi-turn dialogues within the counseling process remains an understudied area. To bridge the gap, we propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues while a comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations. Competitive experimental results demonstrate the effectiveness of our proposed framework in psychological counseling. We open-source the datasets and model for future research at https://github.com/CAS-SIAT-XinHai/CPsyCoun

Updated: 2024-05-26 05:18:00

标题: CPsyCoun：一种基于报告的多轮对话重建和评估框架，用于中国心理咨询

摘要: 目前，利用大型语言模型（LLMs）辅助心理咨询是一项重要但具有挑战性的任务。目前已经尝试在提高共情对话或作为治疗中的有效助手方面改进LLMs。然而，现有数据集缺乏咨询知识，导致LLMs缺乏专业咨询能力。此外，在咨询过程中如何自动评估多轮对话仍是一片未被研究的领域。为了弥合这个差距，我们提出了CPsyCoun，一个基于报告的多轮对话重构和评估框架，用于中国心理咨询。为了充分利用心理咨询报告，我们设计了一个两阶段方法来构建高质量的对话，同时开发了一个全面的评估基准，用于有效自动评估多轮心理咨询。竞争性实验结果表明我们提出的框架在心理咨询中的有效性。我们为未来研究开放了数据集和模型，网址为https://github.com/CAS-SIAT-XinHai/CPsyCoun。

更新时间: 2024-05-26 05:18:00

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.16433v1

Improving Demand Forecasting in Open Systems with Cartogram-Enhanced Deep Learning

Predicting temporal patterns across various domains poses significant challenges due to their nuanced and often nonlinear trajectories. To address this challenge, prediction frameworks have been continuously refined, employing data-driven statistical methods, mathematical models, and machine learning. Recently, as one of the challenging systems, shared transport systems such as public bicycles have gained prominence due to urban constraints and environmental concerns. Predicting rental and return patterns at bicycle stations remains a formidable task due to the system's openness and imbalanced usage patterns across stations. In this study, we propose a deep learning framework to predict rental and return patterns by leveraging cartogram approaches. The cartogram approach facilitates the prediction of demand for newly installed stations with no training data as well as long-period prediction, which has not been achieved before. We apply this method to public bicycle rental-and-return data in Seoul, South Korea, employing a spatial-temporal convolutional graph attention network. Our improved architecture incorporates batch attention and modified node feature updates for better prediction accuracy across different time scales. We demonstrate the effectiveness of our framework in predicting temporal patterns and its potential applications.

Updated: 2024-05-26 04:58:09

标题: 使用卡图增强的深度学习改进开放系统中的需求预测

摘要: 在各个领域中预测时间模式具有重要挑战，因为它们通常具有微妙且非线性的轨迹。为了解决这一挑战，预测框架不断得到改进，采用数据驱动的统计方法、数学模型和机器学习。最近，作为具有挑战性的系统之一，公共自行车等共享交通系统由于城市限制和环境问题而备受关注。由于系统的开放性和各站点之间不平衡的使用模式，预测自行车站点的出租和归还模式仍然是一项艰巨的任务。在本研究中，我们提出了一个深度学习框架，通过利用等距图方法来预测出租和归还模式。等距图方法有助于预测新安装站点的需求，而无需训练数据，同时实现长期预测，这是以前未曾实现的。我们将这种方法应用于韩国首尔的公共自行车出租和归还数据，采用空间-时间卷积图注意力网络。我们改进的架构结合了批量注意力和修改的节点特征更新，以在不同时间尺度上提高预测准确性。我们展示了我们的框架在预测时间模式及其潜在应用中的有效性。

更新时间: 2024-05-26 04:58:09

领域: cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2403.16049v2

PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models

This paper introduces PipeFusion, a novel approach that harnesses multi-GPU parallelism to address the high computational and latency challenges of generating high-resolution images with diffusion transformers (DiT) models. PipeFusion splits images into patches and distributes the network layers across multiple devices. It employs a pipeline parallel manner to orchestrate communication and computations. By leveraging the high similarity between the input from adjacent diffusion steps, PipeFusion eliminates the waiting time in the pipeline by reusing the one-step stale feature maps to provide context for the current step. Our experiments demonstrate that it can generate higher image resolution where existing DiT parallel approaches meet OOM. PipeFusion significantly reduces the required communication bandwidth, enabling DiT inference to be hosted on GPUs connected via PCIe rather than the more costly NVLink infrastructure, which substantially lowers the overall operational expenses for serving DiT models. Our code is publicly available at https://github.com/PipeFusion/PipeFusion.

Updated: 2024-05-26 04:57:33

标题: PipeFusion：用于扩散变换器模型推断的位移补丁管道并行化

摘要: 本文介绍了PipeFusion，一种新颖的方法，利用多GPU并行性来解决使用扩散变压器（DiT）模型生成高分辨率图像时所面临的高计算和延迟挑战。PipeFusion将图像分割成补丁，并将网络层分布在多个设备上。它采用管道并行方式来协调通信和计算。通过利用相邻扩散步骤输入之间的高相似性，PipeFusion通过重复使用一步过时的特征图来为当前步骤提供上下文，消除了管道中的等待时间。我们的实验表明，它可以在现有DiT并行方法达到OOM时生成更高的图像分辨率。PipeFusion显著降低了所需的通信带宽，使得DiT推断可以在通过PCIe连接的GPU上托管，而不是成本更高的NVLink基础设施，从而大幅降低为服务DiT模型而产生的整体操作费用。我们的代码可以在https://github.com/PipeFusion/PipeFusion 上公开获取。

更新时间: 2024-05-26 04:57:33

领域: cs.CV,cs.AI,cs.PF

下载: http://arxiv.org/abs/2405.14430v2

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.

Updated: 2024-05-26 04:55:19

标题: “变压器作为决策者：通过监督预训练可证明的上下文强化学习”

摘要: 在离线强化学习数据集上预训练的大型变压器模型展示了在情境强化学习（ICRL）中的显著能力，当它们被提示使用来自未见环境的交互轨迹时，它们可以做出良好的决策。然而，变压器何时以及如何进行ICRL训练尚未在理论上得到很好的理解。特别是，不清楚变压器可以在上下文中执行哪些强化学习算法，以及离线训练数据中的分布不匹配如何影响学习到的算法。本文提供了一个分析ICRL监督预训练的理论框架。这包括两种最近提出的训练方法--算法蒸馏和决策预训练变压器。首先，假设模型可实现性，我们证明了监督预训练的变压器将模仿专家算法在观察到的轨迹中的条件期望。泛化误差将随着模型容量和专家与离线算法之间的分布发散因子而缩放。其次，我们展示了具有ReLU注意力的变压器可以有效地近似最优的在线强化学习算法，如用于随机线性赌博机的LinUCB和Thompson抽样，以及用于表格式马尔可夫决策过程的UCB-VI。这提供了对从离线轨迹预训练的变压器的ICRL能力的第一次定量分析。

更新时间: 2024-05-26 04:55:19

领域: cs.LG,cs.AI,cs.CL,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2310.08566v2

STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map

Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference between tasks. That is why existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on a re-proposed data-driven features, Data Maps, which capture the training dynamics for each classification task during the MTL training. Through a theoretical comparison with other techniques, we manage to show that our approach has the superior scalability. Our experiments show a better performance and verify the method's effectiveness, even on an unprecedented number of tasks (up to 100 tasks on CIFAR100). Being the first to work on such number of tasks, our comparisons on the resulting grouping shows similar grouping to the mentioned in the dataset, CIFAR100. Finally, we provide a modular implementation for easier integration and testing, with examples from multiple datasets and tasks.

Updated: 2024-05-26 04:52:02

标题: STG-MTL: 使用数据映射的可扩展多任务学习任务分组

摘要: 多任务学习（MTL）是一种强大的技术，因其在传统单任务学习（STL）上的性能改进而广受欢迎。然而，MTL往往具有挑战性，因为可能存在指数级的任务分组，这可能使选择最佳分组变得困难，因为某些分组可能由于任务之间的负相互干扰而导致性能下降。这就是为什么现有解决方案严重受到可扩展性问题的影响，限制了任何实际应用。在我们的论文中，我们提出了一种新的数据驱动方法，解决了这些挑战，并提供了一种基于重新提出的数据驱动特征Data Maps的分类任务分组的可扩展和模块化解决方案，这些特征捕获了MTL训练期间每个分类任务的训练动态。通过与其他技术的理论比较，我们成功地表明了我们的方法具有卓越的可扩展性。我们的实验表明了更好的性能，并验证了该方法的有效性，即使在前所未有的任务数量上（最多在CIFAR100上的100个任务）。作为第一个处理如此数量任务的工作，我们对结果分组的比较显示与数据集CIFAR100中提到的类似的分组。最后，我们提供了一个模块化实现，以便更容易集成和测试，包括来自多个数据集和任务的示例。

更新时间: 2024-05-26 04:52:02

领域: cs.LG

下载: http://arxiv.org/abs/2307.03374v2

Can Class-Priors Help Single-Positive Multi-Label Learning?

Single-positive multi-label learning (SPMLL) is a typical weakly supervised multi-label learning problem, where each training example is annotated with only one positive label. Existing SPMLL methods typically assign pseudo-labels to unannotated labels with the assumption that prior probabilities of all classes are identical. However, the class-prior of each category may differ significantly in real-world scenarios, which makes the predictive model not perform as well as expected due to the unrealistic assumption on real-world application. To alleviate this issue, a novel framework named {\proposed}, i.e., Class-pRiors Induced Single-Positive multi-label learning, is proposed. Specifically, a class-priors estimator is introduced, which could estimate the class-priors that are theoretically guaranteed to converge to the ground-truth class-priors. In addition, based on the estimated class-priors, an unbiased risk estimator for classification is derived, and the corresponding risk minimizer could be guaranteed to approximately converge to the optimal risk minimizer on fully supervised data. Experimental results on ten MLL benchmark datasets demonstrate the effectiveness and superiority of our method over existing SPMLL approaches.

Updated: 2024-05-26 04:32:00

标题: 类先验能帮助单正多标签学习吗？

摘要: 单正多标签学习（SPMLL）是典型的弱监督多标签学习问题，在这里，每个训练示例仅用一个正标签进行注释。现有的SPMLL方法通常假定所有类别的先验概率相同，然后将伪标签分配给未经注释的标签。然而，在现实世界的情况下，每个类别的类先验可能存在显著差异，这使得预测模型不能如预期的那样表现良好，因为其对真实世界应用的不切实际假设。为了缓解这个问题，提出了一种名为“Class-pRiors Induced Single-Positive multi-label learning”的新框架。具体地，引入了一个类先验估计器，可以估计理论上保证收敛到真实类先验的类先验。此外，基于估计的类先验，推导出了一个无偏风险估计器用于分类，并且相应的风险最小化器可以保证在完全监督数据上近似收敛到最优风险最小化器。在十个MLL基准数据集上的实验结果表明，我们的方法在效果和优越性方面超过了现有的SPMLL方法。

更新时间: 2024-05-26 04:32:00

领域: cs.LG

下载: http://arxiv.org/abs/2309.13886v2

Improving Health Professionals' Onboarding with AI and XAI for Trustworthy Human-AI Collaborative Decision Making

With advanced AI/ML, there has been growing research on explainable AI (XAI) and studies on how humans interact with AI and XAI for effective human-AI collaborative decision-making. However, we still have a lack of understanding of how AI systems and XAI should be first presented to users without technical backgrounds. In this paper, we present the findings of semi-structured interviews with health professionals (n=12) and students (n=4) majoring in medicine and health to study how to improve onboarding with AI and XAI. For the interviews, we built upon human-AI interaction guidelines to create onboarding materials of an AI system for stroke rehabilitation assessment and AI explanations and introduce them to the participants. Our findings reveal that beyond presenting traditional performance metrics on AI, participants desired benchmark information, the practical benefits of AI, and interaction trials to better contextualize AI performance, and refine the objectives and performance of AI. Based on these findings, we highlight directions for improving onboarding with AI and XAI and human-AI collaborative decision-making.

Updated: 2024-05-26 04:30:17

标题: 利用人工智能和可解释人工智能提升健康专业人员的入职体验，以实现可信赖的人工智能与人类协同决策-making

摘要: 随着先进的人工智能/机器学习技术的发展，对可解释人工智能（XAI）的研究日益增多，以及人类如何与人工智能和XAI进行有效的人机协同决策的研究。然而，我们仍然缺乏对如何首次向没有技术背景的用户展示人工智能系统和XAI的理解。本文通过与医学和卫生专业的医疗专业人士（n=12）和学生（n=4）进行半结构化访谈，研究如何改进与人工智能和XAI的入职。在访谈中，我们基于人工智能与人类互动的指导原则，创建了一个用于中风康复评估的人工智能系统的入职材料和人工智能解释，并向参与者介绍了这些材料。我们的研究结果表明，除了展示人工智能的传统性能指标外，参与者还希望得到基准信息、人工智能的实际益处以及互动试验，以更好地将人工智能的表现情境化，并优化人工智能的目标和性能。基于这些发现，我们强调改进与人工智能和XAI的入职以及人机协同决策的方向。

更新时间: 2024-05-26 04:30:17

领域: cs.HC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16424v1

AI-Generated Text Detection and Classification Based on BERT Deep Learning Algorithm

AI-generated text detection plays an increasingly important role in various fields. In this study, we developed an efficient AI-generated text detection model based on the BERT algorithm, which provides new ideas and methods for solving related problems. In the data preprocessing stage, a series of steps were taken to process the text, including operations such as converting to lowercase, word splitting, removing stop words, stemming extraction, removing digits, and eliminating redundant spaces, to ensure data quality and accuracy. By dividing the dataset into a training set and a test set in the ratio of 60% and 40%, and observing the changes in the accuracy and loss values during the training process, we found that the model performed well during the training process. The accuracy increases steadily from the initial 94.78% to 99.72%, while the loss value decreases from 0.261 to 0.021 and converges gradually, which indicates that the BERT model is able to detect AI-generated text with high accuracy and the prediction results are gradually approaching the real classification results. Further analysis of the results of the training and test sets reveals that in terms of loss value, the average loss of the training set is 0.0565, while the average loss of the test set is 0.0917, showing a slightly higher loss value. As for the accuracy, the average accuracy of the training set reaches 98.1%, while the average accuracy of the test set is 97.71%, which is not much different from each other, indicating that the model has good generalisation ability. In conclusion, the AI-generated text detection model based on the BERT algorithm proposed in this study shows high accuracy and stability in experiments, providing an effective solution for related fields.

Updated: 2024-05-26 04:26:07

标题: 基于BERT深度学习算法的AI生成文本检测和分类

摘要: AI生成文本检测在各个领域中扮演着越来越重要的角色。在这项研究中，我们基于BERT算法开发了一种高效的AI生成文本检测模型，为解决相关问题提供了新的思路和方法。在数据预处理阶段，我们采取了一系列步骤来处理文本，包括将文本转换为小写、分词、去除停用词、提取词干、去除数字和消除冗余空格，以确保数据质量和准确性。通过将数据集按照60%和40%的比例划分为训练集和测试集，并观察训练过程中准确率和损失值的变化，我们发现模型在训练过程中表现良好。准确率从初始的94.78%稳步提升至99.72%，而损失值从0.261降至0.021并逐渐收敛，表明BERT模型能够高准确度地检测AI生成文本，预测结果逐渐接近真实分类结果。进一步分析训练集和测试集的结果显示，在损失值方面，训练集的平均损失为0.0565，而测试集的平均损失为0.0917，显示出略高的损失值。至于准确率，训练集的平均准确率达到98.1%，而测试集的平均准确率为97.71%，相差不大，表明模型具有良好的泛化能力。总之，本研究提出的基于BERT算法的AI生成文本检测模型在实验证明具有高准确性和稳定性，为相关领域提供了有效的解决方案。

更新时间: 2024-05-26 04:26:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16422v1

WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment. The world model tries to explain its interactions, while also being optimistic about what reward it can achieve. We define this optimism as a logical constraint between a program and a planner. We study our agent on gridworlds, and on task planning, finding our approach is more sample-efficient compared to deep RL, more compute-efficient compared to ReAct-style agents, and that it can transfer its knowledge across environments by editing its code.

Updated: 2024-05-26 04:24:04

标题: WorldCoder，一种基于模型的LLM代理：通过编写代码和与环境交互构建世界模型

摘要: 我们提出了一个基于模型的代理程序，根据其与环境的交互建立表示其世界知识的Python程序。这个世界模型尝试解释其交互，同时也对其可以实现的奖励持乐观态度。我们将这种乐观主义定义为程序和规划器之间的逻辑约束。我们在网格世界和任务规划上研究了我们的代理程序，发现与深度强化学习相比，我们的方法更加样本高效，与ReAct风格代理相比更加计算高效，并且它可以通过编辑其代码在不同环境中传输知识。

更新时间: 2024-05-26 04:24:04

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.12275v2

Towards Sustainable IoT: Challenges, Solutions, and Future Directions for Device Longevity

In an era dominated by the Internet of Things, ensuring the longevity and sustainability of IoT devices has emerged as a pressing concern. This study explores the various complex difficulties which contributed to the early decommissioning of IoT devices and suggests methods to improve their lifespan management. By examining factors such as security vulnerabilities, user awareness gaps, and the influence of fashion-driven technology trends, the paper underscores the need for legislative interventions, consumer education, and industry accountability. Additionally, it explores innovative approaches to improving IoT longevity, including the integration of sustainability considerations into architectural design through requirements engineering methodologies. Furthermore, the paper discusses the potential of distributed ledger technology, or blockchain, to promote transparent and decentralized processes for device provisioning and tracking. This study promotes a sustainable IoT ecosystem by integrating technology innovation, legal change, and social awareness to reduce environmental impact and enhance resilience for the digital future

Updated: 2024-05-26 04:05:01

标题: 走向可持续的物联网：设备长寿的挑战、解决方案和未来发展方向

摘要: 在一个由物联网主导的时代，确保物联网设备的长期性和可持续性已经成为一个紧迫的问题。本研究探讨了导致物联网设备提前退役的各种复杂困难，并提出了改善其寿命管理的方法。通过考察诸如安全漏洞、用户意识差距以及时尚驱动的技术趋势等因素，本文强调了立法干预、消费者教育和行业责任的必要性。此外，它探讨了改善物联网设备寿命的创新方法，包括通过需求工程方法将可持续性考虑融入建筑设计中。此外，本文还讨论了分布式分类账技术，或者区块链，促进透明和去中心化的设备配置和跟踪过程的潜力。本研究通过整合技术创新、法律变革和社会意识，促进可持续的物联网生态系统，以减少环境影响，并增强数字未来的韧性。

更新时间: 2024-05-26 04:05:01

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.16421v1

Decoding Social Sentiment in DAO: A Comparative Analysis of Blockchain Governance Communities

Blockchain technology is leading a revolutionary transformation across diverse industries, with effective governance being critical for the success and sustainability of blockchain projects. Community forums, pivotal in engaging decentralized autonomous organizations (DAOs), significantly impact blockchain governance decisions. Concurrently, Natural Language Processing (NLP), particularly sentiment analysis, provides powerful insights from textual data. While prior research has explored the potential of NLP tools in social media sentiment analysis, there is a gap in understanding the sentiment landscape of blockchain governance communities. The evolving discourse and sentiment dynamics on the forums of top DAOs remain largely unknown. This paper delves deep into the evolving discourse and sentiment dynamics on the public forums of leading DeFi projects: Aave, Uniswap, Curve DAO, Yearn.finance, Merit Circle, and Balancer, focusing primarily on discussions related to governance issues. Our study shows that participants in decentralized communities generally express positive sentiments during Discord discussions. Furthermore, there is a potential interaction between discussion intensity and sentiment dynamics; higher discussion volume may contribute to a more stable sentiment from code analysis. The insights gained from this study are valuable for decision-makers in blockchain governance, underscoring the pivotal role of sentiment analysis in interpreting community emotions and its evolving impact on the landscape of blockchain governance. This research significantly contributes to the interdisciplinary exploration of the intersection of blockchain and society, specifically emphasizing the decentralized blockchain governance ecosystem. We provide our data and code for replicability as open access on GitHub.

Updated: 2024-05-26 03:47:19

标题: 解读区块链治理社区中的社会情绪：DAO的比较分析

摘要: 区块链技术正在引领跨行业的革命性转变，有效的治理对于区块链项目的成功和可持续性至关重要。社区论坛在参与去中心化自治组织（DAOs）方面发挥着关键作用，显著影响区块链治理决策。同时，自然语言处理（NLP），尤其是情绪分析，可以从文本数据中提供强大的见解。虽然先前的研究已经探讨了NLP工具在社交媒体情绪分析中的潜力，但对区块链治理社区情绪格局的理解仍存在差距。顶级DAOs论坛上的不断发展的话语和情绪动态仍然大部分是未知的。本文深入研究了领先DeFi项目（包括Aave、Uniswap、Curve DAO、Yearn.finance、Merit Circle和Balancer）公共论坛上的不断发展的话语和情绪动态，主要关注与治理问题相关的讨论。我们的研究表明，在Discord讨论中，去中心化社区的参与者通常表达积极的情绪。此外，讨论强度与情绪动态之间存在潜在的互动关系；更高的讨论量可能会对来自代码分析的更稳定的情绪产生影响。从这项研究中获得的见解对于区块链治理的决策者是有价值的，突出了情绪分析在解释社区情绪及其对区块链治理格局不断演变的影响中的关键作用。这项研究对区块链和社会交叉领域的跨学科探索做出了重要贡献，特别强调了去中心化区块链治理生态系统。我们在GitHub上以开放获取的形式提供我们的数据和代码，以便进行复制性研究。

更新时间: 2024-05-26 03:47:19

领域: cs.CY,cs.CR,cs.HC,econ.GN,q-fin.EC,stat.AP

下载: http://arxiv.org/abs/2311.14676v3

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

Federated learning (FL) has become a key component in various language modeling applications such as machine translation, next-word prediction, and medical record analysis. These applications are trained on datasets from many FL participants that often include privacy-sensitive data, such as healthcare records, phone/credit card numbers, login credentials, etc. Although FL enables computation without necessitating clients to share their raw data, determining the extent of privacy leakage in federated language models is challenging and not straightforward. Moreover, existing attacks aim to extract data regardless of how sensitive or naive it is. To fill this research gap, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated large language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other users in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 71% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.

Updated: 2024-05-26 03:44:52

标题: FLTrojan：通过选择性权重篡改针对联邦语言模型的隐私泄露攻击

摘要: 联邦学习（FL）已经成为各种语言建模应用的关键组成部分，例如机器翻译，下一个词预测和医疗记录分析。这些应用程序是在许多FL参与者的数据集上进行训练的，这些数据集通常包括隐私敏感数据，如医疗记录，电话/信用卡号码，登录凭据等。尽管FL使计算能够进行而无需客户共享原始数据，但确定联邦语言模型中的隐私泄漏程度具有挑战性，不是简单明了的。此外，现有的攻击旨在提取数据，无论数据有多敏感或天真。为填补这一研究空白，我们引入了两个关于从联邦大型语言模型泄漏隐私敏感用户数据的新发现。首先，我们观察到FL中间轮次的模型快照可能导致比最终训练模型更大的隐私泄漏。其次，我们发现隐私泄漏可以通过篡改模型的特定权重而加剧，这些权重负责记忆敏感训练数据。我们展示了恶意客户如何在FL中甚至在没有服务器合作的情况下泄露其他用户的隐私敏感数据。我们的性能最佳方法将会员推断召回率提高了29％，并实现了高达71％的私人数据重建，明显优于现有攻击对对手能力更强的假设。

更新时间: 2024-05-26 03:44:52

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2310.16152v2

Explainable Few-shot Knowledge Tracing

Knowledge tracing (KT), aiming to mine students' mastery of knowledge by their exercise records and predict their performance on future test questions, is a critical task in educational assessment. While researchers achieved tremendous success with the rapid development of deep learning techniques, current knowledge tracing tasks fall into the cracks from real-world teaching scenarios. Relying heavily on extensive student data and solely predicting numerical performances differs from the settings where teachers assess students' knowledge state from limited practices and provide explanatory feedback. To fill this gap, we explore a new task formulation: Explainable Few-shot Knowledge Tracing. By leveraging the powerful reasoning and generation abilities of large language models (LLMs), we then propose a cognition-guided framework that can track the student knowledge from a few student records while providing natural language explanations. Experimental results from three widely used datasets show that LLMs can perform comparable or superior to competitive deep knowledge tracing methods. We also discuss potential directions and call for future improvements in relevant topics.

Updated: 2024-05-26 03:43:33

标题: 可解释的少样本知识追踪

摘要: 知识追踪（KT）旨在通过学生的练习记录挖掘他们对知识的掌握，并预测他们在未来测试问题中的表现，是教育评估中的关键任务。尽管研究人员在深度学习技术的快速发展中取得了巨大成功，但当前的知识追踪任务在真实教学场景中存在缺陷。过度依赖大量学生数据并仅预测数字表现与教师根据有限练习评估学生知识状态并提供解释性反馈的情境不同。为填补这一差距，我们探索了一种新的任务形式：可解释的少样本知识追踪。通过利用大型语言模型（LLMs）强大的推理和生成能力，我们提出了一个以认知为指导的框架，可以从少量学生记录中跟踪学生的知识，并提供自然语言解释。来自三个广泛使用的数据集的实验结果显示，LLMs可以表现出与竞争性深度知识追踪方法相媲美或更优越的性能。我们还讨论了潜在的方向，并呼吁未来在相关主题上进行改进。

更新时间: 2024-05-26 03:43:33

领域: cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2405.14391v2

Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers

Multi-Channel Imaging (MCI) contains an array of challenges for encoding useful feature representations not present in traditional images. For example, images from two different satellites may both contain RGB channels, but the remaining channels can be different for each imaging source. Thus, MCI models must support a variety of channel configurations at test time. Recent work has extended traditional visual encoders for MCI, such as Vision Transformers (ViT), by supplementing pixel information with an encoding representing the channel configuration. However, these methods treat each channel equally, i.e., they do not consider the unique properties of each channel type, which can result in needless and potentially harmful redundancies in the learned features. For example, if RGB channels are always present, the other channels can focus on extracting information that cannot be captured by the RGB channels. To this end, we propose DiChaViT, which aims to enhance the diversity in the learned features of MCI-ViT models. This is achieved through a novel channel sampling strategy that encourages the selection of more distinct channel sets for training. Additionally, we employ regularization and initialization techniques to increase the likelihood that new information is learned from each channel. Many of our improvements are architecture agnostic and could be incorporated into new architectures as they are developed. Experiments on both satellite and cell microscopy datasets, CHAMMI, JUMP-CP, and So2Sat, report DiChaViT yields a 1.5-5.0% gain over the state-of-the-art.

Updated: 2024-05-26 03:41:40

标题: 增强特征多样性有利于提升通道自适应视觉变换器

摘要: 多通道成像（MCI）在编码有用的特征表示方面存在一系列挑战，这些挑战在传统图像中并不存在。例如，来自两个不同卫星的图像可能都包含RGB通道，但剩余通道可能对每个成像源不同。因此，MCI模型在测试时必须支持各种通道配置。最近的工作已经扩展了用于MCI的传统视觉编码器，如Vision Transformers（ViT），通过补充像素信息以表示通道配置。然而，这些方法将每个通道视为平等处理，即它们并不考虑每种通道类型的独特特性，这可能导致学到的特征中存在不必要且潜在有害的冗余。例如，如果RGB通道总是存在，其他通道可以专注于提取RGB通道无法捕捉的信息。因此，我们提出了DiChaViT，旨在增强MCI-ViT模型学到的特征的多样性。通过一种新颖的通道抽样策略，鼓励选择更不同的通道集进行训练。此外，我们采用正则化和初始化技术，增加了从每个通道学到新信息的可能性。我们的许多改进是与架构无关的，可以在开发新架构时进行整合。对卫星和细胞显微镜数据集CHAMMI、JUMP-CP和So2Sat的实验表明，DiChaViT相对于最先进技术获得了1.5-5.0%的增益。

更新时间: 2024-05-26 03:41:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16419v1

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Diffusion models have made rapid progress in generating high-quality samples across various domains. However, a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process is still lacking. In this paper, we bridge this gap by providing a detailed examination of these smoothness properties for the case where the target data distribution is a mixture of Gaussians, which serves as a universal approximator for smooth densities such as image data. We prove that if the target distribution is a $k$-mixture of Gaussians, the density of the entire diffusion process will also be a $k$-mixture of Gaussians. We then derive tight upper bounds on the Lipschitz constant and second momentum that are independent of the number of mixture components $k$. Finally, we apply our analysis to various diffusion solvers, both SDE and ODE based, to establish concrete error guarantees in terms of the total variation distance and KL divergence between the target and learned distributions. Our results provide deeper theoretical insights into the dynamics of the diffusion process under common data distributions.

Updated: 2024-05-26 03:32:27

标题: 揭示扩散模型的平滑性质：高斯混合透视

摘要: 扩散模型在各个领域生成高质量样本方面取得了快速进展。然而，对于扩散过程的利普希茨连续性和二阶矩性质的理论理解仍然不足。本文通过对目标数据分布为高斯混合的情况进行详细研究，填补了这一空白，高斯混合在光滑密度（如图像数据）的普遍逼近器中发挥作用。我们证明，如果目标分布是高斯混合的$k$个成分，整个扩散过程的密度也将是一个$k$个高斯混合。然后，我们推导了利普希茨常数和二阶矩的紧密上界，这些上界与混合成分数$k$无关。最后，我们将分析应用于各种扩散求解器，包括基于SDE和ODE的求解器，以建立目标分布与学习分布之间的总变化距离和KL散度的具体误差保证。我们的结果为在常见数据分布下的扩散过程动态提供了更深入的理论洞察。

更新时间: 2024-05-26 03:32:27

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.16418v1

Instruction-tuned Language Models are Better Knowledge Learners

In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs. However, we find that LLMs trained with this recipe struggle to answer questions, even though the perplexity of documents is minimized. We found that QA pairs are generally straightforward, while documents are more complex, weaving many factual statements together in an intricate manner. Therefore, we hypothesize that it is beneficial to expose LLMs to QA pairs before continued pre-training on documents so that the process of encoding knowledge from complex documents takes into account how this knowledge is accessed through questions. Based on this, we propose pre-instruction-tuning (PIT), a method that instruction-tunes on questions prior to training on documents. This contrasts with standard instruction-tuning, which learns how to extract knowledge after training on documents. Extensive experiments and ablation studies demonstrate that pre-instruction-tuning significantly enhances the ability of LLMs to absorb knowledge from new documents, outperforming standard instruction-tuning by 17.8%.

Updated: 2024-05-26 03:19:48

标题: 调整后的语言模型是更好的知识学习者

摘要: 为了使基于大型语言模型（LLM）的助手能够有效地适应不断变化的信息需求，必须能够通过持续训练新数据来更新它们的事实知识。实现这一目标的标准方法包括在新文档上进行持续预训练，然后在问题-答案（QA）对上进行指导调整。然而，我们发现，通过这种方法训练的LLM在回答问题时存在困难，尽管文档的困惑度被最小化。我们发现QA对通常很简单，而文档更复杂，以错综复杂的方式将许多事实陈述编织在一起。因此，我们假设将LLM暴露于QA对之前的持续预训练文档，使得从复杂文档中编码知识的过程考虑了如何通过问题访问这些知识。基于此，我们提出了预指导调整（PIT）方法，该方法在训练文档之前对问题进行指导调整。这与标准指导调整形成对比，后者是在训练文档后学习如何提取知识。大量实验证明，预指导调整显著提高了LLM吸收新文档知识的能力，比标准指导调整高出17.8%。

更新时间: 2024-05-26 03:19:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.12847v2

Exploring Nutritional Impact on Alzheimer's Mortality: An Explainable AI Approach

This article uses machine learning (ML) and explainable artificial intelligence (XAI) techniques to investigate the relationship between nutritional status and mortality rates associated with Alzheimers disease (AD). The Third National Health and Nutrition Examination Survey (NHANES III) database is employed for analysis. The random forest model is selected as the base model for XAI analysis, and the Shapley Additive Explanations (SHAP) method is used to assess feature importance. The results highlight significant nutritional factors such as serum vitamin B12 and glycated hemoglobin. The study demonstrates the effectiveness of random forests in predicting AD mortality compared to other diseases. This research provides insights into the impact of nutrition on AD and contributes to a deeper understanding of disease progression.

Updated: 2024-05-26 03:18:47

标题: 探索营养对阿尔茨海默病死亡率的影响：可解释的人工智能方法

摘要: 本文利用机器学习（ML）和可解释的人工智能（XAI）技术，研究营养状况与阿尔茨海默病（AD）相关的死亡率之间的关系。使用第三次全国健康和营养调查（NHANES III）数据库进行分析。选择随机森林模型作为XAI分析的基础模型，并使用Shapley加性解释（SHAP）方法评估特征的重要性。结果突出显示了血清维生素B12和糖化血红蛋白等重要营养因素。研究表明，与其他疾病相比，随机森林在预测AD死亡率方面具有显著效果。这项研究为营养对AD的影响提供了见解，并有助于更深入地了解疾病的发展过程。

更新时间: 2024-05-26 03:18:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17502v1

Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for developing ADRD screening tools such as machine learning based predictive models. Recent advancements in large language models (LLMs) demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health \& Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare.

Updated: 2024-05-26 03:05:10

标题: 使用大型语言模型从电子健康记录中增强阿尔茨海默病发病风险预测

摘要: 阿尔茨海默病（AD）是美国65岁及以上人群中第五大死因。对AD及相关痴呆症（ADRD）进行筛查和早期检测对及时干预和识别临床试验参与者至关重要。电子健康记录（EHRs）的广泛采用为开发基于机器学习的预测模型等ADRD筛查工具提供了重要资源。最近大型语言模型（LLMs）的进展展示了它们编码知识和进行推理的前所未有能力，这使它们具有增强风险预测的强大潜力。本文提出了一种新颖的流程，通过利用LLMs的少样本推理能力，在传统监督学习方法（SLs）可能表现不佳的案例上进行预测，从而增强风险预测。具体来说，我们开发了一种协作流程，通过一种基于置信度的决策机制将SLs和LLMs结合起来，利用SLs在明显案例中的优势以及LLMs在更复杂情况下的优势。我们使用俄勒冈健康与科学大学（OHSU）医院的真实EHR数据仓库对此流程进行评估，该数据仓库包含超过250万患者和超过2000万患者就诊记录的EHRs。我们的结果显示，我们提出的方法有效地结合了SLs和LLMs的能力，显著提高了预测性能。这一进展有望革新ADRD筛查和早期检测实践，对改进患者管理策略以及提高医疗保健具有潜在影响。

更新时间: 2024-05-26 03:05:10

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16413v1

Modeling the Evolutionary Trends in Corporate ESG Reporting: A Study based on Knowledge Management Model

Environmental, social, and governance (ESG) reports are globally recognized as a keystone in sustainable enterprise development. However, current literature has not concluded the development of topics and trends in ESG contexts in the twenty-first century. Therefore, We selected 1114 ESG reports from firms in the technology industry to analyze the evolutionary trends of ESG topics by text mining. We discovered the homogenization effect towards low environmental, medium governance, and high social features in the evolution. We also designed a strategic framework to look closer into the dynamic changes of firms' within-industry scores and across-domain importances. We found that companies are gradually converging towards the third quadrant, which indicates that firms contribute less to industrial outstanding and professional distinctiveness in ESG reporting. Firms choose to imitate ESG reports from each other to mitigate uncertainty and enhance behavioral legitimacy.

Updated: 2024-05-26 03:05:06

标题: 模拟企业ESG报告的演化趋势：基于知识管理模型的研究

摘要: 环境、社会和治理（ESG）报告被全球认可为可持续企业发展的基石。然而，当前文献尚未总结21世纪ESG背景下的主题和趋势发展。因此，我们选择了来自科技行业企业的1114份ESG报告，通过文本挖掘分析ESG主题的演变趋势。我们发现了朝向低环境、中治理和高社会特征的同质化效应。我们还设计了一个战略框架，更深入地观察企业在行业内得分和跨领域重要性的动态变化。我们发现公司逐渐向第三象限聚拢，这表明企业在ESG报告中对行业的杰出和专业特性贡献越来越少。企业选择互相模仿ESG报告以减轻不确定性并增强行为合法性。

更新时间: 2024-05-26 03:05:06

领域: cs.CE,cs.AI,stat.AP

下载: http://arxiv.org/abs/2309.07001v2

KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-guided refinement to construct a semantically coherent hierarchical structure of entity clusters. By incorporating this hierarchical knowledge along with textual information during the fine-tuning process, KG-FIT effectively captures both global semantics from the LLM and local semantics from the KG. Extensive experiments on the benchmark datasets FB15K-237, YAGO3-10, and PrimeKG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods, achieving improvements of 14.4%, 13.5%, and 11.9% in the Hits@10 metric for the link prediction task, respectively. Furthermore, KG-FIT yields substantial performance gains of 12.6%, 6.7%, and 17.7% compared to the structure-based base models upon which it is built. These results highlight the effectiveness of KG-FIT in incorporating open-world knowledge from LLMs to significantly enhance the expressiveness and informativeness of KG embeddings.

Updated: 2024-05-26 03:04:26

标题: KG-FIT：基于开放世界知识的知识图微调

摘要: 知识图谱嵌入（KGE）技术在学习知识图谱中实体和关系的紧凑表示方面至关重要，促进了有效推理和知识发现。虽然现有方法通常要么专注于仅基于图结构训练KGE模型，要么通过在知识图中对预训练语言模型进行微调来处理分类数据，但KG-FIT利用LLM引导的细化来构建实体簇的语义一致的层次结构。通过在微调过程中结合这种层次知识和文本信息，KG-FIT有效地捕获了LLM的全局语义和知识图中的局部语义。在基准数据集FB15K-237、YAGO3-10和PrimeKG上进行的大量实验证明了KG-FIT相对于基于最先进的预训练语言模型的方法的优越性，在链接预测任务的Hits@10指标中分别取得了14.4%、13.5%和11.9%的改进。此外，与其基于的基于结构的基本模型相比，KG-FIT实现了12.6%、6.7%和17.7%的显着性能提升。这些结果突显了KG-FIT在整合LLMs的开放世界知识以显著增强KG嵌入的表达能力和信息量方面的有效性。

更新时间: 2024-05-26 03:04:26

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16412v1

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention. However, the $\Omega(n^3)$ time complexity of tensor attention poses a significant obstacle to its practical implementation in transformers, where $n$ is the input sequence length. In this work, we prove that the backward gradient of tensor attention training can be computed in almost linear $n^{1+o(1)}$ time, the same complexity as its forward computation under a bounded entries assumption. We provide a closed-form solution for the gradient and propose a fast computation method utilizing polynomial approximation methods and tensor algebraic tricks. Furthermore, we prove the necessity and tightness of our assumption through hardness analysis, showing that slightly weakening it renders the gradient problem unsolvable in truly subcubic time. Our theoretical results establish the feasibility of efficient higher-order transformer training and may facilitate practical applications of tensor attention architectures.

Updated: 2024-05-26 02:59:13

标题: 张量注意力训练：高阶Transformer的可证有效学习

摘要: 张量注意力是一种多视图注意力机制，能够捕捉多种模态之间的高阶相关性，可以克服传统矩阵注意力的表示限制。然而，张量注意力的时间复杂度为$Ω(n^3)$，这在实际的transformer实现中构成了一个重要障碍，其中$n$是输入序列的长度。在这项工作中，我们证明了张量注意力训练的反向梯度可以在几乎线性的$n^{1+o(1)}$时间内计算，假设输入项有界，其复杂度与前向计算相同。我们提供了梯度的闭式解，并提出了一种利用多项式逼近方法和张量代数技巧的快速计算方法。此外，我们通过难度分析证明了我们假设的必要性和严密性，表明稍微削弱假设会使梯度问题在真正的次立方时间内无法解决。我们的理论结果确立了高效高阶transformer训练的可行性，并可能促进张量注意力结构在实际应用中的应用。

更新时间: 2024-05-26 02:59:13

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.16411v1

Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications

Robust machine learning (ML) models can be developed by leveraging large volumes of data and distributing the computational tasks across numerous devices or servers. Federated learning (FL) is a technique in the realm of ML that facilitates this goal by utilizing cloud infrastructure to enable collaborative model training among a network of decentralized devices. Beyond distributing the computational load, FL targets the resolution of privacy issues and the reduction of communication costs simultaneously. To protect user privacy, FL requires users to send model updates rather than transmitting large quantities of raw and potentially confidential data. Specifically, individuals train ML models locally using their own data and then upload the results in the form of weights and gradients to the cloud for aggregation into the global model. This strategy is also advantageous in environments with limited bandwidth or high communication costs, as it prevents the transmission of large data volumes. With the increasing volume of data and rising privacy concerns, alongside the emergence of large-scale ML models like Large Language Models (LLMs), FL presents itself as a timely and relevant solution. It is therefore essential to review current FL algorithms to guide future research that meets the rapidly evolving ML demands. This survey provides a comprehensive analysis and comparison of the most recent FL algorithms, evaluating them on various fronts including mathematical frameworks, privacy protection, resource allocation, and applications. Beyond summarizing existing FL methods, this survey identifies potential gaps, open areas, and future challenges based on the performance reports and algorithms used in recent studies. This survey enables researchers to readily identify existing limitations in the FL field for further exploration.

Updated: 2024-05-26 02:37:36

标题: 《联邦学习：最新进展和应用的前沿调查》

摘要: 强大的机器学习（ML）模型可以通过利用大量数据并将计算任务分布在众多设备或服务器上来开发。联邦学习（FL）是ML领域的一种技术，通过利用云基础设施实现分布式设备网络之间的协作模型训练，促进了这一目标的实现。除了分配计算负载外，FL还致力于同时解决隐私问题和降低通信成本。为了保护用户隐私，FL要求用户发送模型更新，而不是传输大量原始和可能机密的数据。具体而言，个体使用自己的数据在本地训练ML模型，然后将结果以权重和梯度的形式上传到云端，以便进行全局模型的聚合。这种策略在带宽有限或通信成本高的环境中也具有优势，因为它可以防止大数据量的传输。随着数据量的增加和隐私问题的日益增加，以及大规模ML模型如大型语言模型（LLMs）的出现，FL成为一种及时和相关的解决方案。因此，必须审查当前的FL算法，以指导未来满足不断发展的ML需求的研究。本调查对最新的FL算法进行了全面分析和比较，评估它们在数学框架、隐私保护、资源分配和应用等方面。除了总结现有的FL方法，本调查还根据最近研究中使用的性能报告和算法，确定了潜在的差距、开放领域和未来挑战。这项调查使研究人员能够快速识别FL领域的现有局限性，以便进一步探究。

更新时间: 2024-05-26 02:37:36

领域: cs.LG,cs.AI,cs.CR,cs.DC

下载: http://arxiv.org/abs/2310.05269v3

Network Interdiction Goes Neural

Network interdiction problems are combinatorial optimization problems involving two players: one aims to solve an optimization problem on a network, while the other seeks to modify the network to thwart the first player's objectives. Such problems typically emerge in an attacker-defender context, encompassing areas such as military operations, disease spread analysis, and communication network management. The primary bottleneck in network interdiction arises from the high time complexity of using conventional exact solvers and the challenges associated with devising efficient heuristic solvers. GNNs, recognized as a cutting-edge methodology, have shown significant effectiveness in addressing single-level CO problems on graphs, such as the traveling salesman problem, graph matching, and graph edit distance. Nevertheless, network interdiction presents a bi-level optimization challenge, which current GNNs find difficult to manage. To address this gap, we represent network interdiction problems as Mixed-Integer Linear Programming (MILP) instances, then apply a multipartite GNN with sufficient representational capacity to learn these formulations. This approach ensures that our neural network is more compatible with the mathematical algorithms designed to solve network interdiction problems, resulting in improved generalization. Through two distinct tasks, we demonstrate that our proposed method outperforms theoretical baseline models and provides advantages over traditional exact solvers.

Updated: 2024-05-26 02:34:26

标题: 网络拦截进入神经网络领域

摘要: 网络截击问题是涉及两个玩家的组合优化问题：一个旨在解决网络上的优化问题，而另一个寻求修改网络以挫败第一个玩家的目标。这类问题通常出现在攻击者与防御者之间的背景下，涵盖领域包括军事行动、疾病传播分析和通信网络管理等。网络截击中的主要瓶颈源于使用传统精确求解器的高时空复杂度以及设计高效启发式求解器所面临的挑战。被认为是尖端方法的GNNs在解决单层图上的CO问题（如旅行推销员问题、图匹配和图编辑距离）方面表现出显著的有效性。然而，网络截击提出了一个双层优化挑战，当前的GNNs难以处理。为了填补这一差距，我们将网络截击问题表示为混合整数线性规划（MILP）实例，然后应用具有足够表征能力的多部分GNN来学习这些公式。这种方法确保我们的神经网络更加兼容于设计用于解决网络截击问题的数学算法，从而实现更好的泛化性能。通过两个不同的任务，我们证明了我们提出的方法优于理论基线模型，并且优于传统精确求解器。

更新时间: 2024-05-26 02:34:26

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16409v1

Geometry of Critical Sets and Existence of Saddle Branches for Two-layer Neural Networks

This paper presents a comprehensive analysis of critical point sets in two-layer neural networks. To study such complex entities, we introduce the critical embedding operator and critical reduction operator as our tools. Given a critical point, we use these operators to uncover the whole underlying critical set representing the same output function, which exhibits a hierarchical structure. Furthermore, we prove existence of saddle branches for any critical set whose output function can be represented by a narrower network. Our results provide a solid foundation to the further study of optimization and training behavior of neural networks.

Updated: 2024-05-26 02:32:28

标题: 关键集的几何结构和双层神经网络中鞍点分支的存在性

摘要: 本文对双层神经网络中的临界点集进行了全面分析。为了研究这种复杂的实体，我们引入了临界嵌入算子和临界缩减算子作为我们的工具。给定一个临界点，我们使用这些算子来揭示整个底层临界集，代表相同的输出函数，这展示了一个分层结构。此外，我们证明了对于任何输出函数可以由较窄网络表示的临界集存在鞍支。我们的结果为进一步研究神经网络的优化和训练行为提供了坚实的基础。

更新时间: 2024-05-26 02:32:28

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.17501v1

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to discern whether the action contributes to accomplishing the final task. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. More results are given at https://read-llm.github.io/.

Updated: 2024-05-26 02:31:15

标题: 朝着高效的LLM基础建设迈进：面向具体多智能体协作的研究

摘要: 将大型语言模型（LLMs）的推理能力与实体任务联系起来具有挑战性，因为物理世界的复杂性。特别是，LLM规划多智能体协作需要智能体之间的沟通或信用分配作为反馈，以重新调整提出的计划并实现有效协调。然而，现有方法过分依赖物理验证或自我反思，导致对LLMs的过度和低效的查询。在本文中，我们提出了一个新颖的多智能体协作框架，引入了强化优势反馈（ReAd）以实现计划的高效自我完善。具体地，我们执行评论家回归，从LLM规划的数据中学习顺序优势函数，然后将LLM规划者视为优化器，生成最大化优势函数的动作。它赋予LLM预见性，判断该动作是否有助于完成最终任务。我们通过将强化学习中的优势加权回归扩展到多智能体系统，提供了理论分析。在Overcooked-AI和RoCoBench的一个困难变体上的实验证明，ReAd在成功率方面超过了基线，并显著减少了智能体的互动步骤和LLMs的查询轮次，展示了其在基于LLMs的基础上的高效性。更多结果请查看https://read-llm.github.io/。

更新时间: 2024-05-26 02:31:15

领域: cs.AI,cs.CL,cs.LG,cs.MA,cs.RO

下载: http://arxiv.org/abs/2405.14314v2

Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding

Instruction-tuned large language models (LLMs) excel at many tasks but often fail to use external tools due to complicated and unfamiliar syntax constraints. While extensive fine-tuning and prompting can mitigate the issue, these approaches are expensive and hard to generalize. Furthermore, because syntax constraints are only learned implicitly during fine-tuning, models still make frequent syntax errors. Motivated by the fact that these constraints can be better satisfied explicitly with constrained decoding, we propose TOOLDEC, a decoding algorithm using finite state machines to force LLMs to follow tool syntax. Our experiments show that TOOLDEC eliminates all syntax errors, achieving significantly better performance on various base models and benchmarks. More surprisingly, when applied to generalist out-of-the-box LLMs such as Mistral-Instruct, TOOLDEC improves its accuracy in tool use from the initial 0% to an impressive 52%, matching the performance of specialized fine-tuned models such as ToolLLM.

Updated: 2024-05-26 02:16:18

标题: 不要微调，解码：通过受限解码实现无语法错误的工具使用

摘要: Instruction-tuned large language models (LLMs) are proficient in many tasks, but often struggle to utilize external tools due to complex and unfamiliar syntax constraints. While extensive fine-tuning and prompting can help alleviate this issue, these methods are costly and difficult to apply universally. Additionally, since syntax constraints are only implicitly learned during fine-tuning, models still frequently make syntax errors. Recognizing that these constraints can be more effectively met through constrained decoding, we introduce TOOLDEC, a decoding algorithm that employs finite state machines to compel LLMs to adhere to tool syntax. Our experiments demonstrate that TOOLDEC eradicates all syntax errors, leading to significantly improved performance across various base models and benchmarks. Surprisingly, when implemented on generalist out-of-the-box LLMs like Mistral-Instruct, TOOLDEC enhances tool utilization accuracy from an initial 0% to an impressive 52%, matching the performance of specialized fine-tuned models such as ToolLLM.

更新时间: 2024-05-26 02:16:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.07075v2

SpinQuant -- LLM quantization with learned rotations

Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures, and find that some random rotations lead to much better quantization than others, with an up to 13 points difference in downstream zero-shot reasoning performance. As a result, we propose SpinQuant that optimizes (or learns) the rotation matrices with Cayley optimization on a small validation set. With 4-bit quantization of weight, activation, and KV-cache, SpinQuant narrows the accuracy gap on zero-shot reasoning tasks with full precision to merely 2.9 points on the LLaMA-2 7B model, surpassing LLM-QAT by 19.1 points and SmoothQuant by 25.0 points. SpinQuant also outperforms concurrent work QuaRot, which applies random rotations to remove outliers. In particular, for LLaMA-2 7B/LLaMA-3 8B models that are hard to quantize, SpinQuant reduces the gap to full precision by 30.2%/34.1% relative to QuaRot.

Updated: 2024-05-26 02:15:49

标题: SpinQuant -- 使用学习旋转的LLM量子化

摘要: 培训后量化（PTQ）技术应用于权重、激活和KV缓存，大大减少了大型语言模型（LLMs）的内存使用、延迟和功耗，但在存在异常值时可能导致较大的量化误差。最近的研究发现，旋转激活或权重矩阵有助于消除异常值，并有益于量化。在这项工作中，我们确定了一系列适用的旋转参数化，可以在完整精度的Transformer架构中产生相同的输出，并发现一些随机旋转比其他旋转方法更好，在下游零样本推理性能上相差多达13个点。因此，我们提出了SpinQuant，通过Cayley优化在一个小的验证集上优化（或学习）旋转矩阵。通过对权重、激活和KV缓存进行4位量化，SpinQuant将在LLaMA-27B模型上的零样本推理任务上将精度差距缩小到仅2.9个点，超过LLM-QAT 19.1个点和SmoothQuant 25.0个点。SpinQuant还优于同时进行的QuaRot工作，该工作应用随机旋转来消除异常值。特别是对于难以量化的LLaMA-27B/LLaMA-38B模型，SpinQuant将相对于QuaRot的完整精度缩小至30.2％/34.1％。

更新时间: 2024-05-26 02:15:49

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.16406v1

Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level

Graph Neural Networks (GNNs) excel across various applications but remain vulnerable to adversarial attacks, particularly Graph Injection Attacks (GIAs), which inject malicious nodes into the original graph and pose realistic threats. Text-attributed graphs (TAGs), where nodes are associated with textual features, are crucial due to their prevalence in real-world applications and are commonly used to evaluate these vulnerabilities. However, existing research only focuses on embedding-level GIAs, which inject node embeddings rather than actual textual content, limiting their applicability and simplifying detection. In this paper, we pioneer the exploration of GIAs at the text level, presenting three novel attack designs that inject textual content into the graph. Through theoretical and empirical analysis, we demonstrate that text interpretability, a factor previously overlooked at the embedding level, plays a crucial role in attack strength. Among the designs we investigate, the Word-frequency-based Text-level GIA (WTGIA) is particularly notable for its balance between performance and interpretability. Despite the success of WTGIA, we discover that defenders can easily enhance their defenses with customized text embedding methods or large language model (LLM)--based predictors. These insights underscore the necessity for further research into the potential and practical significance of text-level GIAs.

Updated: 2024-05-26 02:12:02

标题: 用文字侵入：探索理解文本级别的图注入攻击

摘要: 图神经网络（GNNs）在各种应用中表现出色，但仍然容易受到敌对攻击的影响，特别是图注入攻击（GIAs），这种攻击向原始图中注入恶意节点并构成现实威胁。文本属性图（TAGs）中，节点与文本特征相关联，由于在现实世界中的广泛应用而至关重要，并且通常用于评估这些漏洞。然而，现有研究仅关注于嵌入级别的GIAs，这些攻击注入节点嵌入而不是实际文本内容，限制了它们的适用性并简化了检测。在本文中，我们首次探索了文本级别的GIAs，提出了三种将文本内容注入图中的新型攻击设计。通过理论和实证分析，我们证明了文本可解释性，在嵌入级别先前被忽视的因素，在攻击强度中起着至关重要的作用。在我们调查的设计中，基于词频的文本级GIA（WTGIA）因其在性能和可解释性之间的平衡而特别引人注目。尽管WTGIA取得了成功，我们发现防御者可以轻松地通过定制文本嵌入方法或基于大型语言模型（LLM）的预测器来增强其防御能力。这些见解强调了进一步研究文本级GIAs的潜力和实际意义的必要性。

更新时间: 2024-05-26 02:12:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16405v1

Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions

The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support through the development of empathetic, patient-facing chatbots. This study investigates an intriguing question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians? To answer this question, we collect a de-identified dataset of patient messages and physician responses from Mayo Clinic and generate alternative replies using ChatGPT. Our analyses incorporate novel empathy ranking evaluation (EMRank) involving both automated metrics and human assessments to gauge the empathy level of responses. Our findings indicate that LLM-powered chatbots have the potential to surpass human physicians in delivering empathetic communication, suggesting a promising avenue for enhancing patient care and reducing professional burnout. The study not only highlights the importance of empathy in patient interactions but also proposes a set of effective automatic empathy ranking metrics, paving the way for the broader adoption of LLMs in healthcare.

Updated: 2024-05-26 01:58:57

标题: 使用真实世界医生-患者互动评估大型语言模型中的同理心

摘要: 将大型语言模型（LLMs）整合到医疗领域中，有潜力通过开发富有同理心的面向患者的聊天机器人，显著增强患者护理和支持。本研究探讨了一个有趣的问题：ChatGPT是否能比通常由医生提供的更具同理心地回应？为了回答这个问题，我们从梅奥诊所收集了一份去匿名化的患者信息和医生回复的数据集，并使用ChatGPT生成替代回复。我们的分析结合了新颖的同理心排名评估（EMRank），包括自动化指标和人类评估，以衡量回复的同理心水平。我们的发现表明，由LLM驱动的聊天机器人有潜力在传递同理心沟通方面超越人类医生，这提示了一个有希望的途径，可以增强患者护理并减少专业倦怠。该研究不仅突显了在患者互动中同理心的重要性，还提出了一组有效的自动同理心排名指标，为在医疗领域更广泛采用LLMs铺平了道路。

更新时间: 2024-05-26 01:58:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.16402v1

Text-Free Multi-domain Graph Pre-training: Toward Graph Foundation Models

Given the ubiquity of graph data, it is intriguing to ask: Is it possible to train a graph foundation model on a broad range of graph data across diverse domains? A major hurdle toward this goal lies in the fact that graphs from different domains often exhibit profoundly divergent characteristics. Although there have been some initial efforts in integrating multi-domain graphs for pre-training, they primarily rely on textual descriptions to align the graphs, limiting their application to text-attributed graphs. Moreover, different source domains may conflict or interfere with each other, and their relevance to the target domain can vary significantly. To address these issues, we propose MDGPT, a text free Multi-Domain Graph Pre-Training and adaptation framework designed to exploit multi-domain knowledge for graph learning. First, we propose a set of domain tokens to to align features across source domains for synergistic pre-training. Second, we propose a dual prompts, consisting of a unifying prompt and a mixing prompt, to further adapt the target domain with unified multi-domain knowledge and a tailored mixture of domain-specific knowledge. Finally, we conduct extensive experiments involving six public datasets to evaluate and analyze MDGPT, which outperforms prior art by up to 37.9%.

Updated: 2024-05-26 01:47:23

标题: 无文本多领域图预训练：走向图基础模型

摘要: 鉴于图形数据的普遍性，人们不禁要问：是否可能在各种不同领域的广泛图形数据上训练一个图形基础模型？实现这一目标的一个主要障碍在于不同领域的图形往往呈现出显著不同的特征。虽然已经有一些初步努力将多领域图形整合进行预训练，但它们主要依赖于文本描述来对齐图形，从而限制了它们在文本属性图形上的应用。此外，不同的源领域可能会相互冲突或干扰，它们与目标领域的相关性可能会显著变化。为解决这些问题，我们提出了MDGPT，一个无文本的多领域图形预训练和适应框架，旨在利用多领域知识进行图形学习。首先，我们提出了一组领域标记，以对齐源领域特征进行协同预训练。其次，我们提出了一个双提示，包括一个统一提示和一个混合提示，以进一步适应目标领域，使用统一的多领域知识和定制的领域特定知识混合。最后，我们进行了涉及六个公共数据集的广泛实验，评估和分析MDGPT，其性能超过先前的技术达37.9%。

更新时间: 2024-05-26 01:47:23

领域: cs.LG

下载: http://arxiv.org/abs/2405.13934v2

Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning

Vision transformers have established a precedent of patchifying images into uniformly-sized chunks before processing. We hypothesize that this design choice may limit models in learning comprehensive and compositional representations from visual data. This paper explores the notion of providing semantically-meaningful visual tokens to transformer encoders within a vision-language pre-training framework. Leveraging off-the-shelf segmentation and scene-graph models, we extract representations of instance segmentation masks (referred to as tangible tokens) and relationships and actions (referred to as intangible tokens). Subsequently, we pre-train a vision-side transformer by incorporating these newly extracted tokens and aligning the resultant embeddings with caption embeddings from a text-side encoder. To capture the structural and semantic relationships among visual tokens, we introduce additive attention weights, which are used to compute self-attention scores. Our experiments on COCO demonstrate notable improvements over ViTs in learned representation quality across text-to-image (+47%) and image-to-text retrieval (+44%) tasks. Furthermore, we showcase the advantages on compositionality benchmarks such as ARO (+18%) and Winoground (+10%).

Updated: 2024-05-26 01:46:22

标题: 理解使用语义上有意义的标记对视觉表示学习的影响

摘要: 视觉变换器在处理之前将图像分割成统一大小的块的先例已经建立。我们假设这种设计选择可能会限制模型从视觉数据中学习全面和组合性的表示。本文探讨了在视觉语言预训练框架中为变换器编码器提供语义有意义的视觉令牌的概念。利用现成的分割和场景图模型，我们提取实例分割掩模的表示（称为有形令牌）以及关系和动作的表示（称为无形令牌）。随后，我们通过将这些新提取的令牌并入视觉端的变换器，将结果嵌入与文本端编码器的标题嵌入对齐来进行预训练。为了捕捉视觉令牌之间的结构和语义关系，我们引入了加性注意权重，用于计算自注意力分数。我们在COCO上的实验表明，在文本到图像（+47％）和图像到文本检索（+44％）任务中，学习的表示质量方面，我们相比ViTs有明显的改进。此外，我们在诸如ARO（+18％）和Winoground（+10％）的组合性基准测试上展示了优势。

更新时间: 2024-05-26 01:46:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.16401v1

DyGPrompt: Learning Feature and Time Prompts on Dynamic Graphs

Dynamic graphs are pervasive in the real world, modeling dynamic relations between objects across various fields. For dynamic graph modeling, dynamic graph neural networks (DGNNs) have emerged as a mainstream technique, which are generally pre-trained on the link prediction task, leaving a significant gap from the objectives of downstream tasks such as node classification. To bridge the gap, prompt-based learning has gained traction on graphs. However, existing efforts focus on static graphs, neglecting the evolution of dynamic graphs. In this paper, we propose DyGPrompt, a novel pre-training and prompting framework for dynamic graph modeling. First, we design dual prompts to address the gap in both task objectives and dynamic variations across pre-training and downstream tasks. Second, we recognize that node and time features mutually characterize each other, and propose dual condition-nets to model the evolving node-time patterns in downstream tasks. Finally, we thoroughly evaluate and analyze DyGPrompt through extensive experiments on three public datasets.

Updated: 2024-05-26 01:46:11

标题: DyGPrompt：学习动态图上的特征和时间提示

摘要: 动态图在现实世界中普遍存在，用于建模不同领域中对象之间的动态关系。对于动态图建模，动态图神经网络（DGNNs）已经成为主流技术，通常在链接预测任务上进行预训练，这导致与节点分类等下游任务的目标存在显著差距。为了填补这一差距，基于提示的学习在图中获得了广泛关注。然而，现有的努力主要集中在静态图上，忽视了动态图的演变。在本文中，我们提出了DyGPrompt，一种用于动态图建模的新型预训练和提示框架。首先，我们设计了双提示，以解决预训练和下游任务之间的任务目标差距以及动态变化之间的差距。其次，我们认识到节点和时间特征相互表征彼此，并提出了双条件网络来模拟下游任务中不断演变的节点-时间模式。最后，我们通过对三个公共数据集进行大量实验对DyGPrompt进行了全面评估和分析。

更新时间: 2024-05-26 01:46:11

领域: cs.LG

下载: http://arxiv.org/abs/2405.13937v2

AdaFisher: Adaptive Second Order Optimization via Fisher Information

First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs are still limited due to increased per-iteration computations and suboptimal accuracy compared to the first order methods. We present AdaFisher--an adaptive second-order optimizer that leverages a block-diagonal approximation to the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhanced convergence capabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modelling and stand out for its stability and robustness in hyperparameter tuning. We demonstrate that AdaFisher outperforms the SOTA optimizers in terms of both accuracy and convergence speed. Code available from \href{https://github.com/AtlasAnalyticsLab/AdaFisher}{https://github.com/AtlasAnalyticsLab/AdaFisher}

Updated: 2024-05-26 01:25:02

标题: AdaFisher：通过费舍尔信息实现自适应二阶优化

摘要: 目前，一阶优化方法在训练深度神经网络（DNNs）中是主流。像Adam这样的优化器通过在训练过程中利用对角矩阵对随机梯度进行预处理，从而融入了有限的曲率信息。尽管一阶优化算法如Adam和SGD广泛应用，但与其一阶对应物相比，二阶优化算法表现出更优越的收敛性能。然而，由于每次迭代计算量的增加和相对于一阶方法的准确性不佳，它们在训练DNNs方面的实用性仍然有限。我们提出了AdaFisher——一种自适应的二阶优化器，利用Fisher信息矩阵的块对角近似进行自适应梯度预处理。AdaFisher旨在在训练DNNs的二阶优化框架中弥合增强收敛能力和计算效率之间的差距。尽管二阶优化器的进展速度较慢，我们展示了AdaFisher可以可靠地用于图像分类、语言建模，并在超参数调整中以其稳定性和鲁棒性脱颖而出。我们证明了AdaFisher在准确性和收敛速度方面优于SOTA优化器。代码可从\href{https://github.com/AtlasAnalyticsLab/AdaFisher}{https://github.com/AtlasAnalyticsLab/AdaFisher}获取。

更新时间: 2024-05-26 01:25:02

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.16397v1

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparameterized two-layer neural networks. In particular, we consider the minimax optimization problem stemming from estimating linear functional equations defined by conditional expectations, where the objective functions are quadratic in the functional spaces. We address (i) the convergence of the stochastic gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. We establish convergence under the mean-field regime by considering the continuous-time and infinite-width limit of the optimization dynamics. Under this regime, the stochastic gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters. We prove that the Wasserstein gradient flow converges globally to a stationary point of the minimax objective at a $O(T^{-1} + \alpha^{-1})$ sublinear rate, and additionally finds the solution to the functional equation when the regularizer of the minimax objective is strongly convex. Here $T$ denotes the time and $\alpha$ is a scaling parameter of the neural networks. In terms of representation learning, our results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(\alpha^{-1})$, measured in terms of the Wasserstein distance. Finally, we apply our general results to concrete examples including policy evaluation, nonparametric instrumental variable regression, asset pricing, and adversarial Riesz representer estimation.

Updated: 2024-05-26 01:22:12

标题: 神经随机梯度下降-上升在函数最小最大优化中的平均场分析

摘要: 本文研究了定义在过度参数化的两层神经网络的无限维函数类上的极小极大优化问题。特别地，我们考虑由估计由条件期望定义的线性函数方程引起的极小极大优化问题，其中目标函数在功能空间中是二次的。我们讨论了（i）随机梯度下降-上升算法的收敛性和（ii）神经网络的表示学习。我们通过考虑优化动态的连续时间和无限宽度极限来建立在均场制度下的收敛性。在这种制度下，随机梯度下降-上升对应于在定义在神经网络参数空间上的概率测度空间之上的Wasserstein梯度流。我们证明了Wasserstein梯度流以$O(T^{-1} + \alpha^{-1})$的亚线性速率全局收敛于极小极大目标的一个稳定点，并且在极小极大目标的正强凸正则化项时找到函数方程的解。这里$T$表示时间，$\alpha$是神经网络的缩放参数。在表示学习方面，我们的结果表明由神经网络引起的特征表示允许与初始表示相差$O(\alpha^{-1})$的量，以Wasserstein距离来衡量。最后，我们将我们的一般结果应用到包括策略评估、非参数工具变量回归、资产定价和对抗性Riesz表征估计在内的具体例子中。

更新时间: 2024-05-26 01:22:12

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2404.12312v2

Machine learning in business process management: A systematic literature review

Machine learning (ML) provides algorithms to create computer programs based on data without explicitly programming them. In business process management (BPM), ML applications are used to analyse and improve processes efficiently. Three frequent examples of using ML are providing decision support through predictions, discovering accurate process models, and improving resource allocation. This paper organises the body of knowledge on ML in BPM. We extract BPM tasks from different literature streams, summarise them under the phases of a process`s lifecycle, explain how ML helps perform these tasks and identify technical commonalities in ML implementations across tasks. This study is the first exhaustive review of how ML has been used in BPM. We hope that it can open the door for a new era of cumulative research by helping researchers to identify relevant preliminary work and then combine and further develop existing approaches in a focused fashion. Our paper helps managers and consultants to find ML applications that are relevant in the current project phase of a BPM initiative, like redesigning a business process. We also offer - as a synthesis of our review - a research agenda that spreads ten avenues for future research, including applying novel ML concepts like federated learning, addressing less regarded BPM lifecycle phases like process identification, and delivering ML applications with a focus on end-users.

Updated: 2024-05-26 01:12:24

标题: 机器学习在业务流程管理中的应用：一项系统文献综述

摘要: 机器学习（ML）提供了基于数据创建计算机程序的算法，而无需明确编程。在业务流程管理（BPM）中，ML应用程序被用来有效地分析和改进流程。使用ML的三个常见示例包括通过预测提供决策支持，发现准确的流程模型，以及改进资源分配。本文整理了有关ML在BPM中的知识体系。我们从不同文献流中提取BPM任务，总结它们在流程生命周期各阶段下的情况，解释ML如何帮助执行这些任务，并识别ML实现在任务之间的技术共性。这项研究是对ML在BPM中的使用进行的第一次全面审查。我们希望它能为通过帮助研究人员识别相关初步工作，然后结合和进一步发展现有方法，打开一个新的累积研究时代的大门。我们的论文帮助管理者和咨询顾问找到在BPM倡议当前项目阶段中相关的ML应用程序，例如重新设计业务流程。作为我们审查的综合成果，我们还提供了一个研究议程，包括将新颖的ML概念应用于如联邦学习等，解决较少关注的BPM生命周期阶段如流程识别，并提供以终端用户为重点的ML应用程序。

更新时间: 2024-05-26 01:12:24

领域: cs.LG

下载: http://arxiv.org/abs/2405.16396v1

Daily Physical Activity Monitoring -- Adaptive Learning from Multi-source Motion Sensor Data

In healthcare applications, there is a growing need to develop machine learning models that use data from a single source, such as that from a wrist wearable device, to monitor physical activities, assess health risks, and provide immediate health recommendations or interventions. However, the limitation of using single-source data often compromises the model's accuracy, as it fails to capture the full scope of human activities. While a more comprehensive dataset can be gathered in a lab setting using multiple sensors attached to various body parts, this approach is not practical for everyday use due to the impracticality of wearing multiple sensors. To address this challenge, we introduce a transfer learning framework that optimizes machine learning models for everyday applications by leveraging multi-source data collected in a laboratory setting. We introduce a novel metric to leverage the inherent relationship between these multiple data sources, as they are all paired to capture aspects of the same physical activity. Through numerical experiments, our framework outperforms existing methods in classification accuracy and robustness to noise, offering a promising avenue for the enhancement of daily activity monitoring.

Updated: 2024-05-26 01:08:28

标题: 每日身体活动监测--来自多源动作传感器数据的自适应学习

摘要: 在医疗应用中，越来越需要开发利用来自单一来源（例如手腕可穿戴设备）的数据来监测身体活动、评估健康风险并提供即时健康建议或干预的机器学习模型。然而，仅使用单一来源数据的局限性常常会影响模型的准确性，因为它无法捕捉到人类活动的全面范围。虽然可以在实验室环境中通过使用连接到各个身体部位的多个传感器收集更全面的数据集，但由于佩戴多个传感器的不实用性，这种方法在日常使用中并不实际。为了解决这一挑战，我们引入了一种迁移学习框架，通过利用在实验室环境中收集的多源数据来优化机器学习模型，使其适用于日常应用。我们引入了一种新的度量标准，以利用这些多个数据源之间的固有关系，因为它们都被配对捕捉同一种体力活动的各个方面。通过数值实验，我们的框架在分类准确性和对噪声的鲁棒性方面优于现有方法，为增强日常活动监测提供了一个有前景的途径。

更新时间: 2024-05-26 01:08:28

领域: cs.LG

下载: http://arxiv.org/abs/2405.16395v1

Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation

Recent advancements in human video synthesis have enabled the generation of high-quality videos through the application of stable diffusion models. However, existing methods predominantly concentrate on animating solely the human element (the foreground) guided by pose information, while leaving the background entirely static. Contrary to this, in authentic, high-quality videos, backgrounds often dynamically adjust in harmony with foreground movements, eschewing stagnancy. We introduce a technique that concurrently learns both foreground and background dynamics by segregating their movements using distinct motion representations. Human figures are animated leveraging pose-based motion, capturing intricate actions. Conversely, for backgrounds, we employ sparse tracking points to model motion, thereby reflecting the natural interaction between foreground activity and environmental changes. Training on real-world videos enhanced with this innovative motion depiction approach, our model generates videos exhibiting coherent movement in both foreground subjects and their surrounding contexts. To further extend video generation to longer sequences without accumulating errors, we adopt a clip-by-clip generation strategy, introducing global features at each step. To ensure seamless continuity across these segments, we ingeniously link the final frame of a produced clip with input noise to spawn the succeeding one, maintaining narrative flow. Throughout the sequential generation process, we infuse the feature representation of the initial reference image into the network, effectively curtailing any cumulative color inconsistencies that may otherwise arise. Empirical evaluations attest to the superiority of our method in producing videos that exhibit harmonious interplay between foreground actions and responsive background dynamics, surpassing prior methodologies in this regard.

Updated: 2024-05-26 00:53:26

标题: 解开前景和背景运动，在人类视频生成中增强逼真感

摘要: 最近人类视频合成技术取得了重大进展，通过稳定的扩散模型可以生成高质量的视频。然而，现有方法主要集中在仅动画化人类元素（前景），并由姿势信息指导，而将背景完全静态化。相反，在真实的高质量视频中，背景通常会随着前景运动动态调整，避免呈现停滞状态。我们引入了一种技术，通过使用不同的运动表示来同时学习前景和背景动态，将它们的运动分离开来。人物角色通过基于姿势的运动进行动画化，捕捉复杂的动作。相反，对于背景，我们采用稀疏跟踪点来建模运动，从而反映前景活动与环境变化之间的自然互动。通过在增强了这种创新运动描绘方法的真实世界视频上进行训练，我们的模型生成了展现前景主体和周围环境中一致运动的视频。为了进一步将视频生成延伸到长序列而不积累错误，我们采用逐段生成策略，在每一步引入全局特征。为了确保这些片段之间的无缝连续性，我们巧妙地将生成的片段的最后一帧与输入噪声链接，生成下一个片段，保持叙事流畅。在整个序列生成过程中，我们将初始参考图像的特征表示注入网络，有效地遏制可能出现的任何累积颜色不一致性。经验评估证明了我们的方法在产生展现前景动作与响应背景动态之间和谐互动的视频方面优越于先前的方法。

更新时间: 2024-05-26 00:53:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16393v1

When does compositional structure yield compositional generalization? A kernel theory

Compositional generalization (the ability to respond correctly to novel combinations of familiar components) is thought to be a cornerstone of intelligent behavior. Compositionally structured (e.g. disentangled) representations are essential for this; however, the conditions under which they yield compositional generalization remain unclear. To address this gap, we present a general theory of compositional generalization in kernel models with fixed, potentially nonlinear representations (which also applies to neural networks in the "lazy regime"). We prove that these models are functionally limited to adding up values assigned to conjunctions/combinations of components that have been seen during training ("conjunction-wise additivity"), and identify novel compositionality failure modes that arise from the data and model structure, even for disentangled inputs. For models in the representation learning (or "rich") regime, we show that networks can generalize on an important non-additive task (associative inference), and give a mechanistic explanation for why. Finally, we validate our theory empirically, showing that it captures the behavior of deep neural networks trained on a set of compositional tasks. In sum, our theory characterizes the principles giving rise to compositional generalization in kernel models and shows how representation learning can overcome their limitations. We further provide a formally grounded, novel generalization class for compositional tasks that highlights fundamental differences in the required learning mechanisms (conjunction-wise additivity).

Updated: 2024-05-26 00:50:11

标题: 构成结构何时产生构成概括？一种核心理论

摘要: 组合泛化能力（对熟悉组件的新颖组合作出正确响应的能力）被认为是智能行为的基石。构成结构化（例如解耦）的表示对此至关重要；然而，它们产生组合泛化的条件仍不清楚。为填补这一空白，我们提出了一种关于具有固定、可能非线性表示的核模型中组合泛化的一般理论（也适用于“懒惰模式”下的神经网络）。我们证明这些模型在功能上受限于对训练过程中已见的组件的连接/组合分配数值的加法（“连接方式可加性”），并确定出源自数据和模型结构的新颖组合失败模式，即使对于解耦输入也是如此。对于处于表示学习（或“丰富”）模式的模型，我们展示了网络可以在一个重要的非加法任务（联想推理）上进行泛化，并给出了解释其原因的机械性解释。最后，我们在实证上验证了我们的理论，展示了它捕捉了在一组组合任务上训练的深度神经网络的行为。总之，我们的理论表征了核模型中产生组合泛化的原则，并展示了表示学习如何克服其局限性。我们进一步提供了一个基于形式的、新颖的组合任务的泛化类，突显了对所需学习机制的基本差异（连接方式可加性）。

更新时间: 2024-05-26 00:50:11

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2405.16391v1

Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning

In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints. To tackle this issue, we propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence. Our method employs a novel natural policy gradient manipulation method to optimize multiple RL objectives and overcome conflicting gradients between different tasks, since the simple weighted average gradient direction may not be beneficial for specific tasks' performance due to misaligned gradients of different task objectives. When there is a violation of a hard constraint, our algorithm steps in to rectify the policy to minimize this violation. We establish theoretical convergence and constraint violation guarantees in a tabular setting. Empirically, our proposed method also outperforms prior state-of-the-art methods on challenging safe multi-objective reinforcement learning tasks.

Updated: 2024-05-26 00:42:10

标题: 安全且平衡：一种用于受限多目标强化学习的框架

摘要: 在涉及安全关键系统的许多强化学习（RL）问题中，一个关键挑战在于在同时满足所有严格的安全约束的情况下平衡多个目标。为了解决这个问题，我们提出了一个基于原始的框架，用于在多目标学习和约束遵从之间协调策略优化。我们的方法采用了一种新颖的自然策略梯度操作方法，以优化多个RL目标，并克服不同任务之间冲突梯度的问题，因为简单的加权平均梯度方向可能对特定任务的性能不利，由于不同任务目标的梯度不对齐。当违反硬约束时，我们的算法会介入，调整策略以最小化这种违规行为。我们在表格设置中建立了理论收敛性和约束违规性保证。从经验上看，我们提出的方法在具有挑战性的安全多目标强化学习任务上也优于先前的最先进方法。

更新时间: 2024-05-26 00:42:10

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16390v1

Multi-Reference Preference Optimization for Large Language Models

How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning optimization by introducing close-formed supervised losses. However, a significant limitation of the current approach is its design for a single reference model only, neglecting to leverage the collective power of numerous pretrained LLMs. To overcome this limitation, we introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models, substantially enhancing preference learning capabilities compared to the single-reference DPO. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance. Furthermore, MRPO effectively finetunes LLMs to exhibit superior performance in several downstream natural language processing tasks such as GSM8K and TruthfulQA.

Updated: 2024-05-26 00:29:04

标题: 大型语言模型的多参考偏好优化

摘要: 大型语言模型（LLMs）如何与人类意图和价值观保持一致？典型的解决方案是收集人类对模型输出的偏好，并相应地微调LLMs，同时确保更新不会偏离参考模型太远。最近的方法，如直接偏好优化（DPO），通过引入封闭形式的监督损失，消除了不稳定和缓慢的强化学习优化的需要。然而，目前方法的一个显著限制是仅设计用于单个参考模型，忽视了利用众多预训练LLMs的集体力量。为了克服这一限制，我们提出了一种新颖的使用多个参考模型进行直接偏好优化的封闭形式公式。由此产生的算法，多参考偏好优化（MRPO），利用来自多样参考模型的更广泛先验知识，相比于单一参考的DPO，显著增强了偏好学习能力。我们的实验表明，使用MRPO微调的LLMs在各种偏好数据中表现出更好的泛化能力，无论数据稀缺还是丰富。此外，MRPO有效地微调LLMs，在GSM8K和TruthfulQA等几个下游自然语言处理任务中表现出卓越的性能。

更新时间: 2024-05-26 00:29:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16388v1

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in $\tilde O(1)$ subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve $\epsilon$ target error within $\tilde{\mathcal O}(d^{1/2}\epsilon^{-1})$ under mild conditions, and RTK-MALA enjoys a $\mathcal{O}(d^{2}\log(d/\epsilon))$ convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.

Updated: 2024-05-26 00:26:57

标题: 反向转移内核：加速扩散推断的灵活框架

摘要: 为了从训练好的扩散模型中生成数据，大多数推断算法，如DDPM、DDIM和其他变体，依赖于离散化逆SDEs或它们的等价ODEs。在本文中，我们将这样的方法视为将整个去噪扩散过程分解为几个部分，每个部分对应于一个逆转移核（RTK）采样子问题。具体来说，DDPM使用了逆转移核的高斯近似，导致每个子问题的复杂度较低，但需要大量的部分（即子问题），据推测这是低效的。为了解决这个问题，我们开发了一个通用的RTK框架，实现了更平衡的子问题分解，导致$\tilde O(1)$个子问题，每个子问题都具有强对数凹目标。然后，我们提出利用两种快速采样算法，Metropolis-Adjusted Langevin Algorithm（MALA）和Underdamped Langevin Dynamics（ULD），来解决这些强对数凹子问题。这产生了用于扩散推断的RTK-MALA和RTK-ULD算法。在理论上，我们进一步发展了RTK-MALA和RTK-ULD在总变差（TV）距离上的收敛保证：在温和条件下，RTK-ULD可以在$\tilde{\mathcal O}(d^{1/2}\epsilon^{-1})$内达到$\epsilon$目标误差，而在略微严格的条件下，RTK-MALA具有$\mathcal{O}(d^{2}\log(d/\epsilon))$的收敛速度。这些理论结果超过了扩散推断的最新收敛速度，并得到了数值实验的很好支持。

更新时间: 2024-05-26 00:26:57

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.16387v1

Variational Offline Multi-agent Skill Discovery

Skills are effective temporal abstractions established for sequential decision making tasks, which enable efficient hierarchical learning for long-horizon tasks and facilitate multi-task learning through their transferability. Despite extensive research, research gaps remain in multi-agent scenarios, particularly for automatically extracting subgroup coordination patterns in a multi-agent task. In this case, we propose two novel auto-encoder schemes: VO-MASD-3D and VO-MASD-Hier, to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills, which firstly solves the aforementioned challenge. An essential algorithm component of these schemes is a dynamic grouping function that can automatically detect latent subgroups based on agent interactions in a task. Notably, our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining. Empirical evaluations on StarCraft tasks indicate that our approach significantly outperforms existing methods regarding applying skills in multi-agent reinforcement learning (MARL). Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals.

Updated: 2024-05-26 00:24:46

标题: 变分离线多智能体技能发现

摘要: 技能是为顺序决策任务建立的有效时间抽象，它们使得对长期任务进行高效的层次学习成为可能，并通过它们的可转移性促进多任务学习。尽管进行了大量研究，但在多智体场景中仍存在研究空白，特别是在自动提取多智体任务中的子群协调模式方面。在这种情况下，我们提出了两种新颖的自动编码器方案：VO-MASD-3D和VO-MASD-Hier，以同时捕捉子群和时间级别的抽象，并形成多智体技能，首次解决了上述挑战。这些方案的一个重要算法组件是一个动态分组函数，它可以根据任务中的智体相互作用自动检测潜在的子群。值得注意的是，我们的方法可以应用于离线多任务数据，并且发现的子群技能可以在相关任务中进行转移而无需重新训练。在星际争霸任务上的实证评估表明，我们的方法在多智体强化学习（MARL）中应用技能方面明显优于现有方法。此外，使用我们的方法发现的技能可以有效地减少具有延迟和稀疏奖励信号的MARL场景中的学习难度。

更新时间: 2024-05-26 00:24:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16386v1

Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions

Deep generative models (DGMs) have demonstrated great success across various domains, particularly in generating texts, images, and videos using models trained from offline data. Similarly, data-driven decision-making and robotic control also necessitate learning a generator function from the offline data to serve as the strategy or policy. In this case, applying deep generative models in offline policy learning exhibits great potential, and numerous studies have explored in this direction. However, this field still lacks a comprehensive review and so developments of different branches are relatively independent. In this paper, we provide the first systematic review on the applications of deep generative models for offline policy learning. In particular, we cover five mainstream deep generative models, including Variational Auto-Encoders, Generative Adversarial Networks, Normalizing Flows, Transformers, and Diffusion Models, and their applications in both offline reinforcement learning (offline RL) and imitation learning (IL). Offline RL and IL are two main branches of offline policy learning and are widely-adopted techniques for sequential decision-making. Notably, for each type of DGM-based offline policy learning, we distill its fundamental scheme, categorize related works based on the usage of the DGM, and sort out the development process of algorithms in that field. Subsequent to the main content, we provide in-depth discussions on deep generative models and offline policy learning as a summary, based on which we present our perspectives on future research directions. This work offers a hands-on reference for the research progress in deep generative models for offline policy learning, and aims to inspire improved DGM-based offline RL or IL algorithms. For convenience, we maintain a paper list on https://github.com/LucasCJYSDL/DGMs-for-Offline-Policy-Learning.

Updated: 2024-05-26 00:23:47

标题: 深度生成模型用于离线策略学习：教程、调查和未来方向展望

摘要: 深度生成模型（DGMs）已在各个领域展现出巨大成功，特别是在使用从离线数据训练的模型生成文本、图像和视频方面。同样，数据驱动的决策制定和机器人控制也需要从离线数据中学习生成函数，以作为策略或政策。在这种情况下，将深度生成模型应用于离线策略学习展现出巨大潜力，许多研究已经朝着这个方向进行探索。然而，这个领域仍然缺乏全面的综述，不同分支的发展相对独立。在本文中，我们首次对深度生成模型在离线策略学习中的应用进行系统综述。特别地，我们涵盖了五种主流的深度生成模型，包括变分自动编码器，生成对抗网络，归一化流，变压器和扩散模型，以及它们在离线强化学习（offline RL）和模仿学习（IL）中的应用。离线RL和IL是离线策略学习的两个主要分支，并且是用于序贯决策制定的广泛采用的技术。值得注意的是，对于每种基于DGM的离线策略学习类型，我们梳理其基本方案，根据DGM的使用对相关工作进行分类，并整理该领域中算法的发展过程。在主要内容之后，我们对深度生成模型和离线策略学习进行深入讨论，作为总结，基于此，我们提出了对未来研究方向的展望。这项工作为深度生成模型在离线策略学习中的研究进展提供了实用参考，并旨在激发改进的基于DGM的离线RL或IL算法。为了方便起见，我们在https://github.com/LucasCJYSDL/DGMs-for-Offline-Policy-Learning 上维护了一份论文列表。

更新时间: 2024-05-26 00:23:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.13777v5

Rewarded Region Replay (R3) for Policy Learning with Discrete Action Space

We introduce a new on-policy algorithm called Rewarded Region Replay (R3), which significantly improves on PPO in solving environments with discrete action spaces. R3 improves sample efficiency by using a replay buffer which contains past successful trajectories with reward above a certain threshold, which are used to update a PPO agent with importance sampling. Crucially, we discard the importance sampling factors which are above a certain ratio to reduce variance and stabilize training. We found that R3 significantly outperforms PPO in Minigrid environments with sparse rewards and discrete action space, such as DoorKeyEnv and CrossingEnv, and moreover we found that the improvement margin of our method versus baseline PPO increases with the complexity of the environment. We also benchmarked the performance of R3 against DDQN (Double Deep Q-Network), which is a standard baseline in off-policy methods for discrete actions, and found that R3 also outperforms DDQN agent in DoorKeyEnv. Lastly, we adapt the idea of R3 to dense reward setting to obtain the Dense R3 algorithm (or DR3) and benchmarked it against PPO on Cartpole-V1 environment. We found that DR3 outperforms PPO significantly on this dense reward environment. Our code can be found at https://github.com/chry-santhemum/R3.

Updated: 2024-05-26 00:01:29

标题: 奖励区域重播（R3）用于具有离散动作空间的策略学习

摘要: 我们介绍了一种新的基于政策的算法，称为Rewarded Region Replay（R3），它在解决具有离散动作空间的环境方面显著改进了PPO。 R3通过使用包含过去成功轨迹的重放缓冲区来提高样本效率，这些轨迹的奖励高于某个阈值，并用于使用重要性采样更新PPO代理。关键是，我们丢弃了超过一定比率的重要性采样因子，以减少方差并稳定训练。我们发现，R3在具有稀疏奖励和离散动作空间的Minigrid环境中明显优于PPO，例如DoorKeyEnv和CrossingEnv，并且我们发现，与基准PPO相比，我们的方法的改进幅度随环境复杂性的增加而增加。我们还针对离散动作的基准DDQN（Double Deep Q-Network）对R3的性能进行了基准测试，并发现R3也在DoorKeyEnv中优于DDQN代理。最后，我们将R3的思想调整到稠密奖励设置中，获得稠密R3算法（或DR3），并在Cartpole-V1环境中将其与PPO进行了基准测试。我们发现DR3在这种稠密奖励环境中显着优于PPO。我们的代码可以在https://github.com/chry-santhemum/R3找到。

更新时间: 2024-05-26 00:01:29

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2405.16383v1

Argumentative Causal Discovery

Causal discovery amounts to unearthing causal relationships amongst features in data. It is a crucial companion to causal inference, necessary to build scientific knowledge without resorting to expensive or impossible randomised control trials. In this paper, we explore how reasoning with symbolic representations can support causal discovery. Specifically, we deploy assumption-based argumentation (ABA), a well-established and powerful knowledge representation formalism, in combination with causality theories, to learn graphs which reflect causal dependencies in the data. We prove that our method exhibits desirable properties, notably that, under natural conditions, it can retrieve ground-truth causal graphs. We also conduct experiments with an implementation of our method in answer set programming (ASP) on four datasets from standard benchmarks in causal discovery, showing that our method compares well against established baselines.

Updated: 2024-05-26 00:00:55

标题: 争论性因果发现

摘要: 因果发现是在数据中挖掘特征之间的因果关系。它是因果推断的重要伴侣，必要用于建立科学知识，而不是依赖昂贵或不可能的随机对照试验。在本文中，我们探讨了如何使用符号表示来支持因果发现。具体来说，我们使用基于假设的论证（ABA），这是一种成熟且强大的知识表示形式，结合因果理论，来学习反映数据中因果依赖关系的图形。我们证明了我们的方法具有可取的属性，特别是在自然条件下，它可以检索到真实的因果图。我们还在四个常规基准数据集上使用我们方法的实现（ASP）进行实验，表明我们的方法与已建立的基准方法相比表现良好。

更新时间: 2024-05-26 00:00:55

领域: cs.AI

下载: http://arxiv.org/abs/2405.11250v2