Arxiv Day: Article

Debating with More Persuasive LLMs Leads to More Truthful Answers

Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information. The method we evaluate is debate, where two LLM experts each argue for a different answer, and a non-expert selects the answer. We find that debate consistently helps both non-expert models and humans answer questions, achieving 76% and 88% accuracy respectively (naive baselines obtain 48% and 60%). Furthermore, optimising expert debaters for persuasiveness in an unsupervised manner improves non-expert ability to identify the truth in debates. Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth.

Updated: 2024-07-25 23:32:21

标题: 与更具说服力的LLM辩论会导致更真实的回答

摘要: 大型语言模型（LLMs）与期望行为对齐的常见方法严重依赖于人工标记的数据。然而，随着模型变得越来越复杂，它们将超越人类的专业知识，人类评估的角色将演变为非专家监督专家。为了预见这一点，我们提出一个问题：较弱的模型能否评估较强的模型的正确性？我们在一个类似的环境中调查了这个问题，在这个环境中，较强的模型（专家）拥有回答问题所需的信息，而较弱的模型（非专家）则缺乏这些信息。我们评估的方法是辩论，其中两个LLM专家分别辩论不同的答案，而一个非专家选择答案。我们发现，辩论一直有助于非专家模型和人类回答问题，分别达到76%和88%的准确率（天真的基准分别为48%和60%）。此外，以无监督的方式优化专家辩手的说服力可以提高非专家在辩论中识别真相的能力。我们的结果为在没有真相的情况下通过辩论对齐模型的可行性提供了令人鼓舞的实证证据。

更新时间: 2024-07-25 23:32:21

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.06782v4

Decentralized Blockchain-based Robust Multi-agent Multi-armed Bandit

We study a robust, i.e. in presence of malicious participants, multi-agent multi-armed bandit problem where multiple participants are distributed on a fully decentralized blockchain, with the possibility of some being malicious. The rewards of arms are homogeneous among the honest participants, following time-invariant stochastic distributions, which are revealed to the participants only when certain conditions are met to ensure that the coordination mechanism is secure enough. The coordination mechanism's objective is to efficiently ensure the cumulative rewards gained by the honest participants are maximized. To this end, we are the first to incorporate advanced techniques from blockchains, as well as novel mechanisms, into such a cooperative decision making framework to design optimal strategies for honest participants. This framework allows various malicious behaviors and the maintenance of security and participant privacy. More specifically, we select a pool of validators who communicate to all participants, design a new consensus mechanism based on digital signatures for these validators, invent a UCB-based strategy that requires less information from participants through secure multi-party computation, and design the chain-participant interaction and an incentive mechanism to encourage participants' participation. Notably, we are the first to prove the theoretical regret of the proposed algorithm and claim its optimality. Unlike existing work that integrates blockchains with learning problems such as federated learning which mainly focuses on optimality via computational experiments, we demonstrate that the regret of honest participants is upper bounded by $\log{T}$ under certain assumptions. The regret bound is consistent with the multi-agent multi-armed bandit problem, both without malicious participants and with purely Byzantine attacks which do not affect the entire system.

Updated: 2024-07-25 23:31:56

标题: 去中心化的基于区块链的稳健多智能体多臂老虎机

摘要: 我们研究了一个鲁棒的多智能体多臂赌博问题，即在存在恶意参与者的情况下，多个参与者分布在一个完全去中心化的区块链上，其中一些可能是恶意的。臂的奖励在诚实参与者之间是均质的，遵循时间不变的随机分布，只有在确保协调机制足够安全的情况下，这些分布才会向参与者公开。协调机制的目标是有效地确保诚实参与者获得的累积奖励最大化。为此，我们首次将区块链的先进技术和新颖机制纳入这样一个合作决策框架中，以设计出诚实参与者的最佳策略。这个框架允许各种恶意行为，同时维护安全性和参与者隐私。具体来说，我们选择了一组验证者，他们向所有参与者通信，为这些验证者设计了基于数字签名的新的共识机制，发明了一种基于UCB的策略，通过安全多方计算需要更少的信息来自参与者，并设计了链-参与者互动和激励机制以鼓励参与者的参与。值得注意的是，我们首次证明了所提出算法的理论遗憾，并声称其最优性。与现有将区块链与学习问题（如联邦学习）集成的工作不同，后者主要通过计算实验来关注最优性，我们证明了在某些假设下，诚实参与者的遗憾受到$\log{T}$的上界限制。这个遗憾限制与多智能体多臂赌博问题一致，无论是没有恶意参与者还是纯拜占庭攻击不影响整个系统。

更新时间: 2024-07-25 23:31:56

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2402.04417v2

Weighted Risk Invariance: Domain Generalization under Invariant Feature Shift

Learning models whose predictions are invariant under multiple environments is a promising approach for out-of-distribution generalization. Such models are trained to extract features $X_{\text{inv}}$ where the conditional distribution $Y \mid X_{\text{inv}}$ of the label given the extracted features does not change across environments. Invariant models are also supposed to generalize to shifts in the marginal distribution $p(X_{\text{inv}})$ of the extracted features $X_{\text{inv}}$, a type of shift we call an $\textit{invariant covariate shift}$. However, we show that proposed methods for learning invariant models underperform under invariant covariate shift, either failing to learn invariant models$\unicode{x2014}$even for data generated from simple and well-studied linear-Gaussian models$\unicode{x2014}$or having poor finite-sample performance. To alleviate these problems, we propose $\textit{weighted risk invariance}$ (WRI). Our framework is based on imposing invariance of the loss across environments subject to appropriate reweightings of the training examples. We show that WRI provably learns invariant models, i.e. discards spurious correlations, in linear-Gaussian settings. We propose a practical algorithm to implement WRI by learning the density $p(X_{\text{inv}})$ and the model parameters simultaneously, and we demonstrate empirically that WRI outperforms previous invariant learning methods under invariant covariate shift.

Updated: 2024-07-25 23:27:10

标题: 加权风险不变性：不变特征偏移下的领域泛化

摘要: 学习模型的预测在多个环境下是不变的，是一种具有应用前景的方法，用于超出分布的泛化。这样的模型被训练以提取特征$X_{\text{inv}}$，其中给定提取的特征的条件分布$Y \mid X_{\text{inv}}$在不同环境下不会改变。不变模型还应该推广到提取特征$X_{\text{inv}}$的边际分布$p(X_{\text{inv}})$的变化，这种变化我们称之为$\textit{不变协变量转移}$。然而，我们发现，提出的学习不变模型的方法在不变协变量转移下表现不佳，要么无法学习不变模型——即使对于从简单且广泛研究的线性高斯模型生成的数据——要么具有较差的有限样本性能。为了缓解这些问题，我们提出了$\textit{加权风险不变性}$（WRI）。我们的框架基于跨环境的损失不变性，受到适当的训练示例重新加权的约束。我们展示了WRI在线性高斯设置中可以可靠地学习不变模型，即丢弃虚假相关性。我们提出了一个实用算法来实现WRI，通过同时学习密度$p(X_{\text{inv}})$和模型参数，并且经验表明，在不变协变量转移条件下，WRI的性能优于先前的不变学习方法。

更新时间: 2024-07-25 23:27:10

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.18428v1

Diffusion-based subsurface multiphysics monitoring and forecasting

Carbon capture and storage (CCS) plays a crucial role in mitigating greenhouse gas emissions, particularly from industrial outputs. Using seismic monitoring can aid in an accurate and robust monitoring system to ensure the effectiveness of CCS and mitigate associated risks. However, conventional seismic wave equation-based approaches are computationally demanding, which hinders real-time applications. In addition to efficiency, forecasting and uncertainty analysis are not easy to handle using such numerical-simulation-based approaches. To this end, we propose a novel subsurface multiphysics monitoring and forecasting framework utilizing video diffusion models. This approach can generate high-quality representations of CO$2$ evolution and associated changes in subsurface elastic properties. With reconstruction guidance, forecasting and inversion can be achieved conditioned on historical frames and/or observational data. Meanwhile, due to the generative nature of the approach, we can quantify uncertainty in the prediction. Tests based on the Compass model show that the proposed method successfully captured the inherently complex physical phenomena associated with CO$_2$ monitoring, and it can predict and invert the subsurface elastic properties and CO$_2$ saturation with consistency in their evolution.

Updated: 2024-07-25 23:04:37

标题: 基于扩散的地下多物理监测和预测

摘要: 碳捕集与储存（CCS）在减少温室气体排放中发挥着关键作用，特别是来自工业排放源。利用地震监测可以帮助建立一个准确和稳健的监测系统，以确保CCS的有效性并减轻相关风险。然而，传统的基于地震波方程的方法在计算上要求较高，这阻碍了实时应用。除了效率外，使用这种基于数值模拟的方法很难处理预测和不确定性分析。为此，我们提出了一种利用视频扩散模型的新型地下多物理监测和预测框架。这种方法可以生成CO$_2$演变和地下弹性特性变化的高质量表示。通过重建指导，可以在历史帧和/或观测数据的条件下实现预测和反演。同时，由于该方法的生成性质，我们可以量化预测中的不确定性。基于Compass模型的测试表明，所提出的方法成功捕捉了与CO$_2$监测相关的固有复杂物理现象，并且可以在演变过程中一致地预测和反演地下弹性特性和CO$_2$饱和度。

更新时间: 2024-07-25 23:04:37

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2407.18426v1

CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving

To safely navigate intricate real-world scenarios, autonomous vehicles must be able to adapt to diverse road conditions and anticipate future events. World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. Nevertheless, to the best of our knowledge, there does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments. To fill this void, we introduce CarDreamer, the first open-source learning platform designed specifically for developing WM based autonomous driving algorithms. It comprises three key components: 1) World model backbone: CarDreamer has integrated some state-of-the-art WMs, which simplifies the reproduction of RL algorithms. The backbone is decoupled from the rest and communicates using the standard Gym interface, so that users can easily integrate and test their own algorithms. 2) Built-in tasks: CarDreamer offers a comprehensive set of highly configurable driving tasks which are compatible with Gym interfaces and are equipped with empirically optimized reward functions. 3) Task development suite: This suite streamlines the creation of driving tasks, enabling easy definition of traffic flows and vehicle routes, along with automatic collection of multi-modal observation data. A visualization server allows users to trace real-time agent driving videos and performance metrics through a browser. Furthermore, we conduct extensive experiments using built-in tasks to evaluate the performance and potential of WMs in autonomous driving. Thanks to the richness and flexibility of CarDreamer, we also systematically study the impact of observation modality, observability, and sharing of vehicle intentions on AV safety and efficiency. All code and documents are accessible on https://github.com/ucd-dare/CarDreamer.

Updated: 2024-07-25 23:02:27

标题: CarDreamer：基于世界模型的自动驾驶开源学习平台

摘要: 为了安全地驾驶复杂的真实场景，自动驾驶汽车必须能够适应各种道路条件并预测未来事件。基于世界模型（WM）的强化学习（RL）已经成为一种有前途的方法，通过学习和预测各种环境的复杂动态。然而，据我们所知，目前还没有一个可访问的平台用于训练和测试这种算法在复杂驾驶环境中的应用。为了填补这一空白，我们介绍了CarDreamer，这是第一个专门设计用于开发基于WM的自主驾驶算法的开源学习平台。它包括三个关键组件：1）世界模型骨干：CarDreamer集成了一些最先进的WM，简化了RL算法的重现。骨干与其余部分解耦，并使用标准的Gym接口进行通信，使用户可以轻松集成和测试自己的算法。2）内置任务：CarDreamer提供了一套高度可配置的驾驶任务，与Gym接口兼容，并配备了经验优化的奖励函数。3）任务开发套件：该套件简化了驾驶任务的创建，可以轻松定义交通流和车辆路线，同时自动收集多模态观察数据。可视化服务器允许用户通过浏览器实时追踪代理驾驶视频和性能指标。此外，我们使用内置任务进行了大量实验，以评估WM在自动驾驶中的性能和潜力。由于CarDreamer的丰富性和灵活性，我们还系统地研究了观察模态、可观察性和车辆意图共享对自动驾驶的安全性和效率的影响。所有代码和文档都可以在https://github.com/ucd-dare/CarDreamer 上访问。

更新时间: 2024-07-25 23:02:27

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.09111v2

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

As large language models (LLMs) become increasingly integrated into real-world applications such as code generation and chatbot assistance, extensive efforts have been made to align LLM behavior with human values, including safety. Jailbreak attacks, aiming to provoke unintended and unsafe behaviors from LLMs, remain a significant/leading LLM safety threat. In this paper, we aim to defend LLMs against jailbreak attacks by introducing SafeDecoding, a safety-aware decoding strategy for LLMs to generate helpful and harmless responses to user queries. Our insight in developing SafeDecoding is based on the observation that, even though probabilities of tokens representing harmful contents outweigh those representing harmless responses, safety disclaimers still appear among the top tokens after sorting tokens by probability in descending order. This allows us to mitigate jailbreak attacks by identifying safety disclaimers and amplifying their token probabilities, while simultaneously attenuating the probabilities of token sequences that are aligned with the objectives of jailbreak attacks. We perform extensive experiments on five LLMs using six state-of-the-art jailbreak attacks and four benchmark datasets. Our results show that SafeDecoding significantly reduces the attack success rate and harmfulness of jailbreak attacks without compromising the helpfulness of responses to benign user queries. SafeDecoding outperforms six defense methods.

Updated: 2024-07-25 22:59:44

标题: SafeDecoding：通过安全感知解码防御越狱攻击

摘要: 随着大型语言模型（LLMs）越来越多地融入到诸如代码生成和聊天机器人辅助等实际应用中，人们已经付出了大量努力来使LLM的行为与人类价值观（包括安全性）保持一致。越狱攻击旨在引发LLMs的意外和不安全行为，仍然是一个重要的/主要的LLM安全威胁。在本文中，我们旨在通过引入SafeDecoding来防御LLMs遭受越狱攻击，这是一种安全感知的解码策略，用于生成对用户查询有帮助且无害的响应。我们开发SafeDecoding的见解基于这样的观察，即尽管表示有害内容的令牌的概率超过了表示无害响应的令牌，但在按概率降序排序令牌后，安全免责声明仍然出现在前几个令牌中。这使我们能够通过识别安全免责声明并增加其令牌的概率，同时减弱与越狱攻击目标一致的令牌序列的概率，从而缓解越狱攻击。我们使用六种最先进的越狱攻击和四个基准数据集在五个LLM上进行了大量实验。我们的结果表明，SafeDecoding显著降低了攻击成功率和越狱攻击的有害性，同时不影响对良性用户查询的响应的帮助性。SafeDecoding优于六种防御方法。

更新时间: 2024-07-25 22:59:44

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.08983v4

Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram

Acoustic signals are crucial for health monitoring, particularly heart sounds which provide essential data like heart rate and detect cardiac anomalies such as murmurs. This study utilizes a publicly available phonocardiogram (PCG) dataset to estimate heart rate using model-driven methods and extends the best-performing model to a multi-task learning (MTL) framework for simultaneous heart rate estimation and murmur detection. Heart rate estimates are derived using a sliding window technique on heart sound snippets, analyzed with a combination of acoustic features (Mel spectrogram, cepstral coefficients, power spectral density, root mean square energy). Our findings indicate that a 2D convolutional neural network (\textbf{\texttt{2dCNN}}) is most effective for heart rate estimation, achieving a mean absolute error (MAE) of 1.312 bpm. We systematically investigate the impact of different feature combinations and find that utilizing all four features yields the best results. The MTL model (\textbf{\texttt{2dCNN-MTL}}) achieves accuracy over 95% in murmur detection, surpassing existing models, while maintaining an MAE of 1.636 bpm in heart rate estimation, satisfying the requirements stated by Association for the Advancement of Medical Instrumentation (AAMI).

Updated: 2024-07-25 22:56:21

标题: 基于心音图的模型驱动心率估计和心杂音检测

摘要: 声音信号对健康监测至关重要，尤其是心音信号，它提供了心率等重要数据，并能检测心脏异常，如杂音。本研究利用公开可获得的心音图（PCG）数据集，利用模型驱动方法估计心率，并将最佳模型扩展到多任务学习（MTL）框架，以实现同时心率估计和杂音检测。利用滑动窗口技术在心音片段上推导心率估计，并结合声学特征（Mel频谱图、倒谱系数、功率谱密度、均方根能量）进行分析。我们的研究结果表明，二维卷积神经网络（2dCNN）对心率估计最为有效，实现了平均绝对误差（MAE）为1.312 bpm。我们系统地研究了不同特征组合的影响，并发现利用所有四种特征产生最佳结果。MTL模型（2dCNN-MTL）在杂音检测上实现了超过95%的准确率，超过现有模型，同时在心率估计方面保持了1.636 bpm的MAE，满足了美国医学仪器协会（AAMI）的要求。

更新时间: 2024-07-25 22:56:21

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.18424v1

Bilingual Adaptation of Monolingual Foundation Models

We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pre-training on a bilingual corpus. By continually pre-training on a mix of Arabic and English corpora, the model retains its proficiency in English while acquiring capabilities in Arabic. Our approach results in significant improvements in Arabic and slight enhancements in English, demonstrating cost-effective cross-lingual transfer. We perform ablations on embedding initialization techniques, data mix ratios, and learning rates and release a detailed training recipe. To demonstrate generalizability of this approach we also adapted Llama 3 8B to Arabic and Llama 2 13B to Hindi.

Updated: 2024-07-25 22:51:39

标题: 双语基础模型的双语适应

摘要: 我们提出了一种有效的方法，将一种单语言的大型语言模型（LLM）适应到另一种语言，解决了灾难性遗忘和分词器限制的挑战。我们将研究重点放在将Llama 2适应到阿拉伯语上。我们的两阶段方法首先是扩展词汇并仅训练嵌入矩阵，然后在双语语料库上进行完整模型的持续预训练。通过在阿拉伯语和英语语料库的混合上持续预训练，模型在保持英语熟练的同时获得了阿拉伯语的能力。我们的方法导致阿拉伯语的显着改进，英语略有提升，展示了成本效益的跨语言转移。我们对嵌入初始化技术、数据混合比例和学习率进行了消融分析，并发布了详细的训练方案。为了展示这种方法的泛化能力，我们还将Llama 3 8B适应到阿拉伯语和Llama 2 13B适应到印地语。

更新时间: 2024-07-25 22:51:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.12869v2

HDL-GPT: High-Quality HDL is All You Need

This paper presents Hardware Description Language Generative Pre-trained Transformers (HDL-GPT), a novel approach that leverages the vast repository of open-source High Definition Language (HDL) codes to train superior quality large code models. The core premise of this paper is the hypothesis that high-quality HDL is all you need to create models with exceptional performance and broad zero-shot generalization abilities. The paper elucidates the methods employed for the curation and augmentation of large corpora from open-source HDL code, transforming highly variable quality data into high-quality data through careful prompting and context maintenance. We demonstrate that the careful selection, filtering, and augmentation of data across HDLs can yield powerful models that surpass current state-of-the-art models. We also explore the impact of different fine-tuning methods on the quality of results. We describe experimental results across a range of fine-tuned SOTA LLMs, substantiating our claims. We demonstrate improvements of 50% to 200% over SOTA HDL models on current benchmarks in tasks ranging from HDL circuit explanations, code generation, formal and simulation testbench creation, triaging bugs, and fixing them. HDL-GPT opens new avenues for the development of advanced model training techniques for circuit design tasks.

Updated: 2024-07-25 22:48:08

标题: HDL-GPT：高质量的HDL就是你所需的一切

摘要: 本文提出了硬件描述语言生成式预训练变压器（HDL-GPT），这是一种新颖的方法，利用广泛的开源高清晰度语言（HDL）代码库来训练优质的大型代码模型。本文的核心假设是高质量的HDL就足以创建出性能优异且具有广泛零-shot泛化能力的模型。本文阐述了从开源HDL代码中策划和增强大型语料库的方法，通过仔细的提示和上下文维护，将高度变化的数据转化为高质量数据。我们证明了在不同HDL之间仔细选择、过滤和增强数据可以产生超越当前最先进模型的强大模型。我们还探讨了不同微调方法对结果质量的影响。我们描述了在一系列细调的最先进LLM上的实验结果，证实了我们的说法。我们展示了在当前基准测试中，HDL-GPT在任务范围从HDL电路解释、代码生成、形式和仿真测试台创建、故障分类和修复等方面比当前最先进HDL模型的改进50%至200%。HDL-GPT为电路设计任务的高级模型训练技术的发展打开了新的途径。

更新时间: 2024-07-25 22:48:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.18423v1

X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention

We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation. Specifically, given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions along with wide-range head movements. As its core, we leverage the generative prior of a pre-trained diffusion model as the rendering backbone, while achieve fine-grained head pose and expression control with novel controlling signals within the framework of ControlNet. In contrast to conventional coarse explicit controls such as facial landmarks, our motion control module is learned to interpret the dynamics directly from the original driving RGB inputs. The motion accuracy is further enhanced with a patch-based local control module that effectively enhance the motion attention to small-scale nuances like eyeball positions. Notably, to mitigate the identity leakage from the driving signals, we train our motion control modules with scaling-augmented cross-identity images, ensuring maximized disentanglement from the appearance reference modules. Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences, and showcase its proficiency in generating captivating portrait animations with consistently maintained identity characteristics.

Updated: 2024-07-25 22:45:41

标题: X-Portrait：具有分层动作注意力的表情肖像动画

摘要: 我们提出了X-Portrait，这是一个创新的有条件扩散模型，专门用于生成富有表现力和时间连贯的肖像动画。具体而言，给定一个单一的肖像作为外观参考，我们旨在使用来自驱动视频的运动来为其添加动画，捕捉高度动态和微妙的面部表情以及广泛范围的头部运动。作为其核心，我们利用了预训练扩散模型的生成先验作为渲染骨干，同时在ControlNet框架内使用新颖的控制信号实现了细粒度头部姿态和表情控制。与传统的粗粒度显式控制（如面部标志点）相比，我们的运动控制模块被训练成直接从原始驱动RGB输入中解释动态。通过基于补丁的局部控制模块进一步增强了运动精度，有效提高了对小尺度细微差异（如眼球位置）的关注。值得注意的是，为了减轻来自驱动信号的身份泄漏，我们使用缩放增强的跨身份图像训练我们的运动控制模块，确保最大化与外观参考模块的解耦。实验结果表明，X-Portrait在各种面部肖像和富有表现力的驱动序列中具有普遍有效性，并展示了其在生成引人入胜的肖像动画时始终保持身份特征的高效能力。

更新时间: 2024-07-25 22:45:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.15931v4

A Black Swan Hypothesis in Markov Decision Process via Irrationality

Black swan events are statistically rare occurrences that carry extremely high risks. A typical view of defining black swan events is heavily assumed to originate from an unpredictable time-varying environments; however, the community lacks a comprehensive definition of black swan events. To this end, this paper challenges that the standard view is incomplete and claims that high-risk, statistically rare events can also occur in unchanging environments due to human misperception of their value and likelihood, which we call as spatial black swan event. We first carefully categorize black swan events, focusing on spatial black swan events, and mathematically formalize the definition of black swan events. We hope these definitions can pave the way for the development of algorithms to prevent such events by rationally correcting human perception

Updated: 2024-07-25 22:44:39

标题: 马尔可夫决策过程中的黑天鹅假设：通过非理性进行研究

摘要: 黑天鹅事件是统计上罕见但具有极高风险的事件。通常对于定义黑天鹅事件的看法，普遍认为是源自于不可预测的时变环境；然而，社区缺乏对黑天鹅事件的全面定义。因此，本文挑战标准观点是不完整的，并声称高风险、统计上罕见的事件也可能发生在不变的环境中，这是由于人类对其价值和可能性的误判，我们称之为空间黑天鹅事件。我们首先仔细分类黑天鹅事件，重点关注空间黑天鹅事件，并对黑天鹅事件的定义进行数学形式化。我们希望这些定义可以为发展算法铺平道路，通过理性地纠正人类感知来防止这类事件的发生。

更新时间: 2024-07-25 22:44:39

领域: cs.AI

下载: http://arxiv.org/abs/2407.18422v1

Self-Directed Synthetic Dialogues and Revisions Technical Report

Synthetic data has become an important tool in the fine-tuning of language models to follow instructions and solve complex problems. Nevertheless, the majority of open data to date is often lacking multi-turn data and collected on closed models, limiting progress on advancing open fine-tuning methods. We introduce Self Directed Synthetic Dialogues (SDSD), an experimental dataset consisting of guided conversations of language models talking to themselves. The dataset consists of multi-turn conversations generated with DBRX, Llama 2 70B, and Mistral Large, all instructed to follow a conversation plan generated prior to the conversation. We also explore including principles from Constitutional AI and other related works to create synthetic preference data via revisions to the final conversation turn. We hope this work encourages further exploration in multi-turn data and the use of open models for expanding the impact of synthetic data.

Updated: 2024-07-25 22:42:36

标题: 自主合成对话和修订技术报告

摘要: 合成数据已经成为微调语言模型以遵循指令和解决复杂问题的重要工具。然而，迄今为止，大多数公开数据往往缺乏多轮数据，并且是在封闭模型上收集的，限制了推进开放微调方法的进展。我们引入了自主合成对话（SDSD），这是一个实验数据集，由语言模型之间的对话构成。该数据集包括使用DBRX、Llama 2 70B和Mistral Large生成的多轮对话，所有这些对话都是根据对话计划生成的。我们还探讨了通过对最终对话回合进行修订，包括宪法AI和其他相关工作原则，以创建合成偏好数据。我们希望这项工作能够鼓励进一步探索多轮数据，并利用开放模型扩大合成数据的影响。

更新时间: 2024-07-25 22:42:36

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.18421v1

ExcelFormer: A neural network surpassing GBDTs on tabular data

Data organized in tabular format is ubiquitous in real-world applications, and users often craft tables with biased feature definitions and flexibly set prediction targets of their interests. Thus, a rapid development of a robust, effective, dataset-versatile, user-friendly tabular prediction approach is highly desired. While Gradient Boosting Decision Trees (GBDTs) and existing deep neural networks (DNNs) have been extensively utilized by professional users, they present several challenges for casual users, particularly: (i) the dilemma of model selection due to their different dataset preferences, and (ii) the need for heavy hyperparameter searching, failing which their performances are deemed inadequate. In this paper, we delve into this question: Can we develop a deep learning model that serves as a "sure bet" solution for a wide range of tabular prediction tasks, while also being user-friendly for casual users? We delve into three key drawbacks of deep tabular models, encompassing: (P1) lack of rotational variance property, (P2) large data demand, and (P3) over-smooth solution. We propose ExcelFormer, addressing these challenges through a semi-permeable attention module that effectively constrains the influence of less informative features to break the DNNs' rotational invariance property (for P1), data augmentation approaches tailored for tabular data (for P2), and attentive feedforward network to boost the model fitting capability (for P3). These designs collectively make ExcelFormer a "sure bet" solution for diverse tabular datasets. Extensive and stratified experiments conducted on real-world datasets demonstrate that our model outperforms previous approaches across diverse tabular data prediction tasks, and this framework can be friendly to casual users, offering ease of use without the heavy hyperparameter tuning.

Updated: 2024-07-25 22:27:56

标题: ExcelFormer：一种在表格数据上超越GBDTs的神经网络

摘要: 以表格形式组织的数据在现实世界的应用中无处不在，用户通常会根据自己感兴趣的偏见特征定义和灵活设置预测目标来制作表格。因此，迫切需要快速开发一种稳健、有效、适用于各种数据集、用户友好的表格预测方法。虽然梯度提升决策树（GBDTs）和现有的深度神经网络（DNNs）已被专业用户广泛利用，但对于普通用户，它们存在一些挑战，特别是：（i）由于它们对数据集的不同偏好而导致的模型选择困境，以及（ii）需要进行大量超参数搜索，否则它们的性能被认为是不足的。在本文中，我们探讨了这个问题：我们是否可以开发出一种深度学习模型，为各种表格预测任务提供“百发百中”的解决方案，同时也对普通用户友好？我们深入研究了深度表格模型的三个关键缺点，包括：（P1）缺乏旋转不变性属性，（P2）对大量数据的需求，以及（P3）过度平滑的解决方案。我们提出了ExcelFormer，通过一个半透性注意力模块，有效地限制了不够信息的特征对DNNs的旋转不变性属性的影响（针对P1），为表格数据量身定制的数据增强方法（针对P2），以及关注前馈网络来提升模型的拟合能力（针对P3）来解决这些挑战。这些设计共同使ExcelFormer成为各种表格数据集的“百发百中”的解决方案。在真实世界数据集上进行了广泛和分层实验，证明我们的模型在各种表格数据预测任务中优于先前的方法，且该框架对普通用户友好，提供了易用性而无需进行繁重的超参数调整。

更新时间: 2024-07-25 22:27:56

领域: cs.LG

下载: http://arxiv.org/abs/2301.02819v8

MELTing point: Mobile Evaluation of Language Transformers

Transformers have revolutionized the machine learning landscape, gradually making their way into everyday tasks and equipping our computers with "sparks of intelligence". However, their runtime requirements have prevented them from being broadly deployed on mobile. As personal devices become increasingly powerful and prompt privacy becomes an ever more pressing issue, we explore the current state of mobile execution of Large Language Models (LLMs). To achieve this, we have created our own automation infrastructure, MELT, which supports the headless execution and benchmarking of LLMs on device, supporting different models, devices and frameworks, including Android, iOS and Nvidia Jetson devices. We evaluate popular instruction fine-tuned LLMs and leverage different frameworks to measure their end-to-end and granular performance, tracing their memory and energy requirements along the way. Our analysis is the first systematic study of on-device LLM execution, quantifying performance, energy efficiency and accuracy across various state-of-the-art models and showcases the state of on-device intelligence in the era of hyperscale models. Results highlight the performance heterogeneity across targets and corroborates that LLM inference is largely memory-bound. Quantization drastically reduces memory requirements and renders execution viable, but at a non-negligible accuracy cost. Drawing from its energy footprint and thermal behavior, the continuous execution of LLMs remains elusive, as both factors negatively affect user experience. Last, our experience shows that the ecosystem is still in its infancy, and algorithmic as well as hardware breakthroughs can significantly shift the execution cost. We expect NPU acceleration, and framework-hardware co-design to be the biggest bet towards efficient standalone execution, with the alternative of offloading tailored towards edge deployments.

Updated: 2024-07-25 22:27:38

标题: 熔点：移动语言转换器的评估

摘要: 变压器已经彻底改变了机器学习领域，逐渐开始走入日常任务，并为我们的计算机装备了“智能火花”。然而，它们运行时的要求阻止了它们在移动设备上广泛部署。随着个人设备变得越来越强大，以及即时隐私变得日益紧迫，我们研究了大型语言模型（LLMs）在移动设备上执行的当前状态。为实现这一目标，我们创建了自己的自动化基础设施MELT，支持在设备上无头执行和基准测试LLMs，支持不同的模型、设备和框架，包括Android、iOS和Nvidia Jetson设备。我们评估了流行的指令微调LLMs，并利用不同的框架来衡量它们的端到端和细粒度性能，跟踪它们在这一过程中的内存和能量要求。我们的分析是对设备上LLM执行的第一次系统研究，量化了各种最先进模型的性能、能效和准确性，并展示了在超大模型时代设备上智能的状态。结果突显了不同目标之间的性能异质性，并证实了LLM推理主要受内存限制。量化大幅降低了内存需求，并使执行变得可行，但会带来不可忽视的准确性损失。从其能源足迹和热行为中得出结论，LLMs的持续执行仍然难以实现，因为这两个因素都会对用户体验产生负面影响。最后，我们的经验表明，生态系统仍处于初级阶段，算法和硬件突破都可以显著改变执行成本。我们预计NPU加速和框架硬件共同设计将成为实现高效独立执行的最佳选择，同时也可以选择向边缘部署卸载。

更新时间: 2024-07-25 22:27:38

领域: cs.LG

下载: http://arxiv.org/abs/2403.12844v4

Symmetries in Overparametrized Neural Networks: A Mean-Field View

We develop a Mean-Field (MF) view of the learning dynamics of overparametrized Artificial Neural Networks (NN) under data symmetric in law wrt the action of a general compact group $G$. We consider for this a class of generalized shallow NNs given by an ensemble of $N$ multi-layer units, jointly trained using stochastic gradient descent (SGD) and possibly symmetry-leveraging (SL) techniques, such as Data Augmentation (DA), Feature Averaging (FA) or Equivariant Architectures (EA). We introduce the notions of weakly and strongly invariant laws (WI and SI) on the parameter space of each single unit, corresponding, respectively, to $G$-invariant distributions, and to distributions supported on parameters fixed by the group action (which encode EA). This allows us to define symmetric models compatible with taking $N\to\infty$ and give an interpretation of the asymptotic dynamics of DA, FA and EA in terms of Wasserstein Gradient Flows describing their MF limits. When activations respect the group action, we show that, for symmetric data, DA, FA and freely-trained models obey the exact same MF dynamic, which stays in the space of WI laws and minimizes therein the population risk. We also give a counterexample to the general attainability of an optimum over SI laws. Despite this, quite remarkably, we show that the set of SI laws is also preserved by the MF dynamics even when freely trained. This sharply contrasts the finite-$N$ setting, in which EAs are generally not preserved by unconstrained SGD. We illustrate the validity of our findings as $N$ gets larger in a teacher-student experimental setting, training a student NN to learn from a WI, SI or arbitrary teacher model through various SL schemes. We last deduce a data-driven heuristic to discover the largest subspace of parameters supporting SI distributions for a problem, that could be used for designing EA with minimal generalization error.

Updated: 2024-07-25 22:27:27

标题: 过参数化神经网络中的对称性：一个均场视角

摘要: 我们发展了一种对过参数化人工神经网络（NN）在数据对称性中学习动态的平均场（MF）视角，该对称性关于一般紧致群$G$的作用。我们考虑了一类广义浅层NN，由$N$个多层单元的集合组成，共同使用随机梯度下降（SGD）和可能的对称性利用（SL）技术进行训练，例如数据增强（DA）、特征平均（FA）或等变体结构（EA）。我们引入了单个单元参数空间上的弱不变和强不变法律（WI和SI）的概念，分别对应于$G$-不变分布和由群作用固定的参数支持的分布（编码EA）。这使我们能够定义与$N\to\infty$兼容的对称模型，并通过描述其MF极限的Wasserstein梯度流对DA、FA和EA的渐近动态进行解释。当激活遵守群作用时，我们表明，对于对称数据，DA、FA和自由训练模型遵循相同的MF动态，该动态保持在WI法律空间中，并在其中最小化人群风险。我们还给出了一个例证，证明了在SI法律上达到最佳的一般可达性。尽管如此，非常引人注目的是，我们表明SI法律集合也在自由训练时由MF动态保持。这与有限-$N$环境形成了鲜明对比，在这种环境中，通常不受限制的SGD不会保留EA。我们通过在教师-学生实验设置中训练学生NN从WI、SI或任意教师模型中学习的各种SL方案，说明了我们发现的有效性随着$N$的增长。最后，我们推导了一种数据驱动的启发式方法，用于发现支持SI分布的参数最大子空间，该方法可用于设计具有最小泛化误差的EA。

更新时间: 2024-07-25 22:27:27

领域: stat.ML,cs.LG,math.PR

下载: http://arxiv.org/abs/2405.19995v2

PersonaGym: Evaluating Persona Agents and LLMs

Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona agents offer significant enhancements across diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses to different user requirements thereby broadening the scope of agent applications. However, evaluating persona agent performance is incredibly challenging due to the complexity of assessing persona adherence in free-form interactions across various environments that are relevant to each persona agent. We introduce PersonaGym, the first dynamic evaluation framework for assessing persona agents, and PersonaScore, the first automated human-aligned metric grounded in decision theory for comprehensive large-scale evaluation of persona agents. Our evaluation of 6 open and closed-source LLMs, using a benchmark encompassing 200 personas and 10,000 questions, reveals significant opportunities for advancement in persona agent capabilities across state-of-the-art models. For example, Claude 3.5 Sonnet only has a 2.97% relative improvement in PersonaScore than GPT 3.5 despite being a much more advanced model. Importantly, we find that increased model size and complexity do not necessarily imply enhanced persona agent capabilities thereby highlighting the pressing need for algorithmic and architectural invention towards faithful and performant persona agents.

Updated: 2024-07-25 22:24:45

标题: PersonaGym：评估Persona代理和LLMs

摘要: Persona agents是LLM代理，根据分配的角色行事，已经展示出在各种应用中令人印象深刻的上下文响应能力。这些Persona代理在教育、医疗保健和娱乐等各个领域都提供了显著的增强，模型开发者可以将代理响应与不同用户需求对齐，从而拓宽代理应用的范围。然而，评估Persona代理的性能非常具有挑战性，因为在与各种与每个Persona代理相关的环境中进行自由形式交互时，评估Persona遵从性的复杂性很高。我们引入了PersonaGym，这是用于评估Persona代理的第一个动态评估框架，以及PersonaScore，这是第一个基于决策理论的自动人类对齐度量，用于全面大规模评估Persona代理。我们对6个开源和闭源LLM进行评估，使用涵盖200个角色和1万个问题的基准测试，揭示了在最先进模型中Persona代理能力方面有重大提升机会。例如，尽管Claude 3.5 Sonnet是一个更先进的模型，但其PersonaScore仅比GPT 3.5提高了2.97%。重要的是，我们发现增加模型大小和复杂性并不一定意味着增强Persona代理能力，因此强调了对忠实和高性能Persona代理的算法和架构创新的迫切需要。

更新时间: 2024-07-25 22:24:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.18416v1

Adversarial Robust Decision Transformer: Enhancing Robustness of RvS via Minimax Returns-to-go

Decision Transformer (DT), as one of the representative Reinforcement Learning via Supervised Learning (RvS) methods, has achieved strong performance in offline learning tasks by leveraging the powerful Transformer architecture for sequential decision-making. However, in adversarial environments, these methods can be non-robust, since the return is dependent on the strategies of both the decision-maker and adversary. Training a probabilistic model conditioned on observed return to predict action can fail to generalize, as the trajectories that achieve a return in the dataset might have done so due to a weak and suboptimal behavior adversary. To address this, we propose a worst-case-aware RvS algorithm, the Adversarial Robust Decision Transformer (ARDT), which learns and conditions the policy on in-sample minimax returns-to-go. ARDT aligns the target return with the worst-case return learned through minimax expectile regression, thereby enhancing robustness against powerful test-time adversaries. In experiments conducted on sequential games with full data coverage, ARDT can generate a maximin (Nash Equilibrium) strategy, the solution with the largest adversarial robustness. In large-scale sequential games and continuous adversarial RL environments with partial data coverage, ARDT demonstrates significantly superior robustness to powerful test-time adversaries and attains higher worst-case returns compared to contemporary DT methods.

Updated: 2024-07-25 22:12:47

标题: 对抗性强化决策变换器：通过最小化最大回报增强RvS的稳健性

摘要: 决策变换器（DT）作为代表性的通过监督学习实现的强化学习方法之一，在离线学习任务中通过利用强大的Transformer架构进行序贯决策已取得了良好表现。然而，在对抗环境中，这些方法可能不够健壮，因为回报取决于决策者和对手的策略。训练一个以观察到的回报为条件来预测行动的概率模型可能无法泛化，因为在数据集中实现回报的轨迹可能是由于一个弱而次优的行为对手。为了解决这个问题，我们提出了一种考虑最坏情况的RvS算法，即对抗鲁棒决策变换器（ARDT），它学习并以样本内极小化回报为条件来制定策略。ARDT通过极小化期望回归学习最坏情况的回报，从而增强了对强大测试对手的鲁棒性。在具有完整数据覆盖的序贯游戏实验中，ARDT可以生成最大最小（纳什均衡）策略，即具有最大对抗鲁棒性的解决方案。在大规模序贯游戏和部分数据覆盖的连续对抗RL环境中，ARDT相比当代DT方法表现出显著更强的对抗鲁棒性，并获得更高的最坏情况回报。

更新时间: 2024-07-25 22:12:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.18414v1

Simulation of Neural Responses to Classical Music Using Organoid Intelligence Methods

Music is a complex auditory stimulus capable of eliciting significant changes in brain activity, influencing cognitive processes such as memory, attention, and emotional regulation. However, the underlying mechanisms of music-induced cognitive processes remain largely unknown. Organoid intelligence and deep learning models show promise for simulating and analyzing these neural responses to classical music, an area significantly unexplored in computational neuroscience. Hence, we present the PyOrganoid library, an innovative tool that facilitates the simulation of organoid learning models, integrating sophisticated machine learning techniques with biologically inspired organoid simulations. Our study features the development of the Pianoid model, a "deep organoid learning" model that utilizes a Bidirectional LSTM network to predict EEG responses based on audio features from classical music recordings. This model demonstrates the feasibility of using computational methods to replicate complex neural processes, providing valuable insights into music perception and cognition. Likewise, our findings emphasize the utility of synthetic models in neuroscience research and highlight the PyOrganoid library's potential as a versatile tool for advancing studies in neuroscience and artificial intelligence.

Updated: 2024-07-25 22:11:30

标题: 使用器官样智能方法模拟对古典音乐的神经反应

摘要: 音乐是一种复杂的听觉刺激，能够引发大脑活动的显著变化，影响记忆、注意力和情绪调节等认知过程。然而，音乐引发的认知过程的基本机制仍然大多未知。器官oid智能和深度学习模型显示出模拟和分析这些神经反应的潜力，这是计算神经科学中尚未深入探讨的领域。因此，我们提出了PyOrganoid库，这是一个创新工具，有助于模拟器官oid学习模型，将先进的机器学习技术与生物启发式器官oid模拟相结合。我们的研究涵盖了Pianoid模型的开发，这是一个利用双向LSTM网络根据古典音乐录音的音频特征预测脑电图响应的“深度器官oid学习”模型。该模型展示了使用计算方法复制复杂神经过程的可行性，为音乐感知和认知提供了宝贵的见解。同样，我们的发现强调了合成模型在神经科学研究中的实用性，并突显了PyOrganoid库作为推动神经科学和人工智能研究的多功能工具的潜力。

更新时间: 2024-07-25 22:11:30

领域: cs.NE,cs.AI,cs.LG,cs.SD,eess.AS,I.2; I.6; J.3; J.4; J.5

下载: http://arxiv.org/abs/2407.18413v1

Large Language Model Integrated Healthcare Cyber-Physical Systems Architecture

Cyber-physical systems have become an essential part of the modern healthcare industry. The healthcare cyber-physical systems (HCPS) combine physical and cyber components to improve the healthcare industry. While HCPS has many advantages, it also has some drawbacks, such as a lengthy data entry process, a lack of real-time processing, and limited real-time patient visualization. To overcome these issues, this paper represents an innovative approach to integrating large language model (LLM) to enhance the efficiency of the healthcare system. By incorporating LLM at various layers, HCPS can leverage advanced AI capabilities to improve patient outcomes, advance data processing, and enhance decision-making.

Updated: 2024-07-25 21:42:10

标题: 大型语言模型集成医疗网络物理系统架构

摘要: 网络物理系统已成为现代医疗行业的必要组成部分。医疗行业的网络物理系统（HCPS）结合了物理和网络组件，以改善医疗行业。虽然HCPS具有许多优点，但也存在一些缺点，例如繁琐的数据输入过程，缺乏实时处理以及有限的实时患者可视化。为了克服这些问题，本文提出了一种创新方法，即整合大型语言模型（LLM）来提高医疗系统的效率。通过在各个层面整合LLM，HCPS可以利用先进的人工智能能力来改善患者结果，推进数据处理并增强决策制定能力。

更新时间: 2024-07-25 21:42:10

领域: cs.LG

下载: http://arxiv.org/abs/2407.18407v1

The seismic purifier: An unsupervised approach to seismic signal detection via representation learning

In this paper, we develop an unsupervised learning approach to earthquake detection. We train a specific class of deep auto-encoders that learn to reproduce the input waveforms after a data-compressive bottleneck, and then use a simple triggering algorithm at the bottleneck to label waveforms as noise or signal. Our approach is motivated by the intuition that efficient compression of data should represent signals differently from noise, and is facilitated by a time-axis-preserving approach to auto-encoding and intuitively-motivated choices on the architecture and triggering. We demonstrate that the detection performance of the unsupervised approach is comparable to, and in some cases better than, some of the state-of-the-art supervised methods. Moreover, it has strong \emph{cross-dataset generalization}. By experimenting with various modifications, we demonstrate that the detection performance is insensitive to various technical choices made in the algorithm. Our approach has the potential to be useful for other signal detection problems with time series data.

Updated: 2024-07-25 21:33:54

标题: 地震净化器：通过表示学习实现地震信号检测的无监督方法

摘要: 在这篇论文中，我们开发了一种无监督学习方法来进行地震检测。我们训练了一类特定的深度自动编码器，这些自动编码器在数据压缩瓶颈之后学习重现输入波形，然后在瓶颈处使用简单的触发算法来标记波形为噪声或信号。我们的方法受启发于这样一种直觉：对数据的高效压缩应该以不同的方式表示信号和噪声，而且受到自动编码和直观启发选择架构和触发方面的时间轴保留方法的促进。我们证明了无监督方法的检测性能与一些最先进的监督方法相当，并且在某些情况下更好。此外，它具有强大的跨数据集泛化能力。通过尝试各种修改，我们证明了检测性能对算法中所做的各种技术选择不敏感。我们的方法有潜力在具有时间序列数据的其他信号检测问题中发挥作用。

更新时间: 2024-07-25 21:33:54

领域: cs.LG,physics.geo-ph,stat.ML

下载: http://arxiv.org/abs/2407.18402v1

A Review of Large Language Models and Autonomous Agents in Chemistry

Large language models (LLMs) have emerged as powerful tools in chemistry, significantly impacting molecule design, property prediction, and synthesis optimization. This review highlights LLM capabilities in these domains and their potential to accelerate scientific discovery through automation. We also review LLM-based autonomous agents: LLMs with a broader set of tools to interact with their surrounding environment. These agents perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. As agents are an emerging topic, we extend the scope of our review of agents beyond chemistry and discuss across any scientific domains. This review covers the recent history, current capabilities, and design of LLMs and autonomous agents, addressing specific challenges, opportunities, and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks, while future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. Due to the quick pace of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.

Updated: 2024-07-25 21:23:15

标题: 一篇关于化学中大型语言模型和自主代理的综述

摘要: 大型语言模型(LLMs)已经成为化学中强大的工具，显著影响着分子设计、属性预测和合成优化。本综述突出了LLM在这些领域的能力，以及它们通过自动化加速科学发现的潜力。我们还审查了基于LLM的自主代理：具有更广泛工具集的LLMs，可以与周围环境进行互动。这些代理执行各种任务，如纸张抓取、与自动化实验室接口和合成规划。由于代理是一个新兴话题，我们将审查代理的范围扩展到化学以外的任何科学领域。此综述涵盖了LLMs和自主代理的最近历史、当前能力和设计，解决了化学中的特定挑战、机会和未来方向。关键挑战包括数据质量和整合、模型可解释性以及需要标准基准，而未来方向指向更复杂的多模态代理和增强代理与实验方法之间的协作。由于该领域发展迅速，已建立了一个存储库来跟踪最新研究：https://github.com/ur-whitelab/LLMs-in-science。

更新时间: 2024-07-25 21:23:15

领域: cs.LG,cs.AI,cs.CL,physics.chem-ph

下载: http://arxiv.org/abs/2407.01603v2

Gaussian Process Kolmogorov-Arnold Networks

In this paper, we introduce a probabilistic extension to Kolmogorov Arnold Networks (KANs) by incorporating Gaussian Process (GP) as non-linear neurons, which we refer to as GP-KAN. A fully analytical approach to handling the output distribution of one GP as an input to another GP is achieved by considering the function inner product of a GP function sample with the input distribution. These GP neurons exhibit robust non-linear modelling capabilities while using few parameters and can be easily and fully integrated in a feed-forward network structure. They provide inherent uncertainty estimates to the model prediction and can be trained directly on the log-likelihood objective function, without needing variational lower bounds or approximations. In the context of MNIST classification, a model based on GP-KAN of 80 thousand parameters achieved 98.5% prediction accuracy, compared to current state-of-the-art models with 1.5 million parameters.

Updated: 2024-07-25 21:09:20

标题: 高斯过程科尔莫戈洛夫-阿诺德网络

摘要: 在本文中，我们介绍了一种将高斯过程（GP）作为非线性神经元并纳入到 Kolmogorov Arnold 网络（KANs）中的概率扩展，我们称为 GP-KAN。通过考虑一个高斯过程的输出分布作为另一个高斯过程的输入的函数内积，实现了对处理输出分布的完全解析方法。这些 GP 神经元展现出强大的非线性建模能力，同时使用较少的参数，并且可以轻松完全地集成到前馈网络结构中。它们为模型预测提供了固有的不确定性估计，并且可以直接在对数似然目标函数上进行训练，而无需需要变分下限或近似。在 MNIST 分类的背景下，一个基于 80,000 个参数的 GP-KAN 模型实现了 98.5% 的预测准确率，而当前最先进的模型则具有 1.5 百万个参数。

更新时间: 2024-07-25 21:09:20

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.18397v1

SCALE: Self-regulated Clustered federAted LEarning in a Homogeneous Environment

Federated Learning (FL) has emerged as a transformative approach for enabling distributed machine learning while preserving user privacy, yet it faces challenges like communication inefficiencies and reliance on centralized infrastructures, leading to increased latency and costs. This paper presents a novel FL methodology that overcomes these limitations by eliminating the dependency on edge servers, employing a server-assisted Proximity Evaluation for dynamic cluster formation based on data similarity, performance indices, and geographical proximity. Our integrated approach enhances operational efficiency and scalability through a Hybrid Decentralized Aggregation Protocol, which merges local model training with peer-to-peer weight exchange and a centralized final aggregation managed by a dynamically elected driver node, significantly curtailing global communication overhead. Additionally, the methodology includes Decentralized Driver Selection, Check-pointing to reduce network traffic, and a Health Status Verification Mechanism for system robustness. Validated using the breast cancer dataset, our architecture not only demonstrates a nearly tenfold reduction in communication overhead but also shows remarkable improvements in reducing training latency and energy consumption while maintaining high learning performance, offering a scalable, efficient, and privacy-preserving solution for the future of federated learning ecosystems.

Updated: 2024-07-25 20:42:16

标题: 尺度：自主调节的同质环境中聚类联合学习

摘要: 联邦学习（FL）已经成为一种变革性方法，可以实现分布式机器学习并保护用户隐私，但是它面临通信效率低和依赖于集中式基础设施等挑战，导致延迟和成本增加。本文提出了一种新颖的FL方法论，通过消除对边缘服务器的依赖，采用基于数据相似性、性能指标和地理接近度的辅助服务器近距离评估来实现动态集群形成，克服了这些局限性。我们的集成方法通过混合式去中心化聚合协议增强了操作效率和可扩展性，该协议将本地模型训练与点对点权重交换以及由动态选举的驱动节点管理的集中式最终聚合相结合，显著减少了全局通信开销。此外，该方法还包括去中心化的驱动节点选择、检查点以减少网络流量以及用于系统稳健性的健康状态验证机制。通过使用乳腺癌数据集进行验证，我们的架构不仅展示了近十倍的通信开销减少，还在减少训练延迟和能源消耗方面取得了显著改进，同时保持了高水平的学习性能，为未来的联邦学习生态系统提供了可扩展、高效和隐私保护的解决方案。

更新时间: 2024-07-25 20:42:16

领域: cs.DC,cs.AI,cs.ET,cs.LG,cs.PF

下载: http://arxiv.org/abs/2407.18387v1

Mathematical theory of deep learning

This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on the topic. It prioritizes simplicity over generality, and presents rigorous yet accessible results to help build an understanding of the essential mathematical concepts underpinning deep learning.

Updated: 2024-07-25 20:37:12

标题: 深度学习的数学理论

摘要: 本书介绍了深度学习的数学分析。它涵盖了逼近理论、优化理论和统计学习理论等基本结果，这些是深度神经网络理论的三大支柱。作为数学和相关领域的学生和研究人员的指南，本书旨在为读者提供关于该主题的基础知识。它将简单性置于普遍性之上，呈现严谨而易于理解的结果，帮助构建对支撑深度学习的基本数学概念的理解。

更新时间: 2024-07-25 20:37:12

领域: cs.LG,math.HO

下载: http://arxiv.org/abs/2407.18384v1

Effect of Duration and Delay on the Identifiability of VR Motion

Social virtual reality is an emerging medium of communication. In this medium, a user's avatar (virtual representation) is controlled by the tracked motion of the user's headset and hand controllers. This tracked motion is a rich data stream that can leak characteristics of the user or can be effectively matched to previously-identified data to identify a user. To better understand the boundaries of motion data identifiability, we investigate how varying training data duration and train-test delay affects the accuracy at which a machine learning model can correctly classify user motion in a supervised learning task simulating re-identification. The dataset we use has a unique combination of a large number of participants, long duration per session, large number of sessions, and a long time span over which sessions were conducted. We find that training data duration and train-test delay affect identifiability; that minimal train-test delay leads to very high accuracy; and that train-test delay should be controlled in future experiments.

Updated: 2024-07-25 20:30:46

标题: 时间和延迟对虚拟现实运动可识别性的影响

摘要: 社交虚拟现实是一种新兴的沟通媒介。在这种媒介中，用户的头显和手柄控制器跟踪的动作控制着用户的虚拟化身（虚拟表现）。这种跟踪运动是一个丰富的数据流，可以泄露用户的特征，也可以有效地与先前识别的数据匹配以识别用户。为了更好地理解运动数据可识别性的边界，我们调查了不同训练数据持续时间和训练-测试延迟如何影响机器学习模型在模拟再识别的监督学习任务中正确分类用户运动的准确性。我们使用的数据集具有大量参与者、每个会话长时间、大量会话和会话进行时间跨度长的独特组合。我们发现训练数据的持续时间和训练-测试延迟会影响可识别性；最小的训练-测试延迟导致非常高的准确性；并且应该在未来的实验中控制训练-测试延迟。

更新时间: 2024-07-25 20:30:46

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2407.18380v1

Accelerating Electron Dynamics Simulations through Machine Learned Time Propagators

Time-dependent density functional theory (TDDFT) is a widely used method to investigate electron dynamics under various external perturbations such as laser fields. In this work, we present a novel approach to accelerate real time TDDFT based electron dynamics simulations using autoregressive neural operators as time-propagators for the electron density. By leveraging physics-informed constraints and high-resolution training data, our model achieves superior accuracy and computational speed compared to traditional numerical solvers. We demonstrate the effectiveness of our model on a class of one-dimensional diatomic molecules. This method has potential in enabling real-time, on-the-fly modeling of laser-irradiated molecules and materials with varying experimental parameters.

Updated: 2024-07-25 20:30:43

标题: 通过机器学习的时间传播加速电子动力学模拟

摘要: 时间依赖密度泛函理论（TDDFT）是一种广泛应用的方法，用于研究在各种外部扰动下的电子动态，例如激光场。在这项工作中，我们提出了一种新颖的方法，通过使用自回归神经算子作为电子密度的时间传播器，加速实时TDDFT基于电子动力学模拟。通过利用物理信息约束和高分辨率训练数据，我们的模型相比传统数值求解器实现了更高的准确性和计算速度。我们在一类一维双原子分子上展示了我们模型的有效性。这种方法有潜力在实时、即时建模激光照射的分子和材料，以及不同实验参数下的应用。

更新时间: 2024-07-25 20:30:43

领域: cond-mat.mtrl-sci,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2407.09628v2

Effect of Data Degradation on Motion Re-Identification

The use of virtual and augmented reality devices is increasing, but these sensor-rich devices pose risks to privacy. The ability to track a user's motion and infer the identity or characteristics of the user poses a privacy risk that has received significant attention. Existing deep-network-based defenses against this risk, however, require significant amounts of training data and have not yet been shown to generalize beyond specific applications. In this work, we study the effect of signal degradation on identifiability, specifically through added noise, reduced framerate, reduced precision, and reduced dimensionality of the data. Our experiment shows that state-of-the-art identification attacks still achieve near-perfect accuracy for each of these degradations. This negative result demonstrates the difficulty of anonymizing this motion data and gives some justification to the existing data- and compute-intensive deep-network based methods.

Updated: 2024-07-25 20:23:32

标题: 数据降级对运动重新识别的影响

摘要: 虚拟和增强现实设备的使用正在增加，但这些传感器丰富的设备对隐私构成风险。追踪用户动作并推断用户身份或特征的能力构成了一个隐私风险，受到了重视。然而，现有基于深度网络的防御措施需要大量的训练数据，并且尚未显示出在特定应用之外泛化的能力。在这项工作中，我们研究了信号退化对可识别性的影响，特别是通过添加噪音、降低帧率、降低精度和降低数据的维度。我们的实验表明，最先进的识别攻击仍然针对每种这些退化实现了几乎完美的准确性。这一负面结果表明了匿名化这些运动数据的困难，并为现有的基于数据和计算密集型的深度网络方法提供了一些正当性。

更新时间: 2024-07-25 20:23:32

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2407.18378v1

What Matters in Range View 3D Object Detection

Lidar-based perception pipelines rely on 3D object detection models to interpret complex scenes. While multiple representations for lidar exist, the range-view is enticing since it losslessly encodes the entire lidar sensor output. In this work, we achieve state-of-the-art amongst range-view 3D object detection models without using multiple techniques proposed in past range-view literature. We explore range-view 3D object detection across two modern datasets with substantially different properties: Argoverse 2 and Waymo Open. Our investigation reveals key insights: (1) input feature dimensionality significantly influences the overall performance, (2) surprisingly, employing a classification loss grounded in 3D spatial proximity works as well or better compared to more elaborate IoU-based losses, and (3) addressing non-uniform lidar density via a straightforward range subsampling technique outperforms existing multi-resolution, range-conditioned networks. Our experiments reveal that techniques proposed in recent range-view literature are not needed to achieve state-of-the-art performance. Combining the above findings, we establish a new state-of-the-art model for range-view 3D object detection -- improving AP by 2.2% on the Waymo Open dataset while maintaining a runtime of 10 Hz. We establish the first range-view model on the Argoverse 2 dataset and outperform strong voxel-based baselines. All models are multi-class and open-source. Code is available at https://github.com/benjaminrwilson/range-view-3d-detection.

Updated: 2024-07-25 20:20:03

标题: 三维物体检测中的关键因素是什么？

摘要: 基于激光雷达的感知管道依赖于3D物体检测模型来解释复杂场景。尽管存在多种激光雷达的表示方法，但范围视图具有吸引力，因为它可以无损地编码整个激光雷达传感器输出。在这项工作中，我们在范围视图3D物体检测模型中取得了最先进的成果，而没有使用过去范围视图文献中提出的多种技术。我们在两个具有明显不同特性的现代数据集Argoverse 2和Waymo Open上探索了范围视图3D物体检测。我们的调查揭示了关键见解：（1）输入特征维度显着影响整体性能，（2）令人惊讶的是，采用基于3D空间接近度的分类损失效果与更复杂的IoU损失相当甚至更好，（3）通过一种简单的范围子采样技术来解决非均匀激光雷达密度优于现有的多分辨率、范围条件网络。我们的实验表明，最近范围视图文献中提出的技术并不是必需的，即可实现最先进的性能。结合上述发现，我们建立了一种新的范围视图3D物体检测模型，提高了Waymo Open数据集上的AP值2.2%，同时保持了10 Hz的运行时。我们在Argoverse 2数据集上建立了第一个范围视图模型，并超越了强大的基于体素的基线。所有模型都是多类别的，并且开源。代码可在https://github.com/benjaminrwilson/range-view-3d-detection找到。

更新时间: 2024-07-25 20:20:03

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.16789v2

A Survey on Reinforcement Learning in Aviation Applications

Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation.

Updated: 2024-07-25 20:19:22

标题: 航空应用中强化学习的调查

摘要: 与基于模型的控制和优化方法相比，强化学习（RL）提供了一种基于数据驱动、学习为基础的框架来制定和解决顺序决策问题。由于航空业数据的大幅增加和计算能力的提升，RL框架变得更加有前景。许多基于航空业的应用可以被制定或视为顺序决策问题。其中一些是离线规划问题，而另一些需要在线解决并且是安全关键的。在这篇调查论文中，我们首先描述标准RL的制定和解决方案。然后我们调查了现有基于RL的航空应用的现状。最后，我们总结了论文，找出了技术上的差距，并提出了RL在航空研究中未来的方向。

更新时间: 2024-07-25 20:19:22

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2211.02147v3

Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning

The deep operator network (DeepONet) is a popular neural operator architecture that has shown promise in solving partial differential equations (PDEs) by using deep neural networks to map between infinite-dimensional function spaces. In the absence of labeled datasets, we utilize the PDE residual loss to learn the physical system, an approach known as physics-informed DeepONet. This method faces significant computational challenges, primarily due to the curse of dimensionality, as the computational cost increases exponentially with finer discretization. In this paper, we introduce the Separable DeepONet framework to address these challenges and improve scalability for high-dimensional PDEs. Our approach involves a factorization technique where sub-networks handle individual one-dimensional coordinates, thereby reducing the number of forward passes and the size of the Jacobian matrix. By using forward-mode automatic differentiation, we further optimize the computational cost related to the Jacobian matrix. As a result, our modifications lead to a linear scaling of computational cost with discretization density, making Separable DeepONet suitable for high-dimensional PDEs. We validate the effectiveness of the separable architecture through three benchmark PDE models: the viscous Burgers equation, Biot's consolidation theory, and a parametrized heat equation. In all cases, our proposed framework achieves comparable or improved accuracy while significantly reducing computational time compared to conventional DeepONet. These results demonstrate the potential of Separable DeepONet in efficiently solving complex, high-dimensional PDEs, advancing the field of physics-informed machine learning.

Updated: 2024-07-25 20:16:50

标题: 可分离的DeepONet：在物理知识驱动的机器学习中打破维度诅咒

摘要: 深度操作网络（DeepONet）是一种流行的神经算子架构，通过使用深度神经网络在无限维函数空间之间映射，显示出在解决偏微分方程（PDEs）方面的潜力。在没有标记数据集的情况下，我们利用PDE残差损失来学习物理系统，这一方法被称为物理信息DeepONet。这种方法面临着重要的计算挑战，主要是由于维度灾难，随着细分的增加，计算成本呈指数增长。在本文中，我们引入了可分离的DeepONet框架来解决这些挑战，并提高高维PDE的可伸缩性。我们的方法涉及一种分解技术，其中子网络处理单个一维坐标，从而减少正向传递的次数和雅可比矩阵的大小。通过使用前向模式自动微分，我们进一步优化与雅可比矩阵相关的计算成本。因此，我们的修改导致计算成本与离散密度呈线性比例，使得可分离的DeepONet适用于高维PDE。我们通过三个基准PDE模型验证了可分离架构的有效性：粘性Burgers方程，Biot的固结理论和参数化的热方程。在所有情况下，我们提出的框架在显著减少计算时间的同时实现了可比或更高的准确性，与传统DeepONet相比。这些结果展示了可分离的DeepONet在有效解决复杂、高维PDE方面的潜力，推动了物理信息机器学习领域的发展。

更新时间: 2024-07-25 20:16:50

领域: cs.LG

下载: http://arxiv.org/abs/2407.15887v2

How Do Students Interact with an LLM-powered Virtual Teaching Assistant in Different Educational Settings?

Jill Watson, a virtual teaching assistant powered by LLMs, answers student questions and engages them in extended conversations on courseware provided by the instructors. In this paper, we analyze student interactions with Jill across multiple courses and colleges, focusing on the types and complexity of student questions based on Bloom's Revised Taxonomy and tool usage patterns. We find that, by supporting a wide range of cognitive demands, Jill encourages students to engage in sophisticated, higher-order cognitive questions. However, the frequency of usage varies significantly across deployments, and the types of questions asked depend on course-specific contexts. These findings pave the way for future work on AI-driven educational tools tailored to individual learning styles and course structure, potentially enhancing both the teaching and learning experience in classrooms.

Updated: 2024-07-25 20:16:11

标题: 学生在不同教育环境中如何与由LLM提供动力的虚拟教学助手互动？

摘要: Jill Watson，一位由LLMs提供支持的虚拟助教，回答学生的问题并与他们在教师提供的课件上进行深入的交流。在本文中，我们分析了学生与Jill在多个课程和学院的互动情况，重点关注基于布鲁姆修订后的分类法和工具使用模式的学生问题的类型和复杂性。我们发现，通过支持广泛的认知要求，Jill鼓励学生提出复杂的、高阶认知的问题。然而，使用频率在不同部署中明显变化，并且所提出的问题类型取决于具体课程的背景。这些发现为未来针对个体学习风格和课程结构定制的人工智能驱动的教育工具的研究铺平了道路，潜在地增强了课堂教学和学习体验。

更新时间: 2024-07-25 20:16:11

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.17429v2

Physics Informed Kolmogorov-Arnold Neural Networks for Dynamical Analysis via Efficent-KAN and WAV-KAN

Physics-informed neural networks have proven to be a powerful tool for solving differential equations, leveraging the principles of physics to inform the learning process. However, traditional deep neural networks often face challenges in achieving high accuracy without incurring significant computational costs. In this work, we implement the Physics-Informed Kolmogorov-Arnold Neural Networks (PIKAN) through efficient-KAN and WAV-KAN, which utilize the Kolmogorov-Arnold representation theorem. PIKAN demonstrates superior performance compared to conventional deep neural networks, achieving the same level of accuracy with fewer layers and reduced computational overhead. We explore both B-spline and wavelet-based implementations of PIKAN and benchmark their performance across various ordinary and partial differential equations using unsupervised (data-free) and supervised (data-driven) techniques. For certain differential equations, the data-free approach suffices to find accurate solutions, while in more complex scenarios, the data-driven method enhances the PIKAN's ability to converge to the correct solution. We validate our results against numerical solutions and achieve $99 \%$ accuracy in most scenarios.

Updated: 2024-07-25 20:14:58

标题: 物理信息科尔莫戈洛夫-阿诺德神经网络在动力学分析中的应用：通过Efficient-KAN和WAV-KAN

摘要: 物理信息神经网络已被证明是解决微分方程问题的强大工具，利用物理原理来指导学习过程。然而，传统的深度神经网络通常面临在不增加显著计算成本的情况下达到高精度的挑战。在这项工作中，我们通过高效的KAN和WAV-KAN实现了基于科尔莫戈洛夫-阿诺德表示定理的PIKAN。PIKAN相比传统深度神经网络表现出更优越的性能，能够在更少的层次和降低计算开销的情况下实现相同水平的精度。我们探索了基于B样条和小波的PIKAN实现，并使用无监督（无数据）和监督（数据驱动）技术在各种普通和偏微分方程上对它们的性能进行了基准测试。对于某些微分方程，无数据方法足以找到准确的解决方案，而在更复杂的情况下，数据驱动方法增强了PIKAN收敛到正确解决方案的能力。我们将结果与数值解进行验证，并在大多数情况下实现了99%的准确度。

更新时间: 2024-07-25 20:14:58

领域: cs.LG

下载: http://arxiv.org/abs/2407.18373v1

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

We present a principled approach to provide LLM-based evaluation with a rigorous guarantee of human agreement. We first propose that a reliable evaluation method should not uncritically rely on model preferences for pairwise evaluation, but rather assess the confidence of judge models and selectively decide when to trust its judgement. We then show that under this selective evaluation framework, human agreement can be provably guaranteed -- such that the model evaluation aligns with that of humans to a user-specified agreement level. As part of our framework, we also introduce Simulated Annotators, a novel confidence estimation method that significantly improves judge calibration and thus enables high coverage of evaluated instances. Finally, we propose Cascaded Selective Evaluation, where we use cheaper models as initial judges and escalate to stronger models only when necessary -- again, while still providing a provable guarantee of human agreement. Experimental results show that Cascaded Selective Evaluation guarantees strong alignment with humans, far beyond what LLM judges could achieve without selective evaluation. For example, on a subset of Chatbot Arena where GPT-4 almost never achieves 80% human agreement, our method, even while employing substantially cost-effective models such as Mistral-7B, guarantees over 80% human agreement with almost 80% test coverage.

Updated: 2024-07-25 20:04:59

标题: 信任还是升级：具有可证明保证的LLM法官用于人类协议

摘要: 我们提出了一种有原则的方法，为LLM基于评估提供人类一致性严格保证。我们首先建议，可靠的评估方法不应盲目依赖模型偏好进行成对评估，而应评估评判模型的信心，并有选择地决定何时信任其判断。然后我们展示，在这种有选择性的评估框架下，人类一致性可以被证明保证 - 即模型评估与人类一致性达到用户指定的一致水平。作为我们框架的一部分，我们还引入了模拟标注者，一种新颖的信心估计方法，显著改善了评判校准，从而实现了高覆盖率的评估实例。最后，我们提出级联选择性评估，其中我们使用更便宜的模型作为初始评判者，并仅在必要时升级到更强大的模型 - 同样，仍提供人类一致性的可证保证。实验结果表明，级联选择性评估保证与人类强烈一致，远超过没有选择性评估的LLM评判者所能达到的水平。例如，在Chatbot Arena的一个子集上，即使GPT-4几乎永远无法达到80%的人类一致性，我们的方法，即使使用成本更低的模型如Mistral-7B，也能保证超过80%的人类一致性，几乎达到80%的测试覆盖率。

更新时间: 2024-07-25 20:04:59

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.18370v1

Robust Claim Verification Through Fact Detection

Claim verification can be a challenging task. In this paper, we present a method to enhance the robustness and reasoning capabilities of automated claim verification through the extraction of short facts from evidence. Our novel approach, FactDetect, leverages Large Language Models (LLMs) to generate concise factual statements from evidence and label these facts based on their semantic relevance to the claim and evidence. The generated facts are then combined with the claim and evidence. To train a lightweight supervised model, we incorporate a fact-detection task into the claim verification process as a multitasking approach to improve both performance and explainability. We also show that augmenting FactDetect in the claim verification prompt enhances performance in zero-shot claim verification using LLMs. Our method demonstrates competitive results in the supervised claim verification model by 15% on the F1 score when evaluated for challenging scientific claim verification datasets. We also demonstrate that FactDetect can be augmented with claim and evidence for zero-shot prompting (AugFactDetect) in LLMs for verdict prediction. We show that AugFactDetect outperforms the baseline with statistical significance on three challenging scientific claim verification datasets with an average of 17.3% performance gain compared to the best performing baselines.

Updated: 2024-07-25 20:03:43

标题: 通过事实检测实现强大的索赔验证

摘要: 索赔验证可能是一项具有挑战性的任务。在本文中，我们提出了一种通过从证据中提取简短事实来增强自动索赔验证的鲁棒性和推理能力的方法。我们的新方法FactDetect利用大型语言模型（LLMs）从证据中生成简明的事实陈述，并根据它们与索赔和证据的语义相关性对这些事实进行标记。然后将生成的事实与索赔和证据结合起来。为了训练一个轻量级的监督模型，我们将事实检测任务整合到索赔验证过程中作为一种多任务方法，以改善性能和可解释性。我们还展示了在使用LLMs进行零-shot索赔验证时，将FactDetect增强在索赔验证提示中可以提高性能。我们的方法在对具有挑战性的科学索赔验证数据集进行评估时，在监督索赔验证模型上取得了竞争性的结果，F1分数提高了15%。我们还展示了FactDetect可以与索赔和证据结合用于零-shot提示（AugFactDetect）在LLMs中进行裁决预测。我们展示了AugFactDetect在三个具有挑战性的科学索赔验证数据集上优于基线，与最佳基线相比平均提高了17.3%的性能。

更新时间: 2024-07-25 20:03:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.18367v1

FADAS: Towards Federated Adaptive Asynchronous Optimization

Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. While the SGD-based FL algorithms have demonstrated considerable success in the past, there is a growing trend towards adopting adaptive federated optimization methods, particularly for training large-scale models. However, the conventional synchronous aggregation design poses a significant challenge to the practical deployment of those adaptive federated optimization methods, particularly in the presence of straggler clients. To fill this research gap, this paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees. To further enhance the efficiency and resilience of our proposed method in scenarios with significant asynchronous delays, we also extend FADAS with a delay-adaptive learning adjustment strategy. We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.

Updated: 2024-07-25 20:02:57

标题: FADAS：走向联邦式自适应异步优化

摘要: 联邦学习（FL）已经成为隐私保护机器学习的广泛采用的训练范式。虽然基于SGD的FL算法在过去已经取得了相当大的成功，但目前越来越多地采用自适应的联邦优化方法，特别是用于训练大规模模型。然而，传统的同步聚合设计对于那些自适应联邦优化方法的实际部署构成了重大挑战，特别是在存在慢速客户端的情况下。为了填补这一研究空白，本文介绍了联邦自适应异步优化，命名为FADAS，这是一种将异步更新与自适应联邦优化相结合的新方法，并具有可证明的保证。为了进一步提高我们所提出方法在存在显著异步延迟情况下的效率和韧性，我们还扩展了FADAS以包括一种延迟自适应学习调整策略。我们严格建立了所提算法的收敛速率，并实证结果显示了FADAS相对于其他异步FL基线的优越性能。

更新时间: 2024-07-25 20:02:57

领域: cs.LG,cs.AI,cs.DC,math.OC

下载: http://arxiv.org/abs/2407.18365v1

Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective

Image-text retrieval (ITR), an important task in information retrieval (IR), is driven by pretrained vision-language models (VLMs) that consistently achieve state-of-the-art performance. However, a significant challenge lies in the brittleness of existing ITR benchmarks. In standard datasets for the task, captions often provide broad summaries of scenes, neglecting detailed information about specific concepts. Additionally, the current evaluation setup assumes simplistic binary matches between images and texts and focuses on intra-modality rather than cross-modal relationships, which can lead to misinterpretations of model performance. Motivated by this gap, in this study, we focus on examining the brittleness of the ITR evaluation pipeline with a focus on concept granularity. We start by analyzing two common benchmarks, MS-COCO and Flickr30k, and compare them with their augmented versions, MS-COCO-FG and Flickr30k-FG, given a specified set of linguistic features capturing concept granularity. We discover that Flickr30k-FG and MS COCO-FG consistently achieve higher scores across all the selected features. To investigate the performance of VLMs on coarse and fine-grained datasets, we introduce a taxonomy of perturbations. We apply these perturbations to the selected datasets. We evaluate four state-of-the-art models - ALIGN, AltCLIP, CLIP, and GroupViT - on the standard and fine-grained datasets under zero-shot conditions, with and without the applied perturbations. The results demonstrate that although perturbations generally degrade model performance, the fine-grained datasets exhibit a smaller performance drop than their standard counterparts. Moreover, the relative performance drop across all setups is consistent across all models and datasets, indicating that the issue lies within the benchmarks. We conclude the paper by providing an agenda for improving ITR evaluation pipelines.

Updated: 2024-07-25 19:52:38

标题: 从视觉-语言模型的角度评估图像-文本检索基准的脆弱性

摘要: 图像文本检索（ITR）是信息检索（IR）中的重要任务，由预训练的视觉语言模型（VLMs）驱动，这些模型始终取得最先进的性能。然而，现有ITR基准的脆弱性是一个重要挑战。在任务的标准数据集中，标题通常提供场景的概括性描述，忽略了有关特定概念的详细信息。此外，当前的评估设置假定图像和文本之间存在简单的二进制匹配，并侧重于模态内而非跨模态关系，这可能导致对模型性能的错误解读。在这项研究中，我们着眼于检查ITR评估流程的脆弱性，特别关注概念的细粒度。我们首先分析两个常见的基准数据集，MS-COCO和Flickr30k，并将它们与其扩展版本MS-COCO-FG和Flickr30k-FG进行比较，以捕捉概念细粒度的一组指定语言特征。我们发现Flickr30k-FG和MS COCO-FG在所有选定的特征上都能够一致地取得更高的分数。为了调查VLMs在粗粒度和细粒度数据集上的性能，我们引入了一组扰动的分类。我们将这些扰动应用到选定的数据集上。我们在零样本条件下评估了四种最先进的模型 - ALIGN，AltCLIP，CLIP和GroupViT - 在标准和细粒度数据集上的表现，有无应用扰动。结果表明，尽管扰动通常会降低模型的性能，但细粒度数据集的性能下降要小于其标准对应物。此外，在所有设置下，所有模型和数据集的相对性能下降是一致的，这表明问题出在基准数据集上。最后，我们提出了改进ITR评估流程的议程。

更新时间: 2024-07-25 19:52:38

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2407.15239v2

Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

We propose a novel framework for retinal feature point alignment, designed for learning cross-modality features to enhance matching and registration across multi-modality retinal images. Our model draws on the success of previous learning-based feature detection and description methods. To better leverage unlabeled data and constrain the model to reproduce relevant keypoints, we integrate a keypoint-based segmentation task. It is trained in a self-supervised manner by enforcing segmentation consistency between different augmentations of the same image. By incorporating a keypoint augmented self-supervised layer, we achieve robust feature extraction across modalities. Extensive evaluation on two public datasets and one in-house dataset demonstrates significant improvements in performance for modality-agnostic retinal feature alignment. Our code and model weights are publicly available at \url{https://github.com/MedICL-VU/RetinaIPA}.

Updated: 2024-07-25 19:51:27

标题: 视网膜IPA：多模式视网膜成像的迭代关键点对齐

摘要: 我们提出了一个新颖的视网膜特征点对齐框架，旨在学习跨模态特征以增强多模态视网膜图像的匹配和注册。我们的模型借鉴了先前基于学习的特征检测和描述方法的成功。为了更好地利用未标记数据并约束模型重现相关的关键点，我们集成了基于关键点的分割任务。通过在同一图像的不同增强之间强制执行分割一致性的自监督方式进行训练。通过整合关键点增强的自监督层，我们实现了跨模态的稳健特征提取。在两个公共数据集和一个内部数据集上的广泛评估表明，在不考虑模态的情况下，视网膜特征对齐的性能有显著改进。我们的代码和模型权重可以在\url{https://github.com/MedICL-VU/RetinaIPA}上公开获取。

更新时间: 2024-07-25 19:51:27

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.18362v1

Novel OCT mosaicking pipeline with Feature- and Pixel-based registration

High-resolution Optical Coherence Tomography (OCT) images are crucial for ophthalmology studies but are limited by their relatively narrow field of view (FoV). Image mosaicking is a technique for aligning multiple overlapping images to obtain a larger FoV. Current mosaicking pipelines often struggle with substantial noise and considerable displacement between the input sub-fields. In this paper, we propose a versatile pipeline for stitching multi-view OCT/OCTA \textit{en face} projection images. Our method combines the strengths of learning-based feature matching and robust pixel-based registration to align multiple images effectively. Furthermore, we advance the application of a trained foundational model, Segment Anything Model (SAM), to validate mosaicking results in an unsupervised manner. The efficacy of our pipeline is validated using an in-house dataset and a large public dataset, where our method shows superior performance in terms of both accuracy and computational efficiency. We also made our evaluation tool for image mosaicking and the corresponding pipeline publicly available at \url{https://github.com/MedICL-VU/OCT-mosaicking}.

Updated: 2024-07-25 19:44:43

标题: 具有基于特征和像素的注册的新型OCT镶嵌流水线

摘要: 高分辨率光学相干断层扫描（OCT）图像对眼科研究至关重要，但受到其相对狭窄的视场（FoV）的限制。图像拼接是一种将多个重叠图像对齐以获得更大FoV的技术。当前的拼接管道往往面临重大噪音和输入子场之间相当大的位移的困难。本文提出了一种用于拼接多视角OCT/OCTA en face投影图像的通用管道。我们的方法结合了基于学习的特征匹配和稳健的基于像素的注册的优势，有效地对齐多个图像。此外，我们推进了一个经过训练的基础模型Segment Anything Model（SAM）的应用，以无监督的方式验证拼接结果。我们的管道的有效性通过使用内部数据集和一个大型公共数据集进行验证，在这些数据集中，我们的方法在准确性和计算效率方面表现出优越的性能。我们还公开了用于图像拼接的评估工具和相应的管道，网址为https://github.com/MedICL-VU/OCT-mosaicking。

更新时间: 2024-07-25 19:44:43

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.13052v2

Generative AI like ChatGPT in Blockchain Federated Learning: use cases, opportunities and future

Federated learning has become a significant approach for training machine learning models using decentralized data without necessitating the sharing of this data. Recently, the incorporation of generative artificial intelligence (AI) methods has provided new possibilities for improving privacy, augmenting data, and customizing models. This research explores potential integrations of generative AI in federated learning, revealing various opportunities to enhance privacy, data efficiency, and model performance. It particularly emphasizes the importance of generative models like generative adversarial networks (GANs) and variational autoencoders (VAEs) in creating synthetic data that replicates the distribution of real data. Generating synthetic data helps federated learning address challenges related to limited data availability and supports robust model development. Additionally, we examine various applications of generative AI in federated learning that enable more personalized solutions.

Updated: 2024-07-25 19:43:49

标题: 区块链联邦学习中的ChatGPT等生成式人工智能：用例、机会和未来Perspectives

摘要: 联邦学习已成为利用分散数据训练机器学习模型的重要方法，而无需共享这些数据。最近，将生成人工智能（AI）方法纳入其中为改善隐私、增加数据和定制模型提供了新的可能性。本研究探讨了生成AI在联邦学习中的潜在整合，揭示了增强隐私、数据效率和模型性能的各种机会。它特别强调了生成对抗网络（GANs）和变分自动编码器（VAEs）等生成模型在创建模拟真实数据分布的合成数据方面的重要性。生成合成数据有助于联邦学习解决与有限数据可用性相关的挑战，并支持稳健的模型开发。此外，我们还研究了生成AI在联邦学习中的各种应用，使个性化解决方案更加可能。

更新时间: 2024-07-25 19:43:49

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2407.18358v1

Privacy-Preserving Model-Distributed Inference at the Edge

This paper focuses on designing a privacy-preserving Machine Learning (ML) inference protocol for a hierarchical setup, where clients own/generate data, model owners (cloud servers) have a pre-trained ML model, and edge servers perform ML inference on clients' data using the cloud server's ML model. Our goal is to speed up ML inference while providing privacy to both data and the ML model. Our approach (i) uses model-distributed inference (model parallelization) at the edge servers and (ii) reduces the amount of communication to/from the cloud server. Our privacy-preserving hierarchical model-distributed inference, privateMDI design uses additive secret sharing and linearly homomorphic encryption to handle linear calculations in the ML inference, and garbled circuit and a novel three-party oblivious transfer are used to handle non-linear functions. privateMDI consists of offline and online phases. We designed these phases in a way that most of the data exchange is done in the offline phase while the communication overhead of the online phase is reduced. In particular, there is no communication to/from the cloud server in the online phase, and the amount of communication between the client and edge servers is minimized. The experimental results demonstrate that privateMDI significantly reduces the ML inference time as compared to the baselines.

Updated: 2024-07-25 19:39:03

标题: 在边缘进行隐私保护的模型分布式推理

摘要: 本文侧重于为分层设置设计隐私保护的机器学习（ML）推理协议，其中客户拥有/生成数据，模型所有者（云服务器）拥有预训练的ML模型，边缘服务器使用云服务器的ML模型对客户数据执行ML推理。我们的目标是加快ML推理速度，同时保护数据和ML模型的隐私。我们的方法是（i）在边缘服务器上使用模型分布式推理（模型并行化），并（ii）减少与云服务器之间的通信量。我们的隐私保护分层模型分布式推理privateMDI设计采用加法秘密共享和线性同态加密来处理ML推理中的线性计算，并使用混淆电路和一种新颖的三方遗忘传输来处理非线性函数。privateMDI包括离线和在线阶段。我们设计这些阶段的方式是，在离线阶段大部分数据交换完成，而在线阶段的通信开销减少。特别是，在在线阶段不与云服务器进行通信，客户与边缘服务器之间的通信量最小化。实验结果表明，与基线相比，privateMDI显著缩短了ML推理时间。

更新时间: 2024-07-25 19:39:03

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.18353v1

Introducing δ-XAI: a novel sensitivity-based method for local AI explanations

Explainable Artificial Intelligence (XAI) is central to the debate on integrating Artificial Intelligence (AI) and Machine Learning (ML) algorithms into clinical practice. High-performing AI/ML models, such as ensemble learners and deep neural networks, often lack interpretability, hampering clinicians' trust in their predictions. To address this, XAI techniques are being developed to describe AI/ML predictions in human-understandable terms. One promising direction is the adaptation of sensitivity analysis (SA) and global sensitivity analysis (GSA), which inherently rank model inputs by their impact on predictions. Here, we introduce a novel delta-XAI method that provides local explanations of ML model predictions by extending the delta index, a GSA metric. The delta-XAI index assesses the impact of each feature's value on the predicted output for individual instances in both regression and classification problems. We formalize the delta-XAI index and provide code for its implementation. The delta-XAI method was evaluated on simulated scenarios using linear regression models, with Shapley values serving as a benchmark. Results showed that the delta-XAI index is generally consistent with Shapley values, with notable discrepancies in models with highly impactful or extreme feature values. The delta-XAI index demonstrated higher sensitivity in detecting dominant features and handling extreme feature values. Qualitatively, the delta-XAI provides intuitive explanations by leveraging probability density functions, making feature rankings clearer and more explainable for practitioners. Overall, the delta-XAI method appears promising for robustly obtaining local explanations of ML model predictions. Further investigations in real-world clinical settings will be conducted to evaluate its impact on AI-assisted clinical workflows.

Updated: 2024-07-25 19:07:49

标题: 引入δ-XAI：一种基于敏感性的局部人工智能解释方法

摘要: 可解释的人工智能（XAI）是关于将人工智能（AI）和机器学习（ML）算法整合到临床实践中的辩论中心。高性能的AI/ML模型，如集成学习器和深度神经网络，通常缺乏可解释性，影响了临床医生对其预测的信任。为了解决这个问题，XAI技术正在被开发，以以人类可理解的方式描述AI/ML的预测。一个有前途的方向是适应灵敏度分析（SA）和全局灵敏度分析（GSA），它们根据其对预测的影响对模型输入进行排序。在这里，我们介绍了一种新颖的delta-XAI方法，通过扩展delta指数（GSA度量），为ML模型的预测提供局部解释。delta-XAI指数评估每个特征值对回归和分类问题中单个实例的预测输出的影响。我们形式化了delta-XAI指数并提供了其实施代码。使用线性回归模型对模拟场景进行了delta-XAI方法的评估，Shapley值作为基准。结果显示，delta-XAI指数通常与Shapley值一致，在具有高影响力或极端特征值的模型中存在明显差异。delta-XAI指数在检测主导特征和处理极端特征值方面表现出更高的灵敏性。从定性上看，delta-XAI通过利用概率密度函数提供直观的解释，使特征排名对从业者更加清晰和可解释。总的来说，delta-XAI方法似乎有望稳健地获取ML模型预测的局部解释。将在真实世界的临床环境中进行进一步的研究，以评估其对AI辅助临床工作流程的影响。

更新时间: 2024-07-25 19:07:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.18343v1

Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably against two strong baselines.

Updated: 2024-07-25 18:58:51

标题: 测量和控制语言模型对话中的指导（不）稳定性

摘要: 系统提示是定制语言模型聊天机器人的标准工具，使它们能够遵循特定的指令。在使用系统提示时的一个隐含假设是它们将是稳定的，因此聊天机器人将会根据规定的指令在整个对话过程中生成文本。我们提出了一个量化基准来测试这一假设，通过两个受指导的聊天机器人之间的自我对话来评估指令的稳定性。通过测试像LLaMA2-chat-70B和GPT-3.5这样的流行模型，我们发现在八轮对话中存在显著的指令漂移。通过对这一现象的实证和理论分析，我们认为变压器注意力机制起了作用，因为在长时间交流中存在注意力衰减。为了对抗注意力衰减和指令漂移，我们提出了一种称为split-softmax的轻量级方法，与两个强基线方法相比效果显著。

更新时间: 2024-07-25 18:58:51

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.10962v4

Combining Cognitive and Generative AI for Self-explanation in Interactive AI Agents

The Virtual Experimental Research Assistant (VERA) is an inquiry-based learning environment that empowers a learner to build conceptual models of complex ecological systems and experiment with agent-based simulations of the models. This study investigates the convergence of cognitive AI and generative AI for self-explanation in interactive AI agents such as VERA. From a cognitive AI viewpoint, we endow VERA with a functional model of its own design, knowledge, and reasoning represented in the Task--Method--Knowledge (TMK) language. From the perspective of generative AI, we use ChatGPT, LangChain, and Chain-of-Thought to answer user questions based on the VERA TMK model. Thus, we combine cognitive and generative AI to generate explanations about how VERA works and produces its answers. The preliminary evaluation of the generation of explanations in VERA on a bank of 66 questions derived from earlier work appears promising.

Updated: 2024-07-25 18:46:11

标题: 结合认知和生成AI，实现交互式AI代理的自我解释

摘要: 虚拟实验研究助手（VERA）是一种基于探究式学习的环境，使学习者能够建立复杂生态系统的概念模型，并通过基于代理的模拟实验模型。本研究探讨了认知人工智能和生成人工智能在交互式AI代理（如VERA）中自解释的融合。从认知人工智能的角度看，我们赋予VERA以其自身设计、知识和推理在任务-方法-知识（TMK）语言中表示的功能模型。从生成人工智能的角度看，我们使用ChatGPT、LangChain和Chain-of-Thought根据VERA TMK模型回答用户问题。因此，我们结合认知和生成人工智能来生成关于VERA如何工作和产生其答案的解释。对VERA在66个问题库中生成解释的初步评估似乎很有前景。

更新时间: 2024-07-25 18:46:11

领域: cs.AI

下载: http://arxiv.org/abs/2407.18335v1

Topological Data Analysis in smart manufacturing: State of the art and futuredirections

Topological Data Analysis (TDA) is a discipline that applies algebraic topology techniques to analyze complex, multi-dimensional data. Although it is a relatively new field, TDA has been widely and successfully applied across various domains, such as medicine, materials science, and biology. This survey provides an overview of the state of the art of TDA within a dynamic and promising application area: industrial manufacturing and production, particularly within the Industry 4.0 context. We have conducted a rigorous and reproducible literature search focusing on TDA applications in industrial production and manufacturing settings. The identified works are categorized based on their application areas within the manufacturing process and the types of input data. We highlight the principal advantages of TDA tools in this context, address the challenges encountered and the future potential of the field. Furthermore, we identify TDA methods that are currently underexploited in specific industrial areas and discuss how their application could be beneficial, with the aim of stimulating further research in this field. This work seeks to bridge the theoretical advancements in TDA with the practical needs of industrial production. Our goal is to serve as a guide for practitioners and researchers applying TDA in industrial production and manufacturing systems. We advocate for the untapped potential of TDA in this domain and encourage continued exploration and research.

Updated: 2024-07-25 18:44:45

标题: 智能制造中的拓扑数据分析：现状与未来方向

摘要: 拓扑数据分析（TDA）是一门将代数拓扑技术应用于分析复杂、多维数据的学科。尽管这是一个相对较新的领域，但TDA已被广泛且成功地应用于各个领域，如医学、材料科学和生物学。本调查综述了TDA在一个充满活力和前景的应用领域中的最新技术状态：工业制造和生产，特别是在工业4.0背景下。我们进行了严谨和可重复的文献搜索，重点关注TDA在工业生产和制造环境中的应用。识别出的作品根据其在制造过程中的应用领域和输入数据类型进行分类。我们突出了TDA工具在这一领域的主要优势，解决了所遇到的挑战以及该领域的未来潜力。此外，我们确定了目前在特定工业领域中未充分利用的TDA方法，并讨论了它们的应用如何有益，旨在激发该领域的进一步研究。这项工作旨在将TDA的理论进展与工业生产的实际需求联系起来。我们的目标是为在工业生产和制造系统中应用TDA的从业者和研究人员提供指导。我们倡导TDA在这一领域中尚未发掘的潜力，并鼓励继续探索和研究。

更新时间: 2024-07-25 18:44:45

领域: cs.LG,math.AT,stat.AP

下载: http://arxiv.org/abs/2310.09319v3

Anticipatory Music Transformer

We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.

Updated: 2024-07-25 18:35:33

标题: 预测性音乐变换器

摘要: 我们介绍了预测：一种构建可控制的时间点过程（事件过程）的生成模型的方法，该模型在异步条件下依赖于第二个相关过程（控制过程）的实现。我们通过交错事件和控制序列来实现这一点，使得控制在事件序列中的停止时间之后出现。这项工作受到符号音乐生成控制中出现的问题的启发。我们专注于填充控制任务，其中控制是事件本身的子集，并且在给定固定控制事件的情况下完成事件序列的条件生成。我们使用大型且多样化的Lakh MIDI音乐数据集训练预测填充模型。这些模型与提示音乐生成的自回归模型的性能相匹配，并具有执行填充控制任务（包括伴奏）的额外能力。人类评估者报告称，预测模型生成的伴奏与人类在20秒片段中创作的音乐具有相似的音乐性。

更新时间: 2024-07-25 18:35:33

领域: cs.SD,cs.LG,eess.AS,stat.ML

下载: http://arxiv.org/abs/2306.08620v2

CavDetect: A DBSCAN Algorithm based Novel Cavity Detection Model on Protein Structure

Cavities on the structures of proteins are formed due to interaction between proteins and some small molecules, known as ligands. These are basically the locations where ligands bind with proteins. Actual detection of such locations is all-important to succeed in the entire drug design process. This study proposes a Voronoi Tessellation based novel cavity detection model that is used to detect cavities on the structure of proteins. As the atom space of protein structure is dense and of large volumes and the DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm can handle such type of data very well as well as it is not mandatory to have knowledge about the numbers of clusters (cavities) in data as priori in this algorithm, this study proposes to implement the proposed algorithm with the DBSCAN algorithm.

Updated: 2024-07-25 18:18:24

标题: CavDetect：基于DBSCAN算法的蛋白质结构新型空腔检测模型

摘要: 蛋白质结构上的空腔是由蛋白质与一些小分子，即配体之间的相互作用形成的。这些基本上是配体与蛋白质结合的位置。实际检测这些位置对于成功完成整个药物设计过程至关重要。本研究提出了一种基于Voronoi分割的新型空腔检测模型，用于检测蛋白质结构上的空腔。由于蛋白质结构的原子空间密集且体积大，而DBSCAN（基于密度的带噪声空间聚类应用）算法可以很好地处理这种类型的数据，而且在此算法中事先不需要知道数据中的簇（空腔）的数量，因此本研究建议将提出的算法与DBSCAN算法结合实施。

更新时间: 2024-07-25 18:18:24

领域: cs.LG,q-bio.QM,92-xx (Primary) 92-04

下载: http://arxiv.org/abs/2407.18317v1

Affectively Framework: Towards Human-like Affect-Based Agents

Game environments offer a unique opportunity for training virtual agents due to their interactive nature, which provides diverse play traces and affect labels. Despite their potential, no reinforcement learning framework incorporates human affect models as part of their observation space or reward mechanism. To address this, we present the \emph{Affectively Framework}, a set of Open-AI Gym environments that integrate affect as part of the observation space. This paper introduces the framework and its three game environments and provides baseline experiments to validate its effectiveness and potential.

Updated: 2024-07-25 18:18:10

标题: 情感框架：走向类人情感为基础的代理程序

摘要: 游戏环境为训练虚拟代理提供了独特的机会，因为它们的互动性质提供了多样的游戏轨迹和情感标签。尽管具有潜力，但没有强化学习框架将人类情感模型作为观察空间或奖励机制的一部分。为了解决这个问题，我们提出了“Affectively Framework”，这是一组集成情感作为观察空间的Open-AI Gym环境。本文介绍了该框架及其三个游戏环境，并提供了基准实验来验证其有效性和潜力。

更新时间: 2024-07-25 18:18:10

领域: cs.AI

下载: http://arxiv.org/abs/2407.18316v1

Revolutionizing Undergraduate Learning: CourseGPT and Its Generative AI Advancements

Integrating Generative AI (GenAI) into educational contexts presents a transformative potential for enhancing learning experiences. This paper introduces CourseGPT, a generative AI tool designed to support instructors and enhance the educational experiences of undergraduate students. Built on open-source Large Language Models (LLMs) from Mistral AI, CourseGPT offers continuous instructor support and regular updates to course materials, enriching the learning environment. By utilizing course-specific content, such as slide decks and supplementary readings and references, CourseGPT provides precise, dynamically generated responses to student inquiries. Unlike generic AI models, CourseGPT allows instructors to manage and control the responses, thus extending the course scope without overwhelming details. The paper demonstrates the application of CourseGPT using the CPR E 431 - Basics of Information System Security course as a pilot. This course, with its large enrollments and diverse curriculum, serves as an ideal testbed for CourseGPT. The tool aims to enhance the learning experience, accelerate feedback processes, and streamline administrative tasks. The study evaluates CourseGPT's impact on student outcomes, focusing on correctness scores, context recall, and faithfulness of responses. Results indicate that the Mixtral-8x7b model, with a higher parameter count, outperforms smaller models, achieving an 88.0% correctness score and a 66.6% faithfulness score. Additionally, feedback from former students and teaching assistants on CourseGPT's accuracy, helpfulness, and overall performance was collected. The outcomes revealed that a significant majority found CourseGPT to be highly accurate and beneficial in addressing their queries, with many praising its ability to provide timely and relevant information.

Updated: 2024-07-25 18:02:16

标题: 彻底改变本科生学习：CourseGPT及其生成式人工智能进展

摘要: 将生成式人工智能（GenAI）整合到教育环境中具有提升学习体验的潜力。本文介绍了CourseGPT，这是一款旨在支持教师并增强本科生教育体验的生成式人工智能工具。基于Mistral AI的开源大型语言模型（LLMs）构建的CourseGPT提供持续的教师支持和定期更新课程材料，丰富学习环境。通过利用特定课程内容，如幻灯片和补充阅读和参考资料，CourseGPT为学生的查询提供精准、动态生成的响应。与通用人工智能模型不同，CourseGPT允许教师管理和控制响应，从而扩展课程范围而不失细节。本文以CPR E 431-信息系统安全基础课程作为试点，展示了CourseGPT的应用。该课程的大量学生和多样化课程使其成为CourseGPT的理想测试平台。该工具旨在增强学习体验，加快反馈过程，并简化管理任务。研究评估了CourseGPT对学生成果的影响，关注正确性得分、上下文回忆和响应的忠实度。结果表明，Mixtral-8x7b模型，具有更高的参数数量，优于较小的模型，获得了88.0%的正确性得分和66.6%的忠实度得分。此外，收集了来自前学生和助教对CourseGPT准确性、帮助性和整体表现的反馈。结果显示，绝大多数人认为CourseGPT非常准确，并且在解决问题时非常有益，许多人称赞其提供及时和相关信息的能力。

更新时间: 2024-07-25 18:02:16

领域: cs.ET,cs.AI

下载: http://arxiv.org/abs/2407.18310v1

Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis

Assessing the robustness of multimodal models against adversarial examples is an important aspect for the safety of its users. We craft L0-norm perturbation attacks on the preprocessed input images. We launch them in a black-box setup against four multimodal models and two unimodal DNNs, considering both targeted and untargeted misclassification. Our attacks target less than 0.04% of perturbed image area and integrate different spatial positioning of perturbed pixels: sparse positioning and pixels arranged in different contiguous shapes (row, column, diagonal, and patch). To the best of our knowledge, we are the first to assess the robustness of three state-of-the-art multimodal models (ALIGN, AltCLIP, GroupViT) against different sparse and contiguous pixel distribution perturbations. The obtained results indicate that unimodal DNNs are more robust than multimodal models. Furthermore, models using CNN-based Image Encoder are more vulnerable than models with ViT - for untargeted attacks, we obtain a 99% success rate by perturbing less than 0.02% of the image area.

Updated: 2024-07-25 17:59:48

标题: 稀疏对抗像素扰动与连续对抗像素扰动在多模态模型中的比较：实证分析

摘要: 评估多模态模型对抗性示例的稳健性是其用户安全的重要方面。我们对预处理输入图像进行了L0范数扰动攻击。我们在黑盒设置中针对四个多模态模型和两个单模态DNN发起攻击，考虑了有针对性和无针对性的错误分类。我们的攻击目标少于0.04%的扰动图像区域，并集成了不同的扰动像素空间定位：稀疏定位和像素排列成不同的连续形状（行、列、对角线和补丁）。据我们所知，我们是第一批评估三种最先进的多模态模型（ALIGN、AltCLIP、GroupViT）对抗不同稀疏和连续像素分布扰动的稳健性。所得结果表明，单模态DNN比多模态模型更稳健。此外，使用基于CNN的图像编码器的模型比使用ViT的模型更容易受到攻击-对于无目标攻击，我们通过扰动不到0.02%的图像区域获得了99%的成功率。

更新时间: 2024-07-25 17:59:48

领域: cs.CV,cs.CR,cs.LG,I.2.0; I.4.0

下载: http://arxiv.org/abs/2407.18251v1

VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads

Human head detection, keypoint estimation, and 3D head model fitting are important tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce VGGHeads -- a large scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset we introduce a new model architecture capable of simultaneous heads detection and head meshes reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads. Additionally, we provide detailed information about the synthetic data generation pipeline, enabling it to be re-used for other tasks and domains.

Updated: 2024-07-25 17:58:17

标题: VGGHeads：用于3D人头的大规模合成数据集

摘要: 人头检测、关键点估计和3D头部模型拟合是许多应用中重要的任务。然而，传统的真实世界数据集经常受到偏见、隐私和道德关注的困扰，并且它们是在实验室环境中记录的，这使得训练模型很难泛化。在这里，我们介绍了VGGHeads - 一个使用扩散模型生成的大规模合成数据集，用于人头检测和3D网格估计。我们的数据集包括超过100万张高分辨率图像，每张图像都标注有详细的3D头部网格、面部标志和边界框。使用这个数据集，我们介绍了一种新的模型架构，能够在一个步骤中从单个图像中同时进行头部检测和头部网格重建。通过广泛的实验评估，我们证明了在我们的合成数据上训练的模型在真实图像上能够取得强大的性能。此外，我们数据集的多功能性使其适用于广泛的任务，提供了人头的一般和全面的表示。此外，我们提供了有关合成数据生成管道的详细信息，使其能够被重新用于其他任务和领域。

更新时间: 2024-07-25 17:58:17

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.18245v1

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Low-Rank Adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning foundation models by re-parameterizing the original matrix into the product of two low-rank matrices. Despite its efficiency, LoRA often yields inferior performance compared to full fine-tuning. In this paper, we propose LoRA-Pro to bridge this performance gap. Firstly, we delve into the optimization processes in LoRA and full fine-tuning. We reveal that while LoRA employs low-rank approximation, it neglects to approximate the optimization process of full fine-tuning. To address this, we introduce a novel concept called the "equivalent gradient." This virtual gradient makes the optimization process on the re-parameterized matrix equivalent to LoRA, which can be used to quantify the differences between LoRA and full fine-tuning. The equivalent gradient is derived from the gradients of matrices $A$ and $B$. To narrow the performance gap, our approach minimizes the differences between the equivalent gradient and the gradient obtained from full fine-tuning during the optimization process. By solving this objective, we derive optimal closed-form solutions for updating matrices $A$ and $B$. Our method constrains the optimization process, shrinking the performance gap between LoRA and full fine-tuning. Extensive experiments on natural language processing tasks validate the effectiveness of our method.

Updated: 2024-07-25 17:57:12

标题: LoRA-Pro：低秩适配器是否得到了适当优化？

摘要: 低秩适应，也称为LoRA，已经成为一种通过重新参数化原始矩阵为两个低秩矩阵的乘积来进行参数高效微调基础模型的突出方法。尽管LoRA效率高，但通常与完全微调相比表现较差。本文提出了LoRA-Pro以弥合这种性能差距。首先，我们深入研究了LoRA和完全微调中的优化过程。我们揭示了虽然LoRA采用低秩逼近，但却忽略了近似完全微调的优化过程。为了解决这个问题，我们引入了一个称为“等效梯度”的新概念。这个虚拟梯度使得在重新参数化的矩阵上的优化过程等效于LoRA，可以用来量化LoRA和完全微调之间的差异。等效梯度是从矩阵$A$和$B$的梯度中推导出来的。为了缩小性能差距，我们的方法在优化过程中最小化等效梯度与从完全微调获得的梯度之间的差异。通过解决这个目标，我们得出了更新矩阵$A$和$B$的最优闭式解。我们的方法限制了优化过程，缩小了LoRA和完全微调之间的性能差距。在自然语言处理任务上进行的大量实验证实了我们方法的有效性。

更新时间: 2024-07-25 17:57:12

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.18242v1

Numerical Literals in Link Prediction: A Critical Examination of Models and Datasets

Link Prediction(LP) is an essential task over Knowledge Graphs(KGs), traditionally focussed on using and predicting the relations between entities. Textual entity descriptions have already been shown to be valuable, but models that incorporate numerical literals have shown minor improvements on existing benchmark datasets. It is unclear whether a model is actually better in using numerical literals, or better capable of utilizing the graph structure. This raises doubts about the effectiveness of these methods and about the suitability of the existing benchmark datasets. We propose a methodology to evaluate LP models that incorporate numerical literals. We propose i) a new synthetic dataset to better understand how well these models use numerical literals and ii) dataset ablations strategies to investigate potential difficulties with the existing datasets. We identify a prevalent trend: many models underutilize literal information and potentially rely on additional parameters for performance gains. Our investigation highlights the need for more extensive evaluations when releasing new models and datasets.

Updated: 2024-07-25 17:55:33

标题: 链接预测中的数字文献：对模型和数据集的关键审查

摘要: 链接预测（LP）是知识图（KGs）上的一项基本任务，传统上侧重于使用和预测实体之间的关系。文本实体描述已被证明具有价值，但融入数值文字的模型在现有基准数据集上显示出了轻微的改进。目前尚不清楚一个模型是否在使用数值文字方面更好，还是在利用图结构方面更好。这引发了对这些方法的有效性以及现有基准数据集的适用性的疑问。我们提出了一种评估融入数值文字的LP模型的方法论。我们提出了i）一个新的合成数据集，以更好地了解这些模型如何使用数值文字，以及ii）数据集消融策略，以调查现有数据集可能存在的困难。我们发现一个普遍的趋势：许多模型未充分利用文字信息，可能依赖于额外的参数来提高性能。我们的调查凸显了在发布新模型和数据集时需要进行更广泛的评估的必要性。

更新时间: 2024-07-25 17:55:33

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2407.18241v1

Dr. Jekyll and Mr. Hyde: Two Faces of LLMs

Recently, we have witnessed a rise in the use of Large Language Models (LLMs), especially in applications like chatbot assistants. Safety mechanisms and specialized training procedures are implemented to prevent improper responses from these assistants. In this work, we bypass these measures for ChatGPT and Gemini (and, to some extent, Bing chat) by making them impersonate complex personas with personality characteristics that are not aligned with a truthful assistant. We start by creating elaborate biographies of these personas, which we then use in a new session with the same chatbots. Our conversations then follow a role-play style to elicit prohibited responses. Using personas, we show that prohibited responses are actually provided, making it possible to obtain unauthorized, illegal, or harmful information. This work shows that by using adversarial personas, one can overcome safety mechanisms set out by ChatGPT and Gemini. We also introduce several ways of activating such adversarial personas, which show that both chatbots are vulnerable to this kind of attack. With the same principle, we introduce two defenses that push the model to interpret trustworthy personalities and make it more robust against such attacks.

Updated: 2024-07-25 17:54:12

标题: Dr. Jekyll and Mr. Hyde：LLMs的两面象征

摘要: 最近，我们目睹了大型语言模型（LLMs）的使用增加，尤其是在聊天机器人助手等应用中。为了防止这些助手产生不当回应，安全机制和专门的训练程序被实施。在这项工作中，我们通过让ChatGPT和Gemini（在某种程度上还有必应聊天）模仿与真实助手不符的人格特征来绕过这些措施。我们首先创建这些人物的详细传记，然后在与相同聊天机器人的新会话中使用。我们的对话随后以角色扮演的方式进行，以引出被禁止的回应。通过使用人物，我们展示了被禁止的回应实际上被提供，从而可能获取未经授权、非法或有害信息。这项工作表明，通过使用对抗性人物，可以克服ChatGPT和Gemini设定的安全机制。我们还介绍了几种激活这种对抗性人物的方法，表明这两种聊天机器人都容易受到这种攻击。基于同样的原则，我们介绍了两种推动模型解释值得信赖人格并使其更加强大抵御此类攻击的防御措施。

更新时间: 2024-07-25 17:54:12

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.03853v4

Can time series forecasting be automated? A benchmark and analysis

In the field of machine learning and artificial intelligence, time series forecasting plays a pivotal role across various domains such as finance, healthcare, and weather. However, the task of selecting the most suitable forecasting method for a given dataset is a complex task due to the diversity of data patterns and characteristics. This research aims to address this challenge by proposing a comprehensive benchmark for evaluating and ranking time series forecasting methods across a wide range of datasets. This study investigates the comparative performance of many methods from two prominent time series forecasting frameworks, AutoGluon-Timeseries, and sktime to shed light on their applicability in different real-world scenarios. This research contributes to the field of time series forecasting by providing a robust benchmarking methodology and facilitating informed decision-making when choosing forecasting methods for achieving optimal prediction.

Updated: 2024-07-25 17:53:38

标题: 时间序列预测能自动化吗？一项基准和分析

摘要: 在机器学习和人工智能领域，时间序列预测在金融、医疗保健和天气等各个领域起着至关重要的作用。然而，由于数据模式和特征的多样性，为给定数据集选择最合适的预测方法是一项复杂的任务。本研究旨在通过提出一个全面的基准来评估和排名各种时间序列预测方法，以解决这一挑战。该研究调查了来自两个著名时间序列预测框架AutoGluon-Timeseries和sktime的许多方法的比较性能，以揭示它们在不同实际场景中的适用性。本研究通过提供一个健壮的基准方法论，促进了在选择预测方法以实现最佳预测时的信息决策。

更新时间: 2024-07-25 17:53:38

领域: cs.LG

下载: http://arxiv.org/abs/2407.16445v2

Block Verification Accelerates Speculative Decoding

Speculative decoding is an effective method for lossless acceleration of large language models during inference. It uses a fast model to draft a block of tokens which are then verified in parallel by the target model, and provides a guarantee that the output is distributed identically to a sample from the target model. In prior works, draft verification is performed independently token-by-token. Surprisingly, we show that this approach is not optimal. We propose Block Verification, a simple draft verification algorithm that verifies the entire block jointly and provides additional wall-clock speedup. We prove that the proposed mechanism is optimal in the expected number of tokens produced each iteration and specifically is never worse than the standard token-level verification. Empirically, block verification provides modest but consistent wall-clock speedups over the standard token verification algorithm of 5%-8% in a range of tasks and datasets. Given that block verification does not increase code complexity, maintains the strong lossless guarantee of the standard speculative decoding verification algorithm, cannot deteriorate performance, and, in fact, consistently improves it, it can be used as a good default in speculative decoding implementations.

Updated: 2024-07-25 17:51:50

标题: 区块验证加速推测解码

摘要: 推测解码是在推理过程中加速大型语言模型的无损方法。它使用一个快速模型起草一块令牌，然后由目标模型并行验证，提供确保输出与目标模型样本分布相同的保证。在先前的工作中，草稿验证是逐个令牌独立执行的。令人惊讶的是，我们发现这种方法并不是最佳的。我们提出了块验证，这是一种简单的草稿验证算法，可以联合验证整个块，并提供额外的墙钟加速。我们证明了所提出的机制在每次迭代产生的令牌数量上是最佳的，特别是从未比标准令牌级验证更差。从经验上看，块验证在一系列任务和数据集中相对标准令牌验证算法提供了适度但一致的墙钟加速，为5%-8%。考虑到块验证不增加代码复杂性，保持了标准推测解码验证算法的强大无损保证，不会降低性能，事实上，还会一贯改进性能，因此可以作为推测解码实现中的良好默认选项。

更新时间: 2024-07-25 17:51:50

领域: cs.LG,cs.CL,cs.DS,cs.IT,math.IT

下载: http://arxiv.org/abs/2403.10444v2

Automated Ensemble Multimodal Machine Learning for Healthcare

The application of machine learning in medicine and healthcare has led to the creation of numerous diagnostic and prognostic models. However, despite their success, current approaches generally issue predictions using data from a single modality. This stands in stark contrast with clinician decision-making which employs diverse information from multiple sources. While several multimodal machine learning approaches exist, significant challenges in developing multimodal systems remain that are hindering clinical adoption. In this paper, we introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning. AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies. In an illustrative application using a multimodal skin lesion dataset, we highlight the importance of multimodal machine learning and the power of combining multiple fusion strategies using ensemble learning. We have open-sourced our framework as a tool for the community and hope it will accelerate the uptake of multimodal machine learning in healthcare and spur further innovation.

Updated: 2024-07-25 17:46:38

标题: 自动集成多模态机器学习在医疗保健领域的应用

摘要: 医学和医疗保健中机器学习的应用已经导致了大量的诊断和预测模型的创建。然而，尽管它们取得了成功，当前的方法通常使用来自单一模态的数据进行预测。这与临床医生的决策方式形成鲜明对比，后者利用来自多个来源的多样信息。虽然存在几种多模式机器学习方法，但在开发多模式系统方面仍然存在重大挑战，这些挑战妨碍了临床的采用。在本文中，我们介绍了一个多模式框架AutoPrognosis-M，它使用自动化机器学习将结构化临床（表格）数据和医学图像进行集成。AutoPrognosis-M包括17个图像模型，包括卷积神经网络和视觉转换器，以及三种不同的多模式融合策略。在使用多模式皮肤病变数据集进行说明性应用中，我们强调了多模式机器学习的重要性以及使用集成学习结合多种融合策略的强大功能。我们已经将我们的框架开源，作为社区的工具，并希望它将加速医疗保健中多模式机器学习的采用，并促进进一步的创新。

更新时间: 2024-07-25 17:46:38

领域: cs.LG

下载: http://arxiv.org/abs/2407.18227v1

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

A central piece in enabling intelligent agentic behavior in foundation models is to make them capable of introspecting upon their behavior, reasoning, and correcting their mistakes as more computation or interaction is available. Even the strongest proprietary large language models (LLMs) do not quite exhibit the ability of continually improving their responses sequentially, even in scenarios where they are explicitly told that they are making a mistake. In this paper, we develop RISE: Recursive IntroSpEction, an approach for fine-tuning LLMs to introduce this capability, despite prior work hypothesizing that this capability may not be possible to attain. Our approach prescribes an iterative fine-tuning procedure, which attempts to teach the model how to alter its response after having executed previously unsuccessful attempts to solve a hard test-time problem, with optionally additional environment feedback. RISE poses fine-tuning for a single-turn prompt as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt. Inspired by principles in online imitation learning and reinforcement learning, we propose strategies for multi-turn data collection and training so as to imbue an LLM with the capability to recursively detect and correct its previous mistakes in subsequent iterations. Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks, outperforming several single-turn strategies given an equal amount of inference-time computation. We also find that RISE scales well, often attaining larger benefits with more capable models. Our analysis shows that RISE makes meaningful improvements to responses to arrive at the correct solution for challenging prompts, without disrupting one-turn abilities as a result of expressing more complex distributions.

Updated: 2024-07-25 17:35:59

标题: 递归反思：教语言模型代理如何自我改进

摘要: 在基础模型中实现智能代理行为的关键是使它们能够对自己的行为、推理和错误进行内省，并在更多计算或交互可用时进行纠正。即使是最强大的专有大型语言模型（LLMs），也不能连续改进它们的响应，即使在明确告知它们犯错的情况下也是如此。在本文中，我们开发了RISE：递归内省方法，用于微调LLMs以引入这种能力，尽管先前的研究假设这种能力可能无法实现。我们的方法规定了一个迭代的微调过程，试图教会模型如何在执行先前未成功的尝试解决难题后改变其响应，可选地获得额外的环境反馈。RISE将单轮提示的微调看作解决多轮马尔可夫决策过程（MDP），其中初始状态为提示。受在线模仿学习和强化学习原则的启发，我们提出了多轮数据收集和训练策略，以赋予LLM递归检测和纠正其先前错误的能力。我们的实验表明，RISE使Llama2、Llama3和Mistral模型在数学推理任务中随着轮数的增加得以改进，优于几种单轮策略在相同推理时间计算的情况下。我们还发现，RISE的扩展性良好，通常随着模型能力的增强而获得更大的收益。我们的分析显示，RISE对于解决具有挑战性提示的响应进行了实质性改进，而不会由于表达更复杂的分布而破坏单轮能力。

更新时间: 2024-07-25 17:35:59

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.18219v1

Exploring Scaling Trends in LLM Robustness

Language model capabilities predictably improve from scaling a model's size and training data. Motivated by this, increasingly large language models have been trained, yielding an array of impressive capabilities. Yet these models are vulnerable to adversarial prompts, such as "jailbreaks" that hijack models to perform undesired behaviors, posing a significant risk of misuse. Prior work indicates that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically, finding that larger models respond substantially better to adversarial training, but there is little to no benefit from model scale in the absence of explicit defenses.

Updated: 2024-07-25 17:26:41

标题: 探讨LLM稳健性的缩放趋势

摘要: 语言模型的能力可以通过扩大模型规模和训练数据来可靠地提升。受此启发，越来越大的语言模型被训练出来，展现出一系列令人印象深刻的能力。然而，这些模型容易受到对抗性提示的影响，比如“越狱”，这些提示会劫持模型执行不期望的行为，构成了很大的被误用的风险。先前的研究表明，计算机视觉模型通过模型和数据的扩展变得更加稳健，这引发了一个问题：语言模型的稳健性是否也会随着规模的扩大而提升？我们通过实证研究这个问题，发现较大规模的模型在对抗性训练中表现更好，但在缺乏明确防御措施的情况下，模型规模几乎没有任何好处。

更新时间: 2024-07-25 17:26:41

领域: cs.LG,cs.AI,cs.CL,cs.CR,I.2.7

下载: http://arxiv.org/abs/2407.18213v1

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly primitives in both the attention and multi-layer perceptron (MLP) layers of an LLM. However, current reparameterization techniques require training from scratch or full parameter fine-tuning to restore accuracy, which is resource-intensive for LLMs. To address this, we propose accelerating pretrained LLMs through post-training shift-and-add reparameterization, creating efficient multiplication-free models, dubbed ShiftAddLLM. Specifically, we quantize each weight matrix into binary matrices paired with group-wise scaling factors. The associated multiplications are reparameterized into (1) shifts between activations and scaling factors and (2) queries and adds according to the binary matrices. To reduce accuracy loss, we present a multi-objective optimization method to minimize both weight and output activation reparameterization errors. Additionally, based on varying sensitivity across layers to reparameterization, we develop an automated bit allocation strategy to further reduce memory usage and latency. Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3 and 2 bits, respectively, and more than 80% memory and energy reductions over the original LLMs. Codes and models are available at https://github.com/GATECH-EIC/ShiftAddLLM.

Updated: 2024-07-25 17:20:48

标题: ShiftAddLLM：通过训练后的无乘法重新参数化加速预训练LLMs

摘要: 大型语言模型（LLMs）在语言任务上表现出色，但在资源受限设备上部署时面临挑战，因为它们具有大量参数并依赖密集的乘法运算，导致内存需求高和延迟瓶颈。移位和加法重新参数化通过在LLM的注意力和多层感知器（MLP）层中用硬件友好的原语替换昂贵的乘法，提供了一个有希望的解决方案。然而，当前的重新参数化技术需要从头开始训练或进行全参数微调以恢复准确性，这对LLMs来说是资源密集型的。为了解决这个问题，我们提出了通过后训练的移位和加法重新参数化来加速预训练的LLMs，创建高效的无乘法模型，称为ShiftAddLLM。具体来说，我们将每个权重矩阵量化为与分组缩放因子配对的二进制矩阵。相关的乘法被重新参数化为（1）激活和缩放因子之间的移位和（2）根据二进制矩阵的查询和添加。为了减少准确性损失，我们提出了一种多目标优化方法，以最小化权重和输出激活重新参数化错误。此外，基于各层对重新参数化的敏感性不同，我们开发了一种自动位分配策略，进一步减少内存使用和延迟。对五个LLM系列和八个任务的实验始终验证了ShiftAddLLM的有效性，相比于最具竞争力的3位和2位量化LLMs，在可比或更低的延迟下，平均困惑度提高了5.6和22.7个点，原始LLMs的内存和能量减少超过80%。代码和模型可在https://github.com/GATECH-EIC/ShiftAddLLM 上找到。

更新时间: 2024-07-25 17:20:48

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05981v3

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. However, both the attention mechanism and multi-layer perceptrons (MLPs) in ViTs are not sufficiently efficient due to dense multiplications, leading to costly training and inference. To this end, we propose to reparameterize pre-trained ViTs with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed $\textbf{ShiftAddViT}$, which aims to achieve end-to-end inference speedups on GPUs without requiring training from scratch. Specifically, all $\texttt{MatMuls}$ among queries, keys, and values are reparameterized using additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized with shift kernels. We utilize TVM to implement and optimize those customized kernels for practical hardware deployment on GPUs. We find that such a reparameterization on attention maintains model accuracy, while inevitably leading to accuracy drops when being applied to MLPs. To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e.g., multiplication and shift, and designing a new latency-aware load-balancing loss. Such a loss helps to train a generic router for assigning a dynamic amount of input tokens to different experts according to their latency. Extensive experiments on various 2D/3D Transformer-based vision tasks consistently validate the effectiveness of our proposed ShiftAddViT, achieving up to $\textbf{5.18$\times$}$ latency reductions on GPUs and $\textbf{42.9}$% energy savings, while maintaining a comparable accuracy as original or efficient ViTs.

Updated: 2024-07-25 17:19:31

标题: ShiftAddViT：混合乘法基元实现高效的视觉Transformer

摘要: 视觉Transformer（ViTs）已经展现出令人印象深刻的性能，并已成为多种视觉任务的统一骨干。然而，ViTs中的注意机制和多层感知器（MLPs）都因为密集的乘法而不够高效，导致训练和推断成本高昂。为此，我们提出重新参数化预训练的ViTs，采用一种混合乘法原语，例如比特位移和加法，朝着一种新型减少乘法的模型，命名为ShiftAddViT，旨在在GPU上实现端到端推理加速，而无需从头开始训练。具体来说，将查询、键和值之间的所有MatMuls重新参数化为使用加性核的方式，在将查询和键映射到汉明空间的二进制代码后。然后，剩余的MLPs或线性层通过移位核重新参数化。我们利用TVM来实现和优化这些定制核，以便在GPU上进行实际硬件部署。我们发现，在注意力上进行这种重新参数化可以保持模型的准确性，但在应用于MLPs时不可避免地会导致准确性下降。为了兼顾两者的优势，我们进一步提出了一种新的专家混合（MoE）框架，通过将乘法或其原语作为专家，例如乘法和位移，并设计一种新的延迟感知负载平衡损失来重新参数化MLPs。这种损失有助于训练一个通用的路由器，根据其延迟动态分配不同数量的输入标记给不同的专家。对各种基于2D/3D Transformer的视觉任务进行的大量实验一致验证了我们提出的ShiftAddViT的有效性，实现了高达5.18倍的GPU延迟减少和42.9%的能量节省，同时保持与原始或高效ViTs相当的准确性。

更新时间: 2024-07-25 17:19:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.06446v6

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential solutions, their applicability and synergistic potential for enhancing autoregressive LLMs remain uncertain. We conduct the first comprehensive study on the efficacy of existing linear attention methods for autoregressive LLMs, integrating them with speculative decoding. We introduce an augmentation technique for linear attention that ensures compatibility with speculative decoding, enabling more efficient training and serving of LLMs. Extensive experiments and ablation studies involving seven existing linear attention models and five encoder/decoder-based LLMs consistently validate the effectiveness of our augmented linearized LLMs. Notably, our approach achieves up to a 6.67 reduction in perplexity on the LLaMA model and up to a 2$\times$ speedup during generation compared to prior linear attention methods. Codes and models are available at https://github.com/GATECH-EIC/Linearized-LLM.

Updated: 2024-07-25 17:18:01

标题: 当线性注意力遇到自回归解码：朝着更有效和高效的线性大型语言模型方向

摘要: 自回归大型语言模型（LLMs）在语言任务中取得了令人印象深刻的表现，但面临两个重要瓶颈：（1）随着标记数量增加，注意力模块的二次复杂度，以及（2）自回归LLMs在生成过程中由于顺序处理的特性而限制了效率。虽然线性注意力和推测解码提供了潜在解决方案，但它们对于增强自回归LLMs的适用性和协同潜力仍不确定。我们进行了关于现有线性注意力方法对自回归LLMs有效性的首次全面研究，将它们与推测解码相结合。我们引入了一种增强线性注意力的技术，确保与推测解码兼容，从而实现更高效的LLMs训练和服务。涉及七种现有线性注意力模型和五种基于编码器/解码器的LLMs的广泛实验和消融研究一致验证了我们增强的线性化LLMs的有效性。值得注意的是，与先前的线性注意力方法相比，我们的方法在LLaMA模型上可以将困惑度降低6.67倍，在生成过程中可以提高2倍的速度。代码和模型可在https://github.com/GATECH-EIC/Linearized-LLM找到。

更新时间: 2024-07-25 17:18:01

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07368v2

Geometry Fidelity for Spherical Images

Spherical or omni-directional images offer an immersive visual format appealing to a wide range of computer vision applications. However, geometric properties of spherical images pose a major challenge for models and metrics designed for ordinary 2D images. Here, we show that direct application of Fr\'echet Inception Distance (FID) is insufficient for quantifying geometric fidelity in spherical images. We introduce two quantitative metrics accounting for geometric constraints, namely Omnidirectional FID (OmniFID) and Discontinuity Score (DS). OmniFID is an extension of FID tailored to additionally capture field-of-view requirements of the spherical format by leveraging cubemap projections. DS is a kernel-based seam alignment score of continuity across borders of 2D representations of spherical images. In experiments, OmniFID and DS quantify geometry fidelity issues that are undetected by FID.

Updated: 2024-07-25 17:17:10

标题: 球面图像的几何保真度

摘要: 球形或全向图像提供了一种沉浸式的视觉格式，适用于广泛的计算机视觉应用。然而，球形图像的几何属性对于为普通2D图像设计的模型和指标构成了重大挑战。在这里，我们展示直接应用Fr\'echet Inception Distance (FID)不足以量化球形图像的几何保真度。我们引入了两个考虑几何约束的定量指标，即全向FID (OmniFID)和不连续性分数 (DS)。OmniFID是FID的扩展，旨在通过利用立方体映射进一步捕捉球形格式的视野要求。DS是基于核的缝合对齐分数，用于衡量球形图像的2D表示边界上的连续性。在实验中，OmniFID和DS量化了FID未能检测到的几何保真问题。

更新时间: 2024-07-25 17:17:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.18207v1

Differentiable Quantum Architecture Search in Asynchronous Quantum Reinforcement Learning

The emergence of quantum reinforcement learning (QRL) is propelled by advancements in quantum computing (QC) and machine learning (ML), particularly through quantum neural networks (QNN) built on variational quantum circuits (VQC). These advancements have proven successful in addressing sequential decision-making tasks. However, constructing effective QRL models demands significant expertise due to challenges in designing quantum circuit architectures, including data encoding and parameterized circuits, which profoundly influence model performance. In this paper, we propose addressing this challenge with differentiable quantum architecture search (DiffQAS), enabling trainable circuit parameters and structure weights using gradient-based optimization. Furthermore, we enhance training efficiency through asynchronous reinforcement learning (RL) methods facilitating parallel training. Through numerical simulations, we demonstrate that our proposed DiffQAS-QRL approach achieves performance comparable to manually-crafted circuit architectures across considered environments, showcasing stability across diverse scenarios. This methodology offers a pathway for designing QRL models without extensive quantum knowledge, ensuring robust performance and fostering broader application of QRL.

Updated: 2024-07-25 17:11:00

标题: 异步量子增强学习中的可微分量子架构搜索

摘要: 量子强化学习（QRL）的出现得益于量子计算（QC）和机器学习（ML）的进步，特别是基于变分量子电路（VQC）构建的量子神经网络（QNN）。这些进展已经在解决顺序决策任务方面取得成功。然而，构建有效的QRL模型需要相当的专业知识，因为设计量子电路体系结构存在挑战，包括数据编码和参数化电路，这些因素深刻影响模型性能。在本文中，我们提出使用可微分量子体系结构搜索（DiffQAS）来应对这一挑战，通过基于梯度的优化来使电路参数和结构权重可训练。此外，我们通过异步强化学习（RL）方法增强训练效率，促进并行训练。通过数值模拟，我们展示了我们提出的DiffQAS-QRL方法在考虑的环境中实现了与手工制作电路体系结构相当的性能，展示了在各种情况下的稳定性。这种方法为设计QRL模型提供了一条路径，无需广泛的量子知识，确保稳健的性能，并促进QRL的更广泛应用。

更新时间: 2024-07-25 17:11:00

领域: quant-ph,cs.AI,cs.DC,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.18202v1

Sparse Incremental Aggregation in Multi-Hop Federated Learning

This paper investigates federated learning (FL) in a multi-hop communication setup, such as in constellations with inter-satellite links. In this setup, part of the FL clients are responsible for forwarding other client's results to the parameter server. Instead of using conventional routing, the communication efficiency can be improved significantly by using in-network model aggregation at each intermediate hop, known as incremental aggregation (IA). Prior works [1] have indicated diminishing gains for IA under gradient sparsification. Here we study this issue and propose several novel correlated sparsification methods for IA. Numerical results show that, for some of these algorithms, the full potential of IA is still available under sparsification without impairing convergence. We demonstrate a 15x improvement in communication efficiency over conventional routing and a 11x improvement over state-of-the-art (SoA) sparse IA.

Updated: 2024-07-25 17:09:22

标题: 多跳联邦学习中的稀疏增量聚合

摘要: 本文研究了在多跳通信设置中的联邦学习（FL），例如在具有星间链路的星座中。在这种设置中，部分FL客户端负责将其他客户端的结果转发给参数服务器。与使用传统路由不同，通过在每个中间跳中使用网络内模型聚合（即增量聚合IA），通信效率可以得到显着提高。之前的研究表明，在梯度稀疏化下，IA的收益逐渐减小。在这里，我们研究了这个问题，并提出了几种新的相关稀疏化方法用于IA。数值结果显示，对于其中一些算法，即使在稀疏化下，IA的全部潜力仍然可用而不会影响收敛。我们展示了相比传统路由的通信效率提高了15倍，比最先进的稀疏IA（SoA）提高了11倍。

更新时间: 2024-07-25 17:09:22

领域: cs.DC,cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.18200v1

Wasserstein approximation schemes based on Voronoi partitions

We consider structured approximation of measures in Wasserstein space $\mathrm{W}_p(\mathbb{R}^d)$ for $p\in[1,\infty)$ using general measure approximants compactly supported on Voronoi regions derived from a scaled Voronoi partition of $\mathbb{R}^d$. We show that if a full rank lattice $\Lambda$ is scaled by a factor of $h\in(0,1]$, then approximation of a measure based on the Voronoi partition of $h\Lambda$ is $O(h)$ regardless of $d$ or $p$. We then use a covering argument to show that $N$-term approximations of compactly supported measures is $O(N^{-\frac1d})$ which matches known rates for optimal quantizers and empirical measure approximation in most instances. Additionally, we generalize our construction to nonuniform Voronoi partitions, highlighting the flexibility and robustness of our approach for various measure approximation scenarios. Finally, we extend these results to noncompactly supported measures with sufficient decay. Our findings are pertinent to applications in computer vision and machine learning where measures are used to represent structured data such as images.

Updated: 2024-07-25 17:05:37

标题: 基于Voronoi分区的Wasserstein逼近方案

摘要: 我们考虑在Wasserstein空间$\mathrm{W}_p(\mathbb{R}^d)$中对测度进行结构化逼近，其中$p\in[1,\infty)$，使用一般紧支持在从一个经过缩放的Voronoi分区得到的Voronoi区域上的测度逼近。我们证明，如果一个满秩格$\Lambda$被一个因子$h\in(0,1]$缩放，那么基于$h\Lambda$的Voronoi分区逼近的测度是$O(h)$，不管$d$或$p$是多少。然后我们使用一个覆盖论证来表明，紧支持测度的$N$项逼近是$O(N^{-\frac1d})，这与最优量化器和大多数情况下的经验测度逼近的已知速率相匹配。此外，我们将我们的构造推广到非均匀Voronoi分区，突出了我们方法在各种测度逼近场景中的灵活性和鲁棒性。最后，我们将这些结果推广到有足够衰减的非紧支持测度。我们的发现对于在计算机视觉和机器学习中使用测度来表示结构化数据（如图像）的应用是相关的。

更新时间: 2024-07-25 17:05:37

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2310.09149v2

AutoCodeRover: Autonomous Program Improvement

Researchers have made significant progress in automating the software development process in the past decades. Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless, software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. bug fixing) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving GitHub issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search. The use of spectrum-based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on SWE-bench-lite (300 real-life GitHub issues) show increased efficacy in solving GitHub issues (19% on SWE-bench-lite), which is higher than the efficacy of the recently reported SWE-agent. In addition, AutoCodeRover achieved this efficacy with significantly lower cost (on average, $0.43 USD), compared to other baselines. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved.

Updated: 2024-07-25 16:54:41

标题: AutoCodeRover：自主程序改进

摘要: 在过去几十年里，研究人员在自动化软件开发过程方面取得了显著进展。最近大型语言模型（LLMs）的进展显著影响了开发过程，开发人员可以利用基于LLM的编程助手实现自动编码。然而，软件工程涉及除编码外的程序改进过程，特别是为了实现软件维护（例如缺陷修复）和软件演变（例如功能添加）。在本文中，我们提出了一种自动化方法来解决GitHub问题，以实现程序改进的自主性。在我们称之为AutoCodeRover的方法中，LLMs与复杂的代码搜索功能相结合，最终导致程序修改或补丁。与最近的AI研究人员和实践者的LLM代理方法相比，我们的观点更加软件工程导向。我们将程序表示（抽象语法树）作为研究对象，而不是将软件项目视为简单的文件集合。我们的代码搜索利用类/方法的程序结构来增强LLM对问题根本原因的理解，并通过迭代搜索有效地检索上下文。使用基于频谱的故障定位测试进一步锐化上下文，只要测试套件可用。在SWE-bench-lite（300个真实GitHub问题）上的实验显示，在解决GitHub问题方面效果增加了（SWE-bench-lite提高了19%），这比最近报告的SWE代理的效果更好。此外，与其他基线相比，AutoCodeRover以显著更低的成本（平均0.43美元）实现了这种效果。我们认为我们的工作流程实现了自主软件工程，在未来，LLMs生成的代码可以自动得到改进。

更新时间: 2024-07-25 16:54:41

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.05427v3

A Unified Framework for Model Editing

ROME and MEMIT are largely believed to be two different model editing algorithms, with the major difference between them being the ability to perform batched edits. In this paper, we unify these two algorithms under a single conceptual umbrella, optimizing for the same goal, which we call the preservation-memorization objective. ROME uses an equality constraint to optimize this objective to perform one edit at a time, whereas MEMIT employs a more flexible least-square constraint that allows for batched edits. We generalize ROME and enable batched editing with equality constraint in the form of EMMET - an Equality-constrained Mass Model Editing algorithm for Transformers, a new batched memory-editing algorithm. EMMET can perform batched-edits up to a batch-size of 10,000, with very similar performance to MEMIT across multiple dimensions. With the introduction of EMMET, we truly unify ROME and MEMIT and show that both algorithms are equivalent in terms of their optimization objective, their abilities (singular and batched editing), their model editing performance and their limitations.

Updated: 2024-07-25 16:52:15

标题: 一个统一的模型编辑框架

摘要: ROME和MEMIT被普遍认为是两种不同的模型编辑算法，它们之间的主要区别在于能够进行批量编辑的能力。在本文中，我们将这两种算法统一在一个概念框架下，优化同一目标，我们称之为保存-记忆目标。ROME使用等式约束来优化这个目标以一次执行一次编辑，而MEMIT采用更灵活的最小二乘约束，允许进行批量编辑。我们将ROME泛化，并通过EMMET启用批量编辑，EMMET是一个用于变压器的等式约束质量模型编辑算法，一种新的批量内存编辑算法。EMMET可以进行批量编辑，批量大小可达10,000，在多个维度上与MEMIT表现非常相似。引入EMMET后，我们真正统一了ROME和MEMIT，并展示了这两种算法在优化目标、能力（单个和批量编辑）、模型编辑性能和限制方面的等效性。

更新时间: 2024-07-25 16:52:15

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.14236v4

Regurgitative Training: The Value of Real Data in Training Large Language Models

What happens if we train a new Large Language Model (LLM) using data that are at least partially generated by other LLMs? The explosive success of LLMs means that a substantial amount of content online will be generated by LLMs rather than humans, which will inevitably enter the training datasets of next-generation LLMs. We evaluate the implications of such "regurgitative training" on LLM performance. Through fine-tuning GPT-3.5 with data generated either by itself or by other LLMs in a machine translation task, we find strong evidence that regurgitative training clearly handicaps the performance of LLMs. The same performance loss of regurgitative training is observed on transformer models that we train from scratch. We find suggestive evidence that the performance disadvantage of regurgitative training can be attributed to at least two mechanisms: (1) higher error rates and (2) lower lexical diversity in LLM-generated data as compared to real data. Based on these mechanisms, we propose and evaluate three different strategies to mitigate the performance loss of regurgitative training. First, we devise data-driven metrics to gauge the quality of each LLM-generated data instance, and then carry out an ordered training process where high-quality data are added before low-quality ones. Second, we combine data generated by multiple different LLMs (as an attempt to increase lexical diversity). Third, we train an AI detection classifier to differentiate between LLM- and human-generated data, and include LLM-generated data in the order of resemblance to human-generated data. All three strategies can improve the performance of regurgitative training to some extent but are not always able to fully close the gap from training with real data. Our results highlight the value of real, human-generated data in training LLMs, which cannot be easily substituted by synthetic, LLM-generated data.

Updated: 2024-07-25 16:50:58

标题: Regurgitative Training: 在训练大型语言模型中真实数据的价值

摘要: 如果我们使用至少部分由其他LLM生成的数据来训练新的大型语言模型（LLM），会发生什么？LLM的爆炸性成功意味着在线内容中将有相当多的内容是由LLM而不是人类生成的，这些内容将不可避免地进入下一代LLM的训练数据集。我们评估了这种“反刍式训练”对LLM性能的影响。通过在机器翻译任务中使用由GPT-3.5自身或其他LLM生成的数据对其进行微调，我们发现有强烈证据表明反刍式训练明显削弱了LLM的性能。我们在从头开始训练的transformer模型上观察到反刍式训练的同样性能损失。我们发现暗示性证据表明反刍式训练的性能劣势可以归因于至少两种机制：（1）LLM生成的数据中存在更高的错误率和（2）词汇多样性较低，与真实数据相比。基于这些机制，我们提出并评估了三种不同的策略来减轻反刍式训练的性能损失。首先，我们设计了数据驱动的指标来衡量每个LLM生成的数据实例的质量，然后进行有序训练过程，先添加高质量数据，再添加低质量数据。其次，我们结合多个不同LLM生成的数据（试图增加词汇多样性）。第三，我们训练一个AI检测分类器来区分LLM生成的数据和人类生成的数据，并按照与人类生成的数据相似的顺序包含LLM生成的数据。这三种策略都可以在一定程度上提高反刍式训练的性能，但并不总能完全弥补与真实数据训练的差距。我们的结果突显了在LLM训练中真实的、人类生成的数据的价值，这种价值无法轻易被合成的、LLM生成的数据替代。

更新时间: 2024-07-25 16:50:58

领域: cs.CL,cs.AI,stat.ML

下载: http://arxiv.org/abs/2407.12835v2

AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction

Epitope identification is vital for antibody design yet challenging due to the inherent variability in antibodies. While many deep learning methods have been developed for general protein binding site prediction tasks, whether they work for epitope prediction remains an understudied research question. The challenge is also heightened by the lack of a consistent evaluation pipeline with sufficient dataset size and epitope diversity. We introduce a filtered antibody-antigen complex structure dataset, AsEP (Antibody-specific Epitope Prediction). AsEP is the largest of its kind and provides clustered epitope groups, allowing the community to develop and test novel epitope prediction methods. AsEP comes with an easy-to-use interface in Python and pre-built graph representations of each antibody-antigen complex while also supporting customizable embedding methods. Based on this new dataset, we benchmarked various representative general protein-binding site prediction methods and find that their performances are not satisfactory as expected for epitope prediction. We thus propose a new method, WALLE, that leverages both protein language models and graph neural networks. WALLE demonstrate about 5X performance gain over existing methods. Our empirical findings evidence that epitope prediction benefits from combining sequential embeddings provided by language models and geometrical information from graph representations, providing a guideline for future method design. In addition, we reformulate the task as bipartite link prediction, allowing easy model performance attribution and interpretability. We open-source our data and code at https://github.com/biochunan/AsEP-dataset.

Updated: 2024-07-25 16:43:56

标题: AsEP：基于深度学习方法的抗体特异性表位预测的基准测试

摘要: 抗原表位的识别对于抗体设计至关重要，但由于抗体固有的变异性而具有挑战性。虽然已经开发了许多深度学习方法用于一般蛋白质结合位点预测任务，但它们是否适用于抗原表位预测仍然是一个未被充分研究的问题。这一挑战还加剧了由于缺乏具有足够数据集大小和抗原表位多样性的一致性评估流程。我们介绍了一个经过筛选的抗体-抗原复合物结构数据集AsEP（抗体特异性表位预测）。AsEP是其类别中最大的数据集，并提供了聚类的表位群，使社区能够开发和测试新的表位预测方法。AsEP具有一个易于使用的Python接口，并提供每个抗体-抗原复合物的预构建图表示，同时支持可定制的嵌入方法。基于这个新数据集，我们对各种代表性的一般蛋白质结合位点预测方法进行了基准测试，发现它们的性能并不如预期的那样令人满意用于表位预测。因此，我们提出了一种新方法WALLE，利用蛋白质语言模型和图神经网络。WALLE相对于现有方法展示了大约5倍的性能增益。我们的实证发现证明了表位预测从语言模型提供的顺序嵌入和图表示提供的几何信息相结合的好处，为未来方法设计提供了指导。此外，我们将任务重新构建为二部图链接预测，从而实现了易于模型性能归因和可解释性。我们将我们的数据和代码开源在https://github.com/biochunan/AsEP-dataset。

更新时间: 2024-07-25 16:43:56

领域: cs.LG

下载: http://arxiv.org/abs/2407.18184v1

Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

Inferring gene regulatory networks (GRNs) from single-cell RNA sequencing (scRNA-seq) data is a complex challenge that requires capturing the intricate relationships between genes and their regulatory interactions. In this study, we tackle this challenge by leveraging the single-cell BERT-based pre-trained transformer model (scBERT), trained on extensive unlabeled scRNA-seq data, to augment structured biological knowledge from existing GRNs. We introduce a novel joint graph learning approach that combines the rich contextual representations learned by pre-trained single-cell language models with the structured knowledge encoded in GRNs using graph neural networks (GNNs). By integrating these two modalities, our approach effectively reasons over boththe gene expression level constraints provided by the scRNA-seq data and the structured biological knowledge inherent in GRNs. We evaluate our method on human cell benchmark datasets from the BEELINE study with cell type-specific ground truth networks. The results demonstrate superior performance over current state-of-the-art baselines, offering a deeper understanding of cellular regulatory mechanisms.

Updated: 2024-07-25 16:42:08

标题: 基因调控网络推断来自预训练的单细胞转录组转换器与联合图学习

摘要: 从单细胞RNA测序（scRNA-seq）数据中推断基因调控网络（GRNs）是一个复杂的挑战，需要捕获基因之间及其调控相互作用之间的复杂关系。在这项研究中，我们通过利用基于单细胞BERT预训练转换器模型（scBERT），该模型在大量未标记的scRNA-seq数据上进行了训练，来解决这一挑战，以增强现有GRNs中的结构化生物知识。我们引入了一种新颖的联合图学习方法，通过将预训练的单细胞语言模型学习到的丰富语境表示与使用图神经网络（GNNs）编码的GRNs中的结构化知识相结合。通过整合这两种模式，我们的方法有效地推理出由scRNA-seq数据提供的基因表达水平约束以及GRNs中固有的结构化生物知识。我们在BEELINE研究中使用人类细胞基准数据集进行了方法评估，并具有细胞类型特定的真实网络。结果表明，与当前最先进的基线相比，我们的方法表现出优越的性能，为细胞调控机制提供了更深入的理解。

更新时间: 2024-07-25 16:42:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.18181v1

PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations

In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the generalization capability of the agent to novel songs not available in the dataset. We show that we are able to learn a policy with up to 56\% F1 score on unseen songs.

Updated: 2024-07-25 16:37:07

标题: PianoMime：从互联网演示中学习通用、灵巧的钢琴演奏者

摘要: 在这项工作中，我们介绍了PianoMime，一个使用互联网演示来训练钢琴演奏代理的框架。互联网是一个有前途的大规模演示来源，可以用来训练我们的机器人代理。特别是在钢琴演奏的情况下，Youtube充斥着各种专业钢琴家演奏各种歌曲的视频。在我们的工作中，我们利用这些演示来学习一个通用的钢琴演奏代理，能够演奏任何任意的歌曲。我们的框架分为三个部分：数据准备阶段从Youtube视频中提取信息特征，策略学习阶段从演示中训练特定歌曲的专家策略，以及策略精炼阶段将策略蒸馏成一个通用代理。我们探索不同的策略设计来表示代理，并评估训练数据量对代理对不在数据集中的新歌曲的泛化能力的影响。我们展示了我们能够学习到一个在未见过的歌曲上具有高达56\% F1分数的策略。

更新时间: 2024-07-25 16:37:07

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.18178v1

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for hardware implementation while preserving the accuracy. First, Quasar-ViT trains a supernet using our row-wise flexible mixed-precision quantization scheme, mixed-precision weight entanglement, and supernet layer scaling techniques. Then, it applies an efficient hardware-oriented search algorithm, integrated with hardware latency and resource modeling, to determine a series of optimal subnets from supernet under different inference latency targets. Finally, we propose a series of model-adaptive designs on the FPGA platform to support the architecture search and mitigate the gap between the theoretical computation reduction and the practical inference speedup. Our searched models achieve 101.5, 159.6, and 251.6 frames-per-second (FPS) inference speed on the AMD/Xilinx ZCU102 FPGA with 80.4%, 78.6%, and 74.9% top-1 accuracy, respectively, for the ImageNet dataset, consistently outperforming prior works.

Updated: 2024-07-25 16:35:46

标题: Quasar-ViT：面向硬件的量化感知架构搜索用于视觉变换器

摘要: 视觉transformers（ViTs）已经证明与卷积神经网络（CNNs）相比，在计算机视觉任务中具有更高的准确性。然而，ViT模型通常对资源有限的边缘设备来说计算密集。本文提出了Quasar-ViT，一种面向硬件的量化感知架构搜索框架，用于为ViTs设计高效的模型以便在硬件实现中保持准确性。首先，Quasar-ViT使用我们的逐行灵活的混合精度量化方案、混合精度权重纠缠和超网络层缩放技术训练一个超网络。然后，它应用一个高效的面向硬件的搜索算法，集成了硬件延迟和资源建模，以确定在不同推理延迟目标下从超网络中确定一系列最佳子网络。最后，我们在FPGA平台上提出了一系列模型自适应设计，以支持架构搜索并缓解理论计算减少与实际推理加速之间的差距。我们搜索的模型在AMD/Xilinx ZCU102 FPGA上分别以101.5、159.6和251.6帧每秒（FPS）的推理速度获得80.4%、78.6%和74.9%的top-1准确性，对ImageNet数据集进行持续的优于先前的工作。

更新时间: 2024-07-25 16:35:46

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.18175v1

RIDA: A Robust Attack Framework on Incomplete Graphs

Graph Neural Networks (GNNs) are vital in data science but are increasingly susceptible to adversarial attacks. To help researchers develop more robust GNN models, it's essential to focus on designing strong attack models as foundational benchmarks and guiding references. Among adversarial attacks, gray-box poisoning attacks are noteworthy due to their effectiveness and fewer constraints. These attacks exploit GNNs' need for retraining on updated data, thereby impacting their performance by perturbing these datasets. However, current research overlooks the real-world scenario of incomplete graphs.To address this gap, we introduce the Robust Incomplete Deep Attack Framework (RIDA). It is the first algorithm for robust gray-box poisoning attacks on incomplete graphs. The approach innovatively aggregates distant vertex information and ensures powerful data utilization.Extensive tests against 9 SOTA baselines on 3 real-world datasets demonstrate RIDA's superiority in handling incompleteness and high attack performance on the incomplete graph.

Updated: 2024-07-25 16:33:35

标题: RIDA：针对不完整图的稳健攻击框架

摘要: 图神经网络（GNNs）在数据科学中至关重要，但越来越容易受到对抗性攻击的影响。为了帮助研究人员开发更强大的GNN模型，关注设计强大的攻击模型作为基准和指导参考是至关重要的。在对抗性攻击中，灰盒中毒攻击因其有效性和较少的约束而备受关注。这些攻击利用GNN对更新数据的重新训练的需求，从而通过扰动这些数据集来影响其性能。然而，当前的研究忽视了不完整图的真实场景。为了解决这一差距，我们引入了Robust Incomplete Deep Attack Framework（RIDA）。这是第一个针对不完整图的强大灰盒中毒攻击算法。该方法创新地聚合远程顶点信息，并确保强大的数据利用。对3个真实世界数据集上的9个SOTA基线进行的广泛测试表明，RIDA在处理不完整性和在不完整图上的高攻击性能方面具有优势。

更新时间: 2024-07-25 16:33:35

领域: cs.LG

下载: http://arxiv.org/abs/2407.18170v1

Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models

Recent advancements in massively multilingual machine translation systems have significantly enhanced translation accuracy; however, even the best performing systems still generate hallucinations, severely impacting user trust. Detecting hallucinations in Machine Translation (MT) remains a critical challenge, particularly since existing methods excel with High-Resource Languages (HRLs) but exhibit substantial limitations when applied to Low-Resource Languages (LRLs). This paper evaluates hallucination detection approaches using Large Language Models (LLMs) and semantic similarity within massively multilingual embeddings. Our study spans 16 language directions, covering HRLs, LRLs, with diverse scripts. We find that the choice of model is essential for performance. On average, for HRLs, Llama3-70B outperforms the previous state of the art by as much as 0.16 MCC (Matthews Correlation Coefficient). However, for LRLs we observe that Claude Sonnet outperforms other LLMs on average by 0.03 MCC. The key takeaway from our study is that LLMs can achieve performance comparable or even better than previously proposed models, despite not being explicitly trained for any machine translation task. However, their advantage is less significant for LRLs.

Updated: 2024-07-25 16:31:39

标题: 使用大型语言模型进行低资源和高资源语言的机器翻译幻觉检测

摘要: 最近大规模多语言机器翻译系统的进展显著提高了翻译准确性；然而，即使是表现最好的系统仍然会产生幻觉，严重影响用户的信任。检测机器翻译中的幻觉仍然是一个关键挑战，特别是因为现有的方法在高资源语言（HRLs）方面表现出色，但在低资源语言（LRLs）中应用时存在重大限制。本文评估了使用大型语言模型（LLMs）和大规模多语言嵌入中的语义相似性来检测幻觉的方法。我们的研究涵盖了16种语言方向，涵盖了不同脚本的HRLs和LRLs。我们发现模型的选择对性能至关重要。平均而言，对于HRLs，Llama3-70B的表现比先前的最新技术高出多达0.16 MCC（马修斯相关系数）。然而，对于LRLs，我们观察到Claude Sonnet相较于其他LLMs平均高出0.03 MCC。我们研究的主要结论是，LLMs可以实现与先前提出的模型相当甚至更好的性能，尽管它们并没有明确为任何机器翻译任务进行训练。然而，它们在LRLs中的优势不那么显著。

更新时间: 2024-07-25 16:31:39

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2407.16470v2

Light Curve Classification with DistClassiPy: a new distance-based classifier

The rise of synoptic sky surveys has ushered in an era of big data in time-domain astronomy, making data science and machine learning essential tools for studying celestial objects. While tree-based models (e.g. Random Forests) and deep learning models dominate the field, we explore the use of different distance metrics to aid in the classification of astrophysical objects. We developed DistClassiPy, a new distance metric based classifier. The direct use of distance metrics is unexplored in time-domain astronomy, but distance-based methods can help make classification more interpretable and decrease computational costs. In particular, we applied DistClassiPy to classify light curves of variable stars, comparing the distances between objects of different classes. Using 18 distance metrics on a catalog of 6,000 variable stars across 10 classes, we demonstrate classification and dimensionality reduction. Our classifier meets state-of-the-art performance but has lower computational requirements and improved interpretability. Additionally, DistClassiPy can be tailored to specific objects by identifying the most effective distance metric for that classification. To facilitate broader applications within and beyond astronomy, we have made DistClassiPy open-source and available at https://pypi.org/project/distclassipy/.

Updated: 2024-07-25 16:27:49

标题: 用DistClassiPy进行光变曲线分类：一种基于距离的新分类器

摘要: 天文学中时间域的大数据时代的到来引入了天文学中数据科学和机器学习作为研究天体对象的基本工具。尽管基于树的模型（例如随机森林）和深度学习模型在该领域占主导地位，我们探索了使用不同距离度量来帮助分类天体物体。我们开发了DistClassiPy，一种基于距离度量的新分类器。在时间域天文学中直接使用距离度量是未被探索的，但基于距离的方法可以帮助使分类更易解释并降低计算成本。特别地，我们应用DistClassiPy来对不同类别对象之间的距离进行分类变星的光变曲线。在一个包含6,000颗不同类别变星的目录上使用18种距离度量，我们展示了分类和降维。我们的分类器达到了最先进的性能，但计算要求更低，解释性更好。此外，DistClassiPy可以通过识别最有效的距离度量来定制特定对象的分类。为了促进天文学内外更广泛的应用，我们已经将DistClassiPy开源并提供在https://pypi.org/project/distclassipy/上。

更新时间: 2024-07-25 16:27:49

领域: astro-ph.IM,astro-ph.SR,cs.LG

下载: http://arxiv.org/abs/2403.12120v2

Longhorn: State Space Models are Amortized Online Learners

The most fundamental capability of modern AI methods such as Large Language Models (LLMs) is the ability to predict the next token in a long sequence of tokens, known as ``sequence modeling." Although the Transformers model is the current dominant approach to sequence modeling, its quadratic computational cost with respect to sequence length is a significant drawback. State-space models (SSMs) offer a promising alternative due to their linear decoding efficiency and high parallelizability during training. However, existing SSMs often rely on seemingly ad hoc linear recurrence designs. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. This approach links SSM design to formulating precise online learning objectives, with state transition rules derived from optimizing these objectives. Based on this insight, we introduce a novel deep SSM architecture based on the implicit update for optimizing an online regression objective. Our experimental results show that our models outperform state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks and language modeling tasks.

Updated: 2024-07-25 16:24:59

标题: 长角牛：状态空间模型是分期摊销的在线学习者

摘要: 现代AI方法，如大型语言模型（LLMs）最基础的能力是预测长序列中的下一个标记，即“序列建模”。虽然变压器模型是当前主导的序列建模方法，但其与序列长度成二次关系的计算成本是一个显著的缺点。状态空间模型（SSMs）由于其线性解码效率和训练期间的高并行性，提供了一种有前途的替代方案。然而，现有的SSMs通常依赖看似临时设计的线性递归。在这项工作中，我们通过在线学习的视角探索SSM设计，将SSMs概念化为特定在线学习问题的元模块。这种方法将SSM设计与制定精确的在线学习目标联系起来，状态转换规则源自于优化这些目标。基于这一见解，我们提出了一种基于隐式更新的新型深度SSM架构，用于优化在线回归目标。我们的实验结果表明，我们的模型在标准序列建模基准和语言建模任务中优于最先进的SSMs，包括Mamba模型。

更新时间: 2024-07-25 16:24:59

领域: cs.LG

下载: http://arxiv.org/abs/2407.14207v2

Harmonic LLMs are Trustworthy

We introduce an intuitive method to test the robustness (stability and explainability) of any black-box LLM in real-time via its local deviation from harmoniticity, denoted as $\gamma$. To the best of our knowledge this is the first completely model-agnostic and unsupervised method of measuring the robustness of any given response from an LLM, based upon the model itself conforming to a purely mathematical standard. To show general application and immediacy of results, we measure $\gamma$ in 10 popular LLMs (ChatGPT, Claude-2.1, Claude3.0, GPT-4, GPT-4o, Smaug-72B, Mixtral-8x7B, Llama2-7B, Mistral-7B and MPT-7B) across thousands of queries in three objective domains: WebQA, ProgrammingQA, and TruthfulQA. Across all models and domains tested, human annotation confirms that $\gamma \to 0$ indicates trustworthiness, and conversely searching higher values of $\gamma$ easily exposes examples of hallucination, a fact that enables efficient adversarial prompt generation through stochastic gradient ascent in $\gamma$. The low-$\gamma$ leaders among the models in the respective domains are GPT-4o, GPT-4, and Smaug-72B, providing evidence that mid-size open-source models can win out against large commercial models.

Updated: 2024-07-25 16:16:46

标题: 谐波LLMs是可信赖的

摘要: 我们介绍了一种直观的方法，通过黑匣子LLM在实时中对其与谐波性的局部偏差进行测试，从而测试其稳健性（稳定性和可解释性），记为$\gamma$。据我们所知，这是第一种完全与模型无关且无监督的方法，用于衡量LLM给定响应的稳健性，基于模型本身符合纯数学标准。为了展示结果的普遍应用和即时性，我们在三个客观领域（WebQA，ProgrammingQA和TruthfulQA）中，在10个流行的LLM（ChatGPT，Claude-2.1，Claude3.0，GPT-4，GPT-4o，Smaug-72B，Mixtral-8x7B，Llama2-7B，Mistral-7B和MPT-7B）中测量$\gamma$，跨数千个查询。在测试的所有模型和领域中，人类注释确认$\gamma \to 0$表示可信度，相反搜索更高的$\gamma$值很容易暴露出幻觉的例子，这一事实使通过在$\gamma$中进行随机梯度上升的有效对抗提示生成成为可能。在各自领域中低$\gamma$领先者是GPT-4o，GPT-4和Smaug-72B，这提供了证据，即中等规模的开源模型可以胜过大型商业模型。

更新时间: 2024-07-25 16:16:46

领域: cs.LG,cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2404.19708v2

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale. Moreover, these bounds are obtained through restrictive compression techniques, bounding compressed models that generate low-quality text. Additionally, the tightness of these existing bounds depends on the number of IID documents in a training set rather than the much larger number of non-IID constituent tokens, leaving untapped potential for tighter bounds. In this work, we instead use properties of martingales to derive generalization bounds that benefit from the vast number of tokens in LLM training sets. Since a dataset contains far more tokens than documents, our generalization bounds not only tolerate but actually benefit from far less restrictive compression schemes. With Monarch matrices, Kronecker factorizations, and post-training quantization, we achieve non-vacuous generalization bounds for LLMs as large as LLaMA2-70B. Unlike previous approaches, our work achieves the first non-vacuous bounds for models that are deployed in practice and generate high-quality text.

Updated: 2024-07-25 16:13:58

标题: 使用令牌作为数据点解锁更大语言模型的概括界限

摘要: 具有数十亿参数的大型语言模型（LLMs）在预测序列中的下一个标记方面表现出色。最近的研究计算了LLMs的非空压缩式泛化界限，但是对于数十亿参数规模的大型模型来说，这些界限是空的。此外，这些界限是通过限制性压缩技术获得的，限制了生成低质量文本的压缩模型。此外，这些现有界限的紧密程度取决于训练集中的IID文档数量，而不是非IID组成标记的数量，这留下了更严格界限的潜力。在这项工作中，我们相反地使用马丁格尔的性质推导出泛化边界，从LLM的训练集中的大量标记中获益。由于数据集中包含的标记数量远远超过文档数量，我们的泛化界限不仅能够容忍，而且实际上受益于更少限制性的压缩方案。通过使用Monarch矩阵、Kronecker分解和后训练量化，我们实现了对LLMs（例如LLaMA2-70B）的非空泛化界限。与先前的方法不同，我们的工作实现了第一个在实践中部署并生成高质量文本的非空界限。

更新时间: 2024-07-25 16:13:58

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.18158v1

Enhanced Privacy Bound for Shuffle Model with Personalized Privacy

The shuffle model of Differential Privacy (DP) is an enhanced privacy protocol which introduces an intermediate trusted server between local users and a central data curator. It significantly amplifies the central DP guarantee by anonymizing and shuffling the local randomized data. Yet, deriving a tight privacy bound is challenging due to its complicated randomization protocol. While most existing work are focused on unified local privacy settings, this work focuses on deriving the central privacy bound for a more practical setting where personalized local privacy is required by each user. To bound the privacy after shuffling, we first need to capture the probability of each user generating clones of the neighboring data points. Second, we need to quantify the indistinguishability between two distributions of the number of clones on neighboring datasets. Existing works either inaccurately capture the probability, or underestimate the indistinguishability between neighboring datasets. Motivated by this, we develop a more precise analysis, which yields a general and tighter bound for arbitrary DP mechanisms. Firstly, we derive the clone-generating probability by hypothesis testing %from a randomizer-specific perspective, which leads to a more accurate characterization of the probability. Secondly, we analyze the indistinguishability in the context of $f$-DP, where the convexity of the distributions is leveraged to achieve a tighter privacy bound. Theoretical and numerical results demonstrate that our bound remarkably outperforms the existing results in the literature.

Updated: 2024-07-25 16:11:56

标题: Shuffle模型中个性化隐私的增强隐私界限

摘要: 差分隐私（DP）的洗牌模型是一种增强的隐私协议，它在本地用户和中央数据管理员之间引入了一个中间可信服务器。通过对本地随机化数据进行匿名化和洗牌，它显著增强了中央DP保证。然而，由于其复杂的随机化协议，推导出严格的隐私界限是具有挑战性的。虽然大多数现有工作都集中在统一的本地隐私设置上，但本研究专注于为更实际的设置推导中央隐私界限，其中每个用户都需要个性化的本地隐私。为了在洗牌后限定隐私，我们首先需要捕获每个用户生成相邻数据点克隆的概率。其次，我们需要量化相邻数据集上克隆数量分布之间的不可区分性。现有的工作要么不准确地捕获概率，要么低估了相邻数据集之间的不可区分性。受此启发，我们开发了更精确的分析，为任意DP机制提供了一般和更紧密的界限。首先，我们通过假设检验从随机化器特定的视角来推导克隆生成概率，从而更准确地表征概率。其次，我们分析了在$f$-DP环境中的不可区分性，利用分布的凸性来实现更紧密的隐私界限。理论和数值结果表明，我们的界限在文献中明显优于现有结果。

更新时间: 2024-07-25 16:11:56

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2407.18157v1

No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

Reinforcement learning (RL) is inherently rife with non-stationarity since the states and rewards the agent observes during training depend on its changing policy. Therefore, networks in deep RL must be capable of adapting to new observations and fitting new targets. However, previous works have observed that networks in off-policy deep value-based methods exhibit a decrease in representation rank, often correlated with an inability to continue learning or a collapse in performance. Although this phenomenon has generally been attributed to neural network learning under non-stationarity, it has been overlooked in on-policy policy optimization methods which are often thought capable of training indefinitely. In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments, revealing that PPO agents are also affected by feature rank deterioration and loss of plasticity. We show that this is aggravated with stronger non-stationarity, ultimately driving the actor's performance to collapse, regardless of the performance of the critic. We ask why the trust region, specific to methods like PPO, cannot alleviate or prevent the collapse. We find that there is a connection between representation collapse and the degradation of the trust region, one exacerbating the other, and present Proximal Feature Optimization (PFO), a novel auxiliary loss that, along with other interventions, shows that regularizing the representation dynamics improves the performance of PPO agents.

Updated: 2024-07-25 16:04:49

标题: 没有代表，就没有信任：连接PPO中的代表、崩溃和信任问题

摘要: 强化学习（RL）本质上充满了非稳态性，因为训练过程中代理观察到的状态和奖励取决于其不断变化的策略。因此，深度RL中的网络必须能够适应新观察并拟合新目标。然而，先前的研究发现，在基于离线策略的深度值方法中，网络的表示等级会下降，通常与无法继续学习或性能下降相关。虽然这种现象通常被归因于神经网络在非稳态环境下学习，但它在通常被认为可以无限训练的基于在线策略优化方法中被忽略。在这项工作中，我们在Atari和MuJoCo环境中经验性地研究了Proximal Policy Optimization（PPO）中的表示动态，揭示了PPO代理也受到特征等级恶化和可塑性丧失的影响。我们发现，这种情况会随着更强的非稳态性而加剧，最终导致演员的表现崩溃，而不论评论者的表现如何。我们探讨了为什么像PPO这样的方法特有的信任区域无法缓解或防止崩溃。我们发现表示崩溃与信任区域的退化之间存在联系，彼此相互加剧，并提出了Proximal Feature Optimization（PFO），这是一种新颖的辅助损失，在其他干预措施的帮助下，表明正则化表示动态可以改善PPO代理的表现。

更新时间: 2024-07-25 16:04:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.00662v2

Evaluating the design space of diffusion-based generative models

Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a non-asymptotic convergence analysis of denoising score matching under gradient descent. In addition, a refined sampling error analysis for variance exploding models is also provided. The combination of these two results yields a full error analysis, which elucidates (again, but this time theoretically) how to design the training and sampling processes for effective generation. For instance, our theory implies a preference toward noise distribution and loss weighting in training that qualitatively agree with the ones used in [Karras et al. 2022]. It also provides perspectives on the choices of time and variance schedules in sampling: when the score is well trained, the design in [Song et al. 2020] is more preferable, but when it is less trained, the design in [Karras et al. 2022] becomes more preferable.

Updated: 2024-07-25 16:01:04

标题: 评估基于扩散的生成模型的设计空间

摘要: 现有的关于扩散模型准确性的理论研究大多数假设得分函数已经被近似到一定的准确度，并且利用这个先验界限来控制生成的误差。本文提供了第一种对整个生成过程（即训练和采样）的定量理解。更具体地说，它对梯度下降下的去噪得分匹配进行了非渐近收敛分析。此外，还提供了一种对方差爆炸模型的精细采样误差分析。这两个结果的结合产生了一个完整的误差分析，阐明了如何在理论上设计有效的生成过程（再次，但这次是从理论上）。例如，我们的理论意味着在训练中对噪声分布和损失加权的偏好与[Karras等人2022年]中使用的偏好定性上一致。它还提供了关于采样中时间和方差计划选择的观点：当得分训练良好时，[Song等人2020年]中的设计更可取，但当训练不足时，[Karras等人2022年]中的设计更可取。

更新时间: 2024-07-25 16:01:04

领域: cs.LG,math.DS,math.OC,math.PR,stat.ML

下载: http://arxiv.org/abs/2406.12839v2

StraightLine: An End-to-End Resource-Aware Scheduler for Machine Learning Application Requests

The life cycle of machine learning (ML) applications consists of two stages: model development and model deployment. However, traditional ML systems (e.g., training-specific or inference-specific systems) focus on one particular stage or phase of the life cycle of ML applications. These systems often aim at optimizing model training or accelerating model inference, and they frequently assume homogeneous infrastructure, which may not always reflect real-world scenarios that include cloud data centers, local servers, containers, and serverless platforms. We present StraightLine, an end-to-end resource-aware scheduler that schedules the optimal resources (e.g., container, virtual machine, or serverless) for different ML application requests in a hybrid infrastructure. The key innovation is an empirical dynamic placing algorithm that intelligently places requests based on their unique characteristics (e.g., request frequency, input data size, and data distribution). In contrast to existing ML systems, StraightLine offers end-to-end resource-aware placement, thereby it can significantly reduce response time and failure rate for model deployment when facing different computing resources in the hybrid infrastructure.

Updated: 2024-07-25 15:58:56

标题: 直线：一种面向机器学习应用请求的资源感知调度器

摘要: 机器学习（ML）应用程序的生命周期由两个阶段组成：模型开发和模型部署。然而，传统的ML系统（例如，专门针对训练或推理的系统）专注于ML应用程序生命周期的某个特定阶段或阶段。这些系统通常旨在优化模型训练或加速模型推理，并经常假定基础设施是均匀的，这并不总是反映现实世界情况，其中包括云数据中心、本地服务器、容器和无服务器平台。我们提出了StraightLine，这是一个端到端资源感知调度程序，它在混合基础设施中为不同的ML应用程序请求调度最佳资源（例如容器、虚拟机或无服务器）。关键创新是一种经验动态放置算法，根据请求的独特特征（例如请求频率、输入数据大小和数据分布）智能地放置请求。与现有的ML系统相比，StraightLine提供端到端资源感知放置，因此在面对混合基础设施中的不同计算资源时，它可以显著减少模型部署的响应时间和失败率。

更新时间: 2024-07-25 15:58:56

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2407.18148v1

Understanding the Security Benefits and Overheads of Emerging Industry Solutions to DRAM Read Disturbance

We present the first rigorous security, performance, energy, and cost analyses of the state-of-the-art on-DRAM-die read disturbance mitigation method, Per Row Activation Counting (PRAC), described in JEDEC DDR5 specification's April 2024 update. Unlike prior state-of-the-art that advises the memory controller to periodically issue refresh management (RFM) commands, which provides the DRAM chip with time to perform refreshes, PRAC introduces a new back-off signal. PRAC's back-off signal propagates from the DRAM chip to the memory controller and forces the memory controller to 1) stop serving requests and 2) issue RFM commands. As a result, RFM commands are issued when needed as opposed to periodically, reducing RFM's overheads. We analyze PRAC in four steps. First, we define an adversarial access pattern that represents the worst-case for PRAC's security. Second, we investigate PRAC's configurations and security implications. Our analyses show that PRAC can be configured for secure operation as long as no bitflip occurs before accessing a memory location 10 times. Third, we evaluate the performance impact of PRAC and compare it against prior works using Ramulator 2.0. Our analysis shows that while PRAC incurs less than 13% performance overhead for today's DRAM chips, its performance overheads can reach up to 94% for future DRAM chips that are more vulnerable to read disturbance bitflips. Fourth, we define an availability adversarial access pattern that exacerbates PRAC's performance overhead to perform a memory performance attack, demonstrating that such an adversarial pattern can hog up to 94% of DRAM throughput and degrade system throughput by up to 95%. We discuss PRAC's implications on future systems and foreshadow future research directions. To aid future research, we open-source our implementations and scripts at https://github.com/CMU-SAFARI/ramulator2.

Updated: 2024-07-25 15:55:15

标题: 理解新兴行业解决DRAM读取干扰的安全优势和开销

摘要: 我们提出了首个严格的安全性、性能、能源和成本分析，针对最新的DRAM芯片上读取干扰缓解方法——按行激活计数（PRAC），该方法在JEDEC DDR5规范的2024年4月更新中描述。与之前的最新技术建议内存控制器定期发出刷新管理（RFM）命令以提供DRAM芯片执行刷新的时间不同，PRAC引入了一种新的后退信号。PRAC的后退信号从DRAM芯片传播到内存控制器，强制内存控制器停止服务请求并发出RFM命令。因此，RFM命令是根据需要而不是定期发出的，减少了RFM的开销。我们对PRAC进行了四步分析。首先，我们定义了一个对PRAC安全性最为不利的对抗性访问模式。其次，我们调查了PRAC的配置和安全性影响。我们的分析显示，只要在访问内存位置10次之前没有发生位翻转，PRAC就可以配置为安全操作。第三，我们评估了PRAC的性能影响，并使用Ramulator 2.0将其与之前的工作进行了比较。我们的分析显示，尽管PRAC对今天的DRAM芯片的性能开销不到13％，但对于未来更容易受到读取干扰位翻转的DRAM芯片，其性能开销可能高达94％。第四，我们定义了一个可用性对抗性访问模式，加剧了PRAC的性能开销，以执行内存性能攻击，表明这种对抗性模式可以占用高达94％的DRAM吞吐量，并将系统吞吐量降低高达95％。我们讨论了PRAC对未来系统的影响，并预示未来的研究方向。为了支持未来的研究，我们在https://github.com/CMU-SAFARI/ramulator2上开源我们的实现和脚本。

更新时间: 2024-07-25 15:55:15

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2406.19094v2

Implementing and Evaluating Security in O-RAN: Interfaces, Intelligence, and Platforms

The Open Radio Access Network (RAN) is a networking paradigm that builds on top of cloud-based, multi-vendor, open and intelligent architectures to shape the next generation of cellular networks for 5G and beyond. While this new paradigm comes with many advantages in terms of observatibility and reconfigurability of the network, it inevitably expands the threat surface of cellular systems and can potentially expose its components to several cyber attacks, thus making securing O-RAN networks a necessity. In this paper, we explore the security aspects of O-RAN systems by focusing on the specifications and architectures proposed by the O-RAN Alliance. We address the problem of securing O-RAN systems with a holistic perspective, including considerations on the open interfaces used to interconnect the different O-RAN components, on the overall platform, and on the intelligence used to monitor and control the network. For each focus area we identify threats, discuss relevant solutions to address these issues, and demonstrate experimentally how such solutions can effectively defend O-RAN systems against selected cyber attacks. This article is the first work in approaching the security aspect of O-RAN holistically and with experimental evidence obtained on a state-of-the-art programmable O-RAN platform, thus providing unique guideline for researchers in the field.

Updated: 2024-07-25 15:52:43

标题: 在O-RAN中实施和评估安全性：接口、智能和平台

摘要: 开放式无线接入网络（RAN）是一种建立在基于云、多供应商、开放和智能架构之上的网络范式，用于塑造5G及更高一代蜂窝网络。虽然这种新范式在网络的可观察性和可重构性方面具有许多优势，但它不可避免地扩大了蜂窝系统的威胁面，并有可能使其组件暴露于多种网络攻击之下，因此，保护O-RAN网络成为必要。本文通过关注O-RAN联盟提出的规范和架构，探讨了O-RAN系统的安全方面。我们以全面的视角解决了保护O-RAN系统的问题，包括对用于连接不同O-RAN组件的开放接口、整体平台和用于监控和控制网络的智能的考虑。针对每个焦点领域，我们确定了威胁，讨论了相关解决方案以解决这些问题，并通过实验演示了这些解决方案如何有效地防御选定的网络攻击，从而为研究人员提供了领域内的独特指导。本文是首次以实验证据综合地探讨O-RAN安全方面的工作，在最先进的可编程O-RAN平台上获得实证，为领域研究人员提供了独特的指南。

更新时间: 2024-07-25 15:52:43

领域: cs.CR,cs.NI,cs.SY,eess.SP,eess.SY

下载: http://arxiv.org/abs/2304.11125v3

Taxonomy-Aware Continual Semantic Segmentation in Hyperbolic Spaces for Open-World Perception

Semantic segmentation models are typically trained on a fixed set of classes, limiting their applicability in open-world scenarios. Class-incremental semantic segmentation aims to update models with emerging new classes while preventing catastrophic forgetting of previously learned ones. However, existing methods impose strict rigidity on old classes, reducing their effectiveness in learning new incremental classes. In this work, we propose Taxonomy-Oriented Poincar\'e-regularized Incremental-Class Segmentation (TOPICS) that learns feature embeddings in hyperbolic space following explicit taxonomy-tree structures. This supervision provides plasticity for old classes, updating ancestors based on new classes while integrating new classes at fitting positions. Additionally, we maintain implicit class relational constraints on the geometric basis of the Poincar\'e ball. This ensures that the latent space can continuously adapt to new constraints while maintaining a robust structure to combat catastrophic forgetting. We also establish eight realistic incremental learning protocols for autonomous driving scenarios, where novel classes can originate from known classes or the background. Extensive evaluations of TOPICS on the Cityscapes and Mapillary Vistas 2.0 benchmarks demonstrate that it achieves state-of-the-art performance. We make the code and trained models publicly available at http://topics.cs.uni-freiburg.de.

Updated: 2024-07-25 15:49:26

标题: 在开放世界感知中基于双曲空间的分类感知持续语义分割

摘要: 语义分割模型通常在固定的类别集上进行训练，限制了它们在开放世界场景中的适用性。类增量语义分割旨在在更新模型时引入新出现的类别，同时防止先前学习的类别遗忘。然而，现有方法对旧类别施加严格的刚性，降低了它们在学习新的增量类别方面的效果。在这项工作中，我们提出了基于分类树结构的Poincaré正则化增量类别分割（TOPICS），它在双曲空间中学习特征嵌入，遵循显式的分类树结构。这种监督为旧类别提供了可塑性，基于新类别更新祖先，同时在合适的位置整合新类别。此外，我们在Poincaré球的几何基础上保持隐式类别关系约束。这确保潜在空间可以持续适应新的约束，同时保持强大的结构来抵御灾难性遗忘。我们还为自动驾驶场景建立了八种现实增量学习协议，其中新类别可以起源于已知类别或背景。在Cityscapes和Mapillary Vistas 2.0基准上对TOPICS进行了大量评估，结果表明它达到了最先进的性能。我们将代码和训练模型公开发布在http://topics.cs.uni-freiburg.de。

更新时间: 2024-07-25 15:49:26

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.18145v1

Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation

Entropy Regularisation is a widely adopted technique that enhances policy optimisation performance and stability. A notable form of entropy regularisation is augmenting the objective with an entropy term, thereby simultaneously optimising the expected return and the entropy. This framework, known as maximum entropy reinforcement learning (MaxEnt RL), has shown theoretical and empirical successes. However, its practical application in straightforward on-policy actor-critic settings remains surprisingly underexplored. We hypothesise that this is due to the difficulty of managing the entropy reward in practice. This paper proposes a simple method of separating the entropy objective from the MaxEnt RL objective, which facilitates the implementation of MaxEnt RL in on-policy settings. Our empirical evaluations demonstrate that extending Proximal Policy Optimisation (PPO) and Trust Region Policy Optimisation (TRPO) within the MaxEnt framework improves policy optimisation performance in both MuJoCo and Procgen tasks. Additionally, our results highlight MaxEnt RL's capacity to enhance generalisation.

Updated: 2024-07-25 15:48:24

标题: 最大熵基于策略的演员-评论家算法通过熵优势估计

摘要: 熵正则化是一种广泛采用的技术，可以增强策略优化的性能和稳定性。一种显著的熵正则化形式是通过增加熵项来增强目标，从而同时优化期望回报和熵。这种框架被称为最大熵强化学习(MaxEnt RL)，已经在理论和实证上取得成功。然而，在直接的在线策略演员-评论家设置中，其实际应用仍然相当少。我们假设这是由于在实践中管理熵奖励的困难。本文提出了一种简单的方法，将熵目标从MaxEnt RL目标中分离出来，从而促进在在线设置中实施MaxEnt RL。我们的实证评估证明，在MuJoCo和Procgen任务中，将Proximal Policy Optimisation (PPO)和Trust Region Policy Optimisation (TRPO)扩展到MaxEnt框架中可以改善策略优化性能。此外，我们的结果突显了MaxEnt RL提高泛化能力的潜力。

更新时间: 2024-07-25 15:48:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.18143v1

IRIS: Wireless Ring for Vision-based Smart Home Interaction

Integrating cameras into wireless smart rings has been challenging due to size and power constraints. We introduce IRIS, the first wireless vision-enabled smart ring system for smart home interactions. Equipped with a camera, Bluetooth radio, inertial measurement unit (IMU), and an onboard battery, IRIS meets the small size, weight, and power (SWaP) requirements for ring devices. IRIS is context-aware, adapting its gesture set to the detected device, and can last for 16-24 hours on a single charge. IRIS leverages the scene semantics to achieve instance-level device recognition. In a study involving 23 participants, IRIS consistently outpaced voice commands, with a higher proportion of participants expressing a preference for IRIS over voice commands regarding toggling a device's state, granular control, and social acceptability. Our work pushes the boundary of what is possible with ring form-factor devices, addressing system challenges and opening up novel interaction capabilities.

Updated: 2024-07-25 15:45:17

标题: IRIS：基于视觉的智能家居交互的无线环路

摘要: 将摄像头集成到无线智能戒指中一直是个挑战，主要是由于尺寸和功耗的限制。我们引入了IRIS，这是第一个用于智能家居互动的无线视觉智能戒指系统。IRIS配备了摄像头、蓝牙无线电、惯性测量单元（IMU）和内置电池，符合戒指设备的小尺寸、轻量和功耗（SWaP）要求。IRIS是上下文感知的，能根据检测到的设备自适应手势集，并可以在单次充电下持续使用16-24小时。IRIS利用场景语义实现实例级设备识别。在一个涉及23名参与者的研究中，IRIS始终优于语音命令，有更高比例的参与者表示喜欢IRIS而非语音命令来切换设备状态、进行精细控制和社交接受度。我们的工作拓展了戒指形态设备的可能性边界，解决了系统挑战并开启了新的互动能力。

更新时间: 2024-07-25 15:45:17

领域: cs.HC,cs.ET,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.18141v1

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

Learning good representations involves capturing the diverse ways in which data samples relate. Contrastive loss - an objective matching related samples - underlies methods from self-supervised to multimodal learning. Contrastive losses, however, can be viewed more broadly as modifying a similarity graph to indicate how samples should relate in the embedding space. This view reveals a shortcoming in contrastive learning: the similarity graph is binary, as only one sample is the related positive sample. Crucially, similarities \textit{across} samples are ignored. Based on this observation, we revise the standard contrastive loss to explicitly encode how a sample relates to others. We experiment with this new objective, called $\mathbb{X}$-Sample Contrastive, to train vision models based on similarities in class or text caption descriptions. Our study spans three scales: ImageNet-1k with 1 million, CC3M with 3 million, and CC12M with 12 million samples. The representations learned via our objective outperform both contrastive self-supervised and vision-language models trained on the same data across a range of tasks. When training on CC12M, we outperform CLIP by $0.6\%$ on both ImageNet and ImageNet Real. Our objective appears to work particularly well in lower-data regimes, with gains over CLIP of $16.8\%$ on ImageNet and $18.1\%$ on ImageNet Real when training with CC3M. Finally, our objective seems to encourage the model to learn representations that separate objects from their attributes and backgrounds, with gains of $3.3$-$5.6$\% over CLIP on ImageNet9. We hope the proposed solution takes a small step towards developing richer learning objectives for understanding sample relations in foundation models.

Updated: 2024-07-25 15:38:16

标题: $\mathbb{X}$-样本对比损失：利用样本相似性图改进对比学习

摘要: 学习良好的表示涉及捕捉数据样本关系的多样化方式。对比损失-一个匹配相关样本的客观函数-支撑了从自监督到多模态学习的方法。然而，对比损失可以更广泛地被视为修改相似性图，以指示样本在嵌入空间中应如何关联。这种观点揭示了对比学习的一个缺点：相似性图是二元的，因为只有一个样本是相关的正样本。至关重要的是，跨样本的相似性被忽略了。基于这一观察，我们对标准对比损失进行了修订，明确编码样本与其他样本的关系。我们尝试了这个新的目标，称为$\mathbb{X}$-样本对比，以训练基于类别或文本标题描述的视觉模型。我们的研究涵盖了三个规模：ImageNet-1k具有100万个样本，CC3M具有300万个样本和CC12M具有1200万个样本。通过我们的目标学习到的表示在一系列任务中表现优于对比自监督和基于视觉语言的模型在相同数据上训练的结果。在CC12M上训练时，我们在ImageNet和ImageNet Real上都比CLIP提高了0.6%。我们的目标似乎在低数据情况下特别有效，在使用CC3M训练时，与CLIP相比在ImageNet上提高了16.8%，在ImageNet Real上提高了18.1%。最后，我们的目标似乎鼓励模型学习将对象与其属性和背景分离的表示，与CLIP在ImageNet9上相比提高了3.3%至5.6%。我们希望提出的解决方案为发展更丰富的学习目标，以理解基础模型中的样本关系迈出一小步。

更新时间: 2024-07-25 15:38:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.18134v1

Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic

Recent advancements have significantly enhanced the capabilities of Multimodal Large Language Models (MLLMs) in generating and understanding image-to-text content. Despite these successes, progress is predominantly limited to English due to the scarcity of high quality multimodal resources in other languages. This limitation impedes the development of competitive models in languages such as Arabic. To alleviate this situation, we introduce an efficient Arabic multimodal assistant, dubbed Dallah, that utilizes an advanced language model based on LLaMA-2 to facilitate multimodal interactions. Dallah demonstrates state-of-the-art performance in Arabic MLLMs. Through fine-tuning six Arabic dialects, Dallah showcases its capability to handle complex dialectal interactions incorporating both textual and visual elements. The model excels in two benchmark tests: one evaluating its performance on Modern Standard Arabic (MSA) and another specifically designed to assess dialectal responses. Beyond its robust performance in multimodal interaction tasks, Dallah has the potential to pave the way for further development of dialect-aware Arabic MLLMs.

Updated: 2024-07-25 15:36:48

标题: Dallah：用于阿拉伯语的方言感知多模态大型语言模型

摘要: 最近的进展显著增强了多模大型语言模型（MLLMs）在生成和理解图像到文本内容方面的能力。尽管取得了成功，但由于其他语言中高质量多模资源的稀缺，进展主要限于英语。这种限制阻碍了在阿拉伯语等语言中开发具有竞争力的模型。为了缓解这种情况，我们引入了一种高效的阿拉伯语多模助手，名为Dallah，它利用基于LLaMA-2的先进语言模型促进多模交互。Dallah在阿拉伯语MLLMs中展示了最新的性能。通过对六种阿拉伯方言进行微调，Dallah展示了其处理复杂方言交互（包括文本和视觉元素）的能力。该模型在两个基准测试中表现出色：一个评估其在现代标准阿拉伯语（MSA）上的表现，另一个专门设计用于评估方言回应。除了在多模交互任务中表现出色外，Dallah有潜力为方言感知的阿拉伯语MLLMs的进一步发展铺平道路。

更新时间: 2024-07-25 15:36:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.18129v1

Looking at Model Debiasing through the Lens of Anomaly Detection

It is widely recognized that deep neural networks are sensitive to bias in the data. This means that during training these models are likely to learn spurious correlations between data and labels, resulting in limited generalization abilities and low performance. In this context, model debiasing approaches can be devised aiming at reducing the model's dependency on such unwanted correlations, either leveraging the knowledge of bias information or not. In this work, we focus on the latter and more realistic scenario, showing the importance of accurately predicting the bias-conflicting and bias-aligned samples to obtain compelling performance in bias mitigation. On this ground, we propose to conceive the problem of model bias from an out-of-distribution perspective, introducing a new bias identification method based on anomaly detection. We claim that when data is mostly biased, bias-conflicting samples can be regarded as outliers with respect to the bias-aligned distribution in the feature space of a biased model, thus allowing for precisely detecting them with an anomaly detection method. Coupling the proposed bias identification approach with bias-conflicting data upsampling and augmentation in a two-step strategy, we reach state-of-the-art performance on synthetic and real benchmark datasets. Ultimately, our proposed approach shows that the data bias issue does not necessarily require complex debiasing methods, given that an accurate bias identification procedure is defined.

Updated: 2024-07-25 15:33:00

标题: 透过异常检测的视角看模型去偏见化

摘要: 广泛认为深度神经网络对数据中的偏见很敏感。这意味着在训练过程中，这些模型很可能会学习到数据和标签之间的虚假相关性，导致其泛化能力有限，性能较低。在这种情况下，可以设计模型去偏置方法，旨在减少模型对这种不良相关性的依赖，无论是利用偏见信息的知识还是不利用。在这项工作中，我们专注于后者更为现实的情况，展示了准确预测偏见冲突和偏见一致样本对于在减少偏见方面获得引人注目的性能的重要性。在此基础上，我们提出从超出分布的角度来理解模型偏见问题，引入一种基于异常检测的新偏见识别方法。我们声称，当数据大多存在偏见时，偏见冲突样本可以被视为相对于偏见一致分布在偏见模型的特征空间中的异常值，从而可以通过异常检测方法精确地检测到它们。将提出的偏见识别方法与偏见冲突数据上采样和增强结合在一个两步策略中，我们在合成和真实基准数据集上取得了最先进的性能。最终，我们提出的方法表明，数据偏见问题并不一定需要复杂的去偏置方法，只要确定准确的偏见识别程序即可。

更新时间: 2024-07-25 15:33:00

领域: cs.LG,cs.CV,I.4; I.5

下载: http://arxiv.org/abs/2407.17449v2

Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

In the last few years, deep neural networks have been extensively applied in the medical domain for different tasks, ranging from image classification and segmentation to landmark detection. However, the application of these technologies in the medical domain is often hindered by data scarcity, both in terms of available annotations and images. This study introduces a new self-supervised pre-training protocol based on diffusion models for landmark detection in x-ray images. Our results show that the proposed self-supervised framework can provide accurate landmark detection with a minimal number of available annotated training images (up to 50), outperforming ImageNet supervised pre-training and state-of-the-art self-supervised pre-trainings for three popular x-ray benchmark datasets. To our knowledge, this is the first exploration of diffusion models for self-supervised learning in landmark detection, which may offer a valuable pre-training approach in few-shot regimes, for mitigating data scarcity.

Updated: 2024-07-25 15:32:59

标题: 使用扩散模型进行自监督预训练的X射线图像少样本地标检测

摘要: 在过去几年中，深度神经网络已广泛应用于医学领域的不同任务，从图像分类和分割到地标检测。然而，这些技术在医学领域的应用通常受到数据稀缺的限制，无论是在可用注释还是图像方面。本研究介绍了一种基于扩散模型的新的自监督预训练协议，用于x光图像中的地标检测。我们的结果显示，所提出的自监督框架可以在最少数量的可用注释训练图像（高达50张）中提供准确的地标检测，优于ImageNet监督预训练和三个流行的x光基准数据集的最新自监督预训练。据我们所知，这是对扩散模型在地标检测中自监督学习的首次探索，可能为缓解数据稀缺情况下的少样本预训练提供有价值的方法。

更新时间: 2024-07-25 15:32:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.18125v1

Generative Learning of Continuous Data by Tensor Networks

Beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. While possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting their utility in real-world modeling problems. We overcome this by introducing a new family of tensor network generative models for continuous data, which are capable of learning from distributions containing continuous random variables. We develop our method in the setting of matrix product states, first deriving a universal expressivity theorem proving the ability of this model family to approximate any reasonably smooth probability density function with arbitrary precision. We then benchmark the performance of this model on several synthetic and real-world datasets, finding that the model learns and generalizes well on distributions of continuous and discrete variables. We develop methods for modeling different data domains, and introduce a trainable compression layer which is found to increase model performance given limited memory or computational resources. Overall, our methods give important theoretical and empirical evidence of the efficacy of quantum-inspired methods for the rapidly growing field of generative learning.

Updated: 2024-07-25 15:25:27

标题: 张量网络生成连续数据的生成学习

摘要: 除了在建模多体量子系统方面的起源外，张量网络已经成为一类有前途的模型，用于解决机器学习问题，特别是在无监督生成学习中。虽然具有许多源自其受量子启发的特性，张量网络生成模型先前主要局限于二进制或分类数据，限制了它们在现实建模问题中的效用。我们通过引入一种新的张量网络生成模型家族，用于连续数据，可以从包含连续随机变量的分布中学习。我们在矩阵乘积状态设置中开发了我们的方法，首先推导出一个通用表达力定理，证明了该模型家族能够以任意精度逼近任何相当平滑的概率密度函数。然后我们在几个合成和真实数据集上对该模型的性能进行基准测试，发现该模型在连续和离散变量的分布上学习和泛化效果良好。我们开发了用于建模不同数据领域的方法，并引入了一个可训练的压缩层，发现在内存或计算资源有限的情况下能提高模型性能。总的来说，我们的方法提供了关于受量子启发方法在快速增长的生成学习领域中的功效的重要理论和实证证据。

更新时间: 2024-07-25 15:25:27

领域: cs.LG,cond-mat.stat-mech,quant-ph,stat.ML

下载: http://arxiv.org/abs/2310.20498v2

Unsupervised Training of Neural Cellular Automata on Edge Devices

The disparity in access to machine learning tools for medical imaging across different regions significantly limits the potential for universal healthcare innovation, particularly in remote areas. Our research addresses this issue by implementing Neural Cellular Automata (NCA) training directly on smartphones for accessible X-ray lung segmentation. We confirm the practicality and feasibility of deploying and training these advanced models on five Android devices, improving medical diagnostics accessibility and bridging the tech divide to extend machine learning benefits in medical imaging to low- and middle-income countries (LMICs). We further enhance this approach with an unsupervised adaptation method using the novel Variance-Weighted Segmentation Loss (VWSL), which efficiently learns from unlabeled data by minimizing the variance from multiple NCA predictions. This strategy notably improves model adaptability and performance across diverse medical imaging contexts without the need for extensive computational resources or labeled datasets, effectively lowering the participation threshold. Our methodology, tested on three multisite X-ray datasets -- Padchest, ChestX-ray8, and MIMIC-III -- demonstrates improvements in segmentation Dice accuracy by 0.7 to 2.8%, compared to the classic Med-NCA. Additionally, in extreme cases where no digital copy is available and images must be captured by a phone from an X-ray lightbox or monitor, VWSL enhances Dice accuracy by 5-20%, demonstrating the method's robustness even with suboptimal image sources.

Updated: 2024-07-25 15:21:54

标题: 边缘设备上的神经元元胞自动机的无监督训练

摘要: 不同地区对医学影像机器学习工具的获取差异显著限制了普遍医疗创新的潜力，特别是在偏远地区。我们的研究通过在智能手机上直接实施神经细胞自动机（NCA）训练，为可访问的X射线肺部分割解决了这一问题。我们确认了在五台安卓设备上部署和训练这些先进模型的实用性和可行性，改善了医疗诊断的可访问性，并缩小了技术差距，将机器学习在医学影像领域的好处扩展到低收入和中等收入国家（LMICs）。我们进一步通过使用新颖的方差加权分割损失（VWSL）的无监督适应方法来增强这种方法，该方法通过最小化多个NCA预测的方差有效地从未标记的数据中学习。这一策略显著提高了模型在不同医学影像背景下的适应性和性能，而无需大量计算资源或有标签的数据集，有效降低了参与门槛。我们的方法在三个多站点X射线数据集--Padchest、ChestX-ray8和MIMIC-III--上进行了测试，与经典Med-NCA相比，分割Dice准确率提高了0.7至2.8%。此外，在极端情况下，没有数字副本可用，必须通过手机从X射线灯箱或监视器捕获图像时，VWSL通过5-20%提高Dice准确率，展示了该方法即使在次优的图像来源下也具有鲁棒性。

更新时间: 2024-07-25 15:21:54

领域: cs.LG

下载: http://arxiv.org/abs/2407.18114v1

MapTune: Advancing ASIC Technology Mapping via Reinforcement Learning Guided Library Tuning

Technology mapping involves mapping logical circuits to a library of cells. Traditionally, the full technology library is used, leading to a large search space and potential overhead. Motivated by randomly sampled technology mapping case studies, we propose MapTune framework that addresses this challenge by utilizing reinforcement learning to make design-specific choices during cell selection. By learning from the environment, MapTune refines the cell selection process, resulting in a reduced search space and potentially improved mapping quality. The effectiveness of MapTune is evaluated on a wide range of benchmarks, different technology libraries and technology mappers. The experimental results demonstrate that MapTune achieves higher mapping accuracy and reducing delay/area across diverse circuit designs, technology libraries and mappers. The paper also discusses the Pareto-Optimal exploration and confirms the perpetual delay-area trade-off. Conducted on benchmark suites ISCAS 85/89, ITC/ISCAS 99, VTR8.0 and EPFL benchmarks, the post-technology mapping and post-sizing quality-of-results (QoR) have been significantly improved, with average Area-Delay Product (ADP) improvement of 22.54\% among all different exploration settings in MapTune. The improvements are consistently remained for four different technologies (7nm, 45nm, 130nm, and 180 nm) and two different mappers.

Updated: 2024-07-25 15:18:47

标题: MapTune：通过强化学习引导的库调优推进 ASIC 技术映射

摘要: 技术映射涉及将逻辑电路映射到一个细胞库。传统上，使用完整的技术库，导致搜索空间较大，潜在的开销也很大。受随机抽样技术映射案例的启发，我们提出了MapTune框架，该框架通过利用强化学习在细胞选择过程中做出设计特定选择来解决这一挑战。通过从环境中学习，MapTune优化了细胞选择过程，减小了搜索空间，可能提高了映射质量。 MapTune的有效性在各种基准测试、不同技术库和技术映射器上进行了评估。实验结果表明，MapTune在各种电路设计、技术库和映射器中实现了更高的映射准确性，并减少了延迟/面积。该论文还讨论了帕累托最优探索，并确认了永远的延迟-面积权衡。在基准测试套件ISCAS 85/89、ITC/ISCAS 99、VTR8.0和EPFL基准上进行后技术映射和后尺寸化结果（QoR）的显着改善，MapTune中所有不同探索设置中平均面积-延迟乘积（ADP）改善了22.54\%。这些改进在四种不同技术（7nm、45nm、130nm和180nm）和两种不同映射器上保持一致。

更新时间: 2024-07-25 15:18:47

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2407.18110v1

Graph Neural Ordinary Differential Equations for Coarse-Grained Socioeconomic Dynamics

We present a data-driven machine-learning approach for modeling space-time socioeconomic dynamics. Through coarse-graining fine-scale observations, our modeling framework simplifies these complex systems to a set of tractable mechanistic relationships -- in the form of ordinary differential equations -- while preserving critical system behaviors. This approach allows for expedited 'what if' studies and sensitivity analyses, essential for informed policy-making. Our findings, from a case study of Baltimore, MD, indicate that this machine learning-augmented coarse-grained model serves as a powerful instrument for deciphering the complex interactions between social factors, geography, and exogenous stressors, offering a valuable asset for system forecasting and resilience planning.

Updated: 2024-07-25 15:12:46

标题: 图神经常微分方程用于粗粒度社会经济动态

摘要: 我们提出了一种基于数据驱动的机器学习方法，用于建模时空社会经济动态。通过对细粒度观测结果的粗粒化，我们的建模框架将这些复杂系统简化为一组可处理的机械关系 - 以普通微分方程的形式 - 同时保留关键的系统行为。这种方法允许进行加速的“假设”研究和敏感性分析，这对于知情政策制定至关重要。我们在马里兰州巴尔的摩的一个案例研究中的发现表明，这种机器学习增强的粗粒化模型作为一种强大的工具，用于解密社会因素、地理和外部压力因素之间的复杂相互作用，为系统预测和韧性规划提供了有价值的资产。

更新时间: 2024-07-25 15:12:46

领域: cs.LG,cs.CY,cs.SI,physics.soc-ph

下载: http://arxiv.org/abs/2407.18108v1

Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping

Computer vision models are increasingly capable of classifying ovarian epithelial cancer subtypes, but they differ from pathologists by processing small tissue patches at a single resolution. Multi-resolution graph models leverage the spatial relationships of patches at multiple magnifications, learning the context for each patch. In this study, we conduct the most thorough validation of a graph model for ovarian cancer subtyping to date. Seven models were tuned and trained using five-fold cross-validation on a set of 1864 whole slide images (WSIs) from 434 patients treated at Leeds Teaching Hospitals NHS Trust. The cross-validation models were ensembled and evaluated using a balanced hold-out test set of 100 WSIs from 30 patients, and an external validation set of 80 WSIs from 80 patients in the Transcanadian Study. The best-performing model, a graph model using 10x+20x magnification data, gave balanced accuracies of 73%, 88%, and 99% in cross-validation, hold-out testing, and external validation, respectively. However, this only exceeded the performance of attention-based multiple instance learning in external validation, with a 93% balanced accuracy. Graph models benefitted greatly from using the UNI foundation model rather than an ImageNet-pretrained ResNet50 for feature extraction, with this having a much greater effect on performance than changing the subsequent classification approach. The accuracy of the combined foundation model and multi-resolution graph network offers a step towards the clinical applicability of these models, with a new highest-reported performance for this task, though further validations are still required to ensure the robustness and usability of the models.

Updated: 2024-07-25 15:08:54

标题: 卵巢癌亚型的多分辨率组织病理学斑块图谱

摘要: 计算机视觉模型越来越能够对卵巢上皮癌亚型进行分类，但它们与病理学家不同，通过处理单一分辨率下的小组织片段。多分辨率图模型利用多个放大倍数下的片段之间的空间关系，学习每个片段的上下文。在这项研究中，我们进行了迄今为止对卵巢癌亚型图模型进行最彻底的验证。通过对来自利兹教学医院NHS信托的434名患者治疗的1864个全幻灯片图像（WSIs）进行五倍交叉验证来调整和训练七个模型。交叉验证模型被组合并使用来自30名患者的100个WSIs的平衡留置测试集以及来自Transcanadian Study的80名患者的80个WSIs的外部验证集进行评估。性能最佳的模型是使用10x+20x放大数据的图模型，在交叉验证、留置测试和外部验证中分别获得73%、88%和99%的平衡准确率。然而，在外部验证中，这只超过了基于注意力的多实例学习的性能，平衡准确率为93%。与ImageNet预训练的ResNet50相比，图模型在特征提取中使用UNI基础模型大大增加了性能，这比改变后续分类方法的效果更大。组合基础模型和多分辨率图网络的准确性为这些模型的临床适用性迈出了一步，并为该任务报告了最高性能，尽管仍需要进一步验证以确保模型的稳健性和可用性。

更新时间: 2024-07-25 15:08:54

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.18105v1

Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow

Large language models (LLMs) and their fine-tuning techniques have demonstrated superior performance in various language understanding and generation tasks. This paper explores fine-tuning LLMs for stock return forecasting with financial newsflow. In quantitative investing, return forecasting is fundamental for subsequent tasks like stock picking, portfolio optimization, etc. We formulate the model to include text representation and forecasting modules. We propose to compare the encoder-only and decoder-only LLMs, considering they generate text representations in distinct ways. The impact of these different representations on forecasting performance remains an open question. Meanwhile, we compare two simple methods of integrating LLMs' token-level representations into the forecasting module. The experiments on real news and investment universes reveal that: (1) aggregated representations from LLMs' token-level embeddings generally produce return predictions that enhance the performance of long-only and long-short portfolios; (2) in the relatively large investment universe, the decoder LLMs-based prediction model leads to stronger portfolios, whereas in the small universes, there are no consistent winners. Among the three LLMs studied (DeBERTa, Mistral, Llama), Mistral performs more robustly across different universes; (3) return predictions derived from LLMs' text representations are a strong signal for portfolio construction, outperforming conventional sentiment scores.

Updated: 2024-07-25 15:07:35

标题: 使用新闻流调整大型语言模型进行股票回报预测

摘要: 大型语言模型（LLMs）及其微调技术在各种语言理解和生成任务中表现出优越性能。本文探讨了如何利用金融新闻流进行微调LLMs以进行股票收益预测。在量化投资中，收益预测对于后续的任务如股票挑选、组合优化等至关重要。我们设计了一个包含文本表示和预测模块的模型。我们提出比较仅编码器和仅解码器的LLMs，考虑它们以不同方式生成文本表示。这些不同表示对预测性能的影响仍然是一个未解决的问题。同时，我们比较了两种简单的方法将LLMs的标记级表示集成到预测模块中。在真实的新闻和投资领域实验中发现：（1）从LLMs的标记级嵌入中汇总表示通常会产生增强长期和长短组合表现的收益预测；（2）在相对较大的投资领域中，基于解码器LLMs的预测模型导致更强大的组合，而在小型领域中，没有一致的赢家。在研究的三种LLMs（DeBERTa，Mistral，Llama）中，Mistral在不同领域中表现更加稳健；（3）从LLMs的文本表示中得出的收益预测是构建组合的强烈信号，胜过传统情感评分。

更新时间: 2024-07-25 15:07:35

领域: q-fin.CP,cs.LG,q-fin.PM

下载: http://arxiv.org/abs/2407.18103v1

Review of Machine Learning Methods for Additive Manufacturing of Functionally Graded Materials

Additive Manufacturing (AM) is a transformative manufacturing technology enabling direct fabrication of complex parts layer-be-layer from 3D modeling data. Among AM applications, the fabrication of Functionally Graded Materials (FGMs) has significant importance due to the potential to enhance component performance across several industries. FGMs are manufactured with a gradient composition transition between dissimilar materials, enabling the design of new materials with location-dependent mechanical and physical properties. This study presents a comprehensive review of published literature pertaining to the implementation of Machine Learning (ML) techniques in AM, with an emphasis on ML-based methods for optimizing FGMs fabrication processes. Through an extensive survey of the literature, this review article explores the role of ML in addressing the inherent challenges in FGMs fabrication and encompasses parameter optimization, defect detection, and real-time monitoring. The article also provides a discussion of future research directions and challenges in employing ML-based methods in AM fabrication of FGMs.

Updated: 2024-07-25 15:04:31

标题: 功能梯度材料增材制造的机器学习方法综述

摘要: Additive Manufacturing (AM)是一种革命性的制造技术，可以直接从3D建模数据逐层制造复杂零件。在AM应用中，功能梯度材料（FGMs）的制造具有重要意义，因为可以增强多个行业中零件性能。FGMs具有不同材料之间的渐变组成过渡，使得可以设计具有位置相关机械和物理性质的新材料。本研究综述了关于在AM中实施机器学习（ML）技术的已发表文献，重点介绍了基于ML的方法用于优化FGMs制造过程。通过对文献的广泛调查，本综述文章探讨了ML在解决FGMs制造中固有挑战中的作用，并涵盖了参数优化、缺陷检测和实时监测。本文还讨论了在AM制造FGMs中采用基于ML方法的未来研究方向和挑战。

更新时间: 2024-07-25 15:04:31

领域: cs.LG

下载: http://arxiv.org/abs/2309.16571v2

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Generating realistic audio for human actions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinations at test time. We propose a novel ambient-aware audio generation model, AV-LDM. We devise a novel audio-conditioning mechanism to learn to disentangle foreground action sounds from the ambient background sounds in in-the-wild training videos. Given a novel silent video, our model uses retrieval-augmented generation to create audio that matches the visual content both semantically and temporally. We train and evaluate our model on two in-the-wild egocentric video datasets, Ego4D and EPIC-KITCHENS, and we introduce Ego4D-Sounds -- 1.2M curated clips with action-audio correspondence. Our model outperforms an array of existing methods, allows controllable generation of the ambient sound, and even shows promise for generalizing to computer graphics game clips. Overall, our approach is the first to focus video-to-audio generation faithfully on the observed visual content despite training from uncurated clips with natural background sounds.

Updated: 2024-07-25 15:03:37

标题: Action2Sound：环境感知的从主观视角视频生成动作声音

摘要: 为人类动作生成逼真音频对于许多应用程序都很重要，比如为电影或虚拟现实游戏创建音效。现有方法在训练过程中隐含地假设视频和音频之间存在完全对应关系，然而许多声音发生在屏幕之外，并且与视觉内容几乎没有对应关系，导致在测试时产生无法控制的环境声音或幻觉。我们提出了一种新颖的环境感知音频生成模型AV-LDM。我们设计了一种新颖的音频调节机制，可以在野外训练视频中学会将前景动作声音与背景环境声音分离。给定一个新颖的无声视频，我们的模型使用检索增强生成来创建与视觉内容在语义和时间上匹配的音频。我们在两个野外第一视角视频数据集Ego4D和EPIC-KITCHENS上训练和评估我们的模型，并引入了Ego4D-Sounds -- 包含动作音频对应关系的120万个精选片段。我们的模型优于一系列现有方法，允许可控制环境声音的生成，甚至显示出对计算机图形游戏片段的泛化潜力。总的来说，我们的方法是第一个将视频到音频生成真实地集中在观察到的视觉内容上，尽管训练时使用了具有自然背景声音的非策划剪辑。

更新时间: 2024-07-25 15:03:37

领域: cs.CV,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.09272v3

Privacy Threats and Countermeasures in Federated Learning for Internet of Things: A Systematic Review

Federated Learning (FL) in the Internet of Things (IoT) environments can enhance machine learning by utilising decentralised data, but at the same time, it might introduce significant privacy and security concerns due to the constrained nature of IoT devices. This represents a research challenge that we aim to address in this paper. We systematically analysed recent literature to identify privacy threats in FL within IoT environments, and evaluate the defensive measures that can be employed to mitigate these threats. Using a Systematic Literature Review (SLR) approach, we searched five publication databases (Scopus, IEEE Xplore, Wiley, ACM, and Science Direct), collating relevant papers published between 2017 and April 2024, a period which spans from the introduction of FL until now. Guided by the PRISMA protocol, we selected 49 papers to focus our systematic review on. We analysed these papers, paying special attention to the privacy threats and defensive measures -- specifically within the context of IoT -- using inclusion and exclusion criteria tailored to highlight recent advances and critical insights. We identified various privacy threats, including inference attacks, poisoning attacks, and eavesdropping, along with defensive measures such as Differential Privacy and Secure Multi-Party Computation. These defences were evaluated for their effectiveness in protecting privacy without compromising the functional integrity of FL in IoT settings. Our review underscores the necessity for robust and efficient privacy-preserving strategies tailored for IoT environments. Notably, there is a need for strategies against replay, evasion, and model stealing attacks. Exploring lightweight defensive measures and emerging technologies such as blockchain may help improve the privacy of FL in IoT, leading to the creation of FL models that can operate under variable network conditions.

Updated: 2024-07-25 15:01:56

标题: 物联网联邦学习中的隐私威胁和对策：一项系统性综述

摘要: 物联网环境中的联邦学习（FL）可以通过利用分散的数据来增强机器学习，但与此同时，由于物联网设备的受限性，可能会引入重大的隐私和安全顾虑。这代表了我们在本文中要解决的一个研究挑战。我们系统地分析了最近的文献，以识别物联网环境中FL中的隐私威胁，并评估可以采用的防御措施来减轻这些威胁。使用系统文献回顾（SLR）方法，我们搜索了五个出版数据库（Scopus、IEEE Xplore、Wiley、ACM和Science Direct），整理了2017年至2024年4月期间发表的相关论文，这一时期涵盖了FL的引入至今。在PRISMA协议的指导下，我们选择了49篇论文来集中进行系统审查。我们分析了这些论文，特别关注隐私威胁和防御措施 - 特别是在物联网环境下 - 使用定制的包含和排除标准来突显最新的进展和关键见解。我们确定了各种隐私威胁，包括推理攻击、毒害攻击和窃听，以及防御措施，如差分隐私和安全多方计算。这些防御措施在保护隐私的同时评估了它们在不损害FL在物联网设置中的功能完整性方面的有效性。我们的审查强调了为物联网环境量身定制的强大和高效的保护隐私策略的必要性。值得注意的是，有必要对抗重放、规避和模型窃取攻击的策略。探索轻量级的防御措施和区块链等新兴技术可能有助于提高物联网中FL的隐私性，从而创建可以在不同网络条件下运行的FL模型。

更新时间: 2024-07-25 15:01:56

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.18096v1

Self-supervised learning of video representations from a child's perspective

Children learn powerful internal models of the world around them from a few years of egocentric visual experience. Can such internal models be learned from a child's visual experience with highly generic learning algorithms or do they require strong inductive biases? Recent advances in collecting large-scale, longitudinal, developmentally realistic video datasets and generic self-supervised learning (SSL) algorithms are allowing us to begin to tackle this nature vs. nurture question. However, existing work typically focuses on image-based SSL algorithms and visual capabilities that can be learned from static images (e.g. object recognition), thus ignoring temporal aspects of the world. To close this gap, here we train self-supervised video models on longitudinal, egocentric headcam recordings collected from a child over a two year period in their early development (6-31 months). The resulting models are highly effective at facilitating the learning of action concepts from a small number of labeled examples; they have favorable data size scaling properties; and they display emergent video interpolation capabilities. Video models also learn more robust object representations than image-based models trained with the exact same data. These results suggest that important temporal aspects of a child's internal model of the world may be learnable from their visual experience using highly generic learning algorithms and without strong inductive biases.

Updated: 2024-07-25 14:48:34

标题: 从孩子的角度进行视频表征的自监督学习

摘要: 儿童通过几年的自我中心视觉经验学习了强大的世界内部模型。这种内部模型能否从儿童与高度通用的学习算法的视觉经验中学习，或者它们需要强大的归纳偏差？最近在收集大规模、纵向、发展真实的视频数据集和通用的自监督学习（SSL）算法方面取得了进展，这使我们能够开始解决这个天性与养育问题。然而，现有的工作通常集中在基于图像的SSL算法和可以从静态图像中学习的视觉能力（如对象识别），从而忽略了世界的时间方面。为了弥补这一差距，我们在儿童早期发展阶段（6-31个月）收集的长期、自我中心的头部摄像记录上训练了自监督视频模型。由此产生的模型极大地促进了从少量标记示例中学习动作概念；它们具有有利的数据规模缩放特性；并展示了新兴的视频插值能力。视频模型还比使用完全相同数据训练的基于图像的模型学习了更加稳健的对象表示。这些结果表明，儿童对世界的内部模型的重要时间方面可能可以通过他们的视觉经验使用高度通用的学习算法学习，并且不需要强大的归纳偏差。

更新时间: 2024-07-25 14:48:34

领域: cs.CV,cs.LG,cs.NE,q-bio.NC

下载: http://arxiv.org/abs/2402.00300v2

On the Design of Ethereum Data Availability Sampling: A Comprehensive Simulation Study

This paper presents an in-depth exploration of Data Availability Sampling (DAS) and sharding mechanisms within decentralized systems through simulation-based analysis. DAS, a pivotal concept in blockchain technology and decentralized networks, is thoroughly examined to unravel its intricacies and assess its impact on system performance. Through the development of a simulator tailored explicitly for DAS, we embark on a comprehensive investigation into the parameters that influence system behavior and efficiency. A series of experiments are conducted within the simulated environment to validate theoretical formulations and dissect the interplay of DAS parameters. This includes an exploration of approaches such as custody by row, variations in validators per node, and malicious nodes. The outcomes of these experiments furnish insights into the efficacy of DAS protocols and pave the way for the formulation of optimization strategies geared towards enhancing decentralized network performance. Moreover, the findings serve as guidelines for future research endeavors, offering a nuanced understanding of the complexities inherent in decentralized systems. This study not only contributes to the theoretical understanding of DAS but also offers practical implications for the design, implementation, and optimization of decentralized systems.

Updated: 2024-07-25 14:47:41

标题: 关于以太坊数据可用性抽样设计：综合模拟研究

摘要: 本文通过基于模拟的分析，深入探讨了分布式系统中数据可用性采样（DAS）和分片机制。DAS是区块链技术和分布式网络中的一个关键概念，本文对其进行了彻底的探讨，以揭示其复杂性并评估其对系统性能的影响。通过开发专门针对DAS的模拟器，我们展开了对影响系统行为和效率的参数进行全面调查。在模拟环境中进行了一系列实验，以验证理论公式并分析DAS参数的相互作用。这包括探索诸如按行保管、每个节点的验证者变化以及恶意节点等方法。这些实验的结果为DAS协议的有效性提供了见解，并为制定旨在提高分布式网络性能的优化策略铺平了道路。此外，研究结果为未来的研究工作提供了指导，提供了对分布式系统固有复杂性的细致理解。本研究不仅为DAS的理论理解做出了贡献，还为分布式系统的设计、实施和优化提供了实际意义。

更新时间: 2024-07-25 14:47:41

领域: cs.CR

下载: http://arxiv.org/abs/2407.18085v1

PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization

The recent emergence of Large Language Models (LLMs) has heralded a new era of human-AI interaction. These sophisticated models, exemplified by Chat-GPT and its successors, have exhibited remarkable capabilities in language understanding. However, as these LLMs have undergone exponential growth, a crucial dimension that remains understudied is the personalization of these models. Large foundation models such as GPT-3 etc. focus on creating a universal model that serves a broad range of tasks and users. This approach emphasizes the model's generalization capabilities, treating users as a collective rather than as distinct individuals. While practical for many common applications, this one-size-fits-all approach often fails to address the rich tapestry of human diversity and individual needs. To explore this issue we introduce the PEFT-U Benchmark: a new dataset for building and evaluating NLP models for user personalization. \datasetname{} consists of a series of user-centered tasks containing diverse and individualized expressions where the preferences of users can potentially differ for the same input. Using PEFT-U, we explore the challenge of efficiently personalizing LLMs to accommodate user-specific preferences in the context of diverse user-centered tasks.

Updated: 2024-07-25 14:36:18

标题: PEFT-U：用户个性化的参数高效微调

摘要: 最近出现的大型语言模型（LLMs）开启了人工智能互动的新时代。这些复杂的模型，如Chat-GPT及其后继者，展示了在语言理解方面的显著能力。然而，随着这些LLMs经历了指数级增长，一个仍然鲜有研究的关键维度是这些模型的个性化。诸如GPT-3等大型基础模型专注于创建一个服务广泛任务和用户的通用模型。这种方法强调了模型的泛化能力，将用户视为一个集体而不是独立个体。虽然对许多常见应用来说实用，但这种一刀切的方法通常无法解决丰富多样的人类多样性和个体需求。为了探索这个问题，我们引入了PEFT-U基准：一个用于构建和评估自然语言处理模型的用户个性化的新数据集。PEFT-U由一系列以用户为中心的任务组成，其中包含各种个性化表达，用户的偏好在相同输入下可能有所不同。利用PEFT-U，我们探讨了在多样化用户中心任务的背景下，有效地个性化LLMs以适应用户特定偏好的挑战。

更新时间: 2024-07-25 14:36:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.18078v1

Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Frontier AI Models

Frontier AI systems are making transformative impacts across society, but such benefits are not without costs: models trained on web-scale datasets containing personal and private data raise profound concerns about data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage - where the model response reveals pieces of such information - remains inadequately understood. Prior work has investigated what factors drive memorization and have identified that sequence complexity and the number of repetitions drive memorization. Here, we focus on the evolution of memorization over training. We begin by reproducing findings that the probability of memorizing a sequence scales logarithmically with the number of times it is present in the data. We next show that sequences which are apparently not memorized after the first encounter can be "uncovered" throughout the course of training even without subsequent encounters, a phenomenon we term "latent memorization". The presence of latent memorization presents a challenge for data privacy as memorized sequences may be hidden at the final checkpoint of the model but remain easily recoverable. To this end, we develop a diagnostic test relying on the cross entropy loss to uncover latent memorized sequences with high accuracy.

Updated: 2024-07-25 14:33:33

标题: 揭示潜在记忆：评估前沿人工智能模型中的数据泄漏和记忆模式

摘要: 前沿人工智能系统正在对社会产生深远影响，但这些好处并非没有代价：在包含个人和私人数据的网络规模数据集上训练的模型引发了关于数据隐私和安全的深刻担忧。语言模型在包括潜在敏感或专有信息在内的广泛语料库上进行训练，数据泄露的风险 - 模型响应透露出这些信息片段的风险 - 仍然并不充分理解。先前的研究调查了什么因素驱动记忆，并确定了序列复杂性和重复次数驱动记忆。在这里，我们关注记忆在训练过程中的演变。我们首先重现了一个发现，即记忆序列的概率与其在数据中出现的次数呈对数比例关系。接下来，我们展示了即使在第一次遇到后明显没有被记住的序列也可能在训练过程中被“发现”，即使没有后续遇到，我们称之为“潜在记忆”。潜在记忆的存在对数据隐私构成挑战，因为被记忆的序列可能隐藏在模型的最终检查点处，但仍然可以轻松恢复。为此，我们开发了一种依赖于交叉熵损失的诊断测试，以高准确度揭示潜在记忆序列。

更新时间: 2024-07-25 14:33:33

领域: cs.CV,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2406.14549v2

Clustering with minimum spanning trees: How good can it be?

Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.

Updated: 2024-07-25 14:32:51

标题: 最小生成树聚类：它有多好？

摘要: 最小生成树（MST）在许多模式识别活动中提供了数据集的便捷表示。此外，它们相对快速计算。在本文中，我们量化它们在低维分区数据聚类任务中的意义程度。通过识别最佳（oracle）算法与大量基准数据的专家标签之间的一致性上限，我们发现MST方法可以非常有竞争力。接下来，我们回顾、研究、扩展和概括了一些现有的最先进的基于MST的分区方案。这导致了一些新的显著方法。总的来说，Genie和信息论方法经常胜过非MST算法，如K均值，高斯混合，谱聚类，Birch，基于密度和传统的层次聚合程序。然而，我们发现仍有改进的空间，因此鼓励开发新算法。

更新时间: 2024-07-25 14:32:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2303.05679v3

Normalised clustering accuracy: An asymmetric external cluster validity measure

There is no, nor will there ever be, single best clustering algorithm. Nevertheless, we would still like to be able to distinguish between methods that work well on certain task types and those that systematically underperform. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. However, their validity is questionable because the clusterings they endorse can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to fixed ground truth groupings provided by experts. In this paper, we argue that the commonly used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, miss some desirable properties. In particular, they do not identify worst-case scenarios correctly, nor are they easily interpretable. As a consequence, the evaluation of clustering algorithms on diverse benchmark datasets can be difficult. To remedy these issues, we propose and analyse a new measure: a version of the optimal set-matching accuracy, which is normalised, monotonic with respect to some similarity relation, scale-invariant, and corrected for the imbalancedness of cluster sizes (but neither symmetric nor adjusted for chance).

Updated: 2024-07-25 14:31:03

标题: 标准化聚类准确度：一种非对称的外部聚类有效性度量

摘要: 没有，也永远不会有，单一最佳的聚类算法。然而，我们仍然希望能够区分在某些任务类型上表现良好的方法与那些系统性表现不佳的方法。传统上使用内部或外部有效性度量来评估聚类算法。内部度量量化获得的划分的不同方面，例如，簇紧密度的平均程度或点可分离性。然而，它们的有效性是可疑的，因为它们认可的聚类有时可能是毫无意义的。另一方面，外部度量将算法的输出与专家提供的固定基本真相分组进行比较。在本文中，我们认为常用的经典划分相似度评分，如归一化互信息、Fowlkes-Mallows或调整的Rand指数，缺少一些理想的属性。特别是，它们不能正确地识别最坏情况，也不容易解释。因此，对多样化基准数据集上的聚类算法进行评估可能会很困难。为了解决这些问题，我们提出并分析了一个新的度量：一种最优集匹配准确度的版本，它是归一化的，与某种相似关系单调，与尺度无关，并且校正了簇大小的不平衡（但既不对称也不调整机会）。

更新时间: 2024-07-25 14:31:03

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2209.02935v4

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

Diffusion policies are conditional diffusion models that learn robot action distributions conditioned on the robot and environment state. They have recently shown to outperform both deterministic and alternative action distribution learning formulations. 3D robot policies use 3D scene feature representations aggregated from a single or multiple camera views using sensed depth. They have shown to generalize better than their 2D counterparts across camera viewpoints. We unify these two lines of work and present 3D Diffuser Actor, a neural policy equipped with a novel 3D denoising transformer that fuses information from the 3D visual scene, a language instruction and proprioception to predict the noise in noised 3D robot pose trajectories. 3D Diffuser Actor sets a new state-of-the-art on RLBench with an absolute performance gain of 18.1% over the current SOTA on a multi-view setup and an absolute gain of 13.1% on a single-view setup. On the CALVIN benchmark, it improves over the current SOTA by a 9% relative increase. It also learns to control a robot manipulator in the real world from a handful of demonstrations. Through thorough comparisons with the current SOTA policies and ablations of our model, we show 3D Diffuser Actor's design choices dramatically outperform 2D representations, regression and classification objectives, absolute attentions, and holistic non-tokenized 3D scene embeddings.

Updated: 2024-07-25 14:30:22

标题: 3D扩散器：使用3D场景表示进行政策传播

摘要: 扩散策略是有条件的扩散模型，学习机器人动作分布，条件是机器人和环境状态。最近表明，它们在性能上优于确定性和替代动作分布学习公式。3D机器人策略使用从单个或多个摄像机视图中聚合的感测深度，得到的3D场景特征表示。它们表现出对不同摄像机视角的泛化能力优于2D版本。我们统一了这两种工作线，并提出了3D扩散器Actor，一种装备有新型3D去噪变压器的神经策略，可以从3D视觉场景、语言指令和自我感知中融合信息，预测有噪声的3D机器人姿势轨迹中的噪声。在RLBench上，3D扩散器Actor在多视角设置上的绝对性能提升为18.1%，在单视角设置上绝对提升为13.1%，创下了最新技术水平。在CALVIN基准测试中，相对增长9%。它还学会了从少数演示中控制真实世界中的机器人操作器。通过与当前最先进的策略进行彻底比较，并对我们模型进行消融实验，我们展示了3D扩散器Actor的设计选择明显优于2D表示、回归和分类目标、绝对关注和整体非标记的3D场景嵌入。

更新时间: 2024-07-25 14:30:22

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.10885v3

Principal-Agent Reinforcement Learning

Contracts are the economic framework which allows a principal to delegate a task to an agent -- despite misaligned interests, and even without directly observing the agent's actions. In many modern reinforcement learning settings, self-interested agents learn to perform a multi-stage task delegated to them by a principal. We explore the significant potential of utilizing contracts to incentivize the agents. We model the delegated task as an MDP, and study a stochastic game between the principal and agent where the principal learns what contracts to use, and the agent learns an MDP policy in response. We present a learning-based algorithm for optimizing the principal's contracts, which provably converges to the subgame-perfect equilibrium of the principal-agent game. A deep RL implementation allows us to apply our method to very large MDPs with unknown transition dynamics. We extend our approach to multiple agents, and demonstrate its relevance to resolving a canonical sequential social dilemma with minimal intervention to agent rewards.

Updated: 2024-07-25 14:28:58

标题: 委托代理强化学习

摘要: 合同是经济框架，允许委托人将任务委托给代理人 - 尽管存在利益不一致的情况，甚至在没有直接观察代理人行动的情况下。在许多现代强化学习设置中，自私的代理人学习执行委托给他们的多阶段任务。我们探讨了利用合同激励代理人的巨大潜力。我们将委托任务建模为MDP，并研究了委托人和代理人之间的随机博弈，其中委托人学习使用何种合同，代理人则根据情况学习MDP策略。我们提出了一个基于学习的算法来优化委托人的合同，可以证明收敛到委托人-代理人博弈的子博弈完美均衡。通过深度强化学习实现，我们能够将我们的方法应用于具有未知转移动态的非常大的MDP。我们将我们的方法扩展到多个代理人，并展示其与解决具有最少干预代理奖励的经典顺序社会困境的相关性。

更新时间: 2024-07-25 14:28:58

领域: cs.GT,cs.LG,cs.MA

下载: http://arxiv.org/abs/2407.18074v1

HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data

We introduce Human-like Video Models (HVM-1), large-scale video models pretrained with nearly 5000 hours of curated human-like video data (mostly egocentric, temporally extended, continuous video recordings), using the spatiotemporal masked autoencoder (ST-MAE) algorithm. We release two 633M parameter models trained at spatial resolutions of 224x224 and 448x448 pixels. We evaluate the performance of these models in downstream few-shot video and image recognition tasks and compare them against a model pretrained with 1330 hours of short action-oriented video clips from YouTube (Kinetics-700). HVM-1 models perform competitively against the Kinetics-700 pretrained model in downstream evaluations despite substantial qualitative differences between the spatiotemporal characteristics of the corresponding pretraining datasets. HVM-1 models also learn more accurate and more robust object representations compared to models pretrained with the image-based MAE algorithm on the same data, demonstrating the potential benefits of learning to predict temporal regularities in natural videos for learning better object representations.

Updated: 2024-07-25 14:21:50

标题: HVM-1：使用近5000小时类人视频数据预训练的大规模视频模型

摘要: 我们介绍了人类式视频模型（HVM-1），这是使用时空遮罩自编码器（ST-MAE）算法预训练的大规模视频模型，使用了近5000小时的策划好的人类式视频数据（主要是自我中心、时态延长、连续视频录像）。我们发布了两个训练在空间分辨率为224x224和448x448像素的6.33亿参数模型。我们评估了这些模型在下游少样本视频和图像识别任务中的性能，并将它们与预训练了1330小时短动作导向视频片段（Kinetics-700）的模型进行比较。尽管对应的预训练数据集的时空特征存在显著的定性差异，HVM-1模型在下游评估中与Kinetics-700预训练模型表现竞争力。与在相同数据上使用基于图像的MAE算法预训练的模型相比，HVM-1模型还学习到更准确和更稳健的对象表示，展示了学习预测自然视频中时间规律对学习更好的对象表示的潜在益处。

更新时间: 2024-07-25 14:21:50

领域: cs.CV,cs.LG,cs.NE,q-bio.NC

下载: http://arxiv.org/abs/2407.18067v1

Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN

Resilience is defined as the ability of a network to resist, adapt, and quickly recover from disruptions, and to continue to maintain an acceptable level of services from users' perspective. With the advent of future radio networks, including advanced 5G and upcoming 6G, critical services become integral to future networks, requiring uninterrupted service delivery for end users. Unfortunately, with the growing network complexity, user mobility and diversity, it becomes challenging to scale current resilience management techniques that rely on local optimizations to large dense network deployments. This paper aims to address this problem by globally optimizing the resilience of a dense multi-cell network based on multi-agent deep reinforcement learning. Specifically, our proposed solution can dynamically tilt cell antennas and reconfigure transmit power to mitigate outages and increase both coverage and service availability. A multi-objective optimization problem is formulated to simultaneously satisfy resiliency constraints while maximizing the service quality in the network area in order to minimize the impact of outages on neighbouring cells. Extensive simulations then demonstrate that with our proposed solution, the average service availability in terms of user throughput can be increased by up to 50-60% on average, while reaching a coverage availability of 99% in best cases.

Updated: 2024-07-25 14:19:59

标题: 多智能体深度强化学习在5G RAN中的韧性优化

摘要: 弹性被定义为网络抵抗、适应和快速从中断中恢复的能力，并且继续从用户的角度维持可接受的服务水平。随着未来无线网络的出现，包括先进的5G和即将到来的6G，关键服务成为未来网络的重要组成部分，需要为终端用户提供不间断的服务交付。不幸的是，随着网络复杂性、用户移动性和多样性的增长，将依赖于局部优化的当前弹性管理技术扩展到大型密集网络部署变得具有挑战性。本文旨在通过基于多智能体深度强化学习的全局优化来解决这一问题，以提高密集多小区网络的弹性。具体而言，我们提出的解决方案可以动态倾斜小区天线和重新配置发射功率以减轻中断并增加覆盖范围和服务可用性。制定了一个多目标优化问题，以同时满足弹性约束并最大化网络区域内的服务质量，以减少中断对邻近小区的影响。广泛的模拟结果表明，采用我们提出的解决方案，用户吞吐量方面的平均服务可用性平均可以提高50-60%，在最佳情况下达到99%的覆盖可用性。

更新时间: 2024-07-25 14:19:59

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2407.18066v1

Diagnosing and fixing common problems in Bayesian optimization for molecule design

Bayesian optimization (BO) is a principled approach to molecular design tasks. In this paper we explain three pitfalls of BO which can cause poor empirical performance: an incorrect prior width, over-smoothing, and inadequate acquisition function maximization. We show that with these issues addressed, even a basic BO setup is able to achieve the highest overall performance on the PMO benchmark for molecule design (Gao et al 2022). These results suggest that BO may benefit from more attention in the machine learning for molecules community.

Updated: 2024-07-25 14:17:40

标题: 诊断和修复分子设计贝叶斯优化中的常见问题

摘要: 贝叶斯优化（BO）是一种用于分子设计任务的原则性方法。在本文中，我们解释了BO的三个缺陷，这些缺陷可能导致实验性能不佳：先验宽度不正确，过度平滑和采集函数最大化不足。我们展示了，解决了这些问题后，即使是基本的BO设置也能在PMO分子设计基准测试中实现最高的整体性能（Gao等人，2022年）。这些结果表明，BO可能会受益于分子学习社区对其更多的关注。

更新时间: 2024-07-25 14:17:40

领域: cs.LG,physics.chem-ph,stat.ML

下载: http://arxiv.org/abs/2406.07709v2

Difficulty Estimation and Simplification of French Text Using LLMs

We leverage generative large language models for language learning applications, focusing on estimating the difficulty of foreign language texts and simplifying them to lower difficulty levels. We frame both tasks as prediction problems and develop a difficulty classification model using labeled examples, transfer learning, and large language models, demonstrating superior accuracy compared to previous approaches. For simplification, we evaluate the trade-off between simplification quality and meaning preservation, comparing zero-shot and fine-tuned performances of large language models. We show that meaningful text simplifications can be obtained with limited fine-tuning. Our experiments are conducted on French texts, but our methods are language-agnostic and directly applicable to other foreign languages.

Updated: 2024-07-25 14:16:08

标题: 使用LLMs估计和简化法语文本的困难程度

摘要: 我们利用生成式大型语言模型来进行语言学习应用，重点是估计外语文本的难度并将其简化到较低的难度水平。我们将这两个任务都视为预测问题，并使用标记的示例、迁移学习和大型语言模型开发了一个难度分类模型，相比以往的方法展示了更高的准确性。对于简化，我们评估了简化质量和意义保留之间的权衡，比较了大型语言模型的零-shot和微调性能。我们表明，即使进行了有限的微调，也可以获得有意义的文本简化。我们的实验是在法语文本上进行的，但我们的方法是与语言无关的，并可直接应用于其他外语。

更新时间: 2024-07-25 14:16:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.18061v1

Cross-Vendor Reproducibility of Radiomics-based Machine Learning Models for Computer-aided Diagnosis

Background: The reproducibility of machine-learning models in prostate cancer detection across different MRI vendors remains a significant challenge. Methods: This study investigates Support Vector Machines (SVM) and Random Forest (RF) models trained on radiomic features extracted from T2-weighted MRI images using Pyradiomics and MRCradiomics libraries. Feature selection was performed using the maximum relevance minimum redundancy (MRMR) technique. We aimed to enhance clinical decision support through multimodal learning and feature fusion. Results: Our SVM model, utilizing combined features from Pyradiomics and MRCradiomics, achieved an AUC of 0.74 on the Multi-Improd dataset (Siemens scanner) but decreased to 0.60 on the Philips test set. The RF model showed similar trends, with notable robustness for models using Pyradiomics features alone (AUC of 0.78 on Philips). Conclusions: These findings demonstrate the potential of multimodal feature integration to improve the robustness and generalizability of machine-learning models for clinical decision support in prostate cancer detection. This study marks a significant step towards developing reliable AI-driven diagnostic tools that maintain efficacy across various imaging platforms.

Updated: 2024-07-25 14:16:02

标题: 不同供应商的放射组学机器学习模型在计算辅助诊断中的可重现性

摘要: 背景：在不同MRI供应商之间，机器学习模型在前列腺癌检测中的可重复性仍然是一个重要挑战。方法：本研究调查了使用Pyradiomics和MRCradiomics库从T2加权MRI图像中提取的辐射特征训练的支持向量机（SVM）和随机森林（RF）模型。特征选择是使用最大相关性最小冗余（MRMR）技术进行的。我们旨在通过多模式学习和特征融合来增强临床决策支持。结果：我们的SVM模型，利用Pyradiomics和MRCradiomics的组合特征，在Multi-Improd数据集（西门子扫描仪）上实现了0.74的AUC，但在飞利浦测试集上降至0.60。RF模型显示了类似的趋势，其中仅使用Pyradiomics特征的模型表现出显着的稳健性（在飞利浦上的AUC为0.78）。结论：这些发现表明，多模式特征集成的潜力，可以提高机器学习模型在前列腺癌检测临床决策支持中的稳健性和泛化能力。这项研究是朝着开发可靠的AI驱动诊断工具迈出的重要一步，这些工具可以在各种成像平台上保持有效性。

更新时间: 2024-07-25 14:16:02

领域: cs.LG

下载: http://arxiv.org/abs/2407.18060v1

I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition

Music two-tower multimodal systems integrate audio and text modalities into a joint audio-text space, enabling direct comparison between songs and their corresponding labels. These systems enable new approaches for classification and retrieval, leveraging both modalities. Despite the promising results they have shown for zero-shot classification and retrieval tasks, closer inspection of the embeddings is needed. This paper evaluates the inherent zero-shot properties of joint audio-text spaces for the case-study of instrument recognition. We present an evaluation and analysis of two-tower systems for zero-shot instrument recognition and a detailed analysis of the properties of the pre-joint and joint embeddings spaces. Our findings suggest that audio encoders alone demonstrate good quality, while challenges remain within the text encoder or joint space projection. Specifically, two-tower systems exhibit sensitivity towards specific words, favoring generic prompts over musically informed ones. Despite the large size of textual encoders, they do not yet leverage additional textual context or infer instruments accurately from their descriptions. Lastly, a novel approach for quantifying the semantic meaningfulness of the textual space leveraging an instrument ontology is proposed. This method reveals deficiencies in the systems' understanding of instruments and provides evidence of the need for fine-tuning text encoders on musical data.

Updated: 2024-07-25 14:15:05

标题: 我可以听但不能阅读：一种用于乐器识别的双塔多模系统评估

摘要: 音乐双塔多模态系统将音频和文本模态集成到一个联合的音频-文本空间中，实现了歌曲及其相应标签之间的直接比较。这些系统为分类和检索提供了新的方法，利用了两种模态。尽管它们在零样本分类和检索任务上显示出有希望的结果，但需要更仔细地检查嵌入。本文评估了联合音频-文本空间的固有零样本属性，以乐器识别为案例研究。我们对零样本乐器识别的双塔系统进行了评估和分析，以及对前联合和联合嵌入空间的属性进行了详细分析。我们的研究结果表明，音频编码器单独表现出良好的质量，而文本编码器或联合空间投影仍存在挑战。具体而言，双塔系统对特定词语敏感，更倾向于通用提示而不是音乐相关的提示。尽管文本编码器的大小很大，但它们尚未利用额外的文本上下文或准确推断描述中的乐器。最后，提出了一种利用乐器本体论量化文本空间语义意义的新方法。这种方法揭示了系统对乐器理解的不足，并为在音乐数据上对文本编码器进行微调的需要提供了证据。

更新时间: 2024-07-25 14:15:05

领域: cs.SD,cs.CL,cs.IR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.18058v1

Physics-informed nonlinear vector autoregressive models for the prediction of dynamical systems

Machine learning techniques have recently been of great interest for solving differential equations. Training these models is classically a data-fitting task, but knowledge of the expression of the differential equation can be used to supplement the training objective, leading to the development of physics-informed scientific machine learning. In this article, we focus on one class of models called nonlinear vector autoregression (NVAR) to solve ordinary differential equations (ODEs). Motivated by connections to numerical integration and physics-informed neural networks, we explicitly derive the physics-informed NVAR (piNVAR) which enforces the right-hand side of the underlying differential equation regardless of NVAR construction. Because NVAR and piNVAR completely share their learned parameters, we propose an augmented procedure to jointly train the two models. Then, using both data-driven and ODE-driven metrics, we evaluate the ability of the piNVAR model to predict solutions to various ODE systems, such as the undamped spring, a Lotka-Volterra predator-prey nonlinear model, and the chaotic Lorenz system.

Updated: 2024-07-25 14:10:42

标题: 物理信息非线性向量自回归模型用于动态系统预测

摘要: 最近，机器学习技术对于解决微分方程问题引起了极大的兴趣。传统上，训练这些模型是一个数据拟合任务，但微分方程表达式的知识可以用来补充训练目标，从而导致了物理信息科学机器学习的发展。在本文中，我们关注一类称为非线性向量自回归（NVAR）的模型，用于解决普通微分方程（ODEs）。受到与数值积分和物理信息神经网络的联系的启发，我们明确推导了物理信息NVAR（piNVAR），它强制执行基础微分方程的右侧，而与NVAR构造无关。由于NVAR和piNVAR完全共享其学习参数，我们提出了一个增强程序来共同训练这两个模型。然后，利用数据驱动和ODE驱动的指标，我们评估了piNVAR模型预测各种ODE系统解决方案的能力，如无阻尼弹簧、Lotka-Volterra捕食-猎物非线性模型和混沌的Lorenz系统。

更新时间: 2024-07-25 14:10:42

领域: math.DS,cs.LG,34A34, 37M15, 65L05, 68T07

下载: http://arxiv.org/abs/2407.18057v1

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance is constrained by the limited representation ability of discrete latent codes in the encoded features. In this paper, we propose a novel ASSR method named GaussianSR that overcomes this limitation through 2D Gaussian Splatting (2DGS). Unlike traditional methods that treat pixels as discrete points, GaussianSR represents each pixel as a continuous Gaussian field. The encoded features are simultaneously refined and upsampled by rendering the mutually stacked Gaussian fields. As a result, long-range dependencies are established to enhance representation ability. In addition, a classifier is developed to dynamically assign Gaussian kernels to all pixels to further improve flexibility. All components of GaussianSR (i.e., encoder, classifier, Gaussian kernels, and decoder) are jointly learned end-to-end. Experiments demonstrate that GaussianSR achieves superior ASSR performance with fewer parameters than existing methods while enjoying interpretable and content-aware feature aggregations.

Updated: 2024-07-25 13:53:48

标题: 高保真度2D高斯飞溅：适用于任意尺度图像超分辨率的GaussianSR

摘要: Implicit neural representations (INRs) 在任意尺度超分辨率（ASSR）图像领域取得了显著进展。大多数现有基于INR的ASSR网络首先使用编码器从给定的低分辨率图像中提取特征，然后通过多层感知器解码器呈现超分辨率结果。尽管这些方法已经显示出有希望的结果，但它们的性能受到编码特征中离散潜在代码的有限表示能力的限制。在本文中，我们提出了一种名为GaussianSR的新型ASSR方法，通过2D高斯喷泉（2DGS）克服了这一限制。与传统方法将像素视为离散点不同，GaussianSR将每个像素表示为连续的高斯场。编码特征通过呈现相互堆叠的高斯场同时进行细化和上采样。因此，建立了远距离依赖关系以增强表示能力。此外，开发了一个分类器，动态地为所有像素分配高斯核以进一步提高灵活性。GaussianSR的所有组件（即编码器、分类器、高斯核和解码器）都是端到端联合学习的。实验表明，GaussianSR在享受可解释和内容感知的特征汇聚的同时，比现有方法具有更少的参数且实现了卓越的ASSR性能。

更新时间: 2024-07-25 13:53:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.18046v1

The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation

Digital health chatbots powered by Large Language Models (LLMs) have the potential to significantly improve personal health management for chronic conditions by providing accessible and on-demand health coaching and question-answering. However, these chatbots risk providing unverified and inaccurate information because LLMs generate responses based on patterns learned from diverse internet data. Retrieval Augmented Generation (RAG) can help mitigate hallucinations and inaccuracies in LLM responses by grounding it on reliable content. However, efficiently and accurately retrieving most relevant set of content for real-time user questions remains a challenge. In this work, we introduce Query-Based Retrieval Augmented Generation (QB-RAG), a novel approach that pre-computes a database of potential queries from a content base using LLMs. For an incoming patient question, QB-RAG efficiently matches it against this pre-generated query database using vector search, improving alignment between user questions and the content. We establish a theoretical foundation for QB-RAG and provide a comparative analysis of existing retrieval enhancement techniques for RAG systems. Finally, our empirical evaluation demonstrates that QB-RAG significantly improves the accuracy of healthcare question answering, paving the way for robust and trustworthy LLM applications in digital health.

Updated: 2024-07-25 13:47:01

标题: 查询的几何形状：检索增强生成中的基于查询的创新

摘要: 由大型语言模型（LLMs）驱动的数字健康聊天机器人有潜力通过提供可访问和按需的健康辅导和问答来显著改善慢性病患者的个人健康管理。然而，这些聊天机器人存在提供未经验证和不准确信息的风险，因为LLMs生成的回答是基于从各种互联网数据中学习到的模式。检索增强生成（RAG）可以通过将其基于可靠内容来减轻LLMs回答中的幻觉和不准确性。然而，有效且准确地检索最相关的内容集以回答实时用户问题仍然是一个挑战。在这项工作中，我们介绍了一种名为Query-Based Retrieval Augmented Generation（QB-RAG）的新方法，该方法使用LLMs从内容库中预先计算潜在查询数据库。对于传入的患者问题，QB-RAG通过向量搜索将其与这个预生成的查询数据库进行有效匹配，从而提高用户问题与内容之间的对齐度。我们为QB-RAG建立了理论基础，并对现有的RAG系统检索增强技术进行了比较分析。最后，我们的实证评估表明，QB-RAG显著提高了医疗问答的准确性，为数字健康中健壮和可信赖的LLM应用铺平了道路。

更新时间: 2024-07-25 13:47:01

领域: cs.LG

下载: http://arxiv.org/abs/2407.18044v1

Lifelong Graph Summarization with Neural Networks: 2012, 2022, and a Time Warp

Summarizing web graphs is challenging due to the heterogeneity of the modeled information and its changes over time. We investigate the use of neural networks for lifelong graph summarization. Assuming we observe the web graph at a certain time, we train the networks to summarize graph vertices. We apply this trained network to summarize the vertices of the changed graph at the next point in time. Subsequently, we continue training and evaluating the network to perform lifelong graph summarization. We use the GNNs Graph-MLP and GraphSAINT, as well as an MLP baseline, to summarize the temporal graphs. We compare $1$-hop and $2$-hop summaries. We investigate the impact of reusing parameters from a previous snapshot by measuring the backward and forward transfer and the forgetting rate of the neural networks. Our extensive experiments on ten weekly snapshots of a web graph with over $100$M edges, sampled in 2012 and 2022, show that all networks predominantly use $1$-hop information to determine the summary, even when performing $2$-hop summarization. Due to the heterogeneity of web graphs, in some snapshots, the $2$-hop summary produces over ten times more vertex summaries than the $1$-hop summary. When using the network trained on the last snapshot from 2012 and applying it to the first snapshot of 2022, we observe a strong drop in accuracy. We attribute this drop over the ten-year time warp to the strongly increased heterogeneity of the web graph in 2022.

Updated: 2024-07-25 13:44:42

标题: 用神经网络进行终身图摘要：2012年，2022年和时间扭曲

摘要: 总结网页图表是具有挑战性的，因为所建模信息的异质性以及随时间的变化。我们研究了神经网络在终身图表总结中的应用。假设我们在某个时间观察到了网页图表，我们训练网络来总结图表的顶点。我们将这个训练好的网络应用于在下一个时间点发生变化的图表的顶点总结。随后，我们继续训练和评估网络以执行终身图表总结。我们使用GNNs Graph-MLP和GraphSAINT，以及一个MLP基准线，来总结时间图表。我们比较了1跳和2跳的总结。我们通过测量神经网络的向后和向前传递以及遗忘率来调查重用先前快照的参数的影响。我们对2012年和2022年采样的具有超过1亿边的网页图表的十个周快照进行了大量实验，结果显示所有网络主要使用1跳信息来确定总结，即使在进行2跳总结时也是如此。由于网页图表的异质性，在一些快照中，2跳总结产生的顶点总结比1跳总结多十倍以上。当使用在2012年最后一个快照上训练的网络，并将其应用于2022年第一个快照时，我们观察到准确性大幅下降。我们将这种十年时间扭曲中的准确性下降归因于2022年网页图表的强烈增加的异质性。

更新时间: 2024-07-25 13:44:42

领域: cs.LG

下载: http://arxiv.org/abs/2407.18042v1

How to Train the Teacher Model for Effective Knowledge Distillation

Recently, it was shown that the role of the teacher in knowledge distillation (KD) is to provide the student with an estimate of the true Bayes conditional probability density (BCPD). Notably, the new findings propose that the student's error rate can be upper-bounded by the mean squared error (MSE) between the teacher's output and BCPD. Consequently, to enhance KD efficacy, the teacher should be trained such that its output is close to BCPD in MSE sense. This paper elucidates that training the teacher model with MSE loss equates to minimizing the MSE between its output and BCPD, aligning with its core responsibility of providing the student with a BCPD estimate closely resembling it in MSE terms. In this respect, through a comprehensive set of experiments, we demonstrate that substituting the conventional teacher trained with cross-entropy loss with one trained using MSE loss in state-of-the-art KD methods consistently boosts the student's accuracy, resulting in improvements of up to 2.6\%.

Updated: 2024-07-25 13:39:11

标题: 如何训练教师模型以实现有效的知识蒸馏

摘要: 最近，有研究表明在知识蒸馏（KD）中，教师的作用是为学生提供真实贝叶斯条件概率密度（BCPD）的估计。值得注意的是，新的发现提出学生的错误率可以由教师的输出和BCPD之间的均方误差（MSE）上界。因此，为了增强KD的效果，教师应该接受训练，使其输出在MSE意义上接近BCPD。本文阐述了用MSE损失训练教师模型相当于最小化其输出和BCPD之间的MSE，与其核心责任相符，即为学生提供在MSE方面与之相似的BCPD估计。在这方面，通过一系列全面的实验，我们证明用MSE损失训练的教师替代传统的用交叉熵损失训练的教师在最先进的KD方法中始终提高学生的准确性，导致最多提高2.6\%。

更新时间: 2024-07-25 13:39:11

领域: cs.LG

下载: http://arxiv.org/abs/2407.18041v1

Anatomizing Deep Learning Inference in Web Browsers

Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the quality of experience (QoE) remain unexplored, and urgently require new QoE measurements beyond traditional ones, e.g., mainly focusing on page load time. To bridge this gap, we make the first comprehensive performance measurement of in-browser inference to date. Our approach proposes new metrics to measure in-browser inference: responsiveness, smoothness, and inference accuracy. Our extensive analysis involves 9 representative DL models across Web browsers of 50 popular PC devices and 20 mobile devices. The results reveal that in-browser inference exhibits a substantial latency gap, averaging 16.9 times slower on CPU and 4.9 times slower on GPU compared to native inference on PC devices. The gap on mobile CPU and mobile GPU is 15.8 times and 7.8 times, respectively. Furthermore, we identify contributing factors to such latency gap, including underutilized hardware instruction sets, inherent overhead in the runtime environment, resource contention within the browser, and inefficiencies in software libraries and GPU abstractions. Additionally, in-browser inference imposes significant memory demands, at times exceeding 334.6 times the size of the DL models themselves, partly attributable to suboptimal memory management. We also observe that in-browser inference leads to a significant 67.2% increase in the time it takes for GUI components to render within Web browsers, significantly affecting the overall user QoE of Web applications reliant on this technology

Updated: 2024-07-25 13:37:16

标题: 在网络浏览器中解剖深度学习推理

摘要: Web应用程序越来越多地采用深度学习（DL）通过浏览器推断，在其中DL推断直接在Web浏览器中执行。浏览器中推断的实际性能及其对体验质量（QoE）的影响尚未被探索，迫切需要超越传统QoE测量的新QoE测量方法，主要集中在页面加载时间上。为了弥补这一差距，我们进行了迄今为止首次全面的浏览器内推断性能测量。我们的方法提出了用于测量浏览器内推断的新指标：响应速度、流畅性和推断准确性。我们的广泛分析涉及50个热门PC设备和20个移动设备的9个代表性DL模型的Web浏览器。结果显示，与PC设备上的本地推断相比，浏览器内推断表现出相当大的延迟差距，平均在CPU上慢16.9倍，在GPU上慢4.9倍。移动CPU和移动GPU上的差距分别为15.8倍和7.8倍。此外，我们确定了导致这种延迟差距的因素，包括未充分利用的硬件指令集、运行时环境中固有的开销、浏览器内的资源争用以及软件库和GPU抽象中的低效率。此外，在浏览器内推断会带来显著的内存需求，有时超过DL模型本身的大小的334.6倍，部分归因于次优的内存管理。我们还观察到，浏览器内推断导致GUI组件渲染所需时间显著增加了67.2％，显著影响了依赖于这项技术的Web应用程序的整体用户QoE。

更新时间: 2024-07-25 13:37:16

领域: cs.LG,cs.PF

下载: http://arxiv.org/abs/2402.05981v2

Peak-Controlled Logits Poisoning Attack in Federated Distillation

Federated Distillation (FD) offers an innovative approach to distributed machine learning, leveraging knowledge distillation for efficient and flexible cross-device knowledge transfer without necessitating the upload of extensive model parameters to a central server. While FD has gained popularity, its vulnerability to poisoning attacks remains underexplored. To address this gap, we previously introduced FDLA (Federated Distillation Logits Attack), a method that manipulates logits communication to mislead and degrade the performance of client models. However, the impact of FDLA on participants with different identities and the effects of malicious modifications at various stages of knowledge transfer remain unexplored. To this end, we present PCFDLA (Peak-Controlled Federated Distillation Logits Attack), an advanced and more stealthy logits poisoning attack method for FD. PCFDLA enhances the effectiveness of FDLA by carefully controlling the peak values of logits to create highly misleading yet inconspicuous modifications. Furthermore, we introduce a novel metric for better evaluating attack efficacy, demonstrating that PCFDLA maintains stealth while being significantly more disruptive to victim models compared to its predecessors. Experimental results across various datasets confirm the superior impact of PCFDLA on model accuracy, solidifying its potential threat in federated distillation systems.

Updated: 2024-07-25 13:36:42

标题: 在联邦蒸馏中的峰值控制逻辑毒化攻击

摘要: Federated Distillation（FD）提供了一种创新的分布式机器学习方法，利用知识蒸馏实现高效灵活的跨设备知识转移，无需将大量模型参数上传到中央服务器。虽然FD已经变得流行起来，但其对毒害攻击的脆弱性仍未得到充分探讨。为了弥补这一空白，我们先前介绍了FDLA（Federated Distillation Logits Attack），一种通过操纵logits通信来误导和降低客户端模型性能的方法。然而，FDLA对具有不同身份的参与者的影响以及在知识传输的各个阶段进行恶意修改的效果仍未被探索。为此，我们提出了PCFDLA（Peak-Controlled Federated Distillation Logits Attack），这是一种更先进更隐蔽的logits毒害攻击方法。PCFDLA通过精心控制logits的峰值增强了FDLA的效果，以创建高度误导但不易察觉的修改。此外，我们引入了一种新的度量标准来更好地评估攻击的有效性，证明PCFDLA在保持隐蔽性的同时，与其前身相比对受害模型造成了更大的破坏。在各种数据集上的实验结果证实了PCFDLA对模型准确性的卓越影响，巩固了其在联邦蒸馏系统中的潜在威胁。

更新时间: 2024-07-25 13:36:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.18039v1

RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

Natural images captured by mobile devices often suffer from multiple types of degradation, such as noise, blur, and low light. Traditional image restoration methods require manual selection of specific tasks, algorithms, and execution sequences, which is time-consuming and may yield suboptimal results. All-in-one models, though capable of handling multiple tasks, typically support only a limited range and often produce overly smooth, low-fidelity outcomes due to their broad data distribution fitting. To address these challenges, we first define a new pipeline for restoring images with multiple degradations, and then introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models. RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration. Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts. Furthermore, the system modular design facilitates the fast integration of new tasks and models, enhancing its flexibility and scalability for various applications.

Updated: 2024-07-25 13:29:37

标题: RestoreAgent：通过多模式大型语言模型的自主图像恢复代理

摘要: 移动设备捕捉的自然图像通常会遭受多种类型的退化，如噪音、模糊和低光。传统的图像恢复方法需要手动选择特定任务、算法和执行顺序，这是耗时的，可能会产生次优结果。虽然一体化模型能够处理多个任务，但通常只支持有限范围，并且由于其广泛的数据分布拟合，往往会产生过度平滑、低保真度的结果。为了解决这些挑战，我们首先为恢复具有多种退化的图像定义了一个新的流程，然后引入了 RestoreAgent，这是一个利用多模态大型语言模型的智能图像恢复系统。RestoreAgent 自主评估输入图像中的退化类型和程度，并通过 (1) 确定适当的恢复任务，(2) 优化任务顺序，(3) 选择最合适的模型，和 (4) 执行恢复来进行恢复。实验结果表明，RestoreAgent 在处理复杂退化方面表现优异，超过人类专家。此外，系统的模块化设计便于快速集成新任务和模型，增强了其在各种应用中的灵活性和可扩展性。

更新时间: 2024-07-25 13:29:37

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.18035v1

AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

Recently, there has been a significant amount of research conducted on 3D hand reconstruction to use various forms of human-computer interaction. However, 3D hand reconstruction in the wild is challenging due to extreme lack of in-the-wild 3D hand datasets. Especially, when hands are in complex pose such as interacting hands, the problems like appearance similarity, self-handed occclusion and depth ambiguity make it more difficult. To overcome these issues, we propose AttentionHand, a novel method for text-driven controllable hand image generation. Since AttentionHand can generate various and numerous in-the-wild hand images well-aligned with 3D hand label, we can acquire a new 3D hand dataset, and can relieve the domain gap between indoor and outdoor scenes. Our method needs easy-to-use four modalities (i.e, an RGB image, a hand mesh image from 3D label, a bounding box, and a text prompt). These modalities are embedded into the latent space by the encoding phase. Then, through the text attention stage, hand-related tokens from the given text prompt are attended to highlight hand-related regions of the latent embedding. After the highlighted embedding is fed to the visual attention stage, hand-related regions in the embedding are attended by conditioning global and local hand mesh images with the diffusion-based pipeline. In the decoding phase, the final feature is decoded to new hand images, which are well-aligned with the given hand mesh image and text prompt. As a result, AttentionHand achieved state-of-the-art among text-to-hand image generation models, and the performance of 3D hand mesh reconstruction was improved by additionally training with hand images generated by AttentionHand.

Updated: 2024-07-25 13:29:32

标题: AttentionHand：基于文本驱动的可控手部图像生成，用于野外3D手部重建

摘要: 最近，已经进行了大量关于3D手部重建的研究，以利用各种形式的人机交互。然而，在野外进行3D手部重建是具有挑战性的，因为缺乏野外3D手部数据集。特别是，当手部处于复杂姿势，如互动手时，问题如外观相似性、自身遮挡和深度模糊性使问题更加困难。为了克服这些问题，我们提出了AttentionHand，这是一种用于基于文本驱动的可控手部图像生成的新方法。由于AttentionHand可以生成各种各样的野外手部图像，与3D手部标签对齐，我们可以获得一个新的3D手部数据集，并且可以缓解室内和室外场景之间的领域差距。我们的方法需要易于使用的四种模态（即RGB图像、从3D标签获得的手部网格图像、边界框和文本提示）。这些模态通过编码阶段嵌入到潜在空间中。然后，通过文本注意阶段，从给定文本提示中的与手部相关的标记会被关注，以突出潜在嵌入的与手部相关区域。在突出的嵌入被送入视觉注意阶段后，通过基于扩散的管道，用全局和局部手部网格图像对手部相关区域进行关注。在解码阶段，最终特征被解码为新的手部图像，这些图像与给定的手部网格图像和文本提示对齐。结果，AttentionHand 在文本到手部图像生成模型中取得了最先进的成果，并且通过额外训练由AttentionHand生成的手部图像，提高了3D手部网格重建的性能。

更新时间: 2024-07-25 13:29:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.18034v1

Estimating the number of clusters of a Block Markov Chain

Clustering algorithms frequently require the number of clusters to be chosen in advance, but it is usually not clear how to do this. To tackle this challenge when clustering within sequential data, we present a method for estimating the number of clusters when the data is a trajectory of a Block Markov Chain. Block Markov Chains are Markov Chains that exhibit a block structure in their transition matrix. The method considers a matrix that counts the number of transitions between different states within the trajectory, and transforms this into a spectral embedding whose dimension is set via singular value thresholding. The number of clusters is subsequently estimated via density-based clustering of this spectral embedding, an approach inspired by literature on the Stochastic Block Model. By leveraging and augmenting recent results on the spectral concentration of random matrices with Markovian dependence, we show that the method is asymptotically consistent - in spite of the dependencies between the count matrix's entries, and even when the count matrix is sparse. We also present a numerical evaluation of our method, and compare it to alternatives.

Updated: 2024-07-25 13:28:12

标题: 估计块马尔可夫链的聚类数量

摘要: 聚类算法通常需要事先选择聚类数量，但通常不清楚如何做到这一点。为了解决在顺序数据内进行聚类时的这一挑战，我们提出了一种方法，用于估计数据为块马尔可夫链轨迹时的聚类数量。块马尔可夫链是在其转移矩阵中表现出块结构的马尔可夫链。该方法考虑一个计算轨迹内不同状态之间转移次数的矩阵，并将其转换为通过奇异值阈值选取维度的谱嵌入。随后，通过对这个谱嵌入进行基于密度的聚类来估计聚类数量，这种方法受到随机块模型文献的启发。通过利用和增强最近关于具有马尔可夫依赖关系的随机矩阵的谱集中性的结果，我们展示了该方法在渐近一致性上的表现 - 尽管计数矩阵条目之间存在依赖关系，即使计数矩阵是稀疏的。我们还对我们的方法进行了数值评估，并将其与其他方法进行了比较。

更新时间: 2024-07-25 13:28:12

领域: stat.ML,cs.LG,math.PR,62H30, 60J10, 60B20, 60J20

下载: http://arxiv.org/abs/2407.18287v1

ECG Arrhythmia Detection Using Disease-specific Attention-based Deep Learning Model

The electrocardiogram (ECG) is one of the most commonly-used tools to diagnose cardiovascular disease in clinical practice. Although deep learning models have achieved very impressive success in the field of automatic ECG analysis, they often lack model interpretability that is significantly important in the healthcare applications. To this end, many schemes such as general-purpose attention mechanism, Grad-CAM technique and ECG knowledge graph were proposed to be integrated with deep learning models. However, they either result in decreased classification performance or do not consist with the one in cardiologists' mind when interpreting ECG. In this study, we propose a novel disease-specific attention-based deep learning model (DANet) for arrhythmia detection from short ECG recordings. The novel idea is to introduce a soft-coding or hard-coding waveform enhanced module into existing deep neural networks, which amends original ECG signals with the guidance of the rule for diagnosis of a given disease type before being fed into the classification module. For the soft-coding DANet, we also develop a learning framework combining self-supervised pre-training with two-stage supervised training. To verify the effectiveness of our proposed DANet, we applied it to the problem of atrial premature contraction detection and the experimental results shows that it demonstrates superior performance compared to the benchmark model. Moreover, it also provides the waveform regions that deserve special attention in the model's decision-making process, allowing it to be a medical diagnostic assistant for physicians.

Updated: 2024-07-25 13:27:10

标题: 使用疾病特定注意力机制深度学习模型进行心电图心律失常检测

摘要: 心电图（ECG）是临床实践中用于诊断心血管疾病的最常用工具之一。虽然深度学习模型在自动心电图分析领域取得了非常显著的成功，但它们往往缺乏在医疗应用中非常重要的模型解释性。为此，许多方案，如通用注意机制、Grad-CAM技术和心电图知识图等被提出以与深度学习模型集成。然而，它们要么导致分类性能下降，要么在解释心电图时与心脏病学家的想法不一致。在本研究中，我们提出了一种新的基于疾病特异性注意力的深度学习模型（DANet）用于从短心电图记录中检测心律失常。这一新颖的想法是将软编码或硬编码波形增强模块引入现有的深度神经网络中，在将原始心电图信号输入分类模块之前，根据给定疾病类型的诊断规则修正原始心电图信号。对于软编码DANet，我们还开发了一个将自监督预训练与两阶段监督训练相结合的学习框架。为了验证我们提出的DANet的有效性，我们将其应用于房性早搏检测问题，实验结果表明与基准模型相比，它表现出卓越的性能。此外，它还提供了在模型决策过程中值得特别关注的波形区域，使其成为医生的医学诊断助手。

更新时间: 2024-07-25 13:27:10

领域: cs.LG

下载: http://arxiv.org/abs/2407.18033v1

Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind

Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by our beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low-level mental states (e.g., intentions, goals) and attributing high-level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed-forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs-driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.

Updated: 2024-07-25 13:15:25

标题: 通过自我观察学习心理状态估计：在深度学习模型中意图和信念表示之间的发展协同作用——心灵理论的模型化学习

摘要: Theory of Mind（ToM），即将信念、意图或心理状态归因于他人的能力，是人类社会互动的关键特征。在复杂环境中，人类感官系统达到其极限时，行为往往受到我们对周围世界状态的信念驱动。访问他人的心理状态，例如信念和意图，可以在自然环境中实现更有效的社会互动。然而，这些变量并不是直接可观察的，这使得理解ToM成为不同领域的一个具有挑战性的研究课题，包括心理学、机器学习和机器人技术。在本文中，我们通过展示学习预测低级心理状态（例如意图、目标）与归因高级心理状态（即信念）之间的发展协同性，为这一主题做出了贡献。具体来说，我们假设通过观察自己在涉及信念的决策过程中的行为，例如在部分可观察环境中，可以学习到信念的归因。使用简单的前馈深度学习模型，我们展示了当学习预测他人的意图和行动时，如果同时学习信念的归因，更准确的预测可以更早地获得。此外，我们还表明，即使被观察者的体现方式与观察者不同，学习表现也会得到改善，并且当观察信念驱动的行为块时，收益更高。我们提出，我们的计算方法可以促进对人类社会认知发展的理解，并对未来能够自主理解、协助并从人类互动伙伴中学习的自适应社会机器人的设计具有重要意义，尤其是在新颖的自然环境和任务中。

更新时间: 2024-07-25 13:15:25

领域: cs.NE,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.18022v1

Quadratic Advantage with Quantum Randomized Smoothing Applied to Time-Series Analysis

As quantum machine learning continues to develop at a rapid pace, the importance of ensuring the robustness and efficiency of quantum algorithms cannot be overstated. Our research presents an analysis of quantum randomized smoothing, how data encoding and perturbation modeling approaches can be matched to achieve meaningful robustness certificates. By utilizing an innovative approach integrating Grover's algorithm, a quadratic sampling advantage over classical randomized smoothing is achieved. This strategy necessitates a basis state encoding, thus restricting the space of meaningful perturbations. We show how constrained $k$-distant Hamming weight perturbations are a suitable noise distribution here, and elucidate how they can be constructed on a quantum computer. The efficacy of the proposed framework is demonstrated on a time series classification task employing a Bag-of-Words pre-processing solution. The advantage of quadratic sample reduction is recovered especially in the regime with large number of samples. This may allow quantum computers to efficiently scale randomized smoothing to more complex tasks beyond the reach of classical methods.

Updated: 2024-07-25 13:15:16

标题: "利用量子随机平滑技术在时间序列分析中的二次优势"

摘要: 随着量子机器学习的快速发展，确保量子算法的稳健性和效率至关重要。我们的研究分析了量子随机平滑，以及如何匹配数据编码和扰动建模方法以实现有意义的稳健性证书。通过利用集成Grover算法的创新方法，实现了比经典随机平滑更高的二次采样优势。这种策略需要基态编码，从而限制了有意义扰动的空间。我们展示了约束k-距离Hamming权重扰动在这里是一个合适的噪声分布，并阐明了它们如何在量子计算机上构建。所提出的框架的有效性在使用词袋预处理解决方案进行时间序列分类任务时得到了证明。特别是在大量样本的情况下，二次样本减少的优势得以恢复。这可能使量子计算机能够有效地将随机平滑扩展到超出经典方法能力范围的更复杂任务。

更新时间: 2024-07-25 13:15:16

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.18021v1

Resolving Discrepancies in Compute-Optimal Scaling of Language Models

Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scaling law on two datasets (OpenWebText2 and RefinedWeb) and identifying three factors causing the difference: last layer computational cost, warmup duration, and scale-dependent optimizer tuning. With these factors corrected, we obtain excellent agreement with the Hoffmann et al. (i.e., "Chinchilla") scaling law. Counter to a hypothesis of Hoffmann et al., we find that careful learning rate decay is not essential for the validity of their scaling law. As a secondary result, we derive scaling laws for the optimal learning rate and batch size, finding that tuning the AdamW $\beta_2$ parameter is essential at lower batch sizes.

Updated: 2024-07-25 13:09:18

标题: 解决语言模型计算最优缩放的差异

摘要: Kaplan等人和Hoffmann等人为计算预算的函数开发了有影响力的最佳模型规模的标度定律，但这些定律产生了明显不同的预测。我们通过在两个数据集（OpenWebText2和RefinedWeb）上重现Kaplan的标度定律，并确定了导致差异的三个因素：最后一层的计算成本、预热持续时间和尺度相关的优化器调整。在纠正这些因素后，我们与Hoffmann等人（即“Chinchilla”）的标度定律取得了出色的一致性。与Hoffmann等人的假设相反，我们发现精心调整学习率衰减并不是他们标度定律有效性的关键。作为次要结果，我们推导了最佳学习率和批大小的标度定律，发现在较低的批大小下调整AdamW的β2参数是必不可少的。

更新时间: 2024-07-25 13:09:18

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.19146v2

A Sensitivity Analysis of Cellular Automata and Heterogeneous Topology Networks: Partially-Local Cellular Automata and Homogeneous Homogeneous Random Boolean Networks

Elementary Cellular Automata (ECA) are a well-studied computational universe that is, despite its simple configurations, capable of impressive computational variety. Harvesting this computation in a useful way has historically shown itself to be difficult, but if combined with reservoir computing (RC), this becomes much more feasible. Furthermore, RC and ECA enable energy-efficient AI, making the combination a promising concept for Edge AI. In this work, we contrast ECA to substrates of Partially-Local CA (PLCA) and Homogeneous Homogeneous Random Boolean Networks (HHRBN). They are, in comparison, the topological heterogeneous counterparts of ECA. This represents a step from ECA towards more biological-plausible substrates. We analyse these substrates by testing on an RC benchmark (5-bit memory), using Temporal Derrida plots to estimate the sensitivity and assess the defect collapse rate. We find that, counterintuitively, disordered topology does not necessarily mean disordered computation. There are countering computational "forces" of topology imperfections leading to a higher collapse rate (order) and yet, if accounted for, an increased sensitivity to the initial condition. These observations together suggest a shrinking critical range.

Updated: 2024-07-25 13:08:24

标题: 细胞自动机和异质拓扑网络的敏感性分析：部分局部细胞自动机和均匀随机布尔网络

摘要: 基本元胞自动机（ECA）是一个经过深入研究的计算宇宙，尽管其简单的配置，但具备令人印象深刻的计算多样性。历史上，以有用的方式利用这种计算一直被证明是困难的，但如果与储水计算（RC）结合，这将变得更为可行。此外，RC和ECA使得能源高效的人工智能成为可能，将这两者结合起来成为边缘人工智能的一个有前景的概念。在这项工作中，我们将ECA与部分局部CA（PLCA）和同质随机布尔网络（HHRBN）的基质进行对比。相比之下，它们是ECA的拓扑异质对应物。这代表了从ECA向更具生物学可信性的基质迈出的一步。我们通过在RC基准测试（5位存储器）上进行测试，利用时间Derrida图来估计灵敏度并评估缺陷崩溃率来分析这些基质。我们发现，令人困惑的是，无序的拓扑结构并不一定意味着无序的计算。存在着拓扑不完美导致更高的崩溃率（秩序）的相互制约的计算“力量”，然而，如果考虑到，这会增加对初始条件的灵敏度。这些观察结果共同表明了一个收缩的临界范围。

更新时间: 2024-07-25 13:08:24

领域: nlin.CG,cs.AI,cs.ET,cs.NE

下载: http://arxiv.org/abs/2407.18017v1

Self-Supervision Improves Diffusion Models for Tabular Data Imputation

The ubiquity of missing data has sparked considerable attention and focus on tabular data imputation methods. Diffusion models, recognized as the cutting-edge technique for data generation, demonstrate significant potential in tabular data imputation tasks. However, in pursuit of diversity, vanilla diffusion models often exhibit sensitivity to initialized noises, which hinders the models from generating stable and accurate imputation results. Additionally, the sparsity inherent in tabular data poses challenges for diffusion models in accurately modeling the data manifold, impacting the robustness of these models for data imputation. To tackle these challenges, this paper introduces an advanced diffusion model named Self-supervised imputation Diffusion Model (SimpDM for brevity), specifically tailored for tabular data imputation tasks. To mitigate sensitivity to noise, we introduce a self-supervised alignment mechanism that aims to regularize the model, ensuring consistent and stable imputation predictions. Furthermore, we introduce a carefully devised state-dependent data augmentation strategy within SimpDM, enhancing the robustness of the diffusion model when dealing with limited data. Extensive experiments demonstrate that SimpDM matches or outperforms state-of-the-art imputation methods across various scenarios.

Updated: 2024-07-25 13:06:30

标题: 自我监督改进了用于表格数据插补的扩散模型

摘要: 缺失数据的普遍存在引起了人们对表格数据插补方法的关注和重视。扩散模型被认为是数据生成的前沿技术，在表格数据插补任务中展示了显著的潜力。然而，为了追求多样性，普通的扩散模型通常对初始化的噪声敏感，这阻碍了模型生成稳定和准确的插补结果。此外，表格数据固有的稀疏性给扩散模型在准确建模数据流形方面带来了挑战，影响了这些模型在数据插补方面的鲁棒性。为了应对这些挑战，本文引入了一种名为自监督插补扩散模型（简称SimpDM）的先进扩散模型，专门针对表格数据插补任务。为了减轻对噪声的敏感性，我们引入了一种自监督对齐机制，旨在规范化模型，确保一致和稳定的插补预测。此外，我们在SimpDM中引入了精心设计的状态依赖数据增强策略，增强了扩散模型在处理有限数据时的鲁棒性。大量实验证明，SimpDM在各种场景下与或优于最先进的插补方法。

更新时间: 2024-07-25 13:06:30

领域: cs.LG

下载: http://arxiv.org/abs/2407.18013v1

A Non-Expert's Introduction to Data Ethics for Mathematicians

I give a short introduction to data ethics. I begin with some background information and societal context for data ethics. I then discuss data ethics in mathematical-science education and indicate some available course material. I briefly highlight a few efforts -- at my home institution and elsewhere -- on data ethics, society, and social good. I then discuss open data in research, research replicability and some other ethical issues in research, and the tension between privacy and open data and code, and a few controversial studies and reactions to studies. I then discuss ethical principles, institutional review boards, and a few other considerations in the scientific use of human data. I then briefly survey a variety of research and lay articles that are relevant to data ethics and data privacy. I conclude with a brief summary and some closing remarks. My focal audience is mathematicians, but I hope that this chapter will also be useful to others. I am not an expert about data ethics, and this chapter provides only a starting point on this wide-ranging topic. I encourage you to examine the resources that I discuss and to reflect carefully on data ethics, its role in mathematics education, and the societal implications of data and data analysis. As data and technology continue to evolve, I hope that such careful reflection will continue throughout your life.

Updated: 2024-07-25 13:05:25

标题: 一个非专家对数学家的数据伦理介绍

摘要: 我对数据伦理学进行了简短介绍。我首先介绍了一些关于数据伦理学的背景信息和社会背景。然后我讨论了数学科学教育中的数据伦理学，并指出了一些可用的课程材料。我简要介绍了一些在我的母校和其他地方关于数据伦理学、社会和社会利益的努力。然后我讨论了研究中的开放数据、研究可复制性以及其他一些研究中的伦理问题，以及隐私和开放数据和代码之间的张力，以及一些有争议的研究和研究反应。然后我讨论了伦理原则、机构审查委员会和在科学使用人类数据中的一些其他考虑因素。然后我简要调查了一系列与数据伦理学和数据隐私相关的研究和专业文章。最后，我总结了一下，并做出一些结尾的话。我的主要受众是数学家，但我希望这一章节对其他人也有用。我不是数据伦理学专家，这一章节仅提供了这个广泛主题的一个起点。我鼓励你们查阅我所讨论的资源，并仔细思考数据伦理学在数学教育中的作用以及数据和数据分析的社会影响。随着数据和技术的不断发展，我希望你们会在一生中继续进行这种谨慎的思考。

更新时间: 2024-07-25 13:05:25

领域: math.HO,cs.CY,cs.LG,physics.soc-ph,stat.ML

下载: http://arxiv.org/abs/2201.07794v5

HANNA: Hard-constraint Neural Network for Consistent Activity Coefficient Prediction

We present the first hard-constraint neural network for predicting activity coefficients (HANNA), a thermodynamic mixture property that is the basis for many applications in science and engineering. Unlike traditional neural networks, which ignore physical laws and result in inconsistent predictions, our model is designed to strictly adhere to all thermodynamic consistency criteria. By leveraging deep-set neural networks, HANNA maintains symmetry under the permutation of the components. Furthermore, by hard-coding physical constraints in the network architecture, we ensure consistency with the Gibbs-Duhem equation and in modeling the pure components. The model was trained and evaluated on 317,421 data points for activity coefficients in binary mixtures from the Dortmund Data Bank, achieving significantly higher prediction accuracies than the current state-of-the-art model UNIFAC. Moreover, HANNA only requires the SMILES of the components as input, making it applicable to any binary mixture of interest. HANNA is fully open-source and available for free use.

Updated: 2024-07-25 13:05:00

标题: HANNA：用于一致活度系数预测的硬约束神经网络

摘要: 我们提出了第一个用于预测活度系数（HANNA）的硬约束神经网络，这是许多科学和工程应用的基础热力学混合物性质。与传统的忽略物理定律并导致不一致预测的神经网络不同，我们的模型旨在严格遵守所有热力学一致性标准。通过利用深度集合神经网络，HANNA在组分排列的置换下保持对称性。此外，通过在网络架构中硬编码物理约束，我们确保与Gibbs-Duhem方程和建模纯组分的一致性。该模型在Dortmund数据库中针对二元混合物的活度系数进行了训练和评估，实现了比当前最先进模型UNIFAC更高的预测准确性。此外，HANNA仅需要组分的SMILES作为输入，使其适用于任何感兴趣的二元混合物。HANNA是完全开源的，可免费使用。

更新时间: 2024-07-25 13:05:00

领域: cs.LG

下载: http://arxiv.org/abs/2407.18011v1

Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning

In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to a very restricted library of components, which in turn hampers the expressiveness of the networks. This paper proposes a method to construct equivariant policies and invariant value functions without specialized neural network components, which we term equivariant ensembles. We further add a regularization term for adding inductive bias during training. In a map-based path planning case study, we show how equivariant ensembles and regularization benefit sample efficiency and performance.

Updated: 2024-07-25 12:56:35

标题: 基于地图的路径规划中的等变集合和正则化强化学习

摘要: 在强化学习（RL）中，利用环境对称性可以显著增强效率、鲁棒性和性能。然而，确保深度RL策略和价值网络分别是等变和不变的，以利用这些对称性是一个重大挑战。相关的工作试图通过构建等变和不变的网络来设计网络，但这限制了它们的组件库，从而影响了网络的表达能力。本文提出了一种构建等变策略和不变价值函数的方法，而无需专门的神经网络组件，我们称之为等变集合。我们进一步在训练过程中添加正则化项以增加归纳偏差。在基于地图的路径规划案例研究中，我们展示了等变集合和正则化如何提高样本效率和性能。

更新时间: 2024-07-25 12:56:35

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2403.12856v2

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation

The scene graph generation (SGG) task involves detecting objects within an image and predicting predicates that represent the relationships between the objects. However, in SGG benchmark datasets, each subject-object pair is annotated with a single predicate even though a single predicate may exhibit diverse semantics (i.e., semantic diversity), existing SGG models are trained to predict the one and only predicate for each pair. This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate, thus leading to biased predictions. In this paper, we propose a novel model-agnostic Semantic Diversity-aware Prototype-based Learning (DPL) framework that enables unbiased predictions based on the understanding of the semantic diversity of predicates. Specifically, DPL learns the regions in the semantic space covered by each predicate to distinguish among the various different semantics that a single predicate can represent. Extensive experiments demonstrate that our proposed model-agnostic DPL framework brings significant performance improvement on existing SGG models, and also effectively understands the semantic diversity of predicates.

Updated: 2024-07-25 12:54:52

标题: 基于语义多样性的原型学习技术用于无偏见场景图生成

摘要: 场景图生成（SGG）任务涉及在图像中检测对象并预测表示对象之间关系的谓词。然而，在SGG基准数据集中，每个主体-客体对都用单个谓词注释，即使单个谓词可能展示不同的语义（即语义多样性），现有的SGG模型被训练为预测每个对应的唯一谓词。这反过来导致SGG模型忽视谓词可能存在的语义多样性，从而导致偏见预测。在本文中，我们提出了一种新颖的基于模型无关的语义多样性感知原型学习（DPL）框架，使得可以根据对谓词语义多样性的理解进行无偏见的预测。具体来说，DPL学习了语义空间中每个谓词所覆盖的区域，以区分单个谓词可以代表的各种不同语义。大量实验证明，我们提出的基于模型无关的DPL框架显著提高了现有SGG模型的性能，并有效地理解了谓词的语义多样性。

更新时间: 2024-07-25 12:54:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.15396v2

Network Inversion of Convolutional Neural Nets

Neural networks have emerged as powerful tools across various applications, yet their decision-making process often remains opaque, leading to them being perceived as "black boxes." This opacity raises concerns about their interpretability and reliability, especially in safety-critical scenarios. Network inversion techniques offer a solution by allowing us to peek inside these black boxes, revealing the features and patterns learned by the networks behind their decision-making processes and thereby provide valuable insights into how neural networks arrive at their conclusions, making them more interpretable and trustworthy. This paper presents a simple yet effective approach to network inversion using a carefully conditioned generator that learns the data distribution in the input space of the trained neural network, enabling the reconstruction of inputs that would most likely lead to the desired outputs. To capture the diversity in the input space for a given output, instead of simply revealing the conditioning labels to the generator, we hideously encode the conditioning label information into vectors, further exemplified by heavy dropout in the generation process and minimisation of cosine similarity between the features corresponding to the generated images. The paper concludes with immediate applications of Network Inversion including in interpretability, explainability and generation of adversarial samples.

Updated: 2024-07-25 12:53:21

标题: 卷积神经网络的网络反演

摘要: 神经网络已经成为各种应用中强大的工具，然而它们的决策过程往往仍然不透明，导致人们认为它们是“黑盒子”。这种不透明性引发了对其可解释性和可靠性的担忧，特别是在安全关键场景中。网络反演技术通过允许我们窥视这些黑盒子的内部，揭示了网络在决策过程中学习的特征和模式，从而为我们提供了宝贵的洞见，了解神经网络是如何得出结论的，使其更具可解释性和可信度。本文提出了一种简单而有效的网络反演方法，使用经过精心调节的生成器学习训练好的神经网络输入空间中的数据分布，从而能够重建最有可能导致所需输出的输入。为了捕捉给定输出的输入空间中的多样性，我们并不简单地将条件标签直接提供给生成器，而是将条件标签信息恶意编码到向量中，进一步通过生成过程中的大量丢弃和最小化生成图像对应特征之间的余弦相似度来加以说明。本文最后总结了网络反演的即时应用，包括解释性、可解释性和对抗样本的生成。

更新时间: 2024-07-25 12:53:21

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.18002v1

Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot Interactions

Autonomous systems face the intricate challenge of navigating unpredictable environments and interacting with external objects. The successful integration of robotic agents into real-world situations hinges on their perception capabilities, which involve amalgamating world models and predictive skills. Effective perception models build upon the fusion of various sensory modalities to probe the surroundings. Deep learning applied to raw sensory modalities offers a viable option. However, learning-based perceptive representations become difficult to interpret. This challenge is particularly pronounced in soft robots, where the compliance of structures and materials makes prediction even harder. Our work addresses this complexity by harnessing a generative model to construct a multi-modal perception model for soft robots and to leverage proprioceptive and visual information to anticipate and interpret contact interactions with external objects. A suite of tools to interpret the perception model is furnished, shedding light on the fusion and prediction processes across multiple sensory inputs after the learning phase. We will delve into the outlooks of the perception model and its implications for control purposes.

Updated: 2024-07-25 12:49:12

标题: 朝向可解释的视触觉预测模型：软体机器人相互作用

摘要: 自主系统面临着在不可预测环境中导航以及与外部对象进行交互的复杂挑战。将机器人代理成功整合到现实世界中的关键在于它们的感知能力，这涉及到融合世界模型和预测技能。有效的感知模型建立在融合各种感官模态以探索周围环境。深度学习应用于原始感官模态提供了一种可行的选择。然而，基于学习的感知表示变得难以解释。这种挑战在软机器人中尤为显著，因为结构和材料的顺应性使预测变得更加困难。我们的工作通过利用生成模型为软机器人构建多模感知模型，并利用本体感知和视觉信息来预测和解释与外部对象的接触交互来应对这种复杂性。提供了一套工具来解释感知模型，揭示了学习阶段后多个感官输入之间的融合和预测过程。我们将深入探讨感知模型的前景及其对控制目的的影响。

更新时间: 2024-07-25 12:49:12

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.12197v2

Lightweight Industrial Cohorted Federated Learning for Heterogeneous Assets

Federated Learning (FL) is the most widely adopted collaborative learning approach for training decentralized Machine Learning (ML) models by exchanging learning between clients without sharing the data and compromising privacy. However, since great data similarity or homogeneity is taken for granted in all FL tasks, FL is still not specifically designed for the industrial setting. Rarely this is the case in industrial data because there are differences in machine type, firmware version, operational conditions, environmental factors, and hence, data distribution. Albeit its popularity, it has been observed that FL performance degrades if the clients have heterogeneous data distributions. Therefore, we propose a Lightweight Industrial Cohorted FL (LICFL) algorithm that uses model parameters for cohorting without any additional on-edge (clientlevel) computations and communications than standard FL and mitigates the shortcomings from data heterogeneity in industrial applications. Our approach enhances client-level model performance by allowing them to collaborate with similar clients and train more specialized or personalized models. Also, we propose an adaptive aggregation algorithm that extends the LICFL to Adaptive LICFL (ALICFL) for further improving the global model performance and speeding up the convergence. Through numerical experiments on real-time data, we demonstrate the efficacy of the proposed algorithms and compare the performance with existing approaches.

Updated: 2024-07-25 12:48:56

标题: 轻量级工业共同联邦学习用于异构资产

摘要: 联邦学习（FL）是最广泛采用的协作学习方法，用于通过在客户端之间交换学习来训练去中心化的机器学习（ML）模型，而不共享数据并损害隐私。然而，由于在所有FL任务中都默认存在很大的数据相似性或同质性，因此FL仍然并非专门为工业环境设计。在工业数据中很少出现这种情况，因为机器类型、固件版本、操作条件、环境因素等存在差异，因此数据分布也不同。尽管FL很受欢迎，但已观察到如果客户端具有异构数据分布，则FL性能会下降。因此，我们提出一种轻量级工业分组FL（LICFL）算法，通过使用模型参数进行分组，而无需比标准FL更多的客户端级计算和通信，并减轻工业应用中由数据异质性引起的缺点。我们的方法通过允许客户端与类似客户端合作并训练更专业或个性化的模型来提高客户端级模型性能。此外，我们提出了一种自适应聚合算法，将LICFL扩展为自适应LICFL（ALICFL），以进一步提高全局模型性能并加快收敛速度。通过对实时数据进行数值实验，我们展示了提出算法的有效性，并将其性能与现有方法进行比较。

更新时间: 2024-07-25 12:48:56

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.17999v1

iNNspector: Visual, Interactive Deep Model Debugging

Deep learning model design, development, and debugging is a process driven by best practices, guidelines, trial-and-error, and the personal experiences of model developers. At multiple stages of this process, performance and internal model data can be logged and made available. However, due to the sheer complexity and scale of this data and process, model developers often resort to evaluating their model performance based on abstract metrics like accuracy and loss. We argue that a structured analysis of data along the model's architecture and at multiple abstraction levels can considerably streamline the debugging process. Such a systematic analysis can further connect the developer's design choices to their impacts on the model behavior, facilitating the understanding, diagnosis, and refinement of deep learning models. Hence, in this paper, we (1) contribute a conceptual framework structuring the data space of deep learning experiments. Our framework, grounded in literature analysis and requirements interviews, captures design dimensions and proposes mechanisms to make this data explorable and tractable. To operationalize our framework in a ready-to-use application, we (2) present the iNNspector system. iNNspector enables tracking of deep learning experiments and provides interactive visualizations of the data on all levels of abstraction from multiple models to individual neurons. Finally, we (3) evaluate our approach with three real-world use-cases and a user study with deep learning developers and data analysts, proving its effectiveness and usability.

Updated: 2024-07-25 12:48:41

标题: iNNspector: 可视化、交互式深度模型调试

摘要: 深度学习模型的设计、开发和调试是一个由最佳实践、指南、试错和模型开发者的个人经验驱动的过程。在这个过程的多个阶段，性能和内部模型数据可以被记录并提供。然而，由于数据和过程的复杂性和规模，模型开发者经常依靠抽象指标如准确度和损失来评估他们的模型性能。我们认为，对数据进行结构化分析，沿着模型的架构和多个抽象级别，可以大大简化调试过程。这样的系统分析进一步可以将开发者的设计选择与其对模型行为的影响联系起来，促进对深度学习模型的理解、诊断和改进。因此，在本文中，我们（1）提出了一个概念框架，用于结构化深度学习实验的数据空间。我们的框架，基于文献分析和需求访谈，捕捉设计维度并提出使这些数据可探索和可处理的机制。为了将我们的框架操作化为一个可立即使用的应用程序，我们（2）提出了iNNspector系统。iNNspector能够跟踪深度学习实验，并提供交互式可视化数据，涵盖从多个模型到单个神经元的各个抽象级别。最后，我们（3）通过三个真实用例和与深度学习开发者和数据分析师的用户研究来评估我们的方法，证明其有效性和可用性。

更新时间: 2024-07-25 12:48:41

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.17998v1

On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures

In this work we evaluate the utility of synthetic data for training automatic speech recognition (ASR). We use the ASR training data to train a text-to-speech (TTS) system similar to FastSpeech-2. With this TTS we reproduce the original training data, training ASR systems solely on synthetic data. For ASR, we use three different architectures, attention-based encoder-decoder, hybrid deep neural network hidden Markov model and a Gaussian mixture hidden Markov model, showing the different sensitivity of the models to synthetic data generation. In order to extend previous work, we present a number of ablation studies on the effectiveness of synthetic vs. real training data for ASR. In particular we focus on how the gap between training on synthetic and real data changes by varying the speaker embedding or by scaling the model size. For the latter we show that the TTS models generalize well, even when training scores indicate overfitting.

Updated: 2024-07-25 12:44:45

标题: 关于纯合成训练数据对不同自动语音识别架构的影响

摘要: 在这项工作中，我们评估了合成数据在训练自动语音识别（ASR）中的实用性。我们使用ASR训练数据来训练一个类似于FastSpeech-2的文本到语音（TTS）系统。通过这个TTS，我们复制了原始训练数据，仅使用合成数据训练ASR系统。对于ASR，我们使用了三种不同的架构，包括基于注意力的编码解码器、混合深度神经网络隐马尔可夫模型和高斯混合隐马尔可夫模型，展示了模型对合成数据生成的不同敏感性。为了扩展以前的工作，我们进行了许多消融研究，评估合成与真实训练数据对ASR的有效性。特别是，我们关注训练在合成和真实数据之间的差距如何随着扬声器嵌入或模型规模的变化而变化。对于后者，我们表明TTS模型在训练得分表明过拟合的情况下也能很好地推广。

更新时间: 2024-07-25 12:44:45

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2407.17997v1

Amortized Active Learning for Nonparametric Functions

Active learning (AL) is a sequential learning scheme aiming to select the most informative data. AL reduces data consumption and avoids the cost of labeling large amounts of data. However, AL trains the model and solves an acquisition optimization for each selection. It becomes expensive when the model training or acquisition optimization is challenging. In this paper, we focus on active nonparametric function learning, where the gold standard Gaussian process (GP) approaches suffer from cubic time complexity. We propose an amortized AL method, where new data are suggested by a neural network which is trained up-front without any real data (Figure 1). Our method avoids repeated model training and requires no acquisition optimization during the AL deployment. We (i) utilize GPs as function priors to construct an AL simulator, (ii) train an AL policy that can zero-shot generalize from simulation to real learning problems of nonparametric functions and (iii) achieve real-time data selection and comparable learning performances to time-consuming baseline methods.

Updated: 2024-07-25 12:38:08

标题: 非参数函数的摊销主动学习

摘要: 主动学习（AL）是一种连续学习方案，旨在选择最具信息量的数据。AL减少了数据消耗，并避免了标记大量数据的成本。然而，AL为每次选择训练模型并解决采集优化，当模型训练或采集优化具有挑战性时，成本变得昂贵。本文关注主动非参数函数学习，其中传统的高标准高斯过程（GP）方法受到立方时间复杂度的限制。我们提出了一种摊销的AL方法，其中新数据由一个神经网络建议，该神经网络在没有任何真实数据的情况下进行了预先训练。我们的方法避免了重复的模型训练，并在AL部署过程中不需要采集优化。我们（i）利用GP作为函数先验构建AL模拟器，（ii）训练一个AL策略，可以从模拟到真实学习问题的非参数函数实现零-shot泛化，（iii）实现实时数据选择，并实现与耗时基准方法相当的学习性能。

更新时间: 2024-07-25 12:38:08

领域: cs.LG

下载: http://arxiv.org/abs/2407.17992v1

LightPHE: Integrating Partially Homomorphic Encryption into Python with Extensive Cloud Environment Evaluations

Homomorphic encryption enables computations on encrypted data without accessing private keys, enhancing security in cloud environments. Without this technology, updates need to be performed on-premises or require transmitting private keys to the cloud, increasing security risks. Fully homomorphic encryption (FHE) supports both additive and multiplicative operations on ciphertexts, while partially homomorphic encryption (PHE) supports either addition or multiplication, offering a more efficient and practical solution. This study introduces LightPHE, a lightweight hybrid PHE framework for Python, designed to address the lack of existing PHE libraries. LightPHE integrates multiple PHE algorithms with a modular and extensible design, ensuring robustness and usability for rapid prototyping and secure application development. Cloud-based experiments were conducted on Google Colab (Normal, A100 GPU, L4 GPU, T4 High RAM, TPU2) and Microsoft Azure Spark to evaluate LightPHE's performance and scalability. Key metrics such as key generation, encryption, decryption, and homomorphic operations were assessed. Results showed LightPHE's superior performance in high-computation environments like Colab A100 GPU and TPU2, while also offering viable options for cost-effective setups like Colab Normal and Azure Spark. Comparative analyses demonstrated LightPHE's efficiency and scalability, making it suitable for various applications. The benchmarks offer insights into selecting appropriate cloud environments based on performance needs, highlighting LightPHE's potential to advance homomorphic encryption for secure and efficient cloud-based data processing.

Updated: 2024-07-25 12:23:31

标题: LightPHE：将部分同态加密集成到具有广泛云环境评估的Python中

摘要: 同态加密使对加密数据进行计算而无需访问私钥，增强了云环境中的安全性。没有这项技术，更新需要在本地进行，或者需要将私钥传输到云端，增加了安全风险。全同态加密（FHE）支持对密文进行加法和乘法操作，而部分同态加密（PHE）支持加法或乘法，提供了更高效和实用的解决方案。本研究介绍了LightPHE，一个轻量级混合PHE框架为Python，旨在解决现有PHE库的缺乏。LightPHE集成了多种PHE算法，具有模块化和可扩展的设计，确保了在快速原型设计和安全应用开发中的稳健性和可用性。在Google Colab（Normal，A100 GPU，L4 GPU，T4 High RAM，TPU2）和Microsoft Azure Spark上进行了基于云的实验，以评估LightPHE的性能和可扩展性。评估了关键指标，如密钥生成、加密、解密和同态操作。结果显示，在高计算环境（如Colab A100 GPU和TPU2）中，LightPHE表现出优越的性能，同时还提供了适用于成本效益的设置选择，如Colab Normal和Azure Spark。比较分析展示了LightPHE的效率和可扩展性，使其适用于各种应用。基准测试提供了根据性能需求选择适当云环境的见解，突出了LightPHE在安全高效的基于云的数据处理中的潜力。

更新时间: 2024-07-25 12:23:31

领域: cs.CR

下载: http://arxiv.org/abs/2408.05219v1

Expressivity and Generalization: Fragment-Biases for Molecular GNNs

Although recent advances in higher-order Graph Neural Networks (GNNs) improve the theoretical expressiveness and molecular property predictive performance, they often fall short of the empirical performance of models that explicitly use fragment information as inductive bias. However, for these approaches, there exists no theoretic expressivity study. In this work, we propose the Fragment-WL test, an extension to the well-known Weisfeiler & Leman (WL) test, which enables the theoretic analysis of these fragment-biased GNNs. Building on the insights gained from the Fragment-WL test, we develop a new GNN architecture and a fragmentation with infinite vocabulary that significantly boosts expressiveness. We show the effectiveness of our model on synthetic and real-world data where we outperform all GNNs on Peptides and have 12% lower error than all GNNs on ZINC and 34% lower error than other fragment-biased models. Furthermore, we show that our model exhibits superior generalization capabilities compared to the latest transformer-based architectures, positioning it as a robust solution for a range of molecular modeling tasks.

Updated: 2024-07-25 12:23:26

标题: 表现力和泛化性：分子GNN的片段偏好

摘要: 虽然最近高阶图神经网络（GNNs）的进展提高了理论表达能力和分子属性预测性能，但它们往往无法达到显式使用片段信息作为归纳偏差的模型的经验性能。然而，针对这些方法，尚不存在理论表达能力研究。在这项工作中，我们提出了Fragment-WL测试，这是著名的Weisfeiler＆Leman（WL）测试的一个扩展，它可以实现对这些受片段偏置的GNNs的理论分析。基于Fragment-WL测试所得的见解，我们开发了一种新的GNN架构和具有无限词汇量的分割方法，显著提高了表达能力。我们展示了我们的模型在合成和真实数据上的有效性，在肽类方面我们优于所有GNNs，并且比ZINC上所有GNNs的误差低12％，比其他受片段偏置的模型的误差低34％。此外，我们展示了我们的模型相比最新的基于Transformer的架构具有更优越的泛化能力，使其成为一种强大的解决方案，适用于一系列分子建模任务。

更新时间: 2024-07-25 12:23:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.08210v2

Personalized and Context-aware Route Planning for Edge-assisted Vehicles

Conventional route planning services typically offer the same routes to all drivers, focusing primarily on a few standardized factors such as travel distance or time, overlooking individual driver preferences. With the inception of autonomous vehicles expected in the coming years, where vehicles will rely on routes decided by such planners, there arises a need to incorporate the specific preferences of each driver, ensuring personalized navigation experiences. In this work, we propose a novel approach based on graph neural networks (GNNs) and deep reinforcement learning (DRL), aimed at customizing routes to suit individual preferences. By analyzing the historical trajectories of individual drivers, we classify their driving behavior and associate it with relevant road attributes as indicators of driver preferences. The GNN is capable of representing the road network as graph-structured data effectively, while DRL is capable of making decisions utilizing reward mechanisms to optimize route selection with factors such as travel costs, congestion level, and driver satisfaction. We evaluate our proposed GNN-based DRL framework using a real-world road network and demonstrate its ability to accommodate driver preferences, offering a range of route options tailored to individual drivers. The results indicate that our framework can select routes that accommodate driver's preferences with up to a 17% improvement compared to a generic route planner, and reduce the travel time by 33% (afternoon) and 46% (evening) relatively to the shortest distance-based approach.

Updated: 2024-07-25 12:14:12

标题: 边缘辅助车辆的个性化和上下文感知路径规划

摘要: 传统的路线规划服务通常为所有驾驶员提供相同的路线，主要关注一些标准化因素，如行驶距离或时间，忽视了个体驾驶员的偏好。随着预计未来将出现自动驾驶车辆，车辆将依赖这些规划者决定的路线，有必要将每个驾驶员的具体偏好纳入其中，确保个性化导航体验。在这项工作中，我们提出了一种基于图神经网络（GNNs）和深度强化学习（DRL）的新方法，旨在根据个体偏好定制路线。通过分析个体驾驶员的历史轨迹，我们对其驾驶行为进行分类，并将其与相关道路属性联系起来，作为驾驶员偏好的指标。GNN能有效地将道路网络表示为图结构化数据，而DRL能够利用奖励机制做出决策，优化路线选择，考虑行驶成本、拥堵水平和驾驶员满意度等因素。我们使用真实世界的道路网络评估了我们提出的基于GNN的DRL框架，并展示了它能够满足驾驶员的偏好，提供多种针对个体驾驶员定制的路线选项。结果表明，与通用路线规划器相比，我们的框架能够选择符合驾驶员偏好的路线，改善了最多17％，并相对于基于最短距离的方法，下午减少了33％的行驶时间，晚上减少了46％。

更新时间: 2024-07-25 12:14:12

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.17980v1

Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications

Fine-tuning Large Language Models (LLMs) is an effective method to enhance their performance on downstream tasks. However, choosing the appropriate setting of tuning hyperparameters (HPs) is a labor-intensive and computationally expensive process. Here, we provide recommended HP configurations for practical use-cases that represent a better starting point for practitioners, when considering two SOTA LLMs and two commonly used tuning methods. We describe Coverage-based Search (CBS), a process for ranking HP configurations based on an offline extensive grid search, such that the top ranked configurations collectively provide a practical robust recommendation for a wide range of datasets and domains. We focus our experiments on Llama-3-8B and Mistral-7B, as well as full fine-tuning and LoRa, conducting a total of > 10,000 tuning experiments. Our results suggest that, in general, Llama-3-8B and LoRA should be preferred, when possible. Moreover, we show that for both models and tuning methods, exploring only a few HP configurations, as recommended by our analysis, can provide excellent results in practice, making this work a valuable resource for practitioners.

Updated: 2024-07-25 12:07:55

标题: 请您耐心等待：超参数在实际应用中对LLM调优的影响的实证研究

摘要: 精细调整大型语言模型（LLMs）是增强它们在下游任务中性能的有效方法。然而，选择调整超参数（HPs）的适当设置是一项耗时且计算成本高昂的过程。在这里，我们为实际用例提供了推荐的HP配置，这些配置代表了从业者更好的起点，考虑了两种SOTA LLMs和两种常用的调整方法。我们描述了基于覆盖率的搜索（CBS），这是一个基于离线广泛的网格搜索对HP配置进行排名的过程，使得排名靠前的配置共同提供了对于各种数据集和领域的实用稳健建议。我们关注Llama-3-8B和Mistral-7B，以及完全的精细调整和LoRa，总共进行了超过10,000次调整实验。我们的结果表明，总体而言，Llama-3-8B和LoRA应尽可能被优先选择。此外，我们展示了对于两种模型和调整方法，仅探索少量HP配置，如我们的分析所推荐的那样，在实践中可以提供出色的结果，使得这项工作对从业者来说是一个有价值的资源。

更新时间: 2024-07-25 12:07:55

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.18990v1

Particle identification with machine learning from incomplete data in the ALICE experiment

The ALICE experiment at the LHC measures properties of the strongly interacting matter formed in ultrarelativistic heavy-ion collisions. Such studies require accurate particle identification (PID). ALICE provides PID information via several detectors for particles with momentum from about 100 MeV/c up to 20 GeV/c. Traditionally, particles are selected with rectangular cuts. A much better performance can be achieved with machine learning (ML) methods. Our solution uses multiple neural networks (NN) serving as binary classifiers. Moreover, we extended our particle classifier with Feature Set Embedding and attention in order to train on data with incomplete samples. We also present the integration of the ML project with the ALICE analysis software, and we discuss domain adaptation, the ML technique needed to transfer the knowledge between simulated and real experimental data.

Updated: 2024-07-25 11:51:04

标题: 在ALICE实验中利用机器学习从不完整数据中进行粒子鉴别

摘要: LHC上的ALICE实验测量超相对论重离子碰撞中形成的强相互作用物质的性质。这类研究需要准确的粒子鉴别（PID）。ALICE通过几个探测器为动量从约100 MeV/c到20 GeV/c的粒子提供PID信息。传统上，粒子是通过矩形切割进行选择的。使用机器学习（ML）方法可以实现更好的性能。我们的解决方案使用多个神经网络（NN）作为二元分类器。此外，我们通过特征集嵌入和注意力扩展了粒子分类器，以便在数据样本不完整的情况下进行训练。我们还介绍了ML项目与ALICE分析软件的集成，并讨论了领域自适应，这是将知识从模拟数据转移到真实实验数据所需的ML技术。

更新时间: 2024-07-25 11:51:04

领域: hep-ex,cs.LG

下载: http://arxiv.org/abs/2403.17436v3

Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery

Generalized Continual Category Discovery (GCCD) tackles learning from sequentially arriving, partially labeled datasets while uncovering new categories. Traditional methods depend on feature distillation to prevent forgetting the old knowledge. However, this strategy restricts the model's ability to adapt and effectively distinguish new categories. To address this, we introduce a novel technique integrating a learnable projector with feature distillation, thus enhancing model adaptability without sacrificing past knowledge. The resulting distribution shift of the previously learned categories is mitigated with the auxiliary category adaptation network. We demonstrate that while each component offers modest benefits individually, their combination - dubbed CAMP (Category Adaptation Meets Projected distillation) - significantly improves the balance between learning new information and retaining old. CAMP exhibits superior performance across several GCCD and Class Incremental Learning scenarios. The code is available at https://github.com/grypesc/CAMP.

Updated: 2024-07-25 11:49:54

标题: 类别适应与广义连续类别发现中的投影蒸馏相遇

摘要: Generalized Continual Category Discovery (GCCD)通过处理从顺序到达的、部分标记的数据集中学习，同时发现新的类别。传统方法依赖于特征精炼来防止遗忘旧知识。然而，这种策略限制了模型适应和有效区分新类别的能力。为了解决这个问题，我们引入了一种新颖的技术，将可学习的投影器与特征精炼相结合，从而增强模型的适应性，而不牺牲过去的知识。通过辅助类别适应网络，缓解了先前学习的类别的分布转移。我们证明，虽然每个组件单独提供了适度的好处，但它们的组合——被称为CAMP（Category Adaptation Meets Projected distillation）——显著改善了学习新信息和保留旧信息之间的平衡。CAMP在多个GCCD和类增量学习场景中展现出优越的性能。代码可在https://github.com/grypesc/CAMP 上找到。

更新时间: 2024-07-25 11:49:54

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2308.12112v4

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

It is a common belief that large language models (LLMs) are better than smaller-sized ones. However, larger models also require significantly more time and compute during inference. This begs the question: what happens when both models operate under the same budget? (e.g., compute, run-time). To address this question, we analyze code generation LLMs of various sizes and make comparisons such as running a 70B model once vs. generating five outputs from a 13B model. We consider a standard unit-test setup, which can be used to select the correct output from the smaller model. Our findings reveal that the repeated use of smaller models can yield consistent improvements, with gains of up to 15% across five tasks. On the other hand, in scenarios where unit-tests are unavailable, a ranking-based selection of candidates from the smaller model falls short of the performance of a single output from larger ones. Our results highlight the potential of using smaller models instead of larger ones, and the importance of studying approaches for ranking LLM outputs.

Updated: 2024-07-25 11:37:54

标题: 越大越好吗？通过预算重分配改进LLM代码生成

摘要: 人们普遍认为，大型语言模型（LLMs）比较小的模型效果更好。然而，更大的模型在推断过程中也需要更多的时间和计算资源。这就引发了一个问题：当两个模型在相同的预算下运行时会发生什么？（例如，计算资源、运行时间）。为了解决这个问题，我们分析了不同大小的代码生成LLMs，并进行比较，如一次运行一个70B模型与从一个13B模型生成五个输出。我们考虑了一个标准的单元测试设置，可以用来从较小的模型中选择正确的输出。我们的研究结果表明，反复使用较小的模型可以产生稳定的改进，在五项任务中的增益可达15%。另一方面，在无法进行单元测试的情况下，从较小模型中根据排名选择候选者的表现不及从较大模型中获得的单个输出。我们的研究结果突显了使用较小模型而不是较大模型的潜力，以及研究对LLM输出进行排名的方法的重要性。

更新时间: 2024-07-25 11:37:54

领域: cs.SE,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.00725v2

Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks

Large language models (LLMs) have demonstrated impressive versatility across numerous tasks, yet their generalization capabilities remain poorly understood. To investigate these behaviors, arithmetic tasks serve as important venues. In previous studies, seemingly unrelated mysteries still exist -- (1) models with appropriate positional embeddings can correctly perform longer unseen arithmetic operations such as addition, but their effectiveness varies in more complex tasks like multiplication; (2) models perform well for longer unseen cases in modular addition under specific moduli (e.g., modulo 100) but struggle under very close moduli (e.g., modulo 101), regardless of the positional encoding used. We believe previous studies have been treating the symptoms rather than addressing the root cause -- they have paid excessive attention to improving model components, while overlooking the differences in task properties that may be the real drivers. This is confirmed by our unified theoretical framework for different arithmetic scenarios. For example, unlike multiplication, the digital addition task has the property of translation invariance which naturally aligns with the relative positional encoding, and this combination leads to successful generalization of addition to unseen longer domains. The discrepancy in operations modulo 100 and 101 arises from the base. Modulo 100, unlike 101, is compatible with the decimal system (base 10), such that unseen information in digits beyond the units digit and the tens digit is actually not needed for the task. Extensive experiments with GPT-like models validate our theoretical predictions. These findings deepen our understanding of the generalization mechanisms, and facilitate more data-efficient model training and objective-oriented AI alignment.

Updated: 2024-07-25 11:35:22

标题: 将看似无关的联系起来：在算术推理任务中生成模型泛化的原则理解

摘要: 大型语言模型（LLMs）在许多任务中展示出了令人印象深刻的多功能性，然而它们的泛化能力仍然知之甚少。为了研究这些行为，算术任务成为重要的研究场所。在先前的研究中，似乎存在着看似无关的谜团 - （1）具有适当位置嵌入的模型可以正确执行更长的未见算术操作，如加法，但在更复杂的任务中，如乘法，它们的有效性却有所不同；（2）在特定模数下（例如模100），模型在更长的未见情况下执行模加法时表现良好，但在非常接近的模数下（例如模101）时却表现出困难，无论使用的位置编码如何。我们认为先前的研究一直在处理症状而没有解决根本原因 - 他们过分关注改进模型组件，而忽视了可能是真正驱动因素的任务属性的差异。这一点得到了我们针对不同算术场景的统一理论框架的证实。例如，与乘法不同，数字加法任务具有平移不变性的属性，这与相对位置编码自然对齐，这种组合导致了加法向未见更长域的成功泛化。在模100和101的操作中的差异源于基数。模100与101不同，它与十进制系统（基数10）兼容，因此实际上对于任务而言，单位位和十位之外的数字信息是不需要的。类似GPT模型的广泛实验证实了我们的理论预测。这些发现加深了我们对泛化机制的理解，并促进了更具数据效率的模型训练和面向目标的人工智能对齐。

更新时间: 2024-07-25 11:35:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.17963v1

Neural Networks for Generating Better Local Optima in Topology Optimization

Neural networks have recently been employed as material discretizations within adjoint optimization frameworks for inverse problems and topology optimization. While advantageous regularization effects and better optima have been found for some inverse problems, the benefit for topology optimization has been limited -- where the focus of investigations has been the compliance problem. We demonstrate how neural network material discretizations can, under certain conditions, find better local optima in more challenging optimization problems, where we here specifically consider acoustic topology optimization. The chances of identifying a better optimum can significantly be improved by running multiple partial optimizations with different neural network initializations. Furthermore, we show that the neural network material discretization's advantage comes from the interplay with the Adam optimizer and emphasize its current limitations when competing with constrained and higher-order optimization techniques. At the moment, this discretization has only been shown to be beneficial for unconstrained first-order optimization.

Updated: 2024-07-25 11:24:44

标题: 神经网络用于在拓扑优化中生成更好的局部最优解

摘要: 最近，神经网络已被应用于逆问题和拓扑优化的伴随优化框架中作为材料离散化。虽然在某些逆问题中发现了有利的正则化效果和更好的最优解，但在拓扑优化方面的好处有限，重点是在对顺应性问题的研究中。我们展示了神经网络材料离散化在某些条件下可以在更具挑战性的优化问题中找到更好的局部最优解，特别是我们在这里考虑声学拓扑优化。通过运行具有不同神经网络初始化的多个部分优化，可以显着提高识别更好最优解的机会。此外，我们展示了神经网络材料离散化的优势来自与Adam优化器的相互作用，并强调它在与受限和高阶优化技术竞争时的当前局限性。目前，这种离散化只被证明对于无约束的一阶优化有益。

更新时间: 2024-07-25 11:24:44

领域: cs.LG

下载: http://arxiv.org/abs/2407.17957v1

A unified law of robustness for Bregman divergence losses

In contemporary deep learning practice, models are often trained to near zero loss i.e. to nearly interpolate the training data. However, the number of parameters in the model is usually far more than the number of data points $n$, the theoretical minimum needed for interpolation: a phenomenon referred to as overparameterization. In an interesting piece of work that contributes to the considerable research that has been devoted to understand overparameterization, Bubeck and Sellke showed that for a broad class of covariate distributions (specifically those satisfying a natural notion of concentration of measure), overparameterization is necessary for robust interpolation i.e. if the interpolating function is required to be Lipschitz. However, their robustness results were proved only in the setting of regression with square loss. In practice, however many other kinds of losses are used, e.g. cross entropy loss for classification. In this work, we generalize Bubeck and Selke's result to Bregman divergence losses, which form a common generalization of square loss and cross-entropy loss. Our generalization relies on identifying a bias-variance type decomposition that lies at the heart of the proof and Bubeck and Sellke.

Updated: 2024-07-25 11:21:50

标题: 一种适用于Bregman散度损失函数的鲁棒性统一法则

摘要: 在当代深度学习实践中，模型通常被训练到接近零损失，即几乎插值训练数据。然而，模型中的参数数量通常远远超过数据点的数量$n$，这是插值所需的理论最小值，这种现象被称为超参数化。在一项有趣的研究中，Bubeck和Sellke展示了对于一类广泛的协变量分布（特别是那些满足自然的测度集中的概念），超参数化对于鲁棒插值是必要的，即如果插值函数需要是Lipschitz的。然而，他们的鲁棒性结果仅在回归方面的平方损失设置中得到证明。然而，在实践中，许多其他类型的损失被使用，例如分类的交叉熵损失。在这项工作中，我们将Bubeck和Selke的结果推广到Bregman散度损失，这是平方损失和交叉熵损失的一种常见泛化。我们的泛化依赖于识别出Bubeck和Sellke证明中的核心部分——偏差-方差类型的分解。

更新时间: 2024-07-25 11:21:50

领域: cs.LG

下载: http://arxiv.org/abs/2405.16639v2

Scaling Training Data with Lossy Image Compression

Empirically-determined scaling laws have been broadly successful in predicting the evolution of large machine learning models with training data and number of parameters. As a consequence, they have been useful for optimizing the allocation of limited resources, most notably compute time. In certain applications, storage space is an important constraint, and data format needs to be chosen carefully as a consequence. Computer vision is a prominent example: images are inherently analog, but are always stored in a digital format using a finite number of bits. Given a dataset of digital images, the number of bits $L$ to store each of them can be further reduced using lossy data compression. This, however, can degrade the quality of the model trained on such images, since each example has lower resolution. In order to capture this trade-off and optimize storage of training data, we propose a `storage scaling law' that describes the joint evolution of test error with sample size and number of bits per image. We prove that this law holds within a stylized model for image compression, and verify it empirically on two computer vision tasks, extracting the relevant parameters. We then show that this law can be used to optimize the lossy compression level. At given storage, models trained on optimally compressed images present a significantly smaller test error with respect to models trained on the original data. Finally, we investigate the potential benefits of randomizing the compression level.

Updated: 2024-07-25 11:19:55

标题: 使用有损图像压缩扩展训练数据

摘要: 经验确定的缩放定律在预测大型机器学习模型随着训练数据和参数数量的演变方面取得了广泛的成功。因此，它们对于优化有限资源的分配非常有用，尤其是计算时间。在某些应用中，存储空间是一个重要的约束条件，因此需要谨慎选择数据格式。计算机视觉是一个突出的例子：图像在本质上是模拟的，但总是以有限数量的位存储在数字格式中。给定一个数字图像数据集，可以使用有损数据压缩进一步减少存储每个图像所需的位数$L$。然而，这可能会降低在这些图像上训练的模型的质量，因为每个示例的分辨率更低。为了捕捉这种权衡并优化训练数据的存储，我们提出了一个“存储缩放定律”，描述了测试误差随样本大小和每个图像的位数的联合演变。我们证明了在图像压缩的一种程式化模型中，该定律成立，并在两个计算机视觉任务中对其进行了经验验证，并提取了相关参数。然后我们展示了这个定律可以用来优化有损压缩水平。在给定的存储条件下，在经过最佳压缩的图像上训练的模型与在原始数据上训练的模型相比，测试误差显著较小。最后，我们调查了随机化压缩水平的潜在好处。

更新时间: 2024-07-25 11:19:55

领域: cs.CV,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2407.17954v1

Pruning Boolean d-DNNF Circuits Through Tseitin-Awareness

Boolean circuits in d-DNNF form enable tractable probabilistic inference. However, as a key insight of this work, we show that commonly used d-DNNF compilation approaches introduce irrelevant subcircuits. We call these subcircuits Tseitin artifacts, as they are introduced due to the Tseitin transformation step -- a well-established procedure to transform any circuit into the CNF format required by several d-DNNF knowledge compilers. We discuss how to detect and remove both Tseitin variables and Tseitin artifacts, leading to more succinct circuits. We empirically observe an average size reduction of 77.5% when removing both Tseitin variables and artifacts. The additional pruning of Tseitin artifacts reduces the size by 22.2% on average. This significantly improves downstream tasks that benefit from a more succinct circuit, e.g., probabilistic inference tasks.

Updated: 2024-07-25 11:15:57

标题: 通过Tseitin意识修剪布尔d-DNNF电路

摘要: 布尔电路以d-DNNF形式使得可处理的概率推断成为可能。然而，作为本文关键见解的一部分，我们展示了常用的d-DNNF编译方法引入了无关的子电路。我们将这些子电路称为Tseitin人为构造物，因为它们是由于Tseitin变换步骤引入的--这是将任何电路转换为几个d-DNNF知识编译器所需的CNF格式的一个既定程序。我们讨论了如何检测和消除Tseitin变量和Tseitin人为构造物，从而产生更简洁的电路。我们经验性地观察到，当同时移除Tseitin变量和人为构造物时，平均大小减少了77.5%。另外，通过修剪Tseitin人为构造物，平均大小减少了22.2%。这显著改善了从更简洁的电路中受益的下游任务，例如概率推断任务。

更新时间: 2024-07-25 11:15:57

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2407.17951v1

Real Time American Sign Language Detection Using Yolo-v9

This paper focuses on real-time American Sign Language Detection. YOLO is a convolutional neural network (CNN) based model, which was first released in 2015. In recent years, it gained popularity for its real-time detection capabilities. Our study specifically targets YOLO-v9 model, released in 2024. As the model is newly introduced, not much work has been done on it, especially not in Sign Language Detection. Our paper provides deep insight on how YOLO- v9 works and better than previous model.

Updated: 2024-07-25 11:11:05

标题: 实时美国手语检测使用Yolo-v9

摘要: 本文着重研究实时美国手语检测。YOLO是一个基于卷积神经网络（CNN）的模型，于2015年首次发布。近年来，该模型因其实时检测能力而备受青睐。我们的研究专门针对于2024年发布的YOLO-v9模型。由于该模型是新发布的，尤其是在手语检测方面，尚未有太多相关研究。我们的论文深入探讨了YOLO-v9模型的工作原理，并表明其比之前的模型更优秀。

更新时间: 2024-07-25 11:11:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.17950v1

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

By utilizing recently developed tools for constructing gradient flows on Wasserstein spaces, we extend an analysis technique commonly employed to understand alternating minimization algorithms on Euclidean space to the Expectation Maximization (EM) algorithm via its representation as coordinate-wise minimization on the product of a Euclidean space and a space of probability distributions due to Neal and Hinton (1998). In so doing we obtain finite sample error bounds and exponential convergence of the EM algorithm under a natural generalisation of a log-Sobolev inequality. We further demonstrate that the analysis technique is sufficiently flexible to allow also the analysis of several variants of the EM algorithm.

Updated: 2024-07-25 11:08:53

标题: 快速收敛的期望最大化算法在对数Sobolev不等式下

摘要: 通过利用最近发展的在Wasserstein空间上构建梯度流的工具，我们将常用的分析技术扩展到了期望最大化（EM）算法上，通过将其表示为Neal和Hinton（1998）提出的在欧几里得空间和概率分布空间上的坐标最小化的乘积。通过这样做，我们得到了在log-Sobolev不等式的自然推广下，EM算法的有限样本误差界和指数收敛性。我们进一步证明，这种分析技术足够灵活，可以用于分析EM算法的几种变体。

更新时间: 2024-07-25 11:08:53

领域: stat.ML,cs.LG,math.OC,math.ST,stat.CO,stat.TH

下载: http://arxiv.org/abs/2407.17949v1

Positive Text Reframing under Multi-strategy Optimization

Differing from sentiment transfer, positive reframing seeks to substitute negative perspectives with positive expressions while preserving the original meaning. With the emergence of pre-trained language models (PLMs), it is possible to achieve acceptable results by fine-tuning PLMs. Nevertheless, generating fluent, diverse and task-constrained reframing text remains a significant challenge. To tackle this issue, a \textbf{m}ulti-\textbf{s}trategy \textbf{o}ptimization \textbf{f}ramework (MSOF) is proposed in this paper. Starting from the objective of positive reframing, we first design positive sentiment reward and content preservation reward to encourage the model to transform the negative expressions of the original text while ensuring the integrity and consistency of the semantics. Then, different decoding optimization approaches are introduced to improve the quality of text generation. Finally, based on the modeling formula of positive reframing, we propose a multi-dimensional re-ranking method that further selects candidate sentences from three dimensions: strategy consistency, text similarity and fluency. Extensive experiments on two Seq2Seq PLMs, BART and T5, demonstrate our framework achieves significant improvements on unconstrained and controlled positive reframing tasks.

Updated: 2024-07-25 10:58:42

标题: 多策略优化下的积极文本重构

摘要: 与情感转移不同，积极重构旨在用积极表达替代消极观点，同时保留原始含义。随着预训练语言模型（PLMs）的出现，通过微调PLMs可以实现可接受的结果。然而，生成流畅、多样且任务受限的重构文本仍然是一个重要挑战。为了解决这个问题，本文提出了一个多策略优化框架（MSOF）。从积极重构的目标开始，我们首先设计了积极情感奖励和内容保留奖励，以鼓励模型转换原始文本的消极表达，同时确保语义的完整性和一致性。然后，引入不同的解码优化方法来提高文本生成的质量。最后，基于积极重构的建模公式，我们提出了一种多维重新排序方法，进一步从三个维度选择候选句子：策略一致性、文本相似性和流畅性。对两个Seq2Seq PLMs（BART和T5）进行的大量实验表明，我们的框架在不受限制和受控积极重构任务上取得了显着改进。

更新时间: 2024-07-25 10:58:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.17940v1

Physics-Enhanced Graph Neural Networks For Soft Sensing in Industrial Internet of Things

The Industrial Internet of Things (IIoT) is reshaping manufacturing, industrial processes, and infrastructure management. By fostering new levels of automation, efficiency, and predictive maintenance, IIoT is transforming traditional industries into intelligent, seamlessly interconnected ecosystems. However, achieving highly reliable IIoT can be hindered by factors such as the cost of installing large numbers of sensors, limitations in retrofitting existing systems with sensors, or harsh environmental conditions that may make sensor installation impractical. Soft (virtual) sensing leverages mathematical models to estimate variables from physical sensor data, offering a solution to these challenges. Data-driven and physics-based modeling are the two main methodologies widely used for soft sensing. The choice between these strategies depends on the complexity of the underlying system, with the data-driven approach often being preferred when the physics-based inference models are intricate and present challenges for state estimation. However, conventional deep learning models are typically hindered by their inability to explicitly represent the complex interactions among various sensors. To address this limitation, we adopt Graph Neural Networks (GNNs), renowned for their ability to effectively capture the complex relationships between sensor measurements. In this research, we propose physics-enhanced GNNs, which integrate principles of physics into graph-based methodologies. This is achieved by augmenting additional nodes in the input graph derived from the underlying characteristics of the physical processes. Our evaluation of the proposed methodology on the case study of district heating networks reveals significant improvements over purely data-driven GNNs, even in the presence of noise and parameter inaccuracies.

Updated: 2024-07-25 10:52:26

标题: 物理增强的图神经网络用于工业物联网中的软传感

摘要: 工业物联网（IIoT）正在改变制造业、工业流程和基础设施管理。通过促进新的自动化、效率和预测性维护水平，IIoT正在将传统行业转变为智能、无缝连接的生态系统。然而，要实现高度可靠的IIoT可能会受到诸如安装大量传感器的成本、在现有系统中安装传感器的限制或严酷的环境条件可能会使传感器安装不切实际等因素的阻碍。软（虚拟）传感利用数学模型从物理传感器数据中估算变量，为解决这些挑战提供了一种解决方案。数据驱动和基于物理的建模是两种广泛用于软传感的主要方法论。在选择这些策略之间，取决于底层系统的复杂性，当基于物理的推断模型复杂且在状态估计方面存在挑战时，通常更倾向于采用数据驱动方法。然而，传统的深度学习模型通常受限于无法明确表示各种传感器之间的复杂交互作用。为了解决这一限制，我们采用了以捕捉传感器测量之间复杂关系而闻名的图神经网络（GNN）。在这项研究中，我们提出了融合物理原理到基于图的方法中的物理增强GNN。这是通过在输入图中增加根据物理过程的基本特征导出的额外节点来实现的。我们对该方法在区域供热网络案例研究中的评估显示，即使在存在噪声和参数不准确性的情况下，与纯数据驱动的GNN相比，也有显著的改进。

更新时间: 2024-07-25 10:52:26

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2404.08061v2

Comparison of different Artificial Neural Networks for Bitcoin price forecasting

This study investigates the impact of varying sequence lengths on the accuracy of predicting cryptocurrency returns using Artificial Neural Networks (ANNs). Utilizing the Mean Absolute Error (MAE) as a threshold criterion, we aim to enhance prediction accuracy by excluding returns that are smaller than this threshold, thus mitigating errors associated with minor returns. The subsequent evaluation focuses on the accuracy of predicted returns that exceed this threshold. We compare four sequence lengths 168 hours (7 days), 72 hours (3 days), 24 hours, and 12 hours each with a return prediction interval of 2 hours. Our findings reveal the influence of sequence length on prediction accuracy and underscore the potential for optimized sequence configurations in financial forecasting models.

Updated: 2024-07-25 10:39:50

标题: 不同人工神经网络在比特币价格预测中的比较

摘要: 这项研究探讨了不同序列长度对使用人工神经网络（ANNs）预测加密货币回报准确性的影响。利用平均绝对误差（MAE）作为阈值标准，我们旨在通过排除小于该阈值的回报来提高预测准确性，从而减少与较小回报相关的误差。随后的评估聚焦于超过该阈值的预测回报的准确性。我们比较了四种序列长度：168小时（7天）、72小时（3天）、24小时和12小时，每种序列长度的回报预测间隔为2小时。我们的研究结果揭示了序列长度对预测准确性的影响，并强调了在金融预测模型中优化序列配置的潜力。

更新时间: 2024-07-25 10:39:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.17930v1

Improving probabilistic forecasts of extreme wind speeds by training statistical post-processing models with weighted scoring rules

Accurate forecasts of extreme wind speeds are of high importance for many applications. Such forecasts are usually generated by ensembles of numerical weather prediction (NWP) models, which however can be biased and have errors in dispersion, thus necessitating the application of statistical post-processing techniques. In this work we aim to improve statistical post-processing models for probabilistic predictions of extreme wind speeds. We do this by adjusting the training procedure used to fit ensemble model output statistics (EMOS) models - a commonly applied post-processing technique - and propose estimating parameters using the so-called threshold-weighted continuous ranked probability score (twCRPS), a proper scoring rule that places special emphasis on predictions over a threshold. We show that training using the twCRPS leads to improved extreme event performance of post-processing models for a variety of thresholds. We find a distribution body-tail trade-off where improved performance for probabilistic predictions of extreme events comes with worse performance for predictions of the distribution body. However, we introduce strategies to mitigate this trade-off based on weighted training and linear pooling. Finally, we consider some synthetic experiments to explain the training impact of the twCRPS and derive closed-form expressions of the twCRPS for a number of distributions, giving the first such collection in the literature. The results will enable researchers and practitioners alike to improve the performance of probabilistic forecasting models for extremes and other events of interest.

Updated: 2024-07-25 10:39:15

标题: 通过使用加权评分规则训练统计后处理模型，改进极端风速的概率预测

摘要: 准确预测极端风速对许多应用至关重要。这类预测通常由数值天气预报（NWP）模型的集合生成，然而这些模型可能存在偏差和散布误差，因此需要应用统计后处理技术。在这项工作中，我们旨在改进用于极端风速概率预测的统计后处理模型。我们通过调整用于拟合集合模型输出统计信息（EMOS）模型的训练过程（一种常用的后处理技术），并提出使用所谓的阈值加权连续排名概率评分（twCRPS）来估计参数，这是一种适当的评分规则，特别强调超过阈值的预测。我们发现使用twCRPS进行训练可以提高后处理模型在各种阈值下对极端事件的性能。我们发现了分布主体-尾部的权衡，即对极端事件的概率预测的改善会导致对分布主体的预测性能变差。然而，我们介绍了基于加权训练和线性池化的策略来缓解这种权衡。最后，我们考虑一些合成实验来解释twCRPS的训练影响，并推导了一些分布的twCRPS的闭式表达式，这是文献中首次出现这样的集合。这些结果将使研究人员和从业者能够改进极端和其他感兴趣事件的概率预测模型的性能。

更新时间: 2024-07-25 10:39:15

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2407.15900v2

Guided Latent Slot Diffusion for Object-Centric Learning

Slot attention aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable various downstream tasks. Yet, these slots often bind to object parts, not objects themselves, especially for real-world datasets. To address this, we introduce Guided Latent Slot Diffusion - GLASS, an object-centric model that uses generated captions as a guiding signal to better align slots with objects. Our key insight is to learn the slot-attention module in the space of generated images. This allows us to repurpose the pre-trained diffusion decoder model, which reconstructs the images from the slots, as a semantic mask generator based on the generated captions. GLASS learns an object-level representation suitable for multiple tasks simultaneously, e.g., segmentation, image generation, and property prediction, outperforming previous methods. For object discovery, GLASS achieves approx. a +35% and +10% relative improvement for mIoU over the previous state-of-the-art (SOTA) method on the VOC and COCO datasets, respectively, and establishes a new SOTA FID score for conditional image generation amongst slot-attention-based methods. For the segmentation task, GLASS surpasses SOTA weakly-supervised and language-based segmentation models, which were specifically designed for the task.

Updated: 2024-07-25 10:38:32

标题: 引导潜在槽扩散用于以对象为中心的学习

摘要: Slot attention旨在将输入图像分解为一组有意义的对象文件（插槽）。这些潜在的对象表示使各种下游任务成为可能。然而，这些插槽通常与对象部分绑定，而不是对象本身，特别是对于现实世界的数据集。为了解决这个问题，我们引入了Guided Latent Slot Diffusion - GLASS，这是一种以对象为中心的模型，使用生成的标题作为引导信号，以更好地将插槽与对象对齐。我们的关键洞察是在生成的图像空间中学习插槽注意模块。这使我们能够重新利用预训练的扩散解码器模型，该模型从插槽重建图像，作为基于生成的标题的语义掩模生成器。GLASS学习了适用于多个任务的对象级表示，例如分割、图像生成和属性预测，优于先前的方法。对于对象发现任务，GLASS在VOC和COCO数据集上相对于先前的最新方法实现了约+35%和+10%的mIoU相对改进，并在基于插槽注意的方法中建立了条件图像生成的新SOTA FID分数。对于分割任务，GLASS超越了专门为该任务设计的SOTA弱监督和基于语言的分割模型。

更新时间: 2024-07-25 10:38:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.17929v1

Invariance of deep image quality metrics to affine transformations

Deep architectures are the current state-of-the-art in predicting subjective image quality. Usually, these models are evaluated according to their ability to correlate with human opinion in databases with a range of distortions that may appear in digital media. However, these oversee affine transformations which may represent better the changes in the images actually happening in natural conditions. Humans can be particularly invariant to these natural transformations, as opposed to the digital ones. In this work, we evaluate state-of-the-art deep image quality metrics by assessing their invariance to affine transformations, specifically: rotation, translation, scaling, and changes in spectral illumination. We propose a methodology to assign invisibility thresholds for any perceptual metric. This methodology involves transforming the distance measured by an arbitrary metric to a common distance representation based on available subjectively rated databases. We psychophysically measure an absolute detection threshold in that common representation and express it in the physical units of each affine transform for each metric. By doing so, we allow the analyzed metrics to be directly comparable with actual human thresholds. We find that none of the state-of-the-art metrics shows human-like results under this strong test based on invisibility thresholds. This means that tuning the models exclusively to predict the visibility of generic distortions may disregard other properties of human vision as for instance invariances or invisibility thresholds.

Updated: 2024-07-25 10:24:54

标题: 深度图像质量指标对仿射变换的不变性

摘要: 深度架构是当前在预测主观图像质量方面的最先进技术。通常，这些模型根据它们与人类意见在可能出现在数字媒体中的各种失真相关的能力进行评估。然而，这些模型忽视了仿射变换，这些变换可能更好地代表了实际发生在自然条件下的图像变化。与数字变换相反，人类对这些自然变换可以特别不变。在这项工作中，我们通过评估最先进的深度图像质量度量方法来评估它们对仿射变换的不变性，具体包括：旋转、平移、缩放和光谱照明变化。我们提出了一种方法来为任何感知度量分配不可见性阈值。该方法涉及将任意度量测得的距离转换为基于可用主观评分数据库的通用距离表示。我们在该通用表示中心理物理地测量了一个绝对检测阈值，并将其表达为每种度量的每种仿射变换的物理单位。通过这样做，我们使所分析的度量与实际人类阈值直接可比较。我们发现，在这种基于不可见性阈值的强化测试下，没有任何最先进的度量显示出类似于人类的结果。这意味着将模型调整为专门预测通用失真的可见性可能忽略了人类视觉的其他特性，例如不变性或不可见性阈值。

更新时间: 2024-07-25 10:24:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17927v1

Targeted stochastic gradient Markov chain Monte Carlo for hidden Markov models with rare latent states

Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler. This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches. These approximate the full posterior by using small random subsequences of the data at each MCMC iteration within stochastic gradient MCMC. In the presence of imbalanced data resulting from rare latent states, subsequences often exclude rare latent state data, leading to inaccurate inference and prediction/detection of rare events. We propose a targeted sub-sampling (TASS) approach that over-samples observations corresponding to rare latent states when calculating the stochastic gradient of parameters associated with them. TASS uses an initial clustering of the data to construct subsequence weights that reduce the variance in gradient estimation. This leads to improved sampling efficiency, in particular in settings where the rare latent states correspond to extreme observations. We demonstrate substantial gains in predictive and inferential accuracy on real and synthetic examples.

Updated: 2024-07-25 10:21:32

标题: 面向稀有潜在状态的隐马尔可夫模型的目标随机梯度马尔可夫链蒙特卡洛算法

摘要: 马尔可夫链蒙特卡罗（MCMC）算法通常用于隐藏马尔可夫模型，依赖于前向后向采样器。随着时间序列的长度增加，这使得它们在计算上变得缓慢，促使开发基于子采样的方法。这些方法通过在随机梯度MCMC中的每个MCMC迭代中使用数据的小随机子序列来近似完整后验分布。在由稀有潜在状态导致的数据不平衡的情况下，子序列通常会排除稀有潜在状态数据，导致对稀有事件的不准确推断和预测/检测。我们提出了一种有针对性的子采样（TASS）方法，在计算与稀有潜在状态相关联的参数的随机梯度时，对应于稀有潜在状态的观测进行过采样。TASS使用数据的初始聚类来构建子序列权重，以减少梯度估计的方差。这导致采样效率得到改善，特别是在稀有潜在状态对应极端观测的设置中。我们在真实和合成示例中展示了预测和推断准确性的显著提高。

更新时间: 2024-07-25 10:21:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/1810.13431v3

Detection of Correlated Random Vectors

In this paper, we investigate the problem of deciding whether two standard normal random vectors $\mathsf{X}\in\mathbb{R}^{n}$ and $\mathsf{Y}\in\mathbb{R}^{n}$ are correlated or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these vectors are statistically independent, while under the alternative, $\mathsf{X}$ and a randomly and uniformly permuted version of $\mathsf{Y}$, are correlated with correlation $\rho$. We analyze the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$ and $\rho$. To derive our information-theoretic lower bounds, we develop a novel technique for evaluating the second moment of the likelihood ratio using an orthogonal polynomials expansion, which among other things, reveals a surprising connection to integer partition functions. We also study a multi-dimensional generalization of the above setting, where rather than two vectors we observe two databases/matrices, and furthermore allow for partial correlations between these two.

Updated: 2024-07-25 10:15:51

标题: 检测相关随机向量

摘要: 在本文中，我们研究了决定两个标准正态随机向量$\mathsf{X}\in\mathbb{R}^{n}$和$\mathsf{Y}\in\mathbb{R}^{n}$是否相关的问题。这被制定为一个假设检验问题，在零假设下，这些向量在统计上是独立的，而在备择假设下，$\mathsf{X}$和$\mathsf{Y}$的随机和均匀置换的版本是具有相关性的，相关性为$\rho$。我们分析了在哪些阈值下最优测试在信息论上是不可能的和可能的，作为$n$和$\rho$的函数。为了推导我们的信息论下界，我们开发了一种评估似然比的二阶矩的新技术，使用正交多项式展开，其中，除其他事项外，还揭示了与整数分割函数的令人惊讶的联系。我们还研究了上述设置的多维泛化，其中我们观察到两个数据库/矩阵，而且还允许这两者之间存在部分相关性。

更新时间: 2024-07-25 10:15:51

领域: cs.IT,cs.LG,math.IT,math.ST,stat.TH

下载: http://arxiv.org/abs/2401.13429v3

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives. However, the existing MORL methods either rely on multiple passes of explicit search for finding the Pareto front and therefore are not sample-efficient, or utilizes a shared policy network for coarse knowledge sharing among policies. To boost the sample efficiency of MORL, we propose Q-Pensieve, a policy improvement scheme that stores a collection of Q-snapshots to jointly determine the policy update direction and thereby enables data sharing at the policy level. We show that Q-Pensieve can be naturally integrated with soft policy iteration with convergence guarantee. To substantiate this concept, we propose the technique of Q replay buffer, which stores the learned Q-networks from the past iterations, and arrive at a practical actor-critic implementation. Through extensive experiments and an ablation study, we demonstrate that with much fewer samples, the proposed algorithm can outperform the benchmark MORL methods on a variety of MORL benchmark tasks.

Updated: 2024-07-25 10:11:29

标题: Q-Pensieve：通过共享Q-Snapshot的记忆来提高多目标强化学习的样本效率

摘要: 许多现实世界中的连续控制问题都面临权衡利弊的困境，多目标强化学习（MORL）作为一个学习控制策略以满足不同目标偏好的通用框架。然而，现有的MORL方法要么依赖于显式搜索多次以找到帕累托前沿，因此不具有样本效率，要么利用共享策略网络在策略之间进行粗略知识共享。为了提高MORL的样本效率，我们提出了Q-Pensieve，一种存储一系列Q快照的策略改进方案，以共同确定策略更新方向，从而实现在策略级别上的数据共享。我们展示了Q-Pensieve可以与具有收敛保证的软策略迭代自然集成。为了证明这一概念，我们提出了Q回放缓冲区技术，存储了过去迭代中学习到的Q网络，并得出了一个实用的演员-评论家实现。通过大量实验和消融研究，我们证明了提出的算法在多个MORL基准任务上可以优于基准MORL方法，且所需样本数量更少。

更新时间: 2024-07-25 10:11:29

领域: cs.LG

下载: http://arxiv.org/abs/2212.03117v2

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

Large language models (LLMs) have demonstrated remarkable capabilities, but their power comes with significant security considerations. While extensive research has been conducted on the safety of LLMs in chat mode, the security implications of their function calling feature have been largely overlooked. This paper uncovers a critical vulnerability in the function calling process of LLMs, introducing a novel "jailbreak function" attack method that exploits alignment discrepancies, user coercion, and the absence of rigorous safety filters. Our empirical study, conducted on six state-of-the-art LLMs including GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-pro, reveals an alarming average success rate of over 90\% for this attack. We provide a comprehensive analysis of why function calls are susceptible to such attacks and propose defensive strategies, including the use of defensive prompts. Our findings highlight the urgent need for enhanced security measures in the function calling capabilities of LLMs, contributing to the field of AI safety by identifying a previously unexplored risk, designing an effective attack method, and suggesting practical defensive measures. Our code is available at https://github.com/wooozihui/jailbreakfunction.

Updated: 2024-07-25 10:09:21

标题: 函数调用的暗面：越狱大型语言模型的途径

摘要: 大型语言模型(LLMs)展示出了非凡的能力，但其强大性带来了重要的安全考虑。尽管在聊天模式下对LLMs的安全性进行了广泛研究，但它们的函数调用功能的安全影响却被大多数人忽视了。本文揭示了LLMs函数调用过程中的一个关键漏洞，引入了一种新颖的“越狱函数”攻击方法，利用了对齐差异、用户强迫和缺乏严格的安全过滤器。我们的实证研究在包括GPT-4o、Claude-3.5-Sonnet和Gemini-1.5-pro在内的六种最先进的LLMs上进行，揭示了这种攻击的惊人平均成功率超过90%。我们对函数调用为何容易受到此类攻击进行了全面分析，并提出了防御策略，包括使用防御提示。我们的研究结果突显了LLMs函数调用能力中加强安全措施的紧迫需要，通过识别一个先前未探索的风险、设计有效的攻击方法和提出实用的防御措施，为AI安全领域做出了贡献。我们的代码可在https://github.com/wooozihui/jailbreakfunction下载。

更新时间: 2024-07-25 10:09:21

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.17915v1

ReCorD: Reasoning and Correcting Diffusion for HOI Generation

Diffusion models revolutionize image generation by leveraging natural language to guide the creation of multimedia content. Despite significant advancements in such generative models, challenges persist in depicting detailed human-object interactions, especially regarding pose and object placement accuracy. We introduce a training-free method named Reasoning and Correcting Diffusion (ReCorD) to address these challenges. Our model couples Latent Diffusion Models with Visual Language Models to refine the generation process, ensuring precise depictions of HOIs. We propose an interaction-aware reasoning module to improve the interpretation of the interaction, along with an interaction correcting module to refine the output image for more precise HOI generation delicately. Through a meticulous process of pose selection and object positioning, ReCorD achieves superior fidelity in generated images while efficiently reducing computational requirements. We conduct comprehensive experiments on three benchmarks to demonstrate the significant progress in solving text-to-image generation tasks, showcasing ReCorD's ability to render complex interactions accurately by outperforming existing methods in HOI classification score, as well as FID and Verb CLIP-Score. Project website is available at https://alberthkyhky.github.io/ReCorD/ .

Updated: 2024-07-25 10:06:26

标题: ReCorD：用于HOI生成的推理和校正扩散

摘要: 扩散模型通过利用自然语言来引导多媒体内容的生成，从而在图像生成领域引发了革命。尽管这种生成模型取得了显著进展，但在描述复杂的人物-物体互动方面仍然存在挑战，特别是在姿势和物体位置准确性方面。我们引入了一种名为Reasoning and Correcting Diffusion（ReCorD）的无需训练的方法来解决这些挑战。我们的模型将潜在扩散模型与视觉语言模型相结合，以精细化生成过程，确保准确描绘HOIs。我们提出了一种感知互动推理模块来改善对互动的理解，以及一个互动校正模块来精细化输出图像，以更精确地生成HOI。通过精心选择姿势和物体位置，ReCorD在生成图像的保真度方面取得了优越的效果，同时有效降低了计算需求。我们在三个基准数据集上进行了全面实验，展示了在解决文本到图像生成任务方面取得的显著进展，展示了ReCorD在HOI分类得分、FID和动词CLIP得分方面优于现有方法，能够准确呈现复杂的互动。项目网站地址为https://alberthkyhky.github.io/ReCorD/。

更新时间: 2024-07-25 10:06:26

领域: cs.MM,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.17911v1

Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets. This paper introduces a causal deepset framework that relaxes several key structural assumptions, primarily the mean-field assumption, prevalent in existing OPE methodologies that handle spatio-temporal interference. These traditional assumptions frequently prove inadequate in real-world settings, thereby restricting the capability of current OPE methods to effectively address complex interference effects. In response, we advocate for the implementation of the permutation invariance (PI) assumption. This innovative approach enables the data-driven, adaptive learning of the mean-field function, offering a more flexible estimation method beyond conventional averaging. Furthermore, we present novel algorithms that incorporate the PI assumption into OPE and thoroughly examine their theoretical foundations. Our numerical analyses demonstrate that this novel approach yields significantly more precise estimations than existing baseline algorithms, thereby substantially improving the practical applicability and effectiveness of OPE methodologies. A Python implementation of our proposed method is available at https://github.com/BIG-S2/Causal-Deepsets.

Updated: 2024-07-25 10:02:11

标题: 因果深集合用于空间或时空干扰下的离线评估

摘要: 脱机策略评估（OPE）被广泛应用于制药和电子商务等领域，用于评估从脱机数据集中评估新产品或政策的有效性。本文介绍了一种因果深度集框架，放宽了几个关键结构假设，主要是现有处理时空干扰的OPE方法中普遍存在的均场假设。这些传统假设在实际环境中经常表现出不足，从而限制了当前OPE方法有效解决复杂干扰效应的能力。为此，我们倡导实施排列不变性（PI）假设。这种创新方法使得对均场函数的数据驱动自适应学习成为可能，提供了一种超越传统平均方法的更灵活的估计方法。此外，我们提出了将PI假设纳入OPE中的新算法，并彻底审查了它们的理论基础。我们的数值分析表明，这种新方法产生的估计比现有基准算法显著更精确，从而大大提高了OPE方法的实际适用性和效力。我们提出的方法的Python实现可在https://github.com/BIG-S2/Causal-Deepsets上找到。

更新时间: 2024-07-25 10:02:11

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.17910v1

Separating Novel Features for Logical Anomaly Detection: A Straightforward yet Effective Approach

Vision-based inspection algorithms have significantly contributed to quality control in industrial settings, particularly in addressing structural defects like dent and contamination which are prevalent in mass production. Extensive research efforts have led to the development of related benchmarks such as MVTec AD (Bergmann et al., 2019). However, in industrial settings, there can be instances of logical defects, where acceptable items are found in unsuitable locations or product pairs do not match as expected. Recent methods tackling logical defects effectively employ knowledge distillation to generate difference maps. Knowledge distillation (KD) is used to learn normal data distribution in unsupervised manner. Despite their effectiveness, these methods often overlook the potential false negatives. Excessive similarity between the teacher network and student network can hinder the generation of a suitable difference map for logical anomaly detection. This technical report provides insights on handling potential false negatives by utilizing a simple constraint in KD-based logical anomaly detection methods. We select EfficientAD as a state-of-the-art baseline and apply a margin-based constraint to its unsupervised learning scheme. Applying this constraint, we can improve the AUROC for MVTec LOCO AD by 1.3 %.

Updated: 2024-07-25 10:00:21

标题: 将文献标题翻译为：分离逻辑异常检测的新特征：一种直接而有效的方法

摘要: 基于视觉的检测算法在工业环境中对质量控制做出了显著贡献，特别是在解决大规模生产中普遍存在的结构缺陷，如凹陷和污染方面。大量的研究工作已经导致了相关基准的发展，如MVTec AD（Bergmann等，2019）。然而，在工业环境中，可能出现逻辑缺陷的情况，即可接受的物品出现在不适当的位置，或产品配对与预期不符。最近有效解决逻辑缺陷的方法采用知识蒸馏来生成差异图。知识蒸馏（KD）用于以无监督的方式学习正常数据分布。尽管这些方法很有效，但它们经常忽视潜在的假阴性。教师网络和学生网络之间过多的相似性可能会阻碍适用于逻辑异常检测的适当差异图的生成。本技术报告提供了利用知识蒸馏为基础的逻辑异常检测方法中简单约束处理潜在假阴性的见解。我们选择EfficientAD作为最先进的基线，并将基于边界的约束应用于其无监督学习方案。应用此约束，我们可以将MVTec LOCO AD的AUROC提高1.3％。

更新时间: 2024-07-25 10:00:21

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.17909v1

Amortized Posterior Sampling with Diffusion Prior Distillation

We propose a variational inference approach to sample from the posterior distribution for solving inverse problems. From a pre-trained diffusion model, our approach trains a conditional flow model to minimize the divergence between the proposal variational distribution and the posterior distribution implicitly defined through the diffusion model. Once trained, the flow model is capable of sampling from the posterior distribution with a single NFE, amortized with respect to the measurement. The proposed method paves a new path for distilling a diffusion prior for efficient posterior sampling. We show that our method is applicable to standard signals in Euclidean space, as well as signals on manifold.

Updated: 2024-07-25 09:53:12

标题: 摊销后验抽样与扩散先验提取

摘要: 我们提出了一种变分推断方法，用于从后验分布中抽样以解决逆问题。通过预先训练的扩散模型，我们的方法训练一个条件流模型，以最小化建议的变分分布与通过扩散模型隐含定义的后验分布之间的差异。一旦训练完成，流模型能够通过单个NFE从后验分布中抽样，相对于测量进行分摊。所提出的方法为提取扩散先验以进行高效后验抽样铺平了一条新路径。我们展示了我们的方法适用于欧几里得空间中的标准信号，以及流形上的信号。

更新时间: 2024-07-25 09:53:12

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.17907v1

Enhanced Deep Learning Methodologies and MRI Selection Techniques for Dementia Diagnosis in the Elderly Population

Dementia, a debilitating neurological condition affecting millions worldwide, presents significant diagnostic challenges. In this work, we introduce a novel methodology for the classification of demented and non-demented elderly patients using 3D brain Magnetic Resonance Imaging (MRI) scans. Our approach features a unique technique for selectively processing MRI slices, focusing on the most relevant brain regions and excluding less informative sections. This methodology is complemented by a confidence-based classification committee composed of three custom deep learning models: Dem3D ResNet, Dem3D CNN, and Dem3D EfficientNet. These models work synergistically to enhance decision-making accuracy, leveraging their collective strengths. Tested on the Open Access Series of Imaging Studies(OASIS) dataset, our method achieved an impressive accuracy of 94.12%, surpassing existing methodologies. Furthermore, validation on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset confirmed the robustness and generalizability of our approach. The use of explainable AI (XAI) techniques and comprehensive ablation studies further substantiate the effectiveness of our techniques, providing insights into the decision-making process and the importance of our methodology. This research offers a significant advancement in dementia diagnosis, providing a highly accurate and efficient tool for clinical applications.

Updated: 2024-07-25 09:50:03

标题: 加强的深度学习方法和磁共振成像选择技术在老年人群痴呆症诊断中的应用

摘要: 痴呆症是一种影响全球数百万人的致残性神经系统疾病，具有重要的诊断挑战。在这项工作中，我们介绍了一种新颖的方法，利用3D脑磁共振成像（MRI）扫描对痴呆和非痴呆的老年患者进行分类。我们的方法采用了一种独特的技术，选择性地处理MRI切片，重点关注最相关的脑区域，排除较不信息丰富的部分。该方法还配备了一个基于信心的分类委员会，由三个定制的深度学习模型组成：Dem3D ResNet、Dem3D CNN和Dem3D EfficientNet。这些模型协同工作，以增强决策准确性，发挥它们集体的优势。在开放获取的成像研究系列（OASIS）数据集上进行测试，我们的方法取得了令人印象深刻的94.12%的准确性，超过了现有的方法。此外，在阿尔茨海默病神经影像学倡议（ADNI）数据集上的验证证实了我们方法的稳健性和泛化能力。采用可解释的人工智能（XAI）技术和全面的消融研究进一步证实了我们技术的有效性，为决策过程提供了洞见，以及我们方法的重要性。这项研究在痴呆症诊断方面取得了重大进展，为临床应用提供了高度准确和高效的工具。

更新时间: 2024-07-25 09:50:03

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.17324v2

The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer

Lymph node metastasis (LNM) is a crucial factor in determining the initial treatment for patients with lung cancer, yet accurate preoperative diagnosis of LNM remains challenging. Recently, large language models (LLMs) have garnered significant attention due to their remarkable text generation capabilities. Leveraging the extensive medical knowledge learned from vast corpora, LLMs can estimate probabilities for clinical problems, though their performance has historically been inferior to data-driven machine learning models. In this paper, we propose a novel ensemble method that combines the medical knowledge acquired by LLMs with the latent patterns identified by machine learning models to enhance LNM prediction performance. Initially, we developed machine learning models using patient data. We then designed a prompt template to integrate the patient data with the predicted probability from the machine learning model. Subsequently, we instructed GPT-4o, the most advanced LLM developed by OpenAI, to estimate the likelihood of LNM based on patient data and then adjust the estimate using the machine learning output. Finally, we collected three outputs from the GPT-4o using the same prompt and ensembled these results as the final prediction. Using the proposed method, our models achieved an AUC value of 0.765 and an AP value of 0.415 for LNM prediction, significantly improving predictive performance compared to baseline machine learning models. The experimental results indicate that GPT-4o can effectively leverage its medical knowledge and the probabilities predicted by machine learning models to achieve more accurate LNM predictions. These findings demonstrate that LLMs can perform well in clinical risk prediction tasks, offering a new paradigm for integrating medical knowledge and patient data in clinical predictions.

Updated: 2024-07-25 09:42:24

标题: 将数据和知识相结合的力量：GPT-4o在预测肺癌淋巴结转移中是机器学习模型的有效解释者

摘要: 淋巴结转移（LNM）是决定肺癌患者初始治疗的关键因素，然而精确的术前诊断LNM仍然具有挑战性。最近，大型语言模型（LLMs）因其出色的文本生成能力而受到广泛关注。利用从大量语料库中学习的广泛医学知识，LLMs可以估计临床问题的概率，尽管它们的性能在历史上一直低于数据驱动的机器学习模型。在本文中，我们提出了一种新颖的集成方法，该方法将LLMs获得的医学知识与机器学习模型识别的潜在模式结合起来，以增强LNM预测性能。首先，我们使用患者数据开发了机器学习模型。然后，我们设计了一个提示模板，将患者数据与机器学习模型的预测概率整合在一起。随后，我们指示由OpenAI开发的最先进的LLM GPT-4o根据患者数据估计LNM的可能性，然后使用机器学习输出调整估计值。最后，我们使用相同的提示从GPT-4o收集了三个输出，并将这些结果作为最终预测的集合。使用所提出的方法，我们的模型在LNM预测方面实现了AUC值为0.765和AP值为0.415，与基线机器学习模型相比显著改善了预测性能。实验结果表明，GPT-4o可以有效地利用其医学知识和机器学习模型预测的概率来实现更准确的LNM预测。这些发现表明，LLMs在临床风险预测任务中表现良好，为将医学知识和患者数据整合到临床预测中提供了一种新范式。

更新时间: 2024-07-25 09:42:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17900v1

3D Hole Filling using Deep Learning Inpainting

The current work presents a novel methodology for completing 3D surfaces produced from 3D digitization technologies in places where there is a scarcity of meaningful geometric data. Incomplete or missing data in these three-dimensional (3D) models can lead to erroneous or flawed renderings, limiting their usefulness in a variety of applications such as visualization, geometric computation, and 3D printing. Conventional surface estimation approaches often produce implausible results, especially when dealing with complex surfaces. To address this issue, we propose a technique that incorporates neural network-based 2D inpainting to effectively reconstruct 3D surfaces. Our customized neural networks were trained on a dataset containing over 1 million curvature images. These images show the curvature of vertices as planar representations in 2D. Furthermore, we used a coarse-to-fine surface deformation technique to improve the accuracy of the reconstructed pictures and assure surface adaptability. This strategy enables the system to learn and generalize patterns from input data, resulting in the development of precise and comprehensive three-dimensional surfaces. Our methodology excels in the shape completion process, effectively filling complex holes in three-dimensional surfaces with a remarkable level of realism and precision.

Updated: 2024-07-25 09:36:37

标题: 使用深度学习修补技术进行3D孔洞填充

摘要: 当前工作提出了一种新颖的方法论，用于在几何数据稀缺的地方完成由3D数字化技术产生的3D表面。在这些三维（3D）模型中存在不完整或缺失的数据可能导致错误或有缺陷的渲染，从而限制了它们在可视化、几何计算和3D打印等各种应用中的实用性。传统的表面估计方法通常会产生不切实际的结果，尤其是在处理复杂表面时。为了解决这个问题，我们提出了一种技术，将基于神经网络的2D修补技术融入其中，有效地重建3D表面。我们的定制神经网络是在包含超过100万个曲率图像的数据集上进行训练的。这些图像显示了顶点的曲率作为2D中的平面表示。此外，我们使用了一种粗到精的表面变形技术，以提高重建图像的准确性并确保表面适应性。这种策略使系统能够从输入数据中学习和概括模式，从而开发出精确和全面的三维表面。我们的方法在形状完成过程中表现出色，有效地填补了三维表面中复杂的空洞，具有卓越的逼真度和精度。

更新时间: 2024-07-25 09:36:37

领域: cs.GR,cs.AI

下载: http://arxiv.org/abs/2407.17896v1

The Platonic Representation Hypothesis

We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.

Updated: 2024-07-25 09:33:50

标题: 柏拉图表征假设

摘要: 我们认为，人工智能模型中的表示，特别是深度网络，正在趋于收敛。首先，我们在文献中调查了许多收敛的例子：随着时间的推移和跨多个领域，不同神经网络表示数据的方式越来越趋于一致。接下来，我们展示了数据模态之间的收敛：随着视觉模型和语言模型变得更加庞大，它们在衡量数据点之间的距离时变得更加相似。我们假设这种收敛正在朝向一个共享的现实统计模型，类似于柏拉图的理念中的理想现实。我们将这种表示称为柏拉图表示，并讨论了几种可能的朝向它的选择性压力。最后，我们讨论了这些趋势的影响、它们的局限性，以及与我们分析相反的案例。

更新时间: 2024-07-25 09:33:50

领域: cs.LG,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2405.07987v5

An Iterative Approach to Topic Modelling

Topic modelling has become increasingly popular for summarizing text data, such as social media posts and articles. However, topic modelling is usually completed in one shot. Assessing the quality of resulting topics is challenging. No effective methods or measures have been developed for assessing the results or for making further enhancements to the topics. In this research, we propose we propose to use an iterative process to perform topic modelling that gives rise to a sense of completeness of the resulting topics when the process is complete. Using the BERTopic package, a popular method in topic modelling, we demonstrate how the modelling process can be applied iteratively to arrive at a set of topics that could not be further improved upon using one of the three selected measures for clustering comparison as the decision criteria. This demonstration is conducted using a subset of the COVIDSenti-A dataset. The early success leads us to believe that further research using in using this approach in conjunction with other topic modelling algorithms could be viable.

Updated: 2024-07-25 09:26:07

标题: 一个迭代方法用于主题建模

摘要: 主题建模在总结文本数据（如社交媒体帖子和文章）方面越来越受欢迎。然而，主题建模通常是一次性完成的。评估生成主题的质量是具有挑战性的。尚未开发出有效的方法或措施来评估结果或进一步改进主题。在这项研究中，我们提出使用迭代过程执行主题建模，当过程完成时，会产生对生成主题的完整感。利用BERTopic包，这是主题建模中一种流行的方法，我们展示了如何迭代应用建模过程，以得到一组主题，这组主题无法通过选择的三种聚类比较措施之一作为决策标准而进一步改进。此演示是使用COVIDSenti-A数据集的子集进行的。初步成功让我们相信，进一步研究使用这种方法结合其他主题建模算法可能是可行的。

更新时间: 2024-07-25 09:26:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.17892v1

AutoRE: Document-Level Relation Extraction with Large Language Models

Large Language Models (LLMs) have demonstrated exceptional abilities in comprehending and generating text, motivating numerous researchers to utilize them for Information Extraction (IE) purposes, including Relation Extraction (RE). Nonetheless, most existing methods are predominantly designed for Sentence-level Relation Extraction (SentRE) tasks, which typically encompass a restricted set of relations and triplet facts within a single sentence. Furthermore, certain approaches resort to treating relations as candidate choices integrated into prompt templates, leading to inefficient processing and suboptimal performance when tackling Document-Level Relation Extraction (DocRE) tasks, which entail handling multiple relations and triplet facts distributed across a given document, posing distinct challenges. To overcome these limitations, we introduce AutoRE, an end-to-end DocRE model that adopts a novel RE extraction paradigm named RHF (Relation-Head-Facts). Unlike existing approaches, AutoRE does not rely on the assumption of known relation options, making it more reflective of real-world scenarios. Additionally, we have developed an easily extensible RE framework using a Parameters Efficient Fine Tuning (PEFT) algorithm (QLoRA). Our experiments on the RE-DocRED dataset showcase AutoRE's best performance, achieving state-of-the-art results, surpassing TAG by 10.03\% and 9.03\% respectively on the dev and test set. The code is available\url{https://github.com/THUDM/AutoRE} and the demonstration video is provided https://www.youtube.com/watch?v=IhKRsZUAxKk

Updated: 2024-07-25 09:19:06

标题: AutoRE：使用大型语言模型进行文档级关系抽取

摘要: 大型语言模型(LLMs)展示了在理解和生成文本方面的出色能力，激发了许多研究人员利用它们进行信息抽取(IE)目的，包括关系抽取(RE)。然而，大多数现有方法主要设计用于句级关系抽取(SentRE)任务，通常包括单个句子中的一组有限的关系和三元组事实。此外，某些方法采用将关系视为集成到提示模板中的候选选择，导致在处理文档级关系抽取(DocRE)任务时处理效率低下，性能不佳，这些任务涉及处理分布在给定文档中的多个关系和三元组事实，提出了不同的挑战。为了克服这些限制，我们引入了AutoRE，一个端到端的DocRE模型，采用了一种新颖的RE抽取范式称为RHF (Relation-Head-Facts)。与现有方法不同，AutoRE不依赖于已知关系选项的假设，使其更符合现实场景。此外，我们使用参数高效微调(PEFT)算法(QLoRA)开发了一个易于扩展的RE框架。我们在RE-DocRED数据集上的实验展示了AutoRE的最佳性能，实现了最先进的结果，在开发集和测试集上分别超过TAG 10.03%和9.03%。代码可在https://github.com/THUDM/AutoRE 上找到，演示视频可在https://www.youtube.com/watch?v=IhKRsZUAxKk 上观看。

更新时间: 2024-07-25 09:19:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.14888v2

Neural Fractional Differential Equations

Fractional Differential Equations (FDEs) are essential tools for modelling complex systems in science and engineering. They extend the traditional concepts of differentiation and integration to non-integer orders, enabling a more precise representation of processes characterised by non-local and memory-dependent behaviours. This property is useful in systems where variables do not respond to changes instantaneously, but instead exhibit a strong memory of past interactions. Having this in mind, and drawing inspiration from Neural Ordinary Differential Equations (Neural ODEs), we propose the Neural FDE, a novel deep neural network architecture that adjusts a FDE to the dynamics of data. This work provides a comprehensive overview of the numerical method employed in Neural FDEs and the Neural FDE architecture. The numerical outcomes suggest that, despite being more computationally demanding, the Neural FDE may outperform the Neural ODE in modelling systems with memory or dependencies on past states, and it can effectively be applied to learn more intricate dynamical systems.

Updated: 2024-07-25 09:18:24

标题: 神经分数阶微分方程

摘要: 分数阶微分方程（FDEs）是科学和工程中建模复杂系统的重要工具。它们将传统的微分和积分概念扩展到非整数阶，使得能够更精确地表示具有非局部和依赖记忆行为特征的过程。这种性质在变量不立即响应变化，而是表现出对过去相互作用的强烈记忆的系统中很有用。在这种情况下，借鉴神经常微分方程（Neural ODEs）的灵感，我们提出了神经FDE，这是一种新颖的深度神经网络结构，可以将FDE调整到数据的动态。本文提供了在神经FDEs和神经FDE架构中采用的数值方法的综合概述。数值结果表明，尽管计算量更大，神经FDE可能会在建模具有记忆或对过去状态有依赖性的系统方面胜过神经ODE，并且可以有效地应用于学习更复杂的动态系统。

更新时间: 2024-07-25 09:18:24

领域: cs.LG,cs.CE,cs.NA,math.NA,G.1, G.1.10, G.4, I.5.1

下载: http://arxiv.org/abs/2403.02737v2

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Efficiently serving large language models (LLMs) requires batching of many requests to reduce the cost per request. Yet, with larger batch sizes and longer context lengths, the key-value (KV) cache, which stores attention keys and values to avoid re-computations, significantly increases memory demands and becomes the new bottleneck in speed and memory usage. Additionally, the loading of the KV cache causes the computational core to be idle, which limits the inference speed. A straightforward and effective solution to reduce KV cache size is quantization, which decreases the total bytes taken by KV cache. However, there is a lack of in-depth studies that explore the element distribution of KV cache to understand the hardness and limitation of KV cache quantization. To fill the gap, we conducted a comprehensive study on the element distribution in KV cache of popular LLMs. Our findings indicate that the key cache should be quantized per-channel, i.e., group elements along the channel dimension and quantize them together. In contrast, the value cache should be quantized per-token. From this analysis, we developed a tuning-free 2bit KV cache quantization algorithm named KIVI. With hardware-friendly implementation, KIVI can enable Llama, Falcon, and Mistral models to maintain almost the same quality while using $\mathbf{2.6\times}$ less peak memory (including model weight). This reduction in memory usage enables up to $\mathbf{4\times}$ larger batch size, bringing $\mathbf{2.35\times \sim 3.47\times}$ throughput on real LLM inference workload. The source code is available at https://github.com/jy-yuan/KIVI.

Updated: 2024-07-25 09:16:05

标题: KIVI：一种用于KV缓存的无调整的非对称2位量化

摘要: 为了有效地为大型语言模型（LLMs）提供服务，需要对许多请求进行批处理，以减少每个请求的成本。然而，随着批处理大小和上下文长度的增加，存储注意力键和值以避免重新计算的键值（KV）缓存显着增加了内存需求，并成为速度和内存使用的新瓶颈。此外，加载KV缓存会导致计算核心处于空闲状态，从而限制推理速度。减少KV缓存大小的一种简单有效的解决方案是量化，可以减少KV缓存占用的总字节数。然而，目前缺乏深入研究，探讨KV缓存的元素分布，以了解KV缓存量化的难度和限制。为填补这一空白，我们进行了一项关于流行LLMs中KV缓存元素分布的全面研究。我们的发现表明，键缓存应该按通道进行量化，即沿通道维度对元素进行分组并一起量化。相反，值缓存应该按令牌进行量化。基于这一分析，我们开发了一种无需调整的2位KV缓存量化算法，名为KIVI。通过友好的硬件实现，KIVI可以使Llama、Falcon和Mistral模型在使用$\mathbf{2.6\times}$更少的峰值内存（包括模型权重）的情况下保持几乎相同的质量。这种内存使用量的减少可以实现高达$\mathbf{4\times}$更大的批处理大小，从而在真实的LLM推理工作负载中带来$\mathbf{2.35\times \sim 3.47\times}$的吞吐量。源代码可在https://github.com/jy-yuan/KIVI找到。

更新时间: 2024-07-25 09:16:05

领域: cs.CL,cs.LG,cs.PF

下载: http://arxiv.org/abs/2402.02750v2

When AI Eats Itself: On the Caveats of Data Pollution in the Era of Generative AI

Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimize outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabeled synthetic data. This trend portends a future where generative AI systems may increasingly rely blindly on consuming self-generated data, raising concerns about model performance and ethical issues. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects? There is a significant gap in the scientific literature regarding the impact of synthetic data use in generative AI, particularly in terms of the fusion of multimodal information. To address this research gap, this review investigates the consequences of integrating synthetic data blindly on training generative AI on both image and text modalities and explores strategies to mitigate these effects. The goal is to offer a comprehensive view of synthetic data's role, advocating for a balanced approach to its use and exploring practices that promote the sustainable development of generative AI technologies in the era of large models.

Updated: 2024-07-25 08:59:36

标题: 当人工智能自我消化时：在生成AI时代数据污染的注意事项

摘要: 产生式人工智能（AI）技术和大型模型在各个领域（如图像、文本、语音和音乐）产生逼真的输出。创建这些先进的生成模型需要大量资源，尤其是大规模和高质量的数据集。为了最大程度地减少培训费用，许多算法开发人员使用由模型本身创建的数据作为经济实惠的培训解决方案。然而，并非所有合成数据都能有效地提高模型性能，这需要在使用真实数据与合成数据之间进行战略平衡以优化结果。目前，先前良好控制的真实和合成数据的整合正变得难以控制。在线合成数据的广泛和无监管传播导致传统通过网络爬取编制的数据集受到污染，现在与未标记的合成数据混合在一起。这种趋势预示着未来，生成式AI系统可能越来越盲目地依赖于消耗自动生成的数据，引发对模型性能和道德问题的担忧。如果生成式AI持续消耗自身而不加区分，会发生什么？我们可以采取什么措施来减轻潜在的不良影响？在关于生成式AI中合成数据使用的影响方面，科学文献存在重大缺口，特别是在多模态信息融合方面。为了解决这一研究空白，本综述研究调查了盲目整合合成数据对图像和文本模态生成式AI训练的后果，并探讨了减轻这些影响的策略。目标是提供合成数据在生成式AI技术发展中的作用的全面视角，倡导对其使用采取平衡的方法，并探索促进大型模型时代生成式AI技术可持续发展的实践。

更新时间: 2024-07-25 08:59:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.09597v2

Unraveling the Never-Ending Story of Lifecycles and Vitalizing Processes

Business process management (BPM) has been widely used to discover, model, analyze, and optimize organizational processes. BPM looks at these processes with analysis techniques that assume a clearly defined start and end. However, not all processes adhere to this logic, with the consequence that their behavior cannot be appropriately captured by BPM analysis techniques. This paper addresses this research problem at a conceptual level. More specifically, we introduce the notion of vitalizing business processes that target the lifecycle process of one or more entities. We show the existence of lifecycle processes in many industries and that their appropriate conceptualizations pave the way for the definition of suitable modeling and analysis techniques. This paper provides a set of requirements for their analysis, and a conceptualization of lifecycle and vitalizing processes.

Updated: 2024-07-25 08:52:23

标题: 揭秘生命周期和活力过程永无止境的故事

摘要: 商业流程管理（BPM）被广泛用于发现、建模、分析和优化组织流程。BPM通过分析技术来审视这些流程，假定有明确定义的开始和结束。然而，并非所有流程都遵循这种逻辑，因此它们的行为无法被BPM分析技术恰当捕捉。本文在概念层面上解决了这一研究问题。更具体地说，我们介绍了活化业务流程的概念，其目标是一个或多个实体的生命周期流程。我们展示了许多行业中生命周期流程的存在，并且它们适当的概念化为适当的建模和分析技术的定义铺平了道路。本文提供了对其分析的一组要求，以及对生命周期和活化流程的概念化。

更新时间: 2024-07-25 08:52:23

领域: cs.DB,cs.AI,cs.SE

下载: http://arxiv.org/abs/2407.17881v1

DAM: Towards A Foundation Model for Time Series Forecasting

It is challenging to scale time series forecasting models such that they forecast accurately for multiple distinct domains and datasets, all with potentially different underlying collection procedures (e.g., sample resolution), patterns (e.g., periodicity), and prediction requirements (e.g., reconstruction vs. forecasting). We call this general task universal forecasting. Existing methods usually assume that input data is regularly sampled, and they forecast to pre-determined horizons, resulting in failure to generalise outside of the scope of their training. We propose the DAM - a neural model that takes randomly sampled histories and outputs an adjustable basis composition as a continuous function of time for forecasting to non-fixed horizons. It involves three key components: (1) a flexible approach for using randomly sampled histories from a long-tail distribution, that enables an efficient global perspective of the underlying temporal dynamics while retaining focus on the recent history; (2) a transformer backbone that is trained on these actively sampled histories to produce, as representational output, (3) the basis coefficients of a continuous function of time. We show that a single univariate DAM, trained on 25 time series datasets, either outperformed or closely matched existing SoTA models at multivariate long-term forecasting across 18 datasets, including 8 held-out for zero-shot transfer, even though these models were trained to specialise for each dataset-horizon combination. This single DAM excels at zero-shot transfer and very-long-term forecasting, performs well at imputation, is interpretable via basis function composition and attention, can be tuned for different inference-cost requirements, is robust to missing and irregularly sampled data {by design}.

Updated: 2024-07-25 08:48:07

标题: DAM：面向时间序列预测的基础模型

摘要: 这是一个具有挑战性的任务，即使针对多个不同领域和数据集，也要使时间序列预测模型能够准确预测，这些领域和数据集可能具有不同的基础采集程序（例如，样本分辨率）、模式（例如，周期性）和预测要求（例如，重建与预测）。我们称这个通用任务为通用预测。现有方法通常假设输入数据是定期采样的，并且它们预测到预先确定的时间范围，导致无法在其训练范围之外进行泛化。我们提出了DAM - 一个神经模型，它接受随机采样的历史数据，并输出一个随时间连续可调的基础组成，用于预测非固定的时间范围。它包括三个关键组件：（1）一种灵活的方法，用于使用来自长尾分布的随机采样历史数据，从而能够高效地全局视角地观察潜在的时间动态，同时保留对最近历史的关注；（2）一个transformer骨干，在这些主动采样的历史数据上进行训练，以产生作为代表性输出的基础系数的连续时间函数；（3）我们展示了单变量DAM，在25个时间序列数据集上训练，要么胜过，要么与现有的SoTA模型在18个数据集上的多变量长期预测紧密匹配，其中包括8个用于零-shot转移的数据集，尽管这些模型被训练为针对每个数据集-时间范围组合进行专门化。这个单一的DAM在零-shot转移和非常长期预测方面表现出色，对缺失数据具有很好的表现，通过基础函数组合和注意力是可解释的，可以调整以满足不同的推断成本要求，通过设计对缺失和不规则采样数据具有鲁棒性。

更新时间: 2024-07-25 08:48:07

领域: cs.LG

下载: http://arxiv.org/abs/2407.17880v1

HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline

Vision Transformer (ViT) acceleration with field programmable gate array (FPGA) is promising but challenging. Existing FPGA-based ViT accelerators mainly rely on temporal architectures, which process different operators by reusing the same hardware blocks and suffer from extensive memory access overhead. Pipelined architectures, either coarse-grained or fine-grained, unroll the ViT computation spatially for memory access efficiency. However, they usually suffer from significant hardware resource constraints and pipeline bubbles induced by the global computation dependency of ViT. In this paper, we introduce HG-PIPE, a pipelined FPGA accelerator for high-throughput and low-latency ViT processing. HG-PIPE features a hybrid-grained pipeline architecture to reduce on-chip buffer cost and couples the computation dataflow and parallelism design to eliminate the pipeline bubbles. HG-PIPE further introduces careful approximations to implement both linear and non-linear operators with abundant Lookup Tables (LUTs), thus alleviating resource constraints. On a ZCU102 FPGA, HG-PIPE achieves 2.78 times better throughput and 2.52 times better resource efficiency than the prior-art accelerators, e.g., AutoViTAcc. With a VCK190 FPGA, HG-PIPE realizes end-to-end ViT acceleration on a single device and achieves 7118 images/s, which is 2.81 times faster than a V100 GPU.

Updated: 2024-07-25 08:47:40

标题: HG-PIPE：混合粒度管道加速视觉变换器

摘要: 用可编程门阵列（FPGA）加速视觉变换器（ViT）是具有前景但具有挑战性的任务。现有基于FPGA的ViT加速器主要依赖于时间架构，通过重复使用相同的硬件块来处理不同的运算符，并且受到广泛的内存访问开销。流水线架构，无论是粗粒度还是细粒度，将ViT计算在空间上展开以提高内存访问效率。然而，它们通常受到显著的硬件资源约束和由ViT的全局计算依赖引起的流水线气泡的影响。在本文中，我们介绍了HG-PIPE，这是一个用于高吞吐量和低延迟ViT处理的流水线FPGA加速器。HG-PIPE具有混合粒度的流水线架构，以降低片上缓冲器成本，并将计算数据流和并行设计结合起来以消除流水线气泡。HG-PIPE进一步引入了谨慎的近似方法，以利用大量的查找表（LUT）来实现线性和非线性运算符，从而缓解资源约束。在ZCU102 FPGA上，HG-PIPE实现了比现有加速器（如AutoViTAcc）更高2.78倍的吞吐量和2.52倍的资源效率。在VCK190 FPGA上，HG-PIPE在单个设备上实现了端到端的ViT加速，并实现了7118张图像/秒的速度，比V100 GPU快2.81倍。

更新时间: 2024-07-25 08:47:40

领域: cs.AR,cs.AI,68T07

下载: http://arxiv.org/abs/2407.17879v1

Machine Learning for Equitable Load Shedding: Real-time Solution via Learning Binding Constraints

Timely and effective load shedding in power systems is critical for maintaining supply-demand balance and preventing cascading blackouts. To eliminate load shedding bias against specific regions in the system, optimization-based methods are uniquely positioned to help balance between economical and equity considerations. However, the resulting optimization problem involves complex constraints, which can be time-consuming to solve and thus cannot meet the real-time requirements of load shedding. To tackle this challenge, in this paper we present an efficient machine learning algorithm to enable millisecond-level computation for the optimization-based load shedding problem. Numerical studies on both a 3-bus toy example and a realistic RTS-GMLC system have demonstrated the validity and efficiency of the proposed algorithm for delivering equitable and real-time load shedding decisions.

Updated: 2024-07-25 08:47:11

标题: 机器学习用于公平的负荷 shedding：通过学习约束实时解决方案

摘要: 电力系统中及时有效的负荷 shedding 对于维持供需平衡和防止级联停电至关重要。为了消除系统中特定区域的负荷 shedding 偏见，基于优化的方法在经济和公平考虑之间帮助平衡的独特位置。然而，由此产生的优化问题涉及复杂的约束条件，解决起来可能耗时，因此无法满足负荷 shedding 的实时需求。为了解决这一挑战，在本文中，我们提出了一种高效的机器学习算法，实现基于优化的负荷 shedding 问题的毫秒级计算。对一个三母线玩具示例和一个现实的 RTS-GMLC 系统进行的数值研究证明了所提出算法在提供公平和实时负荷 shedding 决策方面的有效性和效率。

更新时间: 2024-07-25 08:47:11

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2407.18989v1

A Large-Scale Sensitivity Analysis on Latent Embeddings and Dimensionality Reductions for Text Spatializations

The semantic similarity between documents of a text corpus can be visualized using map-like metaphors based on two-dimensional scatterplot layouts. These layouts result from a dimensionality reduction on the document-term matrix or a representation within a latent embedding, including topic models. Thereby, the resulting layout depends on the input data and hyperparameters of the dimensionality reduction and is therefore affected by changes in them. Furthermore, the resulting layout is affected by changes in the input data and hyperparameters of the dimensionality reduction. However, such changes to the layout require additional cognitive efforts from the user. In this work, we present a sensitivity study that analyzes the stability of these layouts concerning (1) changes in the text corpora, (2) changes in the hyperparameter, and (3) randomness in the initialization. Our approach has two stages: data measurement and data analysis. First, we derived layouts for the combination of three text corpora and six text embeddings and a grid-search-inspired hyperparameter selection of the dimensionality reductions. Afterward, we quantified the similarity of the layouts through ten metrics, concerning local and global structures and class separation. Second, we analyzed the resulting 42817 tabular data points in a descriptive statistical analysis. From this, we derived guidelines for informed decisions on the layout algorithm and highlight specific hyperparameter settings. We provide our implementation as a Git repository at https://github.com/hpicgs/Topic-Models-and-Dimensionality-Reduction-Sensitivity-Study and results as Zenodo archive at https://doi.org/10.5281/zenodo.12772898.

Updated: 2024-07-25 08:46:49

标题: 一个关于文本空间化的潜在嵌入和降维的大规模敏感性分析

摘要: 文本语料库中文档之间的语义相似性可以通过基于二维散点图布局的地图类比来可视化。这些布局是通过对文档-术语矩阵进行降维或在潜在嵌入中进行表示（包括主题模型）而产生的。因此，所得到的布局取决于输入数据和降维的超参数，因此受到它们的变化影响。此外，输入数据和降维的超参数的变化也会影响所得到的布局。然而，对布局的此类更改会需要用户额外的认知努力。在这项工作中，我们进行了一项敏感性研究，分析了这些布局在（1）文本语料库变化、（2）超参数变化和（3）初始化中的随机性方面的稳定性。我们的方法分为两个阶段：数据测量和数据分析。首先，我们为三个文本语料库和六种文本嵌入的组合以及基于网格搜索的超参数选择的降维结果派生了布局。之后，我们通过十种度量标准量化了这些布局的相似性，涉及局部和全局结构以及类别分离。其次，我们通过描述性统计分析对结果为42817个表格数据点进行了分析。从中，我们得出了有关布局算法的明智决策的准则，并强调了特定的超参数设置。我们将我们的实现作为Git存储库提供，并在Zenodo存档中提供结果。Git存储库链接为https://github.com/hpicgs/Topic-Models-and-Dimensionality-Reduction-Sensitivity-Study，Zenodo存档链接为https://doi.org/10.5281/zenodo.12772898。

更新时间: 2024-07-25 08:46:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17876v1

Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions

End-to-end automatic speech recognition (E2E ASR) systems have significantly improved speech recognition through training on extensive datasets. Despite these advancements, they still struggle to accurately recognize domain specific words, such as proper nouns and technical terminologies. To address this problem, we propose a method to utilize the state-of-the-art Whisper without modifying its architecture, preserving its generalization performance while enabling it to leverage descriptions effectively. Moreover, we propose two additional training techniques to improve the domain specific ASR: decoder fine-tuning, and context perturbation. We also propose a method to use a Large Language Model (LLM) to generate descriptions with simple metadata, when descriptions are unavailable. Our experiments demonstrate that proposed methods notably enhance domain-specific ASR accuracy on real-life datasets, with LLM-generated descriptions outperforming human-crafted ones in effectiveness.

Updated: 2024-07-25 08:44:04

标题: 使用LLM生成的上下文描述改进特定领域的ASR

摘要: 端到端自动语音识别（E2E ASR）系统通过在大量数据集上进行训练，显著提高了语音识别的准确性。尽管取得了这些进展，但它们仍然难以准确识别领域特定词汇，如专有名词和技术术语。为了解决这个问题，我们提出了一种方法，利用最先进的Whisper而不修改其架构，保持其泛化性能的同时有效地利用描述信息。此外，我们提出了两种额外的训练技术来改进领域特定的ASR：解码器微调和上下文扰动。我们还提出了一种使用大型语言模型（LLM）生成具有简单元数据的描述的方法，当描述不可用时。我们的实验表明，所提出的方法显著提高了真实数据集上领域特定ASR的准确性，LLM生成的描述在效果上优于人工制作的描述。

更新时间: 2024-07-25 08:44:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.17874v1

Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?

In the era of generative AI, the widespread adoption of Neural Text Generators (NTGs) presents new cybersecurity challenges, particularly within the realms of Digital Forensics and Incident Response (DFIR). These challenges primarily involve the detection and attribution of sources behind advanced attacks like spearphishing and disinformation campaigns. As NTGs evolve, the task of distinguishing between human and NTG-authored texts becomes critically complex. This paper rigorously evaluates the DFIR pipeline tailored for text-based security systems, specifically focusing on the challenges of detecting and attributing authorship of NTG-authored texts. By introducing a novel human-NTG co-authorship text attack, termed CS-ACT, our study uncovers significant vulnerabilities in traditional DFIR methodologies, highlighting discrepancies between ideal scenarios and real-world conditions. Utilizing 14 diverse datasets and 43 unique NTGs, up to the latest GPT-4, our research identifies substantial vulnerabilities in the forensic profiling phase, particularly in attributing authorship to NTGs. Our comprehensive evaluation points to factors such as model sophistication and the lack of distinctive style within NTGs as significant contributors for these vulnerabilities. Our findings underscore the necessity for more sophisticated and adaptable strategies, such as incorporating adversarial learning, stylizing NTGs, and implementing hierarchical attribution through the mapping of NTG lineages to enhance source attribution. This sets the stage for future research and the development of more resilient text-based security systems.

Updated: 2024-07-25 08:42:53

标题: 数字取证和事件响应管道是否已经准备好迎接LLM时代的基于文本的威胁？

摘要: 在生成式人工智能时代，神经文本生成器（NTG）的广泛应用提出了新的网络安全挑战，特别是在数字取证和事件响应（DFIR）领域。这些挑战主要涉及检测和归因于高级攻击背后的来源，如钓鱼和虚假信息宣传。随着NTG的发展，区分人类和NTG编写的文本的任务变得非常复杂。本文对专为基于文本的安全系统设计的DFIR管道进行了严格评估，特别关注检测和归因于NTG编写的文本的挑战。通过引入一种新颖的人类-NTG共同作者文本攻击，称为CS-ACT，我们的研究揭示了传统DFIR方法论中的重大漏洞，突出了理想情况和现实条件之间的差异。利用14个不同的数据集和43个独特的NTG，直到最新的GPT-4，我们的研究发现在取证分析阶段存在重大漏洞，尤其是在将作者归因于NTG方面。我们的综合评估指出，模型复杂性和NTG内缺乏独特风格等因素是这些漏洞的重要原因。我们的发现强调了更复杂和适应性更强的策略的必要性，如整合对抗性学习、为NTG添加风格，并通过将NTG谱系映射实现层次归因以增强来源归因。这为未来研究和开发更具韧性的基于文本的安全系统奠定了基础。

更新时间: 2024-07-25 08:42:53

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2407.17870v1

EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

Ellipsometry is used to indirectly measure the optical properties and thickness of thin films. However, solving the inverse problem of ellipsometry is time-consuming since it involves human expertise to apply the data fitting techniques. Many studies use traditional machine learning-based methods to model the complex mathematical fitting process. In our work, we approach this problem from a deep learning perspective. First, we introduce a large-scale benchmark dataset to facilitate deep learning methods. The proposed dataset encompasses 98 types of thin film materials and 4 types of substrate materials, including metals, alloys, compounds, and polymers, among others. Additionally, we propose a deep learning framework that leverages residual connections and self-attention mechanisms to learn the massive data points. We also introduce a reconstruction loss to address the common challenge of multiple solutions in thin film thickness prediction. Compared to traditional machine learning methods, our framework achieves state-of-the-art (SOTA) performance on our proposed dataset. The dataset and code will be available upon acceptance.

Updated: 2024-07-25 08:42:23

标题: EllipBench：基于机器学习的椭偏测厚建模的大规模基准测试

摘要: 椭圆偏振法被用来间接测量薄膜的光学性质和厚度。然而，解决椭圆偏振的反问题是耗时的，因为它需要人类专业知识来应用数据拟合技术。许多研究使用传统的基于机器学习的方法来建模复杂的数学拟合过程。在我们的工作中，我们从深度学习的角度来解决这个问题。首先，我们引入一个大规模的基准数据集来促进深度学习方法。所提出的数据集包含98种薄膜材料和4种基底材料，包括金属、合金、化合物和聚合物等。此外，我们提出了一个深度学习框架，利用残差连接和自注意机制来学习大量的数据点。我们还引入了重构损失来解决薄膜厚度预测中常见的多解决方案挑战。与传统的机器学习方法相比，我们的框架在我们提出的数据集上实现了最先进的性能。数据集和代码将在接受后提供。

更新时间: 2024-07-25 08:42:23

领域: cs.LG

下载: http://arxiv.org/abs/2407.17869v1

Financial Statement Analysis with Large Language Models

We investigate whether an LLM can successfully perform financial statement analysis in a way similar to a professional human analyst. We provide standardized and anonymous financial statements to GPT4 and instruct the model to analyze them to determine the direction of future earnings. Even without any narrative or industry-specific information, the LLM outperforms financial analysts in its ability to predict earnings changes. The LLM exhibits a relative advantage over human analysts in situations when the analysts tend to struggle. Furthermore, we find that the prediction accuracy of the LLM is on par with the performance of a narrowly trained state-of-the-art ML model. LLM prediction does not stem from its training memory. Instead, we find that the LLM generates useful narrative insights about a company's future performance. Lastly, our trading strategies based on GPT's predictions yield a higher Sharpe ratio and alphas than strategies based on other models. Taken together, our results suggest that LLMs may take a central role in decision-making.

Updated: 2024-07-25 08:36:58

标题: 大语言模型在财务报表分析中的应用

摘要: 我们调查了一个LLM是否能够成功地进行财务报表分析，类似于专业的人类分析师。我们向GPT4提供了标准化和匿名的财务报表，并指导该模型分析它们以确定未来盈利的方向。即使没有任何叙述性或行业特定信息，LLM在预测盈利变化方面表现优于财务分析师。LLM在分析师往往难以应对的情况下具有相对优势。此外，我们发现LLM的预测准确率与最先进的狭义训练机器学习模型的表现相当。LLM的预测并非源自其训练记忆。相反，我们发现LLM产生了有关公司未来表现的有用叙述性见解。最后，基于GPT的预测的交易策略产生了比基于其他模型的策略更高的夏普比率和阿尔法。综上所述，我们的结果表明LLM可能在决策中发挥核心作用。

更新时间: 2024-07-25 08:36:58

领域: q-fin.ST,cs.AI,cs.CL,q-fin.GN,q-fin.PM

下载: http://arxiv.org/abs/2407.17866v1

Batchless Normalization: How to Normalize Activations Across Instances with Minimal Memory Requirements

In training neural networks, batch normalization has many benefits, not all of them entirely understood. But it also has some drawbacks. Foremost is arguably memory consumption, as computing the batch statistics requires all instances within the batch to be processed simultaneously, whereas without batch normalization it would be possible to process them one by one while accumulating the weight gradients. Another drawback is that that distribution parameters (mean and standard deviation) are unlike all other model parameters in that they are not trained using gradient descent but require special treatment, complicating implementation. In this paper, I show a simple and straightforward way to address these issues. The idea, in short, is to add terms to the loss that, for each activation, cause the minimization of the negative log likelihood of a Gaussian distribution that is used to normalize the activation. Among other benefits, this will hopefully contribute to the democratization of AI research by means of lowering the hardware requirements for training larger models.

Updated: 2024-07-25 08:34:58

标题: 无批量规范化：如何以最小内存需求规范化跨实例激活

摘要: 在训练神经网络时，批量归一化有许多好处，但并非所有都完全理解。但它也有一些缺点。首要的是内存消耗，因为计算批量统计需要同时处理批量内的所有实例，而没有批量归一化的情况下，可以逐个处理它们，同时累积权重梯度。另一个缺点是分布参数（均值和标准差）与所有其他模型参数不同，因为它们不是使用梯度下降进行训练，而是需要特殊处理，使得实现复杂化。在本文中，我展示了一种简单直接的方法来解决这些问题。简言之，这个想法是向损失添加项，对于每个激活，导致最小化用于归一化激活的高斯分布的负对数似然。除其他好处外，这将有望通过降低训练更大模型的硬件要求来促进人工智能研究的民主化。

更新时间: 2024-07-25 08:34:58

领域: cs.LG,cs.NE,I.2.6

下载: http://arxiv.org/abs/2212.14729v2

Bayesian Modelling in Practice: Using Uncertainty to Improve Trustworthiness in Medical Applications

The Intensive Care Unit (ICU) is a hospital department where machine learning has the potential to provide valuable assistance in clinical decision making. Classical machine learning models usually only provide point-estimates and no uncertainty of predictions. In practice, uncertain predictions should be presented to doctors with extra care in order to prevent potentially catastrophic treatment decisions. In this work we show how Bayesian modelling and the predictive uncertainty that it provides can be used to mitigate risk of misguided prediction and to detect out-of-domain examples in a medical setting. We derive analytically a bound on the prediction loss with respect to predictive uncertainty. The bound shows that uncertainty can mitigate loss. Furthermore, we apply a Bayesian Neural Network to the MIMIC-III dataset, predicting risk of mortality of ICU patients. Our empirical results show that uncertainty can indeed prevent potential errors and reliably identifies out-of-domain patients. These results suggest that Bayesian predictive uncertainty can greatly improve trustworthiness of machine learning models in high-risk settings such as the ICU.

Updated: 2024-07-25 08:29:47

标题: 贝叶斯建模实践：利用不确定性提高医疗应用的可信度

摘要: 重症监护室（ICU）是医院部门，机器学习有潜力为临床决策提供有价值的帮助。传统的机器学习模型通常只提供点估计而不提供预测的不确定性。在实践中，不确定的预测应该以额外的注意呈现给医生，以防止潜在的灾难性治疗决策。在这项工作中，我们展示了如何使用贝叶斯建模和其提供的预测不确定性来降低误导性预测的风险，并在医疗环境中检测出领域示例。我们在预测不确定性方面推导了预测损失的一个界限。这个界限显示了不确定性可以降低损失。此外，我们将贝叶斯神经网络应用于MIMIC-III数据集，预测ICU患者的死亡风险。我们的实证结果表明，不确定性确实可以防止潜在错误并可靠地识别出领域外的患者。这些结果表明，贝叶斯预测不确定性可以极大地提高高风险环境（如ICU）中机器学习模型的可信度。

更新时间: 2024-07-25 08:29:47

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/1906.08619v2

Knowledge boosting during low-latency inference

Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time. We propose knowledge boosting, a novel technique that allows a large model to operate on time-delayed input during inference, while still boosting small model performance. Using a streaming neural network that processes 8 ms chunks, we evaluate different speech separation and enhancement tasks with communication delays of up to six chunks or 48 ms. Our results show larger gains where the performance gap between the small and large models is wide, demonstrating a promising method for large-small model collaboration for low-latency applications. Code, dataset, and audio samples available at https://knowledgeboosting.cs.washington.edu/.

Updated: 2024-07-25 08:26:35

标题: 低延迟推理中的知识增强

摘要: 低延迟流应用程序的模型可能会受益于更大模型的知识容量，但边缘设备由于资源限制无法运行这些模型。一个可能的解决方案是在推理过程中从远程运行的大型模型向在设备上运行的小型模型传输提示。然而，这会导致通信延迟，打破实时要求，并不能保证两个模型将同时在相同的数据上运行。我们提出了知识增强技术，这是一种新颖的技术，允许大型模型在推理过程中操作具有时间延迟输入，同时提高小型模型的性能。我们使用处理8毫秒数据块的流式神经网络来评估不同的语音分离和增强任务，通信延迟可达六块或48毫秒。我们的结果显示，在小型和大型模型之间性能差距较大的地方获得更大的收益，展示了一种有前途的大型-小型模型协作方法，用于低延迟应用程序。代码、数据集和音频样本可在https://knowledgeboosting.cs.washington.edu/找到。

更新时间: 2024-07-25 08:26:35

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2407.11055v3

Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network

Recent advancements in graph-based approaches for multiplexed immunofluorescence (mIF) images have significantly propelled the field forward, offering deeper insights into patient-level phenotyping. However, current graph-based methodologies encounter two primary challenges: (1) Cellular Heterogeneity, where existing approaches fail to adequately address the inductive biases inherent in graphs, particularly the homophily characteristic observed in cellular connectivity and; (2) Scalability, where handling cellular graphs from high-dimensional images faces difficulties in managing a high number of cells. To overcome these limitations, we introduce Mew, a novel framework designed to efficiently process mIF images through the lens of multiplex network. Mew innovatively constructs a multiplex network comprising two distinct layers: a Voronoi network for geometric information and a Cell-type network for capturing cell-wise homogeneity. This framework equips a scalable and efficient Graph Neural Network (GNN), capable of processing the entire graph during training. Furthermore, Mew integrates an interpretable attention module that autonomously identifies relevant layers for image classification. Extensive experiments on a real-world patient dataset from various institutions highlight Mew's remarkable efficacy and efficiency, marking a significant advancement in mIF image analysis. The source code of Mew can be found here: \url{https://github.com/UNITES-Lab/Mew}

Updated: 2024-07-25 08:22:30

标题: Mew：通过高效的多重网络进行多重免疫荧光图像分析

摘要: 最近，在基于图的方法方面取得了显著进展，用于多重免疫荧光（mIF）图像，大大推动了该领域的发展，为患者级表型提供了更深入的洞察。然而，目前基于图的方法遇到两个主要挑战：（1）细胞异质性，现有方法未能充分解决图中固有的感应偏差，特别是在细胞连接性中观察到的同质性特征；（2）可扩展性，处理来自高维图像的细胞图面临管理大量细胞的困难。为了克服这些限制，我们引入了Mew，一种新颖的框架，旨在通过多重网络的视角高效处理mIF图像。Mew创新地构建了一个多重网络，包括两个不同的层：用于几何信息的Voronoi网络和用于捕获细胞同质性的细胞类型网络。该框架配备了一个可扩展和高效的图神经网络（GNN），能够在训练期间处理整个图。此外，Mew集成了一个可解释的注意模块，自主识别图像分类的相关层。对来自各个机构的真实患者数据集进行的大量实验突显了Mew的显著功效和效率，标志着mIF图像分析的重大进展。Mew的源代码可以在此找到：\url{https://github.com/UNITES-Lab/Mew}

更新时间: 2024-07-25 08:22:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17857v1

MDS-ED: Multimodal Decision Support in the Emergency Department -- a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine

Background: Benchmarking medical decision support algorithms often struggles due to limited access to datasets, narrow prediction tasks, and restricted input modalities. These limitations affect their clinical relevance and performance in high-stakes areas like emergency care, complicating replication, validation, and improvement of benchmarks. Methods: We introduce a dataset based on MIMIC-IV, benchmarking protocol, and initial results for evaluating multimodal decision support in the emergency department (ED). We use diverse data modalities from the first 1.5 hours of patient arrival, including demographics, biometrics, vital signs, lab values, and electrocardiogram waveforms. We analyze 1443 clinical labels across two contexts: predicting diagnoses with ICD-10 codes and forecasting patient deterioration. Results: Our multimodal diagnostic model achieves an AUROC score over 0.8 in a statistically significant manner for 357 out of 1428 conditions, including cardiac issues like myocardial infarction and non-cardiac conditions such as renal disease and diabetes. The deterioration model scores above 0.8 in a statistically significant manner for 13 out of 15 targets, including critical events like cardiac arrest and mechanical ventilation, ICU admission as well as short- and long-term mortality. Incorporating raw waveform data significantly improves model performance, which represents one of the first robust demonstrations of this effect. Conclusions: This study highlights the uniqueness of our dataset, which encompasses a wide range of clinical tasks and utilizes a comprehensive set of features collected early during the emergency after arriving at the ED. The strong performance, as evidenced by high AUROC scores across diagnostic and deterioration targets, underscores the potential of our approach to revolutionize decision-making in acute and emergency medicine.

Updated: 2024-07-25 08:21:46

标题: MDS-ED：急诊科的多模式决策支持--用于急诊医学诊断和恶化预测的基准数据集

摘要: 背景：由于对数据集的有限访问、狭窄的预测任务和受限制的输入模式，基准测试医疗决策支持算法往往面临困难。这些限制影响了它们在高风险领域如急救护理中的临床相关性和性能，使得基准测试的复制、验证和改进变得复杂。方法：我们引入了一个基于MIMIC-IV的数据集、基准测试协议和初步结果，用于评估急诊科（ED）中的多模态决策支持。我们使用患者到达的前1.5小时内的多种数据模式，包括人口统计、生物测量、生命体征、实验室指标和心电图波形。我们在两个情境下分析了1443个临床标签：预测具有ICD-10编码的诊断和预测患者恶化。结果：我们的多模态诊断模型在357种疾病中以显著方式获得了超过0.8的AUROC分数，包括心脏问题如心肌梗死和非心脏疾病如肾病和糖尿病。恶化模型在15个目标中有13个以显著方式获得了超过0.8的分数，包括心脏骤停和机械通气等关键事件，以及ICU入院以及短期和长期死亡。整合原始波形数据显著提高了模型性能，这代表了这种效果的一个最初的强有力的证明。结论：本研究突出了我们数据集的独特性，该数据集涵盖了广泛的临床任务，并利用了患者到达急诊科后早期收集的全面特征集。高AUROC得分表明了我们方法在诊断和恶化目标上的强大表现，强调了我们的方法在急性和急救医学中改变决策的潜力。

更新时间: 2024-07-25 08:21:46

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.17856v1

VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions

High-performance IO demands low-overhead communication between user- and kernel space. This demand can no longer be fulfilled by traditional system calls. Linux's extended Berkeley Packet Filter (BPF) avoids user-/kernel transitions by just-in-time compiling user-provided bytecode and executing it in kernel mode with near-native speed. To still isolate BPF programs from the kernel, they are statically analyzed for memory- and type-safety, which imposes some restrictions but allows for good expressiveness and high performance. However, to mitigate the Spectre vulnerabilities disclosed in 2018, defenses which reject potentially-dangerous programs had to be deployed. We find that this affects 31% to 54% of programs in a dataset with 844 real-world BPF programs from popular open-source projects. To solve this, users are forced to disable the defenses to continue using the programs, which puts the entire system at risk. To enable secure and expressive untrusted Linux kernel extensions, we propose VeriFence, an enhancement to the kernel's Spectre defenses that reduces the number of BPF application programs rejected from 54% to zero. We measure VeriFence's overhead for all mainstream performance-sensitive applications of BPF (i.e., event tracing, profiling, and packet processing) and find that it improves significantly upon the status-quo where affected BPF programs are either unusable or enable transient execution attacks on the kernel.

Updated: 2024-07-25 08:21:42

标题: VeriFence：轻量级且精确的用于不受信任的Linux内核扩展的Spectre防御

摘要: 高性能IO要求用户空间和内核空间之间的通信开销低。传统的系统调用已经无法满足这种需求。Linux的扩展Berkeley数据包过滤器（BPF）通过即时编译用户提供的字节码并在内核模式下以接近本机速度执行来避免用户/内核转换。为了仍然将BPF程序与内核隔离开来，它们会被静态分析以确保内存和类型安全，这会施加一些限制但允许良好的表达能力和高性能。然而，为了缓解2018年披霭漏洞，必须部署拒绝潜在危险程序的防御措施。我们发现，在包含来自流行开源项目的844个真实BPF程序的数据集中，有31%至54%的程序受到影响。为了解决这个问题，用户被迫禁用防御措施以继续使用程序，这会使整个系统面临风险。为了实现安全和表现力强的不受信任的Linux内核扩展，我们提出了VeriFence，它是内核Spectre防御的增强版本，将被拒绝的BPF应用程序数量从54%降至零。我们测量了VeriFence在所有主流性能敏感的BPF应用程序（即事件跟踪、性能分析和数据包处理）上的开销，并发现它在改善受影响的BPF程序要么无法使用，要么使内核易受短暂执行攻击的现状上有显著改进。

更新时间: 2024-07-25 08:21:42

领域: cs.CR,cs.OS,68M25,D.4.6

下载: http://arxiv.org/abs/2405.00078v2

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary "flat" (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360$^{\circ}$ perspective, providing an enhanced immersive experience over existing techniques. Project website at: http://dreamscene360.github.io/

Updated: 2024-07-25 08:19:53

标题: DreamScene360：无约束的文本到3D场景生成与全景高斯飞溅

摘要: 随着虚拟现实应用需求的增加，制作沉浸式3D资产的重要性凸显出来。我们提出了一种文本到3D 360$^{\circ}$场景生成管道，可以在几分钟内轻松创建野外环境的全面360$^{\circ}$场景。我们的方法利用2D扩散模型的生成能力和快速自我完善，创建高质量和全局一致的全景图像。这个图像作为一个初步的“平面”（2D）场景表示。随后，它被提升到3D高斯模型，利用点状技术实现实时探索。为了产生一致的3D几何形状，我们的管道通过将2D单眼深度对齐到全局优化的点云来构建一个空间连贯的结构。这个点云作为3D高斯模型的质心的初始状态。为了解决单视图输入中固有的不可见问题，我们对合成和输入相机视图施加语义和几何约束作为正则化。这些指导高斯模型的优化，有助于重建看不见的区域。总之，我们的方法提供了一个全球一致的360$^{\circ}$视角的3D场景，比现有技术提供了更加增强的沉浸式体验。项目网站：http://dreamscene360.github.io/

更新时间: 2024-07-25 08:19:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.06903v2

Shapley Value-based Contrastive Alignment for Multimodal Information Extraction

The rise of social media and the exponential growth of multimodal communication necessitates advanced techniques for Multimodal Information Extraction (MIE). However, existing methodologies primarily rely on direct Image-Text interactions, a paradigm that often faces significant challenges due to semantic and modality gaps between images and text. In this paper, we introduce a new paradigm of Image-Context-Text interaction, where large multimodal models (LMMs) are utilized to generate descriptive textual context to bridge these gaps. In line with this paradigm, we propose a novel Shapley Value-based Contrastive Alignment (Shap-CA) method, which aligns both context-text and context-image pairs. Shap-CA initially applies the Shapley value concept from cooperative game theory to assess the individual contribution of each element in the set of contexts, texts and images towards total semantic and modality overlaps. Following this quantitative evaluation, a contrastive learning strategy is employed to enhance the interactive contribution within context-text/image pairs, while minimizing the influence across these pairs. Furthermore, we design an adaptive fusion module for selective cross-modal fusion. Extensive experiments across four MIE datasets demonstrate that our method significantly outperforms existing state-of-the-art methods.

Updated: 2024-07-25 08:15:43

标题: 基于Shapley值的多模态信息提取对比对齐

摘要: 社交媒体的崛起和多模态通信的指数增长需要先进的多模态信息提取（MIE）技术。然而，现有方法主要依赖于直接的图像-文本交互作用，这种范式往往面临由于图像和文本之间的语义和模态差距而带来的重大挑战。在本文中，我们引入了一种新的图像-上下文-文本交互范式，其中利用大型多模态模型（LMMs）生成描述性文本上下文以弥合这些差距。根据这一范式，我们提出了一种基于Shapley值的对比对齐（Shap-CA）方法，该方法对齐上下文-文本和上下文-图像对。Shap-CA首先将合作博弈理论中的Shapley值概念应用于评估上下文、文本和图像集合中每个元素对总语义和模态重叠的个体贡献。在进行这种定量评估后，采用对比学习策略来增强上下文-文本/图像对之间的交互贡献，同时最小化这些对之间的影响。此外，我们设计了一个自适应融合模块用于选择性跨模态融合。在四个MIE数据集上进行的大量实验表明，我们的方法明显优于现有最先进的方法。

更新时间: 2024-07-25 08:15:43

领域: cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2407.17854v1

Unsupervised Outlier Detection using Random Subspace and Subsampling Ensembles of Dirichlet Process Mixtures

Probabilistic mixture models are recognized as effective tools for unsupervised outlier detection owing to their interpretability and global characteristics. Among these, Dirichlet process mixture models stand out as a strong alternative to conventional finite mixture models for both clustering and outlier detection tasks. Unlike finite mixture models, Dirichlet process mixtures are infinite mixture models that automatically determine the number of mixture components based on the data. Despite their advantages, the adoption of Dirichlet process mixture models for unsupervised outlier detection has been limited by challenges related to computational inefficiency and sensitivity to outliers in the construction of outlier detectors. Additionally, Dirichlet process Gaussian mixtures struggle to effectively model non-Gaussian data with discrete or binary features. To address these challenges, we propose a novel outlier detection method that utilizes ensembles of Dirichlet process Gaussian mixtures. This unsupervised algorithm employs random subspace and subsampling ensembles to ensure efficient computation and improve the robustness of the outlier detector. The ensemble approach further improves the suitability of the proposed method for detecting outliers in non-Gaussian data. Furthermore, our method uses variational inference for Dirichlet process mixtures, which ensures both efficient and rapid computation. Empirical analyses using benchmark datasets demonstrate that our method outperforms existing approaches in unsupervised outlier detection.

Updated: 2024-07-25 08:13:27

标题: 使用随机子空间和Dirichlet过程混合的子采样集成进行无监督异常检测

摘要: 概率混合模型被认为是一种有效的工具，用于无监督异常检测，因为它们具有可解释性和全局特征。在这些模型中，狄利克雷过程混合模型作为传统有限混合模型的强有力替代品，在聚类和异常检测任务中脱颖而出。与有限混合模型不同，狄利克雷过程混合模型是自动根据数据确定混合组件数量的无限混合模型。尽管具有这些优势，狄利克雷过程混合模型在无监督异常检测中的应用受到与计算效率和在构建异常检测器时对异常值的敏感性有关的挑战的限制。此外，狄利克雷过程高斯混合模型在有效建模具有离散或二进制特征的非高斯数据方面存在困难。为了解决这些挑战，我们提出了一种利用狄利克雷过程高斯混合物集的新颖异常检测方法。这种无监督算法采用随机子空间和子采样集来确保高效计算并提高异常检测器的鲁棒性。集成方法进一步提高了所提出方法在非高斯数据中检测异常值的适用性。此外，我们的方法使用变分推断进行狄利克雷过程混合，确保了高效和快速的计算。使用基准数据集进行的实证分析表明，我们的方法在无监督异常检测中优于现有方法。

更新时间: 2024-07-25 08:13:27

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2401.00773v3

COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images

Deep learning is dramatically transforming the field of medical imaging and radiology, enabling the identification of pathologies in medical images, including computed tomography (CT) and X-ray scans. However, the performance of deep learning models, particularly in segmentation tasks, is often limited by the need for extensive annotated datasets. To address this challenge, the capabilities of weakly supervised semantic segmentation are explored through the lens of Explainable AI and the generation of counterfactual explanations. The scope of this research is development of a novel counterfactual inpainting approach (COIN) that flips the predicted classification label from abnormal to normal by using a generative model. For instance, if the classifier deems an input medical image X as abnormal, indicating the presence of a pathology, the generative model aims to inpaint the abnormal region, thus reversing the classifier's original prediction label. The approach enables us to produce precise segmentations for pathologies without depending on pre-existing segmentation masks. Crucially, image-level labels are utilized, which are substantially easier to acquire than creating detailed segmentation masks. The effectiveness of the method is demonstrated by segmenting synthetic targets and actual kidney tumors from CT images acquired from Tartu University Hospital in Estonia. The findings indicate that COIN greatly surpasses established attribution methods, such as RISE, ScoreCAM, and LayerCAM, as well as an alternative counterfactual explanation method introduced by Singla et al. This evidence suggests that COIN is a promising approach for semantic segmentation of tumors in CT images, and presents a step forward in making deep learning applications more accessible and effective in healthcare, where annotated data is scarce.

Updated: 2024-07-25 08:09:12

标题: COIN：用于医学图像弱监督语义分割的对抗性修复

摘要: 深度学习正在显著改变医学影像学和放射学领域，使得在医学图像中（包括计算机断层扫描和X射线扫描）能够识别病变成为可能。然而，深度学习模型的性能，特别是在分割任务中，通常受到对大量标注数据集的需求限制。为了解决这一挑战，通过可解释人工智能的视角和对反事实解释的生成，探讨了弱监督语义分割的能力。该研究的范围是开发一种新颖的反事实修补方法（COIN），通过使用生成模型将预测的分类标签从异常翻转为正常。例如，如果分类器将输入的医学图像X诊断为异常，表示存在病变，生成模型旨在修补异常区域，从而扭转分类器的原始预测标签。该方法使我们能够为病变产生精确的分割，而无需依赖预先存在的分割蒙版。关键是利用图像级标签，这比创建详细的分割蒙版要容易得多。该方法的有效性通过在爱沙尼亚塔尔图大学医院获取的CT图像中分割合成目标和实际肾脏肿瘤得到证实。研究结果表明，COIN远远超过了已建立的归因方法，如RISE、ScoreCAM和LayerCAM，以及Singla等人引入的另一种反事实解释方法。这一证据表明，COIN是CT图像中肿瘤语义分割的一种有前途的方法，并为使深度学习应用在医疗保健领域更加易于获取和有效提供了一步。

更新时间: 2024-07-25 08:09:12

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.12832v2

ServerlessLLM: Low-Latency Serverless Inference for Large Language Models

This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs). By harnessing the substantial near-GPU storage and memory capacities of inference servers, ServerlessLLM achieves effective local checkpoint storage, minimizing the need for remote checkpoint downloads and ensuring efficient checkpoint loading. The design of ServerlessLLM features three core contributions: (i) \emph{fast multi-tier checkpoint loading}, featuring a new loading-optimized checkpoint format and a multi-tier loading system, fully utilizing the bandwidth of complex storage hierarchies on GPU servers; (ii) \emph{efficient live migration of LLM inference}, which enables newly initiated inferences to capitalize on local checkpoint storage while ensuring minimal user interruption; and (iii) \emph{startup-time-optimized model scheduling}, which assesses the locality statuses of checkpoints on each server and schedules the model onto servers that minimize the time to start the inference. Comprehensive evaluations, including microbenchmarks and real-world scenarios, demonstrate that ServerlessLLM dramatically outperforms state-of-the-art serverless systems, reducing latency by 10 - 200X across various LLM inference workloads.

Updated: 2024-07-25 08:08:11

标题: ServerlessLLM: 用于大型语言模型的低延迟无服务器推理

摘要: 本文介绍了ServerlessLLM，这是一个分布式系统，旨在支持大型语言模型（LLMs）的低延迟无服务器推断。通过利用推断服务器的大量接近GPU存储和内存容量，ServerlessLLM实现了有效的本地检查点存储，最大限度地减少了远程检查点下载的需求，确保了高效的检查点加载。ServerlessLLM的设计具有三个核心贡献：（i）\emph{快速多级检查点加载}，采用新的加载优化检查点格式和多级加载系统，充分利用GPU服务器上复杂存储层次结构的带宽；（ii）\emph{高效的LLM推断实时迁移}，使新发起的推断能够充分利用本地检查点存储，同时确保最小的用户中断；以及（iii）\emph{启动时间优化的模型调度}，评估每台服务器上检查点的局部状态，并将模型调度到最小化开始推断时间的服务器上。全面的评估，包括微基准测试和真实场景，表明ServerlessLLM在各种LLM推断工作负载下将延迟降低了10-200倍，明显优于最先进的无服务器系统。

更新时间: 2024-07-25 08:08:11

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2401.14351v2

Identifying Semantic Induction Heads to Understand In-Context Learning

Although large language models (LLMs) have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness. To gain a better understanding of LLMs, we conduct a detailed analysis of the operations of attention heads and aim to better understand the in-context learning of LLMs. Specifically, we investigate whether attention heads encode two types of relationships between tokens present in natural languages: the syntactic dependency parsed from sentences and the relation within knowledge graphs. We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens. More crucially, the formulation of such semantic induction heads has a close correlation with the emergence of the in-context learning ability of language models. The study of semantic attention heads advances our understanding of the intricate operations of attention heads in transformers, and further provides new insights into the in-context learning of LLMs.

Updated: 2024-07-25 08:07:39

标题: 识别语义感知头以理解上下文学习

摘要: 尽管大型语言模型（LLMs）表现出色，但由于推理逻辑缺乏透明度，引发了对其可信度的担忧。为了更好地理解LLMs，我们对注意力头部的操作进行了详细分析，旨在更好地理解LLMs的上下文学习。具体而言，我们调查了注意力头部是否编码自然语言中存在的两种关系：从句子中解析的句法依赖关系和知识图中的关系。我们发现，某些注意力头部表现出一种模式，当关注头部标记时，它们会回想起尾部标记，并增加这些尾部标记的输出逻辑。更重要的是，这种语义归纳头部的制定与语言模型的上下文学习能力的出现密切相关。对语义注意力头部的研究推动了我们对变压器中注意力头部复杂操作的理解，并进一步提供了对LLMs上下文学习的新见解。

更新时间: 2024-07-25 08:07:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.13055v2

Physics-guided machine learning predicts the planet-scale performance of solar farms with sparse, heterogeneous, public data

The photovoltaics (PV) technology landscape is evolving rapidly. To predict the potential and scalability of emerging PV technologies, a global understanding of these systems' performance is essential. Traditionally, experimental and computational studies at large national research facilities have focused on PV performance in specific regional climates. However, synthesizing these regional studies to understand the worldwide performance potential has proven difficult. Given the expense of obtaining experimental data, the challenge of coordinating experiments at national labs across a politically-divided world, and the data-privacy concerns of large commercial operators, however, a fundamentally different, data-efficient approach is desired. Here, we present a physics-guided machine learning (PGML) scheme to demonstrate that: (a) The world can be divided into a few PV-specific climate zones, called PVZones, illustrating that the relevant meteorological conditions are shared across continents; (b) by exploiting the climatic similarities, high-quality monthly energy yield data from as few as five locations can accurately predict yearly energy yield potential with high spatial resolution and a root mean square error of less than 8 kWhm$^{2}$, and (c) even with noisy, heterogeneous public PV performance data, the global energy yield can be predicted with less than 6% relative error compared to physics-based simulations provided that the dataset is representative. This PGML scheme is agnostic to PV technology and farm topology, making it adaptable to new PV technologies or farm configurations. The results encourage physics-guided, data-driven collaboration among national policymakers and research organizations to build efficient decision support systems for accelerated PV qualification and deployment across the world.

Updated: 2024-07-25 08:06:21

标题: 物理引导的机器学习预测具有稀疏、异质、公共数据的太阳能场的全球性能

摘要: 光伏（PV）技术领域正在迅速发展。为了预测新兴光伏技术的潜力和可扩展性，全球对这些系统性能的理解至关重要。传统上，在大型国家研究机构进行的实验和计算研究主要集中在特定区域气候条件下的光伏性能。然而，将这些地区性研究综合起来以了解全球性能潜力已被证明是困难的。考虑到获取实验数据的成本高昂，协调政治分歧世界各地国家实验室的实验的挑战，以及大型商业运营商的数据隐私问题，因此需要一种根本不同、数据高效的方法。在这里，我们提出了一个物理引导的机器学习（PGML）方案，以证明：（a）世界可以被划分为几个光伏特定气候区域，称为PVZones，显示出相关的气象条件在各大洲之间是共享的；（b）通过利用气候相似性，仅从五个地点获取高质量的月能量产量数据就可以准确预测年能量产量潜力，具有高空间分辨率和均方根误差小于8 kWhm$^{2}$；（c）即使是嘈杂的、异质的公共光伏性能数据，全球能量产量也可以预测出比物理模拟误差小于6％，前提是数据集具有代表性。这种PGML方案对光伏技术和农场拓扑结构是不可知的，使其能够适应新的光伏技术或农场配置。这些结果鼓励物理引导、数据驱动的合作，促进国家政策制定者和研究机构之间的合作，建立高效的决策支持系统，加速全球范围内的光伏资格认证和部署。

更新时间: 2024-07-25 08:06:21

领域: cs.LG,physics.app-ph,physics.data-an

下载: http://arxiv.org/abs/2407.18284v1

Nyström Kernel Stein Discrepancy

Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the construction of powerful goodness-of-fit tests where it is sufficient to know the target distribution up to a multiplicative constant. However, the typical U- and V-statistic-based KSD estimators suffer from a quadratic runtime complexity, which hinders their application in large-scale settings. In this work, we propose a Nystr\"om-based KSD acceleration -- with runtime $\mathcal O\!\left(mn+m^3\right)$ for $n$ samples and $m\ll n$ Nystr\"om points -- , show its $\sqrt{n}$-consistency under the null with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks.

Updated: 2024-07-25 08:01:32

标题: Nyström核Stein差异

摘要: 核方法是数据科学和统计学中许多成功方法的基础，它们允许将概率测度表示为再生核希尔伯特空间的元素，而不会丢失信息。最近，将斯坦方法与核技术结合的核斯坦差异（KSD）引起了广泛关注。通过斯坦算子，KSD允许构建强大的拟合度检验，只需知道目标分布直到一个乘法常数为止。然而，基于典型U和V统计量的KSD估计器受到二次运行时复杂性的限制，这阻碍了它们在大规模环境中的应用。在这项工作中，我们提出了一种基于Nystr\"om的KSD加速方法，对于$n$个样本和$m\ll n$个Nystr\"om点，其运行时间为$\mathcal O\!\left(mn+m^3\right)$，展示了在经典次高斯假设下的$\sqrt{n}$-一致性，并展示了在一系列基准测试中进行拟合度测试的适用性。

更新时间: 2024-07-25 08:01:32

领域: stat.ML,cs.LG,math.ST,stat.TH,46E22 (Primary) 62G10 (Secondary),G.3; I.2.6

下载: http://arxiv.org/abs/2406.08401v2

Enhancing Counterfactual Explanation Search with Diffusion Distance and Directional Coherence

A pressing issue in the adoption of AI models is the increasing demand for more human-centric explanations of their predictions. To advance towards more human-centric explanations, understanding how humans produce and select explanations has been beneficial. In this work, inspired by insights of human cognition we propose and test the incorporation of two novel biases to enhance the search for effective counterfactual explanations. Central to our methodology is the application of diffusion distance, which emphasizes data connectivity and actionability in the search for feasible counterfactual explanations. In particular, diffusion distance effectively weights more those points that are more interconnected by numerous short-length paths. This approach brings closely connected points nearer to each other, identifying a feasible path between them. We also introduce a directional coherence term that allows the expression of a preference for the alignment between the joint and marginal directional changes in feature space to reach a counterfactual. This term enables the generation of counterfactual explanations that align with a set of marginal predictions based on expectations of how the outcome of the model varies by changing one feature at a time. We evaluate our method, named Coherent Directional Counterfactual Explainer (CoDiCE), and the impact of the two novel biases against existing methods such as DiCE, FACE, Prototypes, and Growing Spheres. Through a series of ablation experiments on both synthetic and real datasets with continuous and mixed-type features, we demonstrate the effectiveness of our method.

Updated: 2024-07-25 08:00:44

标题: 利用扩散距离和方向一致性增强反事实解释搜索

摘要: 在采用AI模型方面一个紧迫的问题是对其预测需求更加以人为中心解释的增加。为了朝着更加人为中心的解释前进，理解人类如何产生和选择解释是有益的。在这项工作中，受人类认知的启示，我们提出并测试了引入两种新的偏见以增强寻找有效反事实解释的方法。我们方法的核心是扩散距离的应用，其强调数据的连通性和可行性在寻找可行反事实解释时。特别是，扩散距离有效地为那些通过许多短路径相互连接的点赋予更高的权重。这种方法将紧密相连的点彼此靠近，找出它们之间的可行路径。我们还引入了一个方向一致性项，允许表达对特征空间中联合和边际方向变化对齐的偏好，以达到反事实。这个项使得生成的反事实解释与基于逐个更改特征时模型结果变化的预期一致。我们评估了我们的方法，命名为一致方向反事实解释器（CoDiCE），以及对现有方法如DiCE、FACE、Prototypes和Growing Spheres的两种新偏见的影响。通过在连续和混合型特征的合成和真实数据集上进行一系列消融实验，我们展示了我们方法的有效性。

更新时间: 2024-07-25 08:00:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.12810v2

Mechanistic Interpretability for AI Safety -- A Review

Understanding AI systems' inner workings is critical for ensuring value alignment and safety. This review explores mechanistic interpretability: reverse-engineering the computational mechanisms and representations learned by neural networks into human-understandable algorithms and concepts to provide a granular, causal understanding. We establish foundational concepts such as features encoding knowledge within neural activations and hypotheses about their representation and computation. We survey methodologies for causally dissecting model behaviors and assess the relevance of mechanistic interpretability to AI safety. We investigate challenges surrounding scalability, automation, and comprehensive interpretation. We advocate for clarifying concepts, setting standards, and scaling techniques to handle complex models and behaviors and expand to domains such as vision and reinforcement learning. Mechanistic interpretability could help prevent catastrophic outcomes as AI systems become more powerful and inscrutable.

Updated: 2024-07-25 07:59:30

标题: AI安全的机制可解释性--一篇评论

摘要: 理解AI系统内部运作对于确保价值一致性和安全性至关重要。本综述探讨了机械解释性：将神经网络学习的计算机制和表示反向工程成人类可理解的算法和概念，以提供细粒度的因果理解。我们建立了诸如特征编码知识在神经激活中以及有关其表示和计算的假设等基础概念。我们调查了用于因果解剖模型行为的方法，并评估了机械解释性对AI安全的相关性。我们研究了围绕可扩展性、自动化和全面解释的挑战。我们主张澄清概念、设定标准，并扩展技术以处理复杂的模型和行为，并扩展到视觉和强化学习等领域。机械解释性可以帮助预防随着AI系统变得更加强大和难以捉摸而产生的灾难性结果。

更新时间: 2024-07-25 07:59:30

领域: cs.AI

下载: http://arxiv.org/abs/2404.14082v2

Innovative Speech-Based Deep Learning Approaches for Parkinson's Disease Classification: A Systematic Review

Parkinson's disease (PD), the second most prevalent neurodegenerative disorder worldwide, frequently presents with early-stage speech impairments. Recent advancements in Artificial Intelligence (AI), particularly deep learning (DL), have significantly enhanced PD diagnosis through the analysis of speech data. Nevertheless, the progress of research is restricted by the limited availability of publicly accessible speech-based PD datasets, primarily due to privacy and ethical concerns. This review covers the latest DL-based AI approaches for speech-based PD classification, focusing on performance, available resources and associated challenges of 33 scientific works published between 2020 and March 2024. These DL approaches are categorized into end-to-end (E2E) learning, transfer learning (TL) and deep acoustic features (DAF) extraction. Among E2E approaches, Convolutional Neural Networks (CNNs) are prevalent, though Transformers are increasingly popular. E2E approaches face challenges such as limited data and computational resources, especially with Transformers. TL addresses these issues by providing more robust PD diagnosis and better generalizability across languages. DAF extraction aims to improve the explainability and interpretability of results by examining the specific effects of deep features on both other DL approaches and more traditional machine learning (ML) methods. However, it often underperforms compared to E2E and TL approaches. This review also discusses unresolved issues related to bias, explainability and privacy, highlighting the need for future research.

Updated: 2024-07-25 07:58:19

标题: 创新的基于语音的深度学习方法用于帕金森病分类：一项系统综述

摘要: 帕金森病（PD）是全球第二常见的神经退行性疾病，常常表现为早期言语障碍。最近人工智能（AI）特别是深度学习（DL）的发展显著提高了通过言语数据分析进行PD诊断的能力。然而，研究的进展受到公开可访问的基于言语的PD数据集的限制，主要是由于隐私和伦理问题。本综述涵盖了2020年至2024年3月间发表的33篇科学作品中基于DL的言语PD分类的最新AI方法，着重于性能、可用资源和相关挑战。这些DL方法被分类为端到端（E2E）学习、迁移学习（TL）和深度声学特征（DAF）提取。在E2E方法中，卷积神经网络（CNNs）普遍存在，尽管Transformer越来越受欢迎。E2E方法面临诸如数据和计算资源有限等挑战，特别是对于Transformer。TL通过提供更稳健的PD诊断和更好的跨语言泛化性来解决这些问题。DAF提取旨在通过检查深度特征对其他DL方法和传统的机器学习（ML）方法的具体影响来提高结果的解释性和可解释性。然而，与E2E和TL方法相比，它通常表现不佳。本综述还讨论了与偏见、解释性和隐私有关的未解决问题，强调未来研究的必要性。

更新时间: 2024-07-25 07:58:19

领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.17844v1

DragText: Rethinking Text Embedding in Point-based Image Editing

Point-based image editing enables accurate and flexible control through content dragging. However, the role of text embedding in the editing process has not been thoroughly investigated. A significant aspect that remains unexplored is the interaction between text and image embeddings. In this study, we show that during the progressive editing of an input image in a diffusion model, the text embedding remains constant. As the image embedding increasingly diverges from its initial state, the discrepancy between the image and text embeddings presents a significant challenge. Moreover, we found that the text prompt significantly influences the dragging process, particularly in maintaining content integrity and achieving the desired manipulation. To utilize these insights, we propose DragText, which optimizes text embedding in conjunction with the dragging process to pair with the modified image embedding. Simultaneously, we regularize the text optimization process to preserve the integrity of the original text prompt. Our approach can be seamlessly integrated with existing diffusion-based drag methods with only a few lines of code.

Updated: 2024-07-25 07:57:55

标题: DragText：重新思考点基图像编辑中的文本嵌入

摘要: 基于点的图像编辑通过内容拖动实现准确和灵活的控制。然而，在编辑过程中文本嵌入的作用尚未得到彻底研究。一个尚未探索的重要方面是文本与图像嵌入之间的交互作用。在这项研究中，我们展示了在扩散模型中对输入图像进行渐进编辑的过程中，文本嵌入保持恒定。随着图像嵌入逐渐偏离其初始状态，图像和文本嵌入之间的差异呈现出显著挑战。此外，我们发现文本提示显著影响拖动过程，特别是在保持内容完整性和实现所需操作方面。为了利用这些见解，我们提出了DragText，该方法优化文本嵌入与拖动过程相结合，以配合修改后的图像嵌入。同时，我们规范文本优化过程以保持原始文本提示的完整性。我们的方法可以与现有基于扩散的拖动方法无缝集成，只需几行代码即可。

更新时间: 2024-07-25 07:57:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17843v1

On the Opportunities of (Re)-Exploring Atmospheric Science by Foundation Models: A Case Study

Most state-of-the-art AI applications in atmospheric science are based on classic deep learning approaches. However, such approaches cannot automatically integrate multiple complicated procedures to construct an intelligent agent, since each functionality is enabled by a separate model learned from independent climate datasets. The emergence of foundation models, especially multimodal foundation models, with their ability to process heterogeneous input data and execute complex tasks, offers a substantial opportunity to overcome this challenge. In this report, we want to explore a central question - how the state-of-the-art foundation model, i.e., GPT-4o, performs various atmospheric scientific tasks. Toward this end, we conduct a case study by categorizing the tasks into four main classes, including climate data processing, physical diagnosis, forecast and prediction, and adaptation and mitigation. For each task, we comprehensively evaluate the GPT-4o's performance along with a concrete discussion. We hope that this report may shed new light on future AI applications and research in atmospheric science.

Updated: 2024-07-25 07:57:34

标题: 关于通过基础模型重新探索大气科学的机遇：案例研究

摘要: 大多数大气科学领域的最先进人工智能应用都基于经典的深度学习方法。然而，这种方法无法自动集成多个复杂程序以构建智能体，因为每个功能都由从独立气候数据集学习的单独模型实现。基础模型的出现，特别是多模态基础模型，具有处理异构输入数据和执行复杂任务的能力，为克服这一挑战提供了重要机会。在本报告中，我们想探讨一个核心问题 - 即最先进的基础模型，即GPT-4o，如何执行各种大气科学任务。为此，我们通过将任务分为四个主要类别，包括气候数据处理、物理诊断、预测和预测、以及适应和缓解，进行了一个案例研究。对于每个任务，我们全面评估GPT-4o的表现，并进行具体讨论。我们希望这份报告能为未来大气科学领域的人工智能应用和研究带来新的启示。

更新时间: 2024-07-25 07:57:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.17842v1

Long-term Fairness in Ride-Hailing Platform

Matching in two-sided markets such as ride-hailing has recently received significant attention. However, existing studies on ride-hailing mainly focus on optimising efficiency, and fairness issues in ride-hailing have been neglected. Fairness issues in ride-hailing, including significant earning differences between drivers and variance of passenger waiting times among different locations, have potential impacts on economic and ethical aspects. The recent studies that focus on fairness in ride-hailing exploit traditional optimisation methods and the Markov Decision Process to balance efficiency and fairness. However, there are several issues in these existing studies, such as myopic short-term decision-making from traditional optimisation and instability of fairness in a comparably longer horizon from both traditional optimisation and Markov Decision Process-based methods. To address these issues, we propose a dynamic Markov Decision Process model to alleviate fairness issues currently faced by ride-hailing, and seek a balance between efficiency and fairness, with two distinct characteristics: (i) a prediction module to predict the number of requests that will be raised in the future from different locations to allow the proposed method to consider long-term fairness based on the whole timeline instead of consider fairness only based on historical and current data patterns; (ii) a customised scalarisation function for multi-objective multi-agent Q Learning that aims to balance efficiency and fairness. Extensive experiments on a publicly available real-world dataset demonstrate that our proposed method outperforms existing state-of-the-art methods.

Updated: 2024-07-25 07:54:07

标题: 长期公平对待乘车平台

摘要: 在诸如网约车这样的双边市场中的匹配问题最近引起了显著关注。然而，现有关于网约车的研究主要集中在优化效率，而公平性问题却被忽视了。网约车中的公平性问题，包括司机之间的显著收入差异和不同地点乘客等待时间的方差，对经济和伦理方面有潜在影响。最近关注网约车公平性的研究利用传统的优化方法和马尔科夫决策过程来平衡效率和公平性。然而，这些现有研究存在一些问题，如传统优化方法的短视短期决策和来自传统优化和基于马尔科夫决策过程的方法的相对较长时间范围内公平性的不稳定性。为了解决这些问题，我们提出了一个动态马尔科夫决策过程模型，以缓解网约车目前面临的公平性问题，并寻求在效率和公平性之间取得平衡，具有两个明显特征：(i)一个预测模块，用于预测未来不同地点提出的请求数量，以便提出的方法考虑基于整个时间轴的长期公平性，而不仅仅是基于历史和当前数据模式的公平性；(ii)一个定制的多目标多代理Q学习的标量化函数，旨在平衡效率和公平性。对一个公开可用的真实世界数据集进行的大量实验表明，我们提出的方法优于现有的最先进方法。

更新时间: 2024-07-25 07:54:07

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.17839v1

UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation

Underwater monocular depth estimation serves as the foundation for tasks such as 3D reconstruction of underwater scenes. However, due to the influence of light and medium, the underwater environment undergoes a distinctive imaging process, which presents challenges in accurately estimating depth from a single image. The existing methods fail to consider the unique characteristics of underwater environments, leading to inadequate estimation results and limited generalization performance. Furthermore, underwater depth estimation requires extracting and fusing both local and global features, which is not fully explored in existing methods. In this paper, an end-to-end learning framework for underwater monocular depth estimation called UMono is presented, which incorporates underwater image formation model characteristics into network architecture, and effectively utilize both local and global features of underwater image. Experimental results demonstrate that the proposed method is effective for underwater monocular depth estimation and outperforms the existing methods in both quantitative and qualitative analyses.

Updated: 2024-07-25 07:52:11

标题: UMono：物理模型指导的水下单目深度估计的混合CNN-Transformer框架

摘要: 水下单目深度估计是进行水下场景三维重建等任务的基础。然而，由于光线和介质的影响，水下环境经历了独特的成像过程，这在从单个图像准确估计深度时提出了挑战。现有方法未能考虑水下环境的独特特征，导致估计结果不足以及泛化性能有限。此外，水下深度估计需要提取和融合本地和全局特征，而这在现有方法中并未得到充分探讨。本文介绍了一种名为UMono的水下单目深度估计的端到端学习框架，将水下图像形成模型特征纳入网络架构，并有效利用水下图像的本地和全局特征。实验结果表明，所提出的方法对水下单目深度估计是有效的，并在定量和定性分析中优于现有方法。

更新时间: 2024-07-25 07:52:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17838v1

Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.

Updated: 2024-07-25 07:50:58

标题: Imperative Learning：面向机器人自治的自监督神经符号学习框架

摘要: 数据驱动方法，如强化学习和模仿学习，在机器人自主性方面取得了显著成功。然而，它们以数据为中心的特性仍然阻碍了它们在不断变化的环境中良好泛化的能力。此外，为机器人任务收集大型数据集通常是不切实际且昂贵的。为了克服这些挑战，我们引入了一种新的自监督神经符号（NeSy）计算框架，即必要学习（IL），用于机器人自主性，利用符号推理的泛化能力。IL框架包括三个主要组件：一个神经模块，一个推理引擎和一个记忆系统。我们将IL定式为一种特殊的双层优化（BLO），它实现了对三个模块的相互学习。这克服了数据驱动方法所面临的标签密集障碍，并利用符号推理，涉及逻辑推理、物理原理、几何分析等。我们讨论了IL的几种优化技术，并验证了它们在包括路径规划、规则归纳、最优控制、视觉测距和多机器人路径规划在内的五个不同机器人自主性任务中的有效性。通过各种实验，我们展示了IL可以显著增强机器人自主性能力，并预计它将催化跨领域的进一步研究。

更新时间: 2024-07-25 07:50:58

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.16087v3

IsUMap: Manifold Learning and Data Visualization leveraging Vietoris-Rips filtrations

This work introduces IsUMap, a novel manifold learning technique that enhances data representation by integrating aspects of UMAP and Isomap with Vietoris-Rips filtrations. We present a systematic and detailed construction of a metric representation for locally distorted metric spaces that captures complex data structures more accurately than the previous schemes. Our approach addresses limitations in existing methods by accommodating non-uniform data distributions and intricate local geometries. We validate its performance through extensive experiments on examples of various geometric objects and benchmark real-world datasets, demonstrating significant improvements in representation quality.

Updated: 2024-07-25 07:46:30

标题: IsUMap：利用Vietoris-Rips过滤进行流形学习和数据可视化

摘要: 这项工作介绍了IsUMap，这是一种新颖的流形学习技术，通过将UMAP和Isomap的方面与Vietoris-Rips过滤结合，增强了数据表示。我们系统地详细构建了一种度量表示，用于捕捉局部扭曲的度量空间，比先前的方案更准确地捕捉复杂的数据结构。我们的方法通过容纳非均匀数据分布和复杂的局部几何结构，解决了现有方法的局限性。我们通过在各种几何对象和基准真实世界数据集的示例上进行大量实验验证其性能，证明了在表示质量方面的显著改进。

更新时间: 2024-07-25 07:46:30

领域: cs.LG,math.CT,math.DG,math.MG,51K05 (primary) 57-08, 53Z50, 55U10 (secondary),G.2.2; I.6.5

下载: http://arxiv.org/abs/2407.17835v1

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code is released at https://github.com/Ivan-Tang-3D/Point-PEFT.

Updated: 2024-07-25 07:42:15

标题: Point-PEFT: 3D预训练模型的参数高效微调

摘要: 预训练大型模型的流行性已经彻底改变了多个领域的下游任务，如语言、视觉和多模态。为了最大程度地减少下游任务的适应成本，许多参数高效微调（PEFT）技术被提出用于语言和2D图像预训练模型。然而，针对3D预训练模型的专门PEFT方法仍未得到充分探索。为此，我们引入了Point-PEFT，这是一个新颖的框架，用于通过最少的可学习参数来适应点云预训练模型。具体而言，对于一个预训练的3D模型，我们冻结大部分参数，只调整新添加的PEFT模块，这些模块包括一个Point-prior Prompt和一个Geometry-aware Adapter。Point-prior Prompt采用一组可学习提示标记，我们建议使用领域特定知识构建一个内存库，并利用无参数的注意力来增强提示标记。Geometry-aware Adapter旨在通过局部交互在空间邻域内聚合点云特征，从而捕捉细粒度的几何信息。大量实验证明，我们的Point-PEFT在各种下游任务上比全面微调能够取得更好的性能，同时只使用可训练参数的5%，显示了我们方法的高效性和有效性。代码已发布在https://github.com/Ivan-Tang-3D/Point-PEFT。

更新时间: 2024-07-25 07:42:15

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.03059v7

Behavioral Testing: Can Large Language Models Implicitly Resolve Ambiguous Entities?

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. In this paper, we focus on entity type ambiguity and analyze current state-of-the-art LLMs for their proficiency and consistency in applying their factual knowledge when prompted for entities under ambiguity. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 entities. Our experiments reveal that LLMs perform poorly with ambiguous prompts, achieving only 80% accuracy. Our results further demonstrate systematic discrepancies in LLM behavior and their failure to consistently apply information, indicating that the models can exhibit knowledge without being able to utilize it, significant biases for preferred readings, as well as self inconsistencies. Our study highlights the importance of handling entity ambiguity in future for more trustworthy LLMs

Updated: 2024-07-25 07:39:44

标题: 行为测试：大型语言模型是否可以隐式解决模糊实体？

摘要: 大型语言模型（LLMs）表现出色的一个主要因素是在预训练过程中积累的大量事实知识。然而，许多LLMs存在自相矛盾的问题，这引发了人们对它们的可信度和可靠性的疑虑。在本文中，我们专注于实体类型的模糊性，并分析当前最先进的LLMs在被要求解决模糊实体时应用事实知识的能力和一致性。为此，我们提出了一个评估协议，将知识的获取与应用进行区分，并对49个实体测试最先进的LLMs。我们的实验结果显示，LLMs在模糊提示下表现不佳，仅达到80%的准确率。我们的结果进一步展示了LLM行为中的系统性差异以及它们未能一致应用信息的失败，表明这些模型可能展示出知识而无法利用它，具有偏好阅读的显著偏见，以及自相矛盾。我们的研究强调了在未来处理实体模糊性的重要性，以建立更加可靠的LLMs。

更新时间: 2024-07-25 07:39:44

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17125v2

Image Segmentation via Divisive Normalization: dealing with environmental diversity

Autonomous driving is a challenging scenario for image segmentation due to the presence of uncontrolled environmental conditions and the eventually catastrophic consequences of failures. Previous work suggested that a biologically motivated computation, the so-called Divisive Normalization, could be useful to deal with image variability, but its effects have not been systematically studied over different data sources and environmental factors. Here we put segmentation U-nets augmented with Divisive Normalization to work far from training conditions to find where this adaptation is more critical. We categorize the scenes according to their radiance level and dynamic range (day/night), and according to their achromatic/chromatic contrasts. We also consider video game (synthetic) images to broaden the range of environments. We check the performance in the extreme percentiles of such categorization. Then, we push the limits further by artificially modifying the images in perceptually/environmentally relevant dimensions: luminance, contrasts and spectral radiance. Results show that neural networks with Divisive Normalization get better results in all the scenarios and their performance remains more stable with regard to the considered environmental factors and nature of the source. Finally, we explain the improvements in segmentation performance in two ways: (1) by quantifying the invariance of the responses that incorporate Divisive Normalization, and (2) by illustrating the adaptive nonlinearity of the different layers that depends on the local activity.

Updated: 2024-07-25 07:38:27

标题: 通过分裂归一化进行图像分割：处理环境多样性

摘要: 自动驾驶是图像分割的一个具有挑战性的场景，原因是存在不受控制的环境条件以及失败可能导致灾难性后果。先前的研究表明，一种生物学启发的计算，即所谓的除法归一化，可能有助于处理图像的变异性，但其效果尚未在不同数据源和环境因素上进行系统研究。在这里，我们将带有除法归一化的分割U-net应用到远离训练条件的情况中，以找出这种适应性最为关键的地方。我们根据场景的辐射水平和动态范围（白天/夜晚），以及它们的明度/色度对比进行分类。我们还考虑视频游戏（合成）图像以拓展环境范围。我们检查这些分类的极端百分位数中的性能。然后，通过在感知/环境相关维度上人为修改图像：亮度、对比度和光谱辐射，进一步推动极限。结果显示，带有除法归一化的神经网络在所有场景中获得更好的结果，并且其性能相对于考虑的环境因素和数据源的性质更加稳定。最后，我们以两种方式解释分割性能的改进：（1）通过量化融合除法归一化的响应的不变性，以及（2）通过说明依赖于局部活动的不同层的自适应非线性。

更新时间: 2024-07-25 07:38:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.17829v1

Unified Lexical Representation for Interpretable Visual-Language Alignment

Visual-Language Alignment (VLA) has gained a lot of attention since CLIP's groundbreaking work. Although CLIP performs well, the typical direct latent feature alignment lacks clarity in its representation and similarity scores. On the other hand, lexical representation, a vector whose element represents the similarity between the sample and a word from the vocabulary, is a natural sparse representation and interpretable, providing exact matches for individual words. However, lexical representations is difficult to learn due to no ground-truth supervision and false-discovery issues, and thus requires complex design to train effectively. In this paper, we introduce LexVLA, a more interpretable VLA framework by learning a unified lexical representation for both modalities without complex design. We use DINOv2 as our visual model for its local-inclined features and Llama 2, a generative language model, to leverage its in-context lexical prediction ability. To avoid the false discovery, we propose an overuse penalty to refrain the lexical representation from falsely frequently activating meaningless words. We demonstrate that these two pre-trained uni-modal models can be well-aligned by fine-tuning on modest multi-modal dataset and avoid intricate training configurations. On cross-modal retrieval benchmarks, LexVLA, trained on the CC-12M multi-modal dataset, outperforms baselines fine-tuned on larger datasets (e.g., YFCC15M) and those trained from scratch on even bigger datasets (e.g., 1.1B data, including CC-12M). We conduct extensive experiments to analyze LexVLA.

Updated: 2024-07-25 07:35:27

标题: 统一的词汇表达方式用于可解释的视觉-语言对齐

摘要: 视觉语言对齐（VLA）自CLIP的开创性工作以来引起了很多关注。尽管CLIP表现良好，但典型的直接潜在特征对齐在其表示和相似性分数方面缺乏清晰度。另一方面，词汇表示是一个向量，其元素表示样本与词汇中的一个单词之间的相似性，是一种自然的稀疏表示且易于解释，为单词提供精确匹配。然而，由于缺乏地面真实监督和虚假发现问题，词汇表示很难学习，因此需要复杂的设计来有效训练。在本文中，我们引入了LexVLA，一个更具解释性的VLA框架，通过学习一个统一的词汇表示来表示两种模态，而无需复杂的设计。我们使用DINOv2作为我们的视觉模型，因为它具有局部倾斜特征，同时使用生成语言模型Llama 2来利用其上下文词汇预测能力。为了避免虚假发现，我们提出了一种过度使用惩罚，以阻止词汇表示错误地频繁激活无意义的单词。我们展示了这两个预训练的单模型可以通过在适度的多模态数据集上微调来很好地对齐，并避免复杂的训练配置。在跨模态检索基准上，训练于CC-12M多模态数据集上的LexVLA优于在更大数据集（例如YFCC15M）上微调的基线和甚至在更大数据集（例如包括CC-12M的1.1B数据）上从头开始训练的模型。我们进行了大量实验来分析LexVLA。

更新时间: 2024-07-25 07:35:27

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17827v1

Blockchain Takeovers in Web 3.0: An Empirical Study on the TRON-Steem Incident

A fundamental goal of Web 3.0 is to establish a decentralized network and application ecosystem, thereby enabling users to retain control over their data while promoting value exchange. However, the recent Tron-Steem takeover incident poses a significant threat to this vision. In this paper, we present a thorough empirical analysis of the Tron-Steem takeover incident. By conducting a fine-grained reconstruction of the stake and election snapshots within the Steem blockchain, one of the most prominent social-oriented blockchains, we quantify the marked shifts in decentralization pre and post the takeover incident, highlighting the severe threat that blockchain network takeovers pose to the decentralization principle of Web 3.0. Moreover, by employing heuristic methods to identify anomalous voters and conducting clustering analyses on voter behaviors, we unveil the underlying mechanics of takeover strategies employed in the Tron-Steem incident and suggest potential mitigation strategies, which contribute to the enhanced resistance of Web 3.0 networks against similar threats in the future. We believe the insights gleaned from this research help illuminate the challenges imposed by blockchain network takeovers in the Web 3.0 era, suggest ways to foster the development of decentralized technologies and governance, as well as to enhance the protection of Web 3.0 user rights.

Updated: 2024-07-25 07:31:15

标题: 区块链接管在Web 3.0中的应用：对TRON-Steem事件的实证研究

摘要: Web 3.0的一个基本目标是建立一个去中心化的网络和应用程序生态系统，从而使用户能够保留对其数据的控制，同时促进价值交换。然而，最近的Tron-Steem接管事件对这一愿景构成了重大威胁。在本文中，我们对Tron-Steem接管事件进行了彻底的实证分析。通过对Steem区块链中的股份和选举快照进行细粒度重建，这是最具代表性的社交导向区块链之一，我们量化了接管事件前后去中心化程度的显著变化，突显了区块链网络接管对Web 3.0去中心化原则构成的严重威胁。此外，通过采用启发式方法识别异常选民并对选民行为进行聚类分析，我们揭示了Tron-Steem事件中采用的接管策略的潜在机制，并提出潜在的缓解策略，这有助于增强Web 3.0网络对未来类似威胁的抵抗力。我们相信，从这项研究中获得的见解有助于阐明在Web 3.0时代区块链网络接管所带来的挑战，提出促进去中心化技术和治理发展以及增强保护Web 3.0用户权利的途径。

更新时间: 2024-07-25 07:31:15

领域: cs.SI,cs.CR

下载: http://arxiv.org/abs/2407.17825v1

Node-like as a Whole: Structure-aware Searching and Coarsening for Graph Classification

Graph Transformers (GTs) have made remarkable achievements in graph-level tasks. However, most existing works regard graph structures as a form of guidance or bias for enhancing node representations, which focuses on node-central perspectives and lacks explicit representations of edges and structures. One natural question is, can we treat graph structures node-like as a whole to learn high-level features? Through experimental analysis, we explore the feasibility of this assumption. Based on our findings, we propose a novel multi-view graph representation learning model via structure-aware searching and coarsening (GRLsc) on GT architecture for graph classification. Specifically, we build three unique views, original, coarsening, and conversion, to learn a thorough structural representation. We compress loops and cliques via hierarchical heuristic graph coarsening and restrict them with well-designed constraints, which builds the coarsening view to learn high-level interactions between structures. We also introduce line graphs for edge embeddings and switch to edge-central perspective to construct the conversion view. Experiments on eight real-world datasets demonstrate the improvements of GRLsc over 28 baselines from various architectures.

Updated: 2024-07-25 07:29:02

标题: 整体类似节点：结构感知搜索和粗化用于图分类

摘要: 图形转换器（GTs）在图级任务中取得了显著的成就。然而，大多数现有作品将图结构视为增强节点表示的一种指导或偏见，侧重于节点中心观点，缺乏对边缘和结构的显式表示。一个自然的问题是，我们是否可以将图结构整体视为类似节点以学习高级特征？通过实验分析，我们探讨了这一假设的可行性。基于我们的发现，我们提出了一种新颖的多视图图形表示学习模型，通过结构感知搜索和粗化（GRLsc）在GT架构上进行图分类。具体而言，我们构建了三个独特视图，原始视图、粗化视图和转换视图，以学习全面的结构表示。我们通过分层启发式图粗化压缩环和团，并通过精心设计的约束对其进行限制，从而构建粗化视图以学习结构之间的高级交互。我们还引入了线图用于边嵌入，并转换为边缘中心观点以构建转换视图。对八个真实世界数据集的实验表明，GRLsc相对于来自各种架构的28个基线的改进。

更新时间: 2024-07-25 07:29:02

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2404.11869v3

Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization

Bilevel optimization is widely applied in many machine learning tasks such as hyper-parameter learning, meta learning and reinforcement learning. Although many algorithms recently have been developed to solve the bilevel optimization problems, they generally rely on the (strongly) convex lower-level problems. More recently, some methods have been proposed to solve the nonconvex-PL bilevel optimization problems, where their upper-level problems are possibly nonconvex, and their lower-level problems are also possibly nonconvex while satisfying Polyak-{\L}ojasiewicz (PL) condition. However, these methods still have a high convergence complexity or a high computation complexity such as requiring compute expensive Hessian/Jacobian matrices and its inverses. In the paper, thus, we propose an efficient Hessian/Jacobian-free method (i.e., HJFBiO) with the optimal convergence complexity to solve the nonconvex-PL bilevel problems. Theoretically, under some mild conditions, we prove that our HJFBiO method obtains an optimal convergence rate of $O(\frac{1}{T})$, where $T$ denotes the number of iterations, and has an optimal gradient complexity of $O(\epsilon^{-1})$ in finding an $\epsilon$-stationary solution. We conduct some numerical experiments on the bilevel PL game and hyper-representation learning task to demonstrate efficiency of our proposed method.

Updated: 2024-07-25 07:25:06

标题: 最佳的无海森/雅可比非凸-PL双层优化

摘要: 双层优化在许多机器学习任务中被广泛应用，如超参数学习、元学习和强化学习。尽管最近已经开发了许多算法来解决双层优化问题，但它们通常依赖于（强）凸下层问题。最近，一些方法已被提出来解决非凸-PL双层优化问题，其中它们的上层问题可能是非凸的，下层问题也可能是非凸的，同时满足Polyak-{\L}ojasiewicz（PL）条件。然而，这些方法仍然具有较高的收敛复杂度或较高的计算复杂度，例如需要计算昂贵的Hessian/Jacobian矩阵及其逆矩阵。因此，在本文中，我们提出了一种高效的无Hessian/Jacobian方法（即HJFBiO），具有最佳的收敛复杂度，可解决非凸-PL双层问题。理论上，在一些温和的条件下，我们证明了我们的HJFBiO方法获得了$O(\frac{1}{T})$的最佳收敛速率，其中$T$表示迭代次数，并具有$O(\epsilon^{-1})$的最佳梯度复杂度，以找到一个$\epsilon$-稳定解。我们对双层PL博弈和超表示学习任务进行了一些数值实验，以证明我们提出的方法的效率。

更新时间: 2024-07-25 07:25:06

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2407.17823v1

Advanced deep-reinforcement-learning methods for flow control: group-invariant and positional-encoding networks improve learning speed and quality

Flow control is key to maximize energy efficiency in a wide range of applications. However, traditional flow-control methods face significant challenges in addressing non-linear systems and high-dimensional data, limiting their application in realistic energy systems. This study advances deep-reinforcement-learning (DRL) methods for flow control, particularly focusing on integrating group-invariant networks and positional encoding into DRL architectures. Our methods leverage multi-agent reinforcement learning (MARL) to exploit policy invariance in space, in combination with group-invariant networks to ensure local symmetry invariance. Additionally, a positional encoding inspired by the transformer architecture is incorporated to provide location information to the agents, mitigating action constraints from strict invariance. The proposed methods are verified using a case study of Rayleigh-B\'enard convection, where the goal is to minimize the Nusselt number Nu. The group-invariant neural networks (GI-NNs) show faster convergence compared to the base MARL, achieving better average policy performance. The GI-NNs not only cut DRL training time in half but also notably enhance learning reproducibility. Positional encoding further enhances these results, effectively reducing the minimum Nu and stabilizing convergence. Interestingly, group invariant networks specialize in improving learning speed and positional encoding specializes in improving learning quality. These results demonstrate that choosing a suitable feature-representation method according to the purpose as well as the characteristics of each control problem is essential. We believe that the results of this study will not only inspire novel DRL methods with invariant and unique representations, but also provide useful insights for industrial applications.

Updated: 2024-07-25 07:24:41

标题: 深度强化学习方法在流动控制中的应用：群不变性和位置编码网络提高学习速度和质量

摘要: 控制流是在各种应用中最大化能量效率的关键。然而，传统的流控制方法在处理非线性系统和高维数据方面面临重大挑战，限制了它们在现实能源系统中的应用。本研究推进了深度强化学习（DRL）方法在流控制方面的应用，特别关注集成群不变网络和位置编码到DRL架构中。我们的方法利用多智体强化学习（MARL）来在空间中利用策略不变性，结合群不变网络以确保局部对称性不变性。此外，受变压器架构启发，引入位置编码以向代理提供位置信息，减轻严格不变性的行动约束。所提出的方法通过雷利-贝纳德对流案例研究得到验证，目标是最小化努塞尔特数Nu。与基础MARL相比，群不变神经网络（GI-NNs）显示出更快的收敛速度，实现了更好的平均策略表现。GI-NNs不仅将DRL训练时间减半，而且显著增强了学习的可重现性。位置编码进一步增强了这些结果，有效降低了最小Nu并稳定了收敛性。有趣的是，群不变网络专注于提高学习速度，而位置编码专注于提高学习质量。这些结果表明，根据目的以及每个控制问题的特征选择合适的特征表示方法至关重要。我们相信本研究的结果不仅将激发具有不变性和独特表示的新颖DRL方法，还将为工业应用提供有用的见解。

更新时间: 2024-07-25 07:24:41

领域: cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2407.17822v1

Spatial-Temporal Cross-View Contrastive Pre-training for Check-in Sequence Representation Learning

The rapid growth of location-based services (LBS) has yielded massive amounts of data on human mobility. Effectively extracting meaningful representations for user-generated check-in sequences is pivotal for facilitating various downstream services. However, the user-generated check-in data are simultaneously influenced by the surrounding objective circumstances and the user's subjective intention. Specifically, the temporal uncertainty and spatial diversity exhibited in check-in data make it difficult to capture the macroscopic spatial-temporal patterns of users and to understand the semantics of user mobility activities. Furthermore, the distinct characteristics of the temporal and spatial information in check-in sequences call for an effective fusion method to incorporate these two types of information. In this paper, we propose a novel Spatial-Temporal Cross-view Contrastive Representation (STCCR) framework for check-in sequence representation learning. Specifically, STCCR addresses the above challenges by employing self-supervision from "spatial topic" and "temporal intention" views, facilitating effective fusion of spatial and temporal information at the semantic level. Besides, STCCR leverages contrastive clustering to uncover users' shared spatial topics from diverse mobility activities, while employing angular momentum contrast to mitigate the impact of temporal uncertainty and noise. We extensively evaluate STCCR on three real-world datasets and demonstrate its superior performance across three downstream tasks.

Updated: 2024-07-25 07:18:05

标题: 空间-时间交叉视图对比预训练用于签到序列表示学习

摘要: 定位服务（LBS）的快速增长产生了大量关于人类移动性的数据。有效地提取用户生成的签到序列的有意义的表示对于促进各种下游服务至关重要。然而，用户生成的签到数据同时受周围客观环境和用户主观意图的影响。具体来说，签到数据中呈现的时间不确定性和空间多样性使得捕捉用户的宏观时空模式和理解用户移动活动的语义变得困难。此外，签到序列中时间和空间信息的不同特征需要一种有效的融合方法来整合这两种类型的信息。在本文中，我们提出了一种新颖的空间-时间交叉视图对比表示（STCCR）框架用于签到序列表示学习。具体来说，STCCR通过从“空间主题”和“时间意图”视图中采用自我监督，促进在语义级别上的空间和时间信息的有效融合来解决上述挑战。此外，STCCR利用对比聚类来揭示用户从多样的移动活动中共享的空间主题，同时利用角动量对比来减轻时间不确定性和噪音的影响。我们在三个真实数据集上对STCCR进行了广泛评估，并展示了其在三个下游任务上的卓越表现。

更新时间: 2024-07-25 07:18:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.15899v3

Demystifying Verbatim Memorization in Large Language Models

Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications. Much prior work has studied such verbatim memorization using observational data. To complement such work, we develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences. We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to verbatim memorize sequences, even for out-of-distribution sequences; (3) the generation of memorized sequences is triggered by distributed model states that encode high-level features and makes important use of general language modeling capabilities. Guided by these insights, we develop stress tests to evaluate unlearning methods and find they often fail to remove the verbatim memorized information, while also degrading the LM. Overall, these findings challenge the hypothesis that verbatim memorization stems from specific model weights or mechanisms. Rather, verbatim memorization is intertwined with the LM's general capabilities and thus will be very difficult to isolate and suppress without degrading model quality.

Updated: 2024-07-25 07:10:31

标题: 揭秘大型语言模型中的逐字记忆

摘要: 大型语言模型（LLMs）经常完全记住长序列，通常会带来严重的法律和隐私影响。许多先前的研究使用观察数据研究了这种完全记忆。为了补充这些研究，我们开发了一个框架，通过在注入序列的Pythia检查点中继续预训练来研究受控环境中的完全记忆。我们发现：（1）大量的重复对于发生完全记忆是必要的；（2）较晚（并且可能更好）的检查点更有可能完全记忆序列，即使是超出分布的序列；（3）记忆序列的生成是由编码高级特征的分布式模型状态触发的，并且充分利用了通用语言建模能力。根据这些见解，我们开发了压力测试来评估遗忘方法，发现它们通常无法删除完全记忆的信息，同时还会降低LM的质量。总的来说，这些发现挑战了完全记忆源于特定模型权重或机制的假设。相反，完全记忆与LM的一般能力交织在一起，因此很难在不降低模型质量的情况下隔离和抑制。

更新时间: 2024-07-25 07:10:31

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17817v1

NC-NCD: Novel Class Discovery for Node Classification

Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is frequently hindered by either catastrophic forgetting of old categories or an inability to learn new ones. Furthermore, the implementation of NCD on continuously scalable graph-structured data remains an under-explored area. In response to these challenges, we introduce for the first time a more practical NCD scenario for node classification (i.e., NC-NCD), and propose a novel self-training framework with prototype replay and distillation called SWORD, adopted to our NC-NCD setting. Our approach enables the model to cluster unlabeled new category nodes after learning labeled nodes while preserving performance on old categories without reliance on old category nodes. SWORD achieves this by employing a self-training strategy to learn new categories and preventing the forgetting of old categories through the joint use of feature prototypes and knowledge distillation. Extensive experiments on four common benchmarks demonstrate the superiority of SWORD over other state-of-the-art methods.

Updated: 2024-07-25 07:10:08

标题: NC-NCD：用于节点分类的新颖类别发现

摘要: 新颖类别发现（NCD）涉及通过利用先前建立的类别所获取的知识，识别未标记数据中的新类别。然而，现有的NCD方法往往难以保持新旧类别之间的平衡。以增量类方式发现未标记的新类别更为实际，但也更具挑战性，因为通常会受到旧类别的灾难性遗忘或学习新类别的能力不足的影响。此外，在连续可扩展的图结构数据上实现NCD仍然是一个未被充分探索的领域。为了应对这些挑战，我们首次引入了一个更实际的节点分类NCD场景（即NC-NCD），并提出了一个采用原型重放和蒸馏的新型自我训练框架SWORD，适用于我们的NC-NCD设置。我们的方法使模型在学习标记节点后能够对未标记的新类别节点进行聚类，并在不依赖于旧类别节点的情况下保持在旧类别上的表现。SWORD通过采用自我训练策略来学习新类别，并通过特征原型和知识蒸馏的联合使用来防止忘记旧类别。对四个常见基准上的广泛实验表明，SWORD优于其他最先进的方法。

更新时间: 2024-07-25 07:10:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.17816v1

Nested replicator dynamics, nested logit choice, and similarity-based learning

We consider a model of learning and evolution in games whose action sets are endowed with a partition-based similarity structure intended to capture exogenous similarities between strategies. In this model, revising agents have a higher probability of comparing their current strategy with other strategies that they deem similar, and they switch to the observed strategy with probability proportional to its payoff excess. Because of this implicit bias toward similar strategies, the resulting dynamics - which we call the nested replicator dynamics - do not satisfy any of the standard monotonicity postulates for imitative game dynamics; nonetheless, we show that they retain the main long-run rationality properties of the replicator dynamics, albeit at quantitatively different rates. We also show that the induced dynamics can be viewed as a stimulus-response model in the spirit of Erev & Roth (1998), with choice probabilities given by the nested logit choice rule of Ben-Akiva (1973) and McFadden (1978). This result generalizes an existing relation between the replicator dynamics and the exponential weights algorithm in online learning, and provides an additional layer of interpretation to our analysis and results.

Updated: 2024-07-25 07:09:53

标题: 嵌套复制动力学、嵌套logit选择和基于相似性的学习

摘要: 我们考虑了一个学习和演化模型，在这个模型中，游戏的行动集被赋予了基于分区的相似性结构，旨在捕捉策略之间的外生相似性。在这个模型中，修订代理有更高的概率将他们当前的策略与他们认为相似的其他策略进行比较，并且他们以与其收益超额成比例的概率转换为观察到的策略。由于对相似策略的这种隐性偏向，由此产生的动态 - 我们称之为嵌套复制动态 - 不满足任何模仿游戏动态的标准单调性假设；然而，我们表明它们保留了复制动态的主要长期理性特性，尽管在定量上有所不同。我们还表明，诱导的动态可以被看作是Erev和Roth（1998）精神中的刺激 - 响应模型，选择概率由Ben-Akiva（1973）和McFadden（1978）的嵌套对数选择规则给出。这个结果推广了复制动态与在线学习中指数权重算法之间的现有关系，并为我们的分析和结果提供了额外的解释层。

更新时间: 2024-07-25 07:09:53

领域: cs.GT,cs.LG,Primary 91A22, 91A26, secondary 37N40, 68Q32

下载: http://arxiv.org/abs/2407.17815v1

CCVA-FL: Cross-Client Variations Adaptive Federated Learning for Medical Imaging

Federated Learning (FL) offers a privacy-preserving approach to train models on decentralized data. Its potential in healthcare is significant, but challenges arise due to cross-client variations in medical image data, exacerbated by limited annotations. This paper introduces Cross-Client Variations Adaptive Federated Learning (CCVA-FL) to address these issues. CCVA-FL aims to minimize cross-client variations by transforming images into a common feature space. It involves expert annotation of a subset of images from each client, followed by the selection of a client with the least data complexity as the target. Synthetic medical images are then generated using Scalable Diffusion Models with Transformers (DiT) based on the target client's annotated images. These synthetic images, capturing diversity and representing the original data, are shared with other clients. Each client then translates its local images into the target image space using image-to-image translation. The translated images are subsequently used in a federated learning setting to develop a server model. Our results demonstrate that CCVA-FL outperforms Vanilla Federated Averaging by effectively addressing data distribution differences across clients without compromising privacy.

Updated: 2024-07-25 07:04:32

标题: CCVA-FL：用于医学影像的跨客户变化自适应联邦学习

摘要: 联邦学习（FL）提供了一种在分散数据上训练模型的保护隐私的方法。在医疗保健领域，其潜力巨大，但由于医学图像数据之间的跨客户变化以及有限的注释，挑战也随之而来。本文介绍了跨客户变化自适应联邦学习（CCVA-FL）来解决这些问题。CCVA-FL旨在通过将图像转换为共同的特征空间来最小化跨客户变化。它涉及对每个客户的图像子集进行专家注释，然后选择具有最小数据复杂性的客户作为目标。然后使用基于目标客户注释图像的可伸缩扩散模型与变换器（DiT）生成合成医学图像。这些合成图像捕捉多样性并代表原始数据，并与其他客户共享。然后，每个客户将其本地图像转换为目标图像空间，使用图像到图像的翻译。随后，在联邦学习环境中使用翻译后的图像来开发服务器模型。我们的结果表明，CCVA-FL通过有效地处理客户之间的数据分布差异而不损害隐私，优于普通的联邦平均。

更新时间: 2024-07-25 07:04:32

领域: cs.CV,cs.AI,cs.LG,I.2.10; I.4.0; I.4.1; I.4.2; I.4.6; I.4.7; I.4.8; I.4.9; I.4.10; I.2.10; I.5.1; I.5.2; I.5.4; J.2; I.2.6; I.2.11; I.2.10

下载: http://arxiv.org/abs/2407.11652v2

Enhancing Model Performance: Another Approach to Vision-Language Instruction Tuning

The integration of large language models (LLMs) with vision-language (VL) tasks has been a transformative development in the realm of artificial intelligence, highlighting the potential of LLMs as a versatile general-purpose chatbot. However, the current trend in this evolution focuses on the integration of vision and language to create models that can operate in more diverse and real-world contexts. We present a novel approach, termed Bottleneck Adapter, specifically crafted for enhancing the multimodal functionalities of these complex models, enabling joint optimization of the entire multimodal LLM framework through a process known as Multimodal Model Tuning (MMT). Our approach utilizes lightweight adapters to connect the image encoder and LLM without the need for large, complex neural networks. Unlike the conventional modular training schemes, our approach adopts an end-to-end optimization regime, which, when combined with the adapters, facilitates the joint optimization using a significantly smaller parameter set. Our method exhibits robust performance with 90.12\% accuracy, outperforming both human-level performance (88.4\%) and LaVIN-7B (89.41\%).

Updated: 2024-07-25 06:59:15

标题: 提升模型性能：另一种视觉-语言指导调整方法

摘要: 大型语言模型（LLMs）与视觉-语言（VL）任务的整合是人工智能领域的一个转变性发展，突显了LLMs作为通用聊天机器人的潜力。然而，当前这一演变趋势侧重于整合视觉和语言，以创建能够在更多样化和真实世界环境中运行的模型。我们提出了一种新颖的方法，称为Bottleneck Adapter，专门设计用于增强这些复杂模型的多模态功能，通过一种称为多模态模型调整（MMT）的过程实现对整个多模态LLM框架的联合优化。我们的方法利用轻量级适配器将图像编码器和LLM连接起来，而无需大型、复杂的神经网络。与传统的模块化训练方案不同，我们的方法采用端到端优化制度，结合适配器，通过显著较小的参数集实现联合优化。我们的方法表现出稳健的性能，准确率达到90.12\%，优于人类水平表现（88.4%）和LaVIN-7B（89.41%）。

更新时间: 2024-07-25 06:59:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17813v1

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose output-specific $(\varepsilon,\delta)$-DP to characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We also design an efficient algorithm to investigate individual privacy across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees. For example, on CIFAR-10, the average $\varepsilon$ of the class with the lowest test accuracy is 44.2\% higher than that of the class with the highest accuracy.

Updated: 2024-07-25 06:33:58

标题: 个人隐私在差分隐私随机梯度下降中的考虑

摘要: 差分私有随机梯度下降（DP-SGD）是最近私有深度学习进展的主要算法。它为数据集中的所有数据点提供单一隐私保证。我们提出了输出特定的（ε，δ）-DP来描述释放由DP-SGD训练的模型时对个体示例的隐私保证。我们还设计了一种高效算法来研究跨多个数据集的个体隐私。我们发现大多数示例享有比最坏情况下界更强的隐私保证。我们进一步发现一个示例的训练损失和隐私参数之间存在良好的相关性。这意味着在模型效用方面服务不足的群体同时经历较弱的隐私保证。例如，在CIFAR-10上，具有最低测试准确度的类别的平均ε比具有最高准确度的类别高出44.2％。

更新时间: 2024-07-25 06:33:58

领域: cs.LG,cs.CR,cs.DS,stat.ML

下载: http://arxiv.org/abs/2206.02617v7

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts

The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing. Despite the success, most existing methods face a challenge for balance between sparsity and the availability of expert knowledge: enhancing performance through increased use of expert knowledge often results in diminishing sparsity during expert selection. To mitigate this contradiction, we propose HyperMoE, a novel MoE framework built upon Hypernetworks. This framework integrates the computational processes of MoE with the concept of knowledge transferring in multi-task learning. Specific modules generated based on the information of unselected experts serve as supplementary information, which allows the knowledge of experts not selected to be used while maintaining selection sparsity. Our comprehensive empirical evaluations across multiple datasets and backbones establish that HyperMoE significantly outperforms existing MoE methods under identical conditions concerning the number of experts.

Updated: 2024-07-25 06:28:01

标题: 超级MoE：通过专家之间的转移实现更好的专家混合

摘要: 专家混合（MoE）用于语言模型已被证明通过动态路由每个输入标记到特定专家子集进行处理，有效增加模型的容量。尽管取得成功，大多数现有方法在稀疏性和专家知识可用性之间保持平衡面临挑战：通过增加专家知识的使用来提高性能往往会导致在专家选择过程中稀疏性减弱。为了缓解这种矛盾，我们提出了HyperMoE，这是一个基于超网络构建的新型MoE框架。该框架将MoE的计算过程与多任务学习中的知识传递概念相结合。基于未选择专家的信息生成的特定模块作为补充信息，允许未选择专家的知识被使用，同时保持选择稀疏性。我们在多个数据集和骨干网络上进行了全面的实证评估，结果表明，在专家数量相同的情况下，HyperMoE在相同条件下明显优于现有的MoE方法。

更新时间: 2024-07-25 06:28:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.12656v4

Automatic Data Labeling for Software Vulnerability Prediction Models: How Far Are We?

Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile, there are growing efforts in automatic SV labeling at scale. However, the fitness of auto-labeled data for SV prediction is still largely unknown. Aims: We quantitatively and qualitatively study the quality and use of the state-of-the-art auto-labeled SV data, D2A, for SV prediction. Method: Using multiple sources and manual validation, we curate clean SV data from human-labeled SV-fixing commits in two well-known projects for investigating the auto-labeled counterparts. Results: We discover that 50+% of the auto-labeled SVs are noisy (incorrectly labeled), and they hardly overlap with the publicly reported ones. Yet, SV prediction models utilizing the noisy auto-labeled SVs can perform up to 22% and 90% better in Matthews Correlation Coefficient and Recall, respectively, than the original models. We also reveal the promises and difficulties of applying noise-reduction methods for automatically addressing the noise in auto-labeled SV data to maximize the data utilization for SV prediction. Conclusions: Our study informs the benefits and challenges of using auto-labeled SVs, paving the way for large-scale SV prediction.

Updated: 2024-07-25 06:22:25

标题: 软件漏洞预测模型的自动数据标注：我们已经走了多远？

摘要: 背景：软件漏洞（SV）预测需要大规模和高质量的数据才能表现良好。当前的SV数据集大多需要专家（人工标注）进行昂贵的标注工作，因此在规模上受到限制。与此同时，自动进行大规模SV标注的工作正在增长。然而，自动标注数据在SV预测中的适用性仍然大部分未知。目的：我们定量和定性地研究了最先进的自动标注SV数据D2A的质量和用途，以用于SV预测。方法：通过多个来源和手动验证，我们整理了来自两个知名项目中人工标注的SV修复提交的干净SV数据，以研究自动标注的对应数据。结果：我们发现50%以上的自动标注的SV是嘈杂的（错误标注的），并且它们几乎不与公开报告的SV重叠。然而，利用嘈杂的自动标注SV的SV预测模型在马修斯相关系数和召回率方面可以比原始模型分别提高22%和90%。我们还揭示了应用噪声降低方法来自动处理自动标注SV数据中的噪声以最大程度地利用数据进行SV预测的前景和困难。结论：我们的研究告知了使用自动标注SV的益处和挑战，为大规模SV预测铺平了道路。

更新时间: 2024-07-25 06:22:25

领域: cs.SE,cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.17803v1

Statistical Batch-Based Bearing Fault Detection

In the domain of rotating machinery, bearings are vulnerable to different mechanical faults, including ball, inner, and outer race faults. Various techniques can be used in condition-based monitoring, from classical signal analysis to deep learning methods. Based on the complex working conditions of rotary machines, multivariate statistical process control charts such as Hotelling's $T^2$ and Squared Prediction Error are useful for providing early warnings. However, these methods are rarely applied to condition monitoring of rotating machinery due to the univariate nature of the datasets. In the present paper, we propose a multivariate statistical process control-based fault detection method that utilizes multivariate data composed of Fourier transform features extracted for fixed-time batches. Our approach makes use of the multidimensional nature of Fourier transform characteristics, which record more detailed information about the machine's status, in an effort to enhance early defect detection and diagnosis. Experiments with varying vibration measurement locations (Fan End, Drive End), fault types (ball, inner, and outer race faults), and motor loads (0-3 horsepower) are used to validate the suggested approach. The outcomes illustrate our method's effectiveness in fault detection and point to possible broader uses in industrial maintenance.

Updated: 2024-07-25 06:21:01

标题: 统计批次式轴承故障检测

摘要: 在旋转机械领域，轴承容易受到不同的机械故障影响，包括球体、内圈和外圈故障。各种技术可以用于基于状态的监测，从传统信号分析到深度学习方法。基于旋转机器的复杂工作条件，多变量统计过程控制图，如Hotelling的$T^2$和平方预测误差对提供早期警告很有用。然而，由于数据集的单变量性质，这些方法很少应用于旋转机械的状态监测。在本文中，我们提出了一种基于多变量统计过程控制的故障检测方法，利用由傅里叶变换特征组成的固定时间批次的多变量数据。我们的方法利用了傅里叶变换特征的多维性质，记录了关于机器状态的更详细信息，以增强早期缺陷检测和诊断。实验中使用了不同振动测量位置（风扇端、传动端）、故障类型（球体、内圈和外圈故障）和电机负载（0-3马力）来验证建议的方法。结果表明我们的方法在故障检测方面的有效性，并指出在工业维护方面可能有更广泛的用途。

更新时间: 2024-07-25 06:21:01

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.17236v2

EEG-SSM: Leveraging State-Space Model for Dementia Detection

State-space models (SSMs) have garnered attention for effectively processing long data sequences, reducing the need to segment time series into shorter intervals for model training and inference. Traditionally, SSMs capture only the temporal dynamics of time series data, omitting the equally critical spectral features. This study introduces EEG-SSM, a novel state-space model-based approach for dementia classification using EEG data. Our model features two primary innovations: EEG-SSM temporal and EEG-SSM spectral components. The temporal component is designed to efficiently process EEG sequences of varying lengths, while the spectral component enhances the model by integrating frequency-domain information from EEG signals. The synergy of these components allows EEG-SSM to adeptly manage the complexities of multivariate EEG data, significantly improving accuracy and stability across different temporal resolutions. Demonstrating a remarkable 91.0 percent accuracy in classifying Healthy Control (HC), Frontotemporal Dementia (FTD), and Alzheimer's Disease (AD) groups, EEG-SSM outperforms existing models on the same dataset. The development of EEG-SSM represents an improvement in the use of state-space models for screening dementia, offering more precise and cost-effective tools for clinical neuroscience.

Updated: 2024-07-25 06:20:03

标题: 脑电图-状态空间模型：利用状态空间模型进行痴呆检测

摘要: 状态空间模型（SSMs）因有效处理长数据序列而受到关注，减少了将时间序列分割为较短间隔进行模型训练和推断的需要。传统上，SSMs仅捕捉时间序列数据的时间动态，而忽略同样关键的频谱特征。本研究引入了EEG-SSM，一种基于状态空间模型的新型方法，利用EEG数据进行痴呆分类。我们的模型具有两个主要创新：EEG-SSM时间和EEG-SSM频谱组件。时间组件设计用于高效处理长度不同的EEG序列，而频谱组件通过整合来自EEG信号的频域信息增强了模型。这些组件的协同作用使EEG-SSM能够熟练处理多变量EEG数据的复杂性，在不同的时间分辨率下显著提高了准确性和稳定性。在对健康对照组（HC）、颞叶额叶痴呆（FTD）和阿尔茨海默病（AD）进行分类时，EEG-SSM展现出卓越的91.0%准确率，优于同一数据集上现有模型的表现。EEG-SSM的发展代表了在筛查痴呆症方面利用状态空间模型的改进，为临床神经科学提供更精确和具有成本效益的工具。

更新时间: 2024-07-25 06:20:03

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.17801v1

A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models

With Vision-Language Pre-training (VLP) models demonstrating powerful multimodal interaction capabilities, the application scenarios of neural networks are no longer confined to unimodal domains but have expanded to more complex multimodal V+L downstream tasks. The security vulnerabilities of unimodal models have been extensively examined, whereas those of VLP models remain challenging. We note that in CV models, the understanding of images comes from annotated information, while VLP models are designed to learn image representations directly from raw text. Motivated by this discrepancy, we developed the Feature Guidance Attack (FGA), a novel method that uses text representations to direct the perturbation of clean images, resulting in the generation of adversarial images. FGA is orthogonal to many advanced attack strategies in the unimodal domain, facilitating the direct application of rich research findings from the unimodal to the multimodal scenario. By appropriately introducing text attack into FGA, we construct Feature Guidance with Text Attack (FGA-T). Through the interaction of attacking two modalities, FGA-T achieves superior attack effects against VLP models. Moreover, incorporating data augmentation and momentum mechanisms significantly improves the black-box transferability of FGA-T. Our method demonstrates stable and effective attack capabilities across various datasets, downstream tasks, and both black-box and white-box settings, offering a unified baseline for exploring the robustness of VLP models.

Updated: 2024-07-25 06:10:33

标题: 一个关于单模型和视觉-语言预训练模型敌对脆弱性的统一理解

摘要: 随着视觉语言预训练（VLP）模型展示出强大的多模态交互能力，神经网络的应用场景不再局限于单模态领域，而是扩展到更复杂的多模态V+L下游任务。单模态模型的安全漏洞已经得到广泛检查，而VLP模型的安全漏洞仍然具有挑战性。我们注意到，在计算机视觉模型中，对图像的理解来自于注释信息，而VLP模型被设计为直接从原始文本中学习图像表示。受到这一差异的启发，我们开发了一种名为特征引导攻击（FGA）的新方法，该方法利用文本表示来引导对清晰图像的扰动，从而生成对抗性图像。FGA与单模态领域的许多高级攻击策略是正交的，有助于将单模态领域的丰富研究成果直接应用到多模态场景中。通过适当引入文本攻击到FGA中，我们构建了带有文本攻击的特征引导（FGA-T）。通过攻击两种模态的相互作用，FGA-T实现了对VLP模型的优越攻击效果。此外，结合数据增强和动量机制显著提高了FGA-T的黑盒可迁移性。我们的方法在各种数据集、下游任务以及黑盒和白盒设置中展示了稳定且有效的攻击能力，为探索VLP模型的稳健性提供了统一的基准线。

更新时间: 2024-07-25 06:10:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17797v1

Enhancing Diversity in Multi-objective Feature Selection

Feature selection plays a pivotal role in the data preprocessing and model-building pipeline, significantly enhancing model performance, interpretability, and resource efficiency across diverse domains. In population-based optimization methods, the generation of diverse individuals holds utmost importance for adequately exploring the problem landscape, particularly in highly multi-modal multi-objective optimization problems. Our study reveals that, in line with findings from several prior research papers, commonly employed crossover and mutation operations lack the capability to generate high-quality diverse individuals and tend to become confined to limited areas around various local optima. This paper introduces an augmentation to the diversity of the population in the well-established multi-objective scheme of the genetic algorithm, NSGA-II. This enhancement is achieved through two key components: the genuine initialization method and the substitution of the worst individuals with new randomly generated individuals as a re-initialization approach in each generation. The proposed multi-objective feature selection method undergoes testing on twelve real-world classification problems, with the number of features ranging from 2,400 to nearly 50,000. The results demonstrate that replacing the last front of the population with an equivalent number of new random individuals generated using the genuine initialization method and featuring a limited number of features substantially improves the population's quality and, consequently, enhances the performance of the multi-objective algorithm.

Updated: 2024-07-25 06:09:44

标题: 增强多目标特征选择中的多样性

摘要: 特征选择在数据预处理和模型构建流程中起着关键作用，显著提升了模型性能、可解释性和资源效率，跨越不同领域。在基于群体的优化方法中，生成多样性个体对于充分探索问题景观尤为重要，尤其是在高度多峰多目标优化问题中。我们的研究表明，与先前几篇研究论文的发现一致，常用的交叉和突变操作缺乏生成高质量多样性个体的能力，并且往往局限于各种局部最优解周围的有限区域。本文在遗传算法的经典多目标方案NSGA-II中引入了一种增强人口多样性的方法。该增强通过两个关键组件实现：真实初始化方法和在每一代中以新随机生成的个体替换最差个体作为重新初始化方法。所提出的多目标特征选择方法在十二个真实分类问题上进行了测试，特征数量范围从2,400到近50,000。结果表明，用具有有限特征数量的新随机个体以及采用真实初始化方法生成的等量新随机个体替换人口的最后一层，显著提高了人口的质量，进而提升了多目标算法的性能。

更新时间: 2024-07-25 06:09:44

领域: cs.LG,68T05 (Primary), 68T20, 68W20, 68W40, 90C29, 90C27, 62H30, 62H25 (Secondary),I.2.6; I.2.8; I.5; H.2.8

下载: http://arxiv.org/abs/2407.17795v1

High Significant Fault Detection in Azure Core Workload Insights

Azure Core workload insights have time-series data with different metric units. Faults or Anomalies are observed in these time-series data owing to faults observed with respect to metric name, resources region, dimensions, and its dimension value associated with the data. For Azure Core, an important task is to highlight faults or anomalies to the user on a dashboard that they can perceive easily. The number of anomalies reported should be highly significant and in a limited number, e.g., 5-20 anomalies reported per hour. The reported anomalies will have significant user perception and high reconstruction error in any time-series forecasting model. Hence, our task is to automatically identify 'high significant anomalies' and their associated information for user perception.

Updated: 2024-07-25 06:05:54

标题: Azure核心工作负载洞察中的高显著故障检测

摘要: Azure核心工作负载洞察拥有具有不同度量单位的时间序列数据。观察到这些时间序列数据中的故障或异常是由于观察到的与度量名称、资源区域、维度以及与数据相关联的维度值有关的故障。对于Azure核心，一个重要的任务是在用户可以轻松感知的仪表板上突出显示故障或异常。报告的异常数量应该非常显著且数量有限，例如，每小时报告5-20个异常。报告的异常将在任何时间序列预测模型中具有显著的用户感知和高重建误差。因此，我们的任务是自动识别“高显著异常”及其相关信息以供用户感知。

更新时间: 2024-07-25 06:05:54

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2404.09302v2

Investigating learning-independent abstract reasoning in artificial neural networks

Humans are capable of solving complex abstract reasoning tests. Whether this ability reflects a learning-independent inference mechanism applicable to any novel unlearned problem or whether it is a manifestation of extensive training throughout life is an open question. Addressing this question in humans is challenging because it is impossible to control their prior training. However, assuming a similarity between the cognitive processing of Artificial Neural Networks (ANNs) and humans, the extent to which training is required for ANNs' abstract reasoning is informative about this question in humans. Previous studies demonstrated that ANNs can solve abstract reasoning tests. However, this success required extensive training. In this study, we examined the learning-independent abstract reasoning of ANNs. Specifically, we evaluated their performance without any pretraining, with the ANNs' weights being randomly-initialized, and only change in the process of problem solving. We found that naive ANN models can solve non-trivial visual reasoning tests, similar to those used to evaluate human learning-independent reasoning. We further studied the mechanisms that support this ability. Our results suggest the possibility of learning-independent abstract reasoning that does not require extensive training.

Updated: 2024-07-25 05:58:58

标题: 研究人工神经网络中与学习无关的抽象推理

摘要: 人类有能力解决复杂的抽象推理测试。这种能力是否反映了一种学习无关的推理机制，适用于任何新的未学习过的问题，或者是否它是通过终身广泛训练的表现，这是一个悬而未决的问题。在人类中探讨这个问题是具有挑战性的，因为不可能控制他们的先前训练。然而，假设人工神经网络（ANNs）的认知处理与人类相似，那么对于ANNs的抽象推理需要多少训练对于这个问题在人类中是有启发意义的。先前的研究表明，ANNs能够解决抽象推理测试。然而，这种成功需要大量的训练。在这项研究中，我们研究了ANNs的学习无关的抽象推理。具体地，我们评估了它们在没有任何预训练的情况下的表现，ANNs的权重是随机初始化的，只有在解决问题的过程中发生变化。我们发现，天真的ANN模型可以解决非平凡的视觉推理测试，类似于评估人类学习无关推理所使用的测试。我们进一步研究了支持这种能力的机制。我们的结果表明存在一种学习无关的抽象推理，不需要大量的训练。

更新时间: 2024-07-25 05:58:58

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.17791v1

Variational Inference with Coverage Guarantees in Simulation-Based Inference

Amortized variational inference is an often employed framework in simulation-based inference that produces a posterior approximation that can be rapidly computed given any new observation. Unfortunately, there are few guarantees about the quality of these approximate posteriors. We propose Conformalized Amortized Neural Variational Inference (CANVI), a procedure that is scalable, easily implemented, and provides guaranteed marginal coverage. Given a collection of candidate amortized posterior approximators, CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor. CANVI ensures that the resulting predictor constructs regions that contain the truth with a user-specified level of probability. CANVI is agnostic to design decisions in formulating the candidate approximators and only requires access to samples from the forward model, permitting its use in likelihood-free settings. We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation. Finally, we demonstrate the accurate calibration and high predictive efficiency of CANVI on a suite of simulation-based inference benchmark tasks and an important scientific task: analyzing galaxy emission spectra.

Updated: 2024-07-25 05:53:46

标题: 在基于模拟推断中具有覆盖保证的变分推断

摘要: 摊销变分推断是一种经常在基于模拟的推断中使用的框架，它产生一个后验近似，可以在给定任何新观察时快速计算。不幸的是，关于这些近似后验的质量很少有保证。我们提出了一种名为Conformalized Amortized Neural Variational Inference (CANVI)的过程，它是可扩展的、易于实现的，并提供了保证的边际覆盖。给定一组候选的摊销后验近似器，CANVI基于每个候选构建了符合化预测器，使用一个称为预测效率的度量来比较预测器，并返回最有效的预测器。CANVI确保所得到的预测器构建的区域以用户指定的概率水平包含真实值。CANVI不关心设计决策，只需访问正向模型的样本，允许其在无似然设置中使用。我们证明了CANVI产生的区域的预测效率的下限，并探讨后验近似的质量如何与基于该近似的预测区域的预测效率相关联。最后，我们展示了CANVI在一系列基于模拟的推断基准任务和一个重要的科学任务上的准确校准和高预测效率：分析星系发射光谱。

更新时间: 2024-07-25 05:53:46

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2305.14275v3

Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation

Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on the implementation of KANs in hardware design, which would directly demonstrate whether KANs are truly superior to MLPs in practical applications. As a result, in this paper, we focus on verifying KANs for classification issues, which are a common but significant topic in AI using four different types of datasets. Furthermore, the corresponding hardware implementation is considered using the Vitis high-level synthesis (HLS) tool. To the best of our knowledge, this is the first article to implement hardware for KAN. The results indicate that KANs cannot achieve more accuracy than MLPs in high complex datasets while utilizing substantially higher hardware resources. Therefore, MLP remains an effective approach for achieving accuracy and efficiency in software and hardware implementation.

Updated: 2024-07-25 05:52:48

标题: 探讨Kolmogorov-Arnold网络在分类中的局限性：对软件培训和硬件实现的见解

摘要: 科尔莫戈洛夫-阿诺德网络（KANs）是一种新型的神经网络，最近因能够在人工智能（AI）中以更高的准确性和互操作性替代多层感知器（MLPs）而受到关注和流行。然而，KAN的评估仍然有限，无法对特定领域进行深入分析。此外，还没有关于在硬件设计中实现KAN的研究，这将直接证明KAN在实际应用中是否真正优于MLP。因此，在本文中，我们专注于验证KAN在分类问题上的表现，这是AI中常见但重要的主题，使用四种不同类型的数据集。此外，考虑了使用Vitis高层次综合（HLS）工具进行相应的硬件实现。据我们所知，这是第一篇实现KAN硬件的文章。结果表明，在高复杂数据集中，KAN无法比MLP实现更高的准确性，同时利用了更多的硬件资源。因此，在软件和硬件实现中，MLP仍然是实现准确性和效率的有效方法。

更新时间: 2024-07-25 05:52:48

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2407.17790v1

Very Large-Scale Multi-Agent Simulation in AgentScope

Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting multi-agent simulations with existing platforms, such as limited scalability and low efficiency, unsatisfied agent diversity, and effort-intensive management processes. To address these challenges, we develop several new features and components for AgentScope, a user-friendly multi-agent platform, enhancing its convenience and flexibility for supporting very large-scale multi-agent simulations. Specifically, we propose an actor-based distributed mechanism as the underlying technological infrastructure towards great scalability and high efficiency, and provide flexible environment support for simulating various real-world scenarios, which enables parallel execution of multiple agents, centralized workflow orchestration, and both inter-agent and agent-environment interactions among agents. Moreover, we integrate an easy-to-use configurable tool and an automatic background generation pipeline in AgentScope, simplifying the process of creating agents with diverse yet detailed background settings. Last but not least, we provide a web-based interface for conveniently monitoring and managing a large number of agents that might deploy across multiple devices. We conduct a comprehensive simulation to demonstrate the effectiveness of the proposed enhancements in AgentScope, and provide detailed observations and discussions to highlight the great potential of applying multi-agent systems in large-scale simulations. The source code is released on GitHub at https://github.com/modelscope/agentscope to inspire further research and development in large-scale multi-agent simulations.

Updated: 2024-07-25 05:50:46

标题: AgentScope中的非常大规模多智能体模拟

摘要: 最近大型语言模型（LLMs）的进展为在非常大规模模拟中应用多智能体系统开辟了新的道路。然而，在使用现有平台进行多智能体模拟时仍存在几个挑战，如可扩展性和效率有限、智能体多样性不足以及管理过程繁琐。为解决这些挑战，我们为AgentScope开发了几项新功能和组件，这是一个用户友好的多智能体平台，增强了其便利性和灵活性，以支持非常大规模的多智能体模拟。具体来说，我们提出了一种基于actor的分布式机制作为技术基础设施，实现了极高的可扩展性和效率，并提供了灵活的环境支持，用于模拟各种真实场景，从而实现多个智能体的并行执行、集中式工作流编排以及智能体之间以及智能体与环境之间的交互。此外，我们在AgentScope中集成了一个易于使用的可配置工具和一个自动生成后台流水线，简化了创建具有多样性但详细背景设置的智能体的过程。最后，我们提供了一个基于web的界面，方便监控和管理可能部署在多个设备上的大量智能体。我们进行了全面的模拟，以展示在AgentScope中提出的增强措施的有效性，并提供详细的观察和讨论，突显在大规模模拟中应用多智能体系统的巨大潜力。源代码已发布在GitHub上（https://github.com/modelscope/agentscope），以激发对大规模多智能体模拟的进一步研究和发展。

更新时间: 2024-07-25 05:50:46

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2407.17789v1

PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation

Recent advances in Large Language Models (LLMs) have shown significant potential in enhancing cybersecurity defenses against sophisticated threats. LLM-based penetration testing is an essential step in automating system security evaluations by identifying vulnerabilities. Remediation, the subsequent crucial step, addresses these discovered vulnerabilities. Since details about vulnerabilities, exploitation methods, and software versions offer crucial insights into system weaknesses, integrating penetration testing with vulnerability remediation into a cohesive system has become both intuitive and necessary. This paper introduces PenHeal, a two-stage LLM-based framework designed to autonomously identify and mitigate security vulnerabilities. The framework integrates two LLM-enabled components: the Pentest Module, which detects multiple vulnerabilities within a system, and the Remediation Module, which recommends optimal remediation strategies. The integration is facilitated through Counterfactual Prompting and an Instructor module that guides the LLMs using external knowledge to explore multiple potential attack paths effectively. Our experimental results demonstrate that PenHeal not only automates the identification and remediation of vulnerabilities but also significantly improves vulnerability coverage by 31%, increases the effectiveness of remediation strategies by 32%, and reduces the associated costs by 46% compared to baseline models. These outcomes highlight the transformative potential of LLMs in reshaping cybersecurity practices, offering an innovative solution to defend against cyber threats.

Updated: 2024-07-25 05:42:14

标题: PenHeal：自动化渗透测试和最佳补救的两阶段LLM框架

摘要: 最近大型语言模型（LLMs）的进展显示出在增强网络安全防御对抗复杂威胁方面具有显著潜力。基于LLM的渗透测试是自动化系统安全评估中的一个关键步骤，通过识别漏洞。随后的关键步骤是漏洞修复，以解决这些发现的漏洞。由于有关漏洞、利用方法和软件版本的细节提供了对系统弱点的重要见解，将渗透测试与漏洞修复整合为一个统一系统已变得直观且必要。本文介绍了PenHeal，这是一个两阶段基于LLM的框架，旨在自动识别和减轻安全漏洞。该框架整合了两个LLM启用的组件：Pentest模块，用于检测系统中的多个漏洞，以及Remediation模块，用于推荐最佳的修复策略。集成通过对抗提示和一个指导LLMs使用外部知识来有效地探索多个潜在攻击路径的导师模块实现。我们的实验结果表明，与基准模型相比，PenHeal不仅自动化了漏洞的识别和修复，还将漏洞覆盖率提高了31%，将修复策略的有效性提高了32%，并将相关成本降低了46%。这些结果突显了LLMs在重塑网络安全实践方面的转变潜力，为应对网络威胁提供了创新解决方案。

更新时间: 2024-07-25 05:42:14

领域: cs.CR

下载: http://arxiv.org/abs/2407.17788v1

HC-GST: Heterophily-aware Distribution Consistency based Graph Self-training

Graph self-training (GST), which selects and assigns pseudo-labels to unlabeled nodes, is popular for tackling label sparsity in graphs. However, recent study on homophily graphs show that GST methods could introduce and amplify distribution shift between training and test nodes as they tend to assign pseudo-labels to nodes they are good at. As GNNs typically perform better on homophilic nodes, there could be potential shifts towards homophilic pseudo-nodes, which is underexplored. Our preliminary experiments on heterophilic graphs verify that these methods can cause shifts in homophily ratio distributions, leading to \textit{training bias} that improves performance on homophilic nodes while degrading it on heterophilic ones. Therefore, we study a novel problem of reducing homophily ratio distribution shifts during self-training on heterophilic graphs. A key challenge is the accurate calculation of homophily ratios and their distributions without extensive labeled data. To tackle them, we propose a novel Heterophily-aware Distribution Consistency-based Graph Self-Training (HC-GST) framework, which estimates homophily ratios using soft labels and optimizes a selection vector to align pseudo-nodes with the global homophily ratio distribution. Extensive experiments on both homophilic and heterophilic graphs show that HC-GST effectively reduces training bias and enhances self-training performance.

Updated: 2024-07-25 05:38:06

标题: HC-GST：基于异质性感知的分布一致性图自训练

摘要: 图形自训练（GST）通过选择和分配伪标签给未标记节点，以解决图中标签稀疏性问题而备受青睐。然而，最近对同质性图的研究表明，由于GST方法倾向于将伪标签分配给它们擅长的节点，可能会引入和放大训练和测试节点之间的分布偏移。由于GNN通常在同质节点上表现更好，可能存在向同质伪节点转移的潜在风险，但这方面尚未被充分探讨。我们对异质性图进行的初步实验验证了这些方法可能导致同质比率分布发生变化，产生了训练偏差，从而提高同质节点的性能，但降低了异质节点的性能。因此，我们研究了在异质性图上进行自训练时减少同质比率分布偏移的一个新问题。一个关键挑战是在没有大量标记数据的情况下准确计算同质比率及其分布。为了解决这些问题，我们提出了一种新颖的基于异质性感知的分布一致性图自训练（HC-GST）框架，该框架使用软标签估算同质比率，并优化选择向量以使伪节点与全局同质比率分布对齐。在同质性图和异质性图上进行的大量实验表明，HC-GST有效地减少了训练偏差并增强了自训练性能。

更新时间: 2024-07-25 05:38:06

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2407.17787v1

Enhancing Environmental Monitoring through Multispectral Imaging: The WasteMS Dataset for Semantic Segmentation of Lakeside Waste

Environmental monitoring of lakeside green areas is crucial for environmental protection. Compared to manual inspections, computer vision technologies offer a more efficient solution when deployed on-site. Multispectral imaging provides diverse information about objects under different spectrums, aiding in the differentiation between waste and lakeside lawn environments. This study introduces WasteMS, the first multispectral dataset established for the semantic segmentation of lakeside waste. WasteMS includes a diverse range of waste types in lawn environments, captured under various lighting conditions. We implemented a rigorous annotation process to label waste in images. Representative semantic segmentation frameworks were used to evaluate segmentation accuracy using WasteMS. Challenges encountered when using WasteMS for segmenting waste on lakeside lawns were discussed. The WasteMS dataset is available at https://github.com/zhuqinfeng1999/WasteMS.

Updated: 2024-07-25 05:23:24

标题: 通过多光谱成像加强环境监测：用于湖边废物语义分割的WasteMS数据集

摘要: 湖滨绿地的环境监测对于环境保护至关重要。与手动检查相比，计算机视觉技术在现场部署时提供了更高效的解决方案。多光谱成像提供了有关不同光谱下物体的多样信息，有助于区分废物和湖滨草坪环境。本研究介绍了WasteMS，这是为湖滨废物语义分割建立的第一个多光谱数据集。WasteMS包括在不同光照条件下捕获的草坪环境中各种废物类型。我们实施了严格的标注过程来标记图像中的废物。采用代表性的语义分割框架来评估使用WasteMS进行分割的准确性。讨论了在湖滨草坪上使用WasteMS进行废物分割时遇到的挑战。WasteMS数据集可在https://github.com/zhuqinfeng1999/WasteMS获取。

更新时间: 2024-07-25 05:23:24

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2407.17028v2

How Lightweight Can A Vision Transformer Be

In this paper, we explore a strategy that uses Mixture-of-Experts (MoE) to streamline, rather than augment, vision transformers. Each expert in an MoE layer is a SwiGLU feedforward network, where V and W2 are shared across the layer. No complex attention or convolutional mechanisms are employed. Depth-wise scaling is applied to progressively reduce the size of the hidden layer and the number of experts is increased in stages. Grouped query attention is used. We studied the proposed approach with and without pre-training on small datasets and investigated whether transfer learning works at this scale. We found that the architecture is competitive even at a size of 0.67M parameters.

Updated: 2024-07-25 05:23:20

标题: 一个视觉Transformer可以有多轻量化？

摘要: 在本文中，我们探讨了一种使用专家混合（MoE）策略来简化而非增强视觉变换器的方法。MoE层中的每个专家是一个SwiGLU前馈网络，其中V和W2在层间共享。没有使用复杂的注意力机制或卷积机制。逐层缩放被应用于逐渐减小隐藏层的大小，并且在不同阶段增加专家的数量。使用了分组查询注意力。我们研究了在小数据集上进行预训练和不进行预训练的提议方法，并调查了在这个规模上是否可以进行迁移学习。我们发现，即使在0.67M参数的规模下，该架构也具有竞争力。

更新时间: 2024-07-25 05:23:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17783v1

Integrating Ensemble Kalman Filter with AI-based Weather Prediction Model ClimaX

Artificial intelligence (AI)-based weather prediction research is growing rapidly and has shown to be competitive with the advanced dynamic numerical weather prediction models. However, research combining AI-based weather prediction models with data assimilation remains limited partially because long-term sequential data assimilation cycles are required to evaluate data assimilation systems. This study explores integrating the local ensemble transform Kalman filter (LETKF) with an AI-based weather prediction model ClimaX. Our experiments demonstrated that the ensemble data assimilation cycled stably for the AI-based weather prediction model using covariance inflation and localization techniques inside the LETKF. While ClimaX showed some limitations in capturing flow-dependent error covariance compared to dynamical models, the AI-based ensemble forecasts provided reasonable and beneficial error covariance in sparsely observed regions. These findings highlight the potential of AI models in weather forecasting and the importance of physical consistency and accurate error growth representation in improving ensemble data assimilation.

Updated: 2024-07-25 05:22:08

标题: 将集合卡尔曼滤波器与基于人工智能的天气预测模型ClimaX集成

摘要: 基于人工智能（AI）的天气预测研究正在迅速增长，并已显示出与先进的动态数值天气预测模型相竞争的能力。然而，将基于AI的天气预测模型与数据同化相结合的研究仍然有限，部分原因是需要长期的顺序数据同化循环来评估数据同化系统。本研究探讨了将局部集合变换卡尔曼滤波器（LETKF）与基于AI的天气预测模型ClimaX集成的方法。我们的实验表明，利用LETKF内的协方差膨胀和本地化技术，集合数据同化循环稳定运行于基于AI的天气预测模型上。虽然与动力学模型相比，ClimaX在捕获流动相关误差协方差方面存在一些限制，但基于AI的集合预测在稀疏观测区域提供了合理且有益的误差协方差。这些发现突显了AI模型在天气预测中的潜力，以及在改进集合数据同化过程中物理一致性和准确的误差增长表示的重要性。

更新时间: 2024-07-25 05:22:08

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2407.17781v1

Advancing Multi-Modal Sensing Through Expandable Modality Alignment

Sensing technology is widely used for comprehending the physical world, with numerous modalities explored in past decades. While there has been considerable work on multi-modality learning, they all require data of all modalities be paired. How to leverage multi-modality data with partially pairings remains an open problem. To tackle this challenge, we introduce the Babel framework, encompassing the neural network architecture, data preparation and processing, as well as the training strategies. Babel serves as a scalable pre-trained multi-modal sensing neural network, currently aligning six sensing modalities, namely Wi-Fi, mmWave, IMU, LiDAR, video, and depth. To overcome the scarcity of complete paired data, the key idea of Babel involves transforming the N-modality alignment into a series of two-modality alignments by devising the expandable network architecture. This concept is also realized via a series of novel techniques, including the pre-trained modality tower that capitalizes on available single-modal networks, and the adaptive training strategy balancing the contribution of the newly incorporated modality with the previously established modality alignment. Evaluation demonstrates Babel's outstanding performance on eight human activity recognition datasets, compared to various baselines e.g., the top multi-modal sensing framework, single-modal sensing networks, and multi-modal large language models. Babel not only effectively fuses multiple available modalities (up to 22% accuracy increase), but also enhance the performance of individual modality (12% averaged accuracy improvement). Case studies also highlight exciting application scenarios empowered by Babel, including cross-modality retrieval (i.e., sensing imaging), and bridging LLM for sensing comprehension.

Updated: 2024-07-25 05:10:48

标题: 推进多模式传感器通过可扩展的模态对齐

摘要: 传感技术被广泛应用于理解物理世界，过去几十年探索了许多模态。虽然已经进行了大量关于多模态学习的工作，但它们都需要所有模态的数据成对。如何利用部分配对的多模态数据仍然是一个未解决的问题。为了解决这一挑战，我们引入了Babel框架，包括神经网络架构、数据准备和处理，以及训练策略。Babel作为一个可扩展的预训练多模态感知神经网络，目前对齐六种传感模态，即Wi-Fi、毫米波、IMU、LiDAR、视频和深度。为了克服完全配对数据的稀缺性，Babel的关键思想是通过设计可扩展的网络架构，将N模态对齐转化为一系列两模态对齐。这一概念还通过一系列新技术实现，包括利用可用的单模态网络的预训练模态塔，以及平衡新整合的模态与先前建立的模态对齐贡献的自适应训练策略。评估表明，与各种基线（如顶级多模态感知框架、单模态感知网络和多模态大语言模型）相比，Babel在八个人类活动识别数据集上表现出色。Babel不仅有效地融合了多种可用模态（精度增加了22%），还提高了单个模态的性能（平均精度提高了12%）。案例研究还突显了Babel赋予的令人兴奋的应用场景，包括跨模态检索（即感知成像）和桥接LLM以实现感知理解。

更新时间: 2024-07-25 05:10:48

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2407.17777v1

SoK: Bridging Trust into the Blockchain. A Systematic Review on On-Chain Identity

The ongoing regulation of blockchain-based services and applications requires the identification of users who are issuing transactions on the blockchain. This systematic review explores the current status, identifies research gaps, and outlines future research directions for establishing trusted and privacy-compliant identities on the blockchain (on-chain identity). A systematic search term was applied across various scientific databases, collecting 2232 potentially relevant research papers. These papers were narrowed down in two methodologically executed steps to 98 and finally to 13 relevant sources. The relevant articles were then systematically analyzed based on a set of screening questions. The results of the selected studies have provided insightful findings on the mechanisms of on-chain identities. On-chain identities are established using zero-knowledge proofs, public key infrastructure/certificates, and web of trust approaches. The technologies and architectures used by the authors are also highlighted. Trust has emerged as a key research gap, manifesting in two ways: firstly, a gap in how to trust the digital identity representation of a physical human; secondly, a gap in how to trust identity providers that issue identity confirmations on-chain. Potential future research avenues are suggested to help fill the current gaps in establishing trust and on-chain identities.

Updated: 2024-07-25 05:06:44

标题: SoK: 将信任引入区块链。关于链上身份的系统性审查

摘要: 区块链服务和应用的持续监管需要确定在区块链上发布交易的用户身份。本系统综述探讨了目前的情况，确定了研究中的空白，并概述了未来建立信任和符合隐私的区块链身份（链上身份）的研究方向。在各种科学数据库中应用了系统搜索术语，收集了2232篇潜在相关的研究论文。这些论文经过两个方法论执行步骤缩减至98篇，最终筛选出13篇相关来源。然后根据一组筛选问题对相关文章进行了系统分析。所选研究的结果为链上身份机制提供了深刻的发现。链上身份是通过零知识证明、公钥基础设施/证书和信任网络方法建立的。作者使用的技术和架构也得到了突出。信任已成为一个关键的研究空白，表现为两种方式：首先，如何信任数字身份代表物理人的空白；其次，如何信任在链上发布身份确认的身份提供者的空白。建议未来的研究途径，以帮助填补当前建立信任和链上身份的空白。

更新时间: 2024-07-25 05:06:44

领域: cs.CR,cs.CY,cs.DC

下载: http://arxiv.org/abs/2407.17276v2

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children. A "visual analogy" is an abstract rule inferred from one image and applied to another. While benchmarks exist for testing visual reasoning in LMMs, they require advanced skills and omit basic visual analogies that even young children can make. Inspired by developmental psychology, we propose a new benchmark of 1,400 visual transformations of everyday objects to test LMMs on visual analogical reasoning and compare them to children and adults. We structure the evaluation into three stages: identifying what changed (e.g., color, number, etc.), how it changed (e.g., added one object), and applying the rule to new scenarios. Our findings show that while models like GPT-4V, LLaVA-1.5, and MANTIS identify the "what" effectively, they struggle with quantifying the "how" and extrapolating this rule to new objects. In contrast, children and adults exhibit much stronger analogical reasoning at all three stages. Additionally, the strongest tested model, GPT-4V, performs better in tasks involving simple visual attributes like color and size, correlating with quicker human adult response times. Conversely, more complex tasks such as number, rotation, and reflection, which necessitate extensive cognitive processing and understanding of the 3D physical world, present more significant challenges. Altogether, these findings highlight the limitations of training models on data that primarily consists of 2D images and text.

Updated: 2024-07-25 05:02:39

标题: KiVA: 基于儿童灵感的视觉类比用于测试大型多模态模型

摘要: 本文研究了大型多模态模型（LMMs）中的视觉类比推理，与人类成人和儿童进行了比较。"视觉类比"是从一幅图像中推断出的抽象规则，并应用于另一幅图像。虽然存在用于测试LMM中视觉推理的基准，但它们需要高级技能，并且忽略了即使年幼儿童也能做出的基本视觉类比。受发展心理学的启发，我们提出了一个新的基准，包括对日常物体进行的1,400种视觉转换，以测试LMM在视觉类比推理上的表现，并将它们与儿童和成人进行比较。我们将评估分为三个阶段：识别发生了什么变化（例如颜色、数量等），它是如何变化的（例如添加了一个对象），以及将规则应用于新情景。我们的研究结果显示，虽然像GPT-4V、LLaVA-1.5和MANTIS这样的模型有效地识别了"什么"效应，但它们在量化"如何"以及将这一规则推广到新对象方面遇到了困难。相比之下，儿童和成人在所有三个阶段表现出更强的类比推理能力。此外，经过测试的最强模型GPT-4V在涉及颜色和大小等简单视觉属性的任务中表现更好，与人类成人更快的反应时间相关。相反，更复杂的任务，如数量、旋转和反射，需要进行广泛的认知处理和对3D物理世界的理解，这带来了更大的挑战。总的来说，这些发现突显了将模型训练在主要由2D图像和文本组成的数据上的局限性。

更新时间: 2024-07-25 05:02:39

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17773v1

Comment on "An Efficient Privacy-Preserving Ranked Multi-Keyword Retrieval for Multiple Data Owners in Outsourced Cloud"

Protecting the privacy of keywords in the field of search over outsourced cloud data is a challenging task. In IEEE Transactions on Services Computing (Vol. 17 No. 2, March/April 2024), Li et al. proposed PRMKR: efficient privacy-preserving ranked multi-keyword retrieval scheme, which was claimed to resist keyword guessing attack. However, we show that the scheme fails to resist keyword guessing attack, index privacy, and trapdoor privacy. Further, we propose a solution to address the above said issues by correcting the errors in the important equations of the scheme.

Updated: 2024-07-25 05:01:07

标题: 评论：“一种高效的隐私保护排名多关键字检索方法，适用于外包云中的多个数据所有者”

摘要: 在外包云数据搜索领域保护关键字的隐私是一项具有挑战性的任务。在IEEE服务计算交易（第17卷第2期，2024年3月/4月），Li等人提出了PRMKR：高效的隐私保护排名多关键字检索方案，据称能抵抗关键字猜测攻击。然而，我们发现该方案无法抵抗关键字猜测攻击、索引隐私和陷门隐私。此外，我们提出了一个解决方案，通过纠正该方案中重要方程式的错误来解决上述问题。

更新时间: 2024-07-25 05:01:07

领域: cs.CR

下载: http://arxiv.org/abs/2408.05218v1

Online Learning for Autonomous Management of Intent-based 6G Networks

The growing complexity of networks and the variety of future scenarios with diverse and often stringent performance requirements call for a higher level of automation. Intent-based management emerges as a solution to attain high level of automation, enabling human operators to solely communicate with the network through high-level intents. The intents consist of the targets in the form of expectations (i.e., latency expectation) from a service and based on the expectations the required network configurations should be done accordingly. It is almost inevitable that when a network action is taken to fulfill one intent, it can cause negative impacts on the performance of another intent, which results in a conflict. In this paper, we aim to address the conflict issue and autonomous management of intent-based networking, and propose an online learning method based on the hierarchical multi-armed bandits approach for an effective management. Thanks to this hierarchical structure, it performs an efficient exploration and exploitation of network configurations with respect to the dynamic network conditions. We show that our algorithm is an effective approach regarding resource allocation and satisfaction of intent expectations.

Updated: 2024-07-25 04:48:56

标题: 网络学习用于自主管理基于意图的6G网络

摘要: 网络的不断复杂化和未来各种场景的多样性以及通常严格的性能要求需要更高级别的自动化。基于意图的管理被提出作为实现高度自动化的解决方案，使人类操作者仅通过高级意图与网络进行交流。这些意图包括来自服务的期望（即延迟期望）的目标，根据这些期望，应相应地进行网络配置。几乎不可避免的是，当采取网络操作以实现一个意图时，可能会对另一个意图的性能造成负面影响，从而导致冲突。本文旨在解决冲突问题和基于意图的网络自主管理，并提出了一种基于分层多臂老虎机方法的在线学习方法，以实现有效管理。由于这种分层结构，它能够根据动态网络条件高效地探索和利用网络配置。我们展示了我们的算法在资源分配和满足意图期望方面是一种有效的方法。

更新时间: 2024-07-25 04:48:56

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2407.17767v1

Goodness-of-Fit and Clustering of Spherical Data: the QuadratiK package in R and Python

We introduce the QuadratiK package that incorporates innovative data analysis methodologies. The presented software, implemented in both R and Python, offers a comprehensive set of goodness-of-fit tests and clustering techniques using kernel-based quadratic distances, thereby bridging the gap between the statistical and machine learning literatures. Our software implements one, two and k-sample tests for goodness of fit, providing an efficient and mathematically sound way to assess the fit of probability distributions. Expanded capabilities of our software include supporting tests for uniformity on the d-dimensional Sphere based on Poisson kernel densities. Particularly noteworthy is the incorporation of a unique clustering algorithm specifically tailored for spherical data that leverages a mixture of Poisson kernel-based densities on the sphere. Alongside this, our software includes additional graphical functions, aiding the users in validating, as well as visualizing and representing clustering results. This enhances interpretability and usability of the analysis. In summary, our R and Python packages serve as a powerful suite of tools, offering researchers and practitioners the means to delve deeper into their data, draw robust inference, and conduct potentially impactful analyses and inference across a wide array of disciplines.

Updated: 2024-07-25 04:43:32

标题: 拟合优度和球形数据的聚类：R和Python中的QuadratiK软件包

摘要: 我们介绍了一个包含创新数据分析方法的QuadratiK软件包。该软件包在R和Python中实现，提供了一套全面的拟合度测试和聚类技术，使用基于核的二次距离，从而弥合了统计学和机器学习文献之间的差距。我们的软件实现了一样本、两样本和k样本的拟合度测试，提供了一种高效且数学上合理的方法来评估概率分布的拟合度。我们的软件扩展了更多功能，包括基于Poisson核密度在d维球上支持均匀性测试。特别值得注意的是，我们特别定制了一个针对球形数据的独特聚类算法，利用了球面上基于Poisson核的混合密度。此外，我们的软件还包括额外的图形函数，帮助用户验证、可视化和表示聚类结果。这提高了分析的可解释性和可用性。总之，我们的R和Python软件包作为一个强大的工具套件，为研究人员和从业者提供了深入挖掘数据、进行坚固推断以及在各种学科领域进行潜在有影响力的分析和推断的手段。

更新时间: 2024-07-25 04:43:32

领域: stat.CO,cs.LG,cs.MS,stat.AP,stat.ML

下载: http://arxiv.org/abs/2402.02290v2

Utilizing Blockchain and Smart Contracts for Enhanced Fraud Prevention and Minimization in Health Insurance through Multi-Signature Claim Processing

Healthcare insurance provides financial support to access medical services for patients while ensuring timely and guaranteed payment for providers. Insurance fraud poses a significant challenge to insurance companies and policyholders, leading to increased costs and compromised healthcare treatment and service delivery. Most frauds, like phantom billing, upcoding, and unbundling, happen due to the lack of required entity participation. Also, claim activities are not transparent and accountable. Fraud can be prevented and minimized by involving every entity and making actions transparent and accountable. This paper proposes a blockchain-powered smart contract-based insurance claim processing mechanism to prevent and minimize fraud in response to this prevailing issue. All entities patients, providers, and insurance companies actively participate in the claim submission, approval, and acknowledgment process through a multi-signature technique. Also, every activity is captured and recorded in the blockchain using smart contracts to make every action transparent and accountable so that no entity can deny its actions and responsibilities. Blockchains' immutable storage property and strong integrity guarantee that recorded activities are not modified. As healthcare systems and insurance companies continue to deal with fraud challenges, this proposed approach holds the potential to significantly reduce fraudulent activities, ultimately benefiting both insurers and policyholders.

Updated: 2024-07-25 04:42:31

标题: 利用区块链和智能合约提升医疗保险欺诈预防和减少通过多重签名理赔处理

摘要: 医疗保险为患者提供财务支持，以便获得医疗服务，同时确保及时和保证付款给提供者。保险欺诈对保险公司和保单持有人构成重大挑战，导致成本增加和医疗治疗和服务交付受损。大多数欺诈行为，如虚假账单、提高编码和拆分，是由于缺乏所需实体参与而发生的。此外，索赔活动缺乏透明度和问责制。通过涉及每个实体并使行动透明和问责，可以预防和减少欺诈行为。本文提出了一种基于区块链智能合约的保险理赔处理机制，以应对这一普遍问题中的欺诈行为。所有实体患者、提供者和保险公司均通过多签名技术积极参与索赔提交、批准和确认过程。此外，通过智能合约将每个活动记录在区块链中，使每个行动透明和负责，以确保没有实体可以否认其行动和责任。区块链的不可变存储属性和强大完整性保证记录的活动不会被修改。随着医疗系统和保险公司继续应对欺诈挑战，这种提出的方法有望显著减少欺诈活动，最终使保险公司和保单持有人受益。

更新时间: 2024-07-25 04:42:31

领域: cs.CR

下载: http://arxiv.org/abs/2407.17765v1

Mpox Detection Advanced: Rapid Epidemic Response Through Synthetic Data

Rapid development of disease detection models using computer vision is crucial in responding to medical emergencies, such as epidemics or bioterrorism events. Traditional data collection methods are often too slow in these scenarios, requiring innovative approaches for quick, reliable model generation from minimal data. Our study introduces a novel approach by constructing a comprehensive computer vision model to detect Mpox lesions using only synthetic data. Initially, these models generated a diverse set of synthetic images representing Mpox lesions on various body parts (face, back, chest, leg, neck, arm) across different skin tones as defined by the Fitzpatrick scale (fair, brown, dark skin). Subsequently, we trained and tested a vision model with this synthetic dataset to evaluate the diffusion models' efficacy in producing high-quality training data and its impact on the vision model's medical image recognition performance. The results were promising; the vision model achieved a 97% accuracy rate, with 96% precision and recall for Mpox cases, and similarly high metrics for normal and other skin disorder cases, demonstrating its ability to correctly identify true positives and minimize false positives. The model achieved an F1-Score of 96% for Mpox cases and 98% for normal and other skin disorders, reflecting a balanced precision-recall relationship, thus ensuring reliability and robustness in its predictions. Our proposed SynthVision methodology indicates the potential to develop accurate computer vision models with minimal data input for future medical emergencies.

Updated: 2024-07-25 04:33:19

标题: Mpox检测先进技术：通过合成数据快速应对流行病

摘要: 计算机视觉快速开发疾病检测模型对于应对医疗紧急情况，如流行病或生物恐怖主义事件，至关重要。在这些情况下，传统的数据收集方法往往速度太慢，需要创新方法从最少的数据中快速、可靠地生成模型。我们的研究通过构建一个全面的计算机视觉模型，仅使用合成数据来检测Mpox病变，引入了一种新颖的方法。最初，这些模型生成了一组多样化的合成图像，表示Mpox病变出现在不同皮肤部位（脸部、背部、胸部、腿部、颈部、手臂）上，跨越不同的皮肤色调，如Fitzpatrick等级（白皮肤、棕皮肤、深色皮肤）。随后，我们用这个合成数据集训练和测试了一个视觉模型，评估了扩散模型在生成高质量训练数据方面的有效性，以及对视觉模型的医学图像识别性能的影响。结果是令人鼓舞的；视觉模型实现了97%的准确率，对于Mpox病例的精确度和召回率均为96%，对于正常和其他皮肤疾病病例的度量同样较高，表明其能够正确识别真正阳性并最小化假阳性。该模型在Mpox病例和正常及其他皮肤疾病方面的F1-Score分别为96%和98%，反映了平衡的精确度-召回关系，从而确保其预测的可靠性和稳健性。我们提出的SynthVision方法表明了在未来医疗紧急情况下，利用最少的数据输入开发准确的计算机视觉模型的潜力。

更新时间: 2024-07-25 04:33:19

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17762v1

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.

Updated: 2024-07-25 04:30:15

标题: 贝勒贝勒基准测试：122种语言变体的并行阅读理解数据集

摘要: 我们介绍了Belebele，一个涵盖122种语言变体的多项选择机器阅读理解（MRC）数据集。显著扩展了自然语言理解（NLU）基准的语言覆盖范围，该数据集使得可以评估高、中、低资源语言中的文本模型。每个问题基于Flores-200数据集中的一个短篇章，有四个多项选择答案。问题经过精心筛选，可以区分具有不同一般语言理解水平的模型。仅英语数据集已经足够困难，可以挑战最先进的语言模型。由于是完全并行的，该数据集可以在所有语言之间直接比较模型性能。我们使用该数据集评估多语言蒙面语言模型（MLMs）和大型语言模型（LLMs）的能力。我们提出了详尽的结果，并发现尽管英语为中心的LLM存在显著的跨语言转移，但在平衡的多语言数据上预训练的规模较小的MLM仍然理解更多语言。我们还观察到更大的词汇量和有意识的词汇构建与在低资源语言上表现更好相关。总的来说，Belebele为评估和分析自然语言处理系统的多语言能力开辟了新途径。

更新时间: 2024-07-25 04:30:15

领域: cs.CL,cs.AI,cs.LG,I.2.7

下载: http://arxiv.org/abs/2308.16884v2

Towards the Blockchain Massive Adoption with Permissionless Storage

Blockchain technology emerged with the advent of Bitcoin and rapidly developed over the past few decades, becoming widely accepted and known by the public. However, in the past decades, the massive adoption of blockchain technology has yet to come. Rather than the scalability issue, the blockchain application is challenged by its expensive usage cost. However, the high cost of blockchain usage is deeply connected with the blockchain consensus and security mechanism. The permissionless blockchain must maintain its high cost for security against the 51% Attack. Chain users indirectly cover the cost as coins are appointed for blockchain usage fees. This conflict prevents the massive adoption of blockchain. Thus, blockchain must be improved to solve those problems: 1. The cost of blockchain usage should be low enough. 2. The blockchain should remain decentralized. 3. The scalability of blockchain must meet the demand. In my thesis, new approaches are applied to solve the issues above. The key contribution is the discovery of the useful PoW. It extends the Nakamoto PoW with another usage of file data encoding during the same Nakamoto Consensus computation to prove honest data preservation. Based on this theory, a permissionless storage network is proposed as the new security engine for the blockchain. It bridges the high blockchain security cost to the storage users with real demands who are willing to pay for the storage resource. On the other hand, the chain users can benefit from the low transaction fee. Meanwhile, we also provide a scalability solution to shard the blockchain. It enables high TPS and keeps decentralization. The solutions in this thesis provide the answers to all the dependencies of the massive adoption.

Updated: 2024-07-25 04:28:52

标题: 朝着无许可存储的区块链大规模应用迈进

摘要: 区块链技术随着比特币的出现而出现，并在过去几十年迅速发展，被广泛接受和为公众所知。然而，在过去的几十年里，区块链技术的大规模应用尚未出现。与可扩展性问题不同，区块链应用受到昂贵的使用成本挑战。然而，区块链使用的高成本与区块链共识和安全机制息息相关。无许可的区块链必须保持其高成本以防止51%攻击。链用户间接承担成本，因为硬币被指定用于区块链使用费。这种冲突阻碍了区块链的大规模应用。因此，必须改进区块链以解决这些问题：1. 区块链使用成本应该足够低。2. 区块链应该保持去中心化。3. 区块链的可扩展性必须满足需求。在我的论文中，新方法被应用来解决上述问题。关键贡献是发现了有用的PoW。它通过在同一Nakamoto共识计算期间使用文件数据编码来扩展Nakamoto PoW，以证明数据的诚实保存。基于这一理论，提出了一个无许可存储网络作为区块链的新安全引擎。它将高昂的区块链安全成本与真正需要并愿意为存储资源付费的存储用户联系起来。另一方面，链用户可以从低交易费中受益。同时，我们还提供了一种分片区块链的扩展性解决方案。它实现了高TPS并保持去中心化。本论文中的解决方案为区块链的大规模应用提供了答案。

更新时间: 2024-07-25 04:28:52

领域: cs.CR

下载: http://arxiv.org/abs/2407.17761v1

SES: Bridging the Gap Between Explainability and Prediction of Graph Neural Networks

Despite the Graph Neural Networks' (GNNs) proficiency in analyzing graph data, achieving high-accuracy and interpretable predictions remains challenging. Existing GNN interpreters typically provide post-hoc explanations disjointed from GNNs' predictions, resulting in misrepresentations. Self-explainable GNNs offer built-in explanations during the training process. However, they cannot exploit the explanatory outcomes to augment prediction performance, and they fail to provide high-quality explanations of node features and require additional processes to generate explainable subgraphs, which is costly. To address the aforementioned limitations, we propose a self-explained and self-supervised graph neural network (SES) to bridge the gap between explainability and prediction. SES comprises two processes: explainable training and enhanced predictive learning. During explainable training, SES employs a global mask generator co-trained with a graph encoder and directly produces crucial structure and feature masks, reducing time consumption and providing node feature and subgraph explanations. In the enhanced predictive learning phase, mask-based positive-negative pairs are constructed utilizing the explanations to compute a triplet loss and enhance the node representations by contrastive learning.

Updated: 2024-07-25 04:20:12

标题: SES: 联结图神经网络的可解释性与预测之间的鸿沟

摘要: 尽管图神经网络（GNN）在分析图数据方面表现出高效能，但实现高准确度和可解释性预测仍然具有挑战性。现有的GNN解释器通常提供与GNN预测脱节的事后解释，导致误导。自解释的GNN在训练过程中提供内置解释。然而，它们无法利用解释结果来增强预测性能，并且无法提供高质量的节点特征解释，需要额外的过程来生成可解释的子图，这是昂贵的。为了解决上述限制，我们提出了自解释和自监督的图神经网络（SES）来弥合可解释性和预测之间的差距。SES包括两个过程：可解释训练和增强预测学习。在可解释训练过程中，SES利用与图编码器共同训练的全局掩码生成器直接产生关键结构和特征掩码，减少时间消耗并提供节点特征和子图解释。在增强预测学习阶段，利用解释构建基于掩码的正负对，计算三元损失并通过对比学习增强节点表示。

更新时间: 2024-07-25 04:20:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.11358v2

Vision language models are blind

While large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro, are powering various image-text applications and scoring high on many vision-understanding benchmarks, we find that they are surprisingly still struggling with low-level vision tasks that are easy to humans. Specifically, on BlindTest, our suite of 7 very simple tasks such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting circles in an Olympic-like logo, four state-of-the-art VLMs are only 58.57% accurate on average. Claude 3.5 Sonnet performs the best at 74.01% accuracy, but this is still far from the human expected accuracy of 100%. Across different image resolutions and line widths, VLMs consistently struggle with tasks that require precise spatial information and recognizing geometric primitives that overlap or are close together. Code and data are available at: https://vlmsareblind.github.io

Updated: 2024-07-25 04:19:58

标题: 视觉语言模型是盲目的

摘要: 尽管具有视觉能力的大型语言模型（VLMs），例如GPT-4o和Gemini 1.5 Pro，正在推动各种图像文本应用程序并在许多视觉理解基准测试中得分很高，但我们发现它们仍然令人惊讶地在对人类来说容易的低级视觉任务上遇到困难。具体而言，在BlindTest上，我们的一套包括7个非常简单任务的任务，例如识别（a）两个圆是否重叠；（b）两条线是否相交；（c）单词中哪个字母被圈出；以及（d）在奥林匹克式标志中计数圆圈，四种最先进的VLMs平均准确率仅为58.57％。Claude 3.5 Sonnet的表现最佳，准确率为74.01％，但仍远低于人类期望的100％准确率。在不同的图像分辨率和线宽下，VLMs在需要精确空间信息和识别重叠或靠近的几何基元的任务上始终遇到困难。代码和数据可在以下网址获取：https://vlmsareblind.github.io

更新时间: 2024-07-25 04:19:58

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06581v4

TwIPS: A Large Language Model Powered Texting Application to Simplify Conversational Nuances for Autistic Users

Autistic individuals often experience difficulties in conveying and interpreting emotional tone and non-literal nuances. Many also mask their communication style to avoid being misconstrued by others, spending considerable time and mental effort in the process. To address these challenges in text-based communication, we present TwIPS, a prototype texting application powered by a large language model (LLM), which can assist users with: a) deciphering tone and meaning of incoming messages, b) ensuring the emotional tone of their message is in line with their intent, and c) coming up with alternate phrasing for messages that could be misconstrued and received negatively by others. We leverage an AI-based simulation and a conversational script to evaluate TwIPS with 8 autistic participants in an in-lab setting. Our findings show TwIPS enables a convenient way for participants to seek clarifications, provides a better alternative to tone indicators, and facilitates constructive reflection on writing technique and style. We also examine how autistic users utilize language for self-expression and interpretation in instant messaging, and gather feedback for enhancing our prototype. We conclude with a discussion around balancing user-autonomy with AI-mediation, establishing appropriate trust levels in AI systems, and customization needs if autistic users in the context of AI-assisted communication

Updated: 2024-07-25 04:15:54

标题: TwIPS：一种大型语言模型驱动的文本应用，为自闭症用户简化交流细微差别

摘要: 自闭症患者通常在传达和解释情绪色调和非字面意义细微差别方面遇到困难。许多人还掩饰他们的沟通风格，以避免被他人误解，花费大量时间和精力。为了解决这些挑战，在基于文本的沟通中，我们提出了TwIPS，这是一个由大型语言模型（LLM）支持的原型短信应用程序，可以帮助用户：a）解读传入消息的语气和含义，b）确保他们的消息的情绪色调符合他们的意图，c）为可能被他人误解并受到负面反应的消息提供替代措辞。我们利用基于AI的模拟和对话脚本，在实验室环境中与8名自闭症参与者一起评估TwIPS。我们的研究结果显示TwIPS为参与者提供了一种便捷的方式来寻求澄清，提供了比语气指示器更好的选择，并促进了对写作技巧和风格的建设性反思。我们还研究了自闭症用户如何利用语言进行自我表达和即时消息的解释，并收集反馈以改进我们的原型。最后，我们讨论了在用户自主权和AI调解之间取得平衡，建立AI系统的适当信任级别，以及自闭症用户在AI辅助沟通环境中的定制需求。

更新时间: 2024-07-25 04:15:54

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2407.17760v1

Efficient Combinatorial Optimization via Heat Diffusion

Combinatorial optimization problems are widespread but inherently challenging due to their discrete nature. The primary limitation of existing methods is that they can only access a small fraction of the solution space at each iteration, resulting in limited efficiency for searching the global optimal.To overcome this challenge, diverging from conventional efforts of expanding the solver's search scope, we focus on enabling information to actively propagate to the solver through heat diffusion. By transforming the target function while preserving its optima, heat diffusion facilitates information flow from distant regions to the solver, providing more efficient navigation. Utilizing heat diffusion, we propose a framework for solving general combinatorial optimization problems.The proposed methodology demonstrates superior performance across a range of the most challenging and widely encountered combinatorial optimizations. Echoing recent advancements in harnessing thermodynamics for generative artificial intelligence, our study further reveals its significant potential in advancing combinatorial optimization.

Updated: 2024-07-25 04:12:17

标题: 通过热扩散实现高效的组合优化

摘要: 组合优化问题是普遍存在的，但由于其离散特性而具有挑战性。现有方法的主要局限性是它们在每次迭代中只能访问解空间的一小部分，导致搜索全局最优的效率有限。为了克服这一挑战，我们摒弃了扩展求解器搜索范围的常规努力，而是专注于通过热扩散使信息能够积极传播到求解器。通过转换目标函数同时保留其最优解，热扩散促进了信息从远处地区流向求解器，提供了更有效的导航。利用热扩散，我们提出了一个框架来解决一般的组合优化问题。所提出的方法在一系列最具挑战性和广泛遇到的组合优化问题中表现出优越性能。与最近在利用热力学促进生成人工智能方面取得的进展相呼应，我们的研究进一步揭示了在推进组合优化方面的重要潜力。

更新时间: 2024-07-25 04:12:17

领域: stat.ML,cs.LG,math.CO,physics.app-ph

下载: http://arxiv.org/abs/2403.08757v3

DualFed: Enjoying both Generalization and Personalization in Federated Learning via Hierachical Representations

In personalized federated learning (PFL), it is widely recognized that achieving both high model generalization and effective personalization poses a significant challenge due to their conflicting nature. As a result, existing PFL methods can only manage a trade-off between these two objectives. This raises an interesting question: Is it feasible to develop a model capable of achieving both objectives simultaneously? Our paper presents an affirmative answer, and the key lies in the observation that deep models inherently exhibit hierarchical architectures, which produce representations with various levels of generalization and personalization at different stages. A straightforward approach stemming from this observation is to select multiple representations from these layers and combine them to concurrently achieve generalization and personalization. However, the number of candidate representations is commonly huge, which makes this method infeasible due to high computational costs.To address this problem, we propose DualFed, a new method that can directly yield dual representations correspond to generalization and personalization respectively, thereby simplifying the optimization task. Specifically, DualFed inserts a personalized projection network between the encoder and classifier. The pre-projection representations are able to capture generalized information shareable across clients, and the post-projection representations are effective to capture task-specific information on local clients. This design minimizes the mutual interference between generalization and personalization, thereby achieving a win-win situation. Extensive experiments show that DualFed can outperform other FL methods. Code is available at https://github.com/GuogangZhu/DualFed.

Updated: 2024-07-25 04:09:12

标题: DualFed:通过分层表示在联邦学习中同时实现泛化和个性化

摘要: 在个性化联邦学习（PFL）中，人们普遍认为实现高模型泛化和有效个性化之间存在冲突的挑战。因此，现有的PFL方法只能在这两个目标之间进行权衡。这引发了一个有趣的问题：是否有可能开发一个能够同时实现这两个目标的模型？我们的论文给出了一个肯定答复，关键在于观察到深度模型固有地具有分层结构，这些结构在不同阶段产生具有不同级别泛化和个性化的表示。根据这一观察所产生的一种直接方法是从这些层中选择多个表示，并将它们组合以同时实现泛化和个性化。然而，候选表示的数量通常很大，这使得这种方法由于高计算成本而不可行。为了解决这个问题，我们提出了DualFed，一种新方法，可以直接产生相应于泛化和个性化的双重表示，从而简化了优化任务。具体来说，DualFed在编码器和分类器之间插入了一个个性化投影网络。预投影表示能够捕捉可在客户端之间共享的泛化信息，而后投影表示则能够有效地捕捉本地客户端上的任务特定信息。这种设计最小化了泛化和个性化之间的相互干扰，从而实现了双赢局面。大量实验证明，DualFed可以胜过其他FL方法。代码可在https://github.com/GuogangZhu/DualFed找到。

更新时间: 2024-07-25 04:09:12

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.17754v1

A Survey on Hypergraph Neural Networks: An In-Depth and Step-By-Step Guide

Higher-order interactions (HOIs) are ubiquitous in real-world complex systems and applications. Investigation of deep learning for HOIs, thus, has become a valuable agenda for the data mining and machine learning communities. As networks of HOIs are expressed mathematically as hypergraphs, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraphs. Given the emerging trend, we present the first survey dedicated to HNNs, with an in-depth and step-by-step guide. Broadly, the present survey overviews HNN architectures, training strategies, and applications. First, we break existing HNNs down into four design components: (i) input features, (ii) input structures, (iii) message-passing schemes, and (iv) training strategies. Second, we examine how HNNs address and learn HOIs with each of their components. Third, we overview the recent applications of HNNs in recommendation, bioinformatics and medical science, time series analysis, and computer vision. Lastly, we conclude with a discussion on limitations and future directions.

Updated: 2024-07-25 03:35:48

标题: 《超图神经网络调查：深入和逐步指南》

摘要: 高阶交互作用（HOIs）在现实世界复杂系统和应用中是普遍存在的。因此，对于数据挖掘和机器学习社区来说，深度学习用于HOIs的研究已经成为一个有价值的议程。由于HOIs的网络在数学上被表达为超图，超图神经网络（HNNs）已经成为在超图上表示学习的强大工具。鉴于新兴趋势，我们提出了第一份专门致力于HNNs的调查报告，提供了深入和逐步的指南。总体而言，本调查概述了HNN体系结构、训练策略和应用。首先，我们将现有的HNNs分解为四个设计组件：（i）输入特征，（ii）输入结构，（iii）消息传递方案，以及（iv）训练策略。其次，我们检查HNNs如何通过每个组件来解决和学习HOIs。第三，我们回顾了HNNs在推荐系统、生物信息学和医学科学、时间序列分析以及计算机视觉中的最新应用。最后，我们以对限制和未来方向的讨论结束。

更新时间: 2024-07-25 03:35:48

领域: cs.LG

下载: http://arxiv.org/abs/2404.01039v3

CCoE: A Compact LLM with Collaboration of Experts

In the domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily coupling multiple strong domain experts together to fuse into a big LLM, provides a collective way of utilizing the different domain expert LLMs. Besides, training a large collaborative of multiple expert LLMs requires a high requirements on training sources. CCoE bypasses this problem through isolating other experts and train each expert separately. The design of CCoE assembles multiple expert LLMs through the CoE (Collaboration of Experts) layer. Each CoE layer could have one or more expert LLMs. Expert LLMs have different number of layers and have been well-trained for different domain tasks. Each expert is fine-tuned to be able to achieve the comparable results with SOTA domain LLMs. We start from 5 experts in the domain of Code, Math, Law, text-to-SQL and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.

Updated: 2024-07-25 03:34:56

标题: CCoE：专家协作的紧凑LLM

摘要: 在大型语言模型（LLM）领域，LLMs在自然语言理解和生成方面展现出显著的能力。随着在各个领域应用LLMs的需求不断增长，一个研究问题是如何高效地训练和构建一个在不同领域具有专业知识但训练成本较低的模型。我们提出了CCoE架构，这是一个框架，可以轻松将多个强大的领域专家耦合在一起，融合成一个大型LLM，为利用不同领域专家LLMs提供了一种集体方式。此外，训练多个专家LLMs的大型协作需要对训练源有很高的要求。CCoE通过隔离其他专家并分别训练每个专家来规避这个问题。CCoE的设计通过CoE（专家协作）层组装多个专家LLMs。每个CoE层可以拥有一个或多个专家LLMs。专家LLMs具有不同数量的层，并且已经为不同领域任务进行了充分的训练。每个专家都经过微调，以实现与SOTA领域LLMs相当的结果。我们从代码、数学、法律、文本到SQL和医学领域开始，结果表明我们的CCoE框架可以在不同领域中轻松高效地将原始基础模型的性能提升近10%-20%，同时使用更少的资源进行训练和推理。

更新时间: 2024-07-25 03:34:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.11686v3

Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models

Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individual trials to be synthesized. Ideally, language technologies would permit fully automatic meta-analysis, on demand. This requires accurately extracting numerical results from individual trials, which has been beyond the capabilities of natural language processing (NLP) models to date. In this work, we evaluate whether modern large language models (LLMs) can reliably perform this task. We annotate (and release) a modest but granular evaluation dataset of clinical trial reports with numerical findings attached to interventions, comparators, and outcomes. Using this dataset, we evaluate the performance of seven LLMs applied zero-shot for the task of conditionally extracting numerical findings from trial reports. We find that massive LLMs that can accommodate lengthy inputs are tantalizingly close to realizing fully automatic meta-analysis, especially for dichotomous (binary) outcomes (e.g., mortality). However, LLMs -- including ones trained on biomedical texts -- perform poorly when the outcome measures are complex and tallying the results requires inference. This work charts a path toward fully automatic meta-analysis of RCTs via LLMs, while also highlighting the limitations of existing models for this aim.

Updated: 2024-07-25 03:29:09

标题: 使用大型语言模型自动从随机对照试验中提取数值结果

摘要: Meta分析统计聚合了不同随机对照试验（RCTs）的发现，以评估治疗效果。由于这产生了治疗效果的可靠估计，Meta分析的结果被认为是最强有力的证据形式。然而，严格的证据综合是耗时且劳动密集的，需要从个别试验中手动提取数据进行综合。理想情况下，语言技术应该允许根据需要完全自动进行Meta分析。这需要准确提取个别试验中的数字结果，这在自然语言处理（NLP）模型迄今为止是超出能力范围的。在这项工作中，我们评估现代大型语言模型（LLMs）是否能可靠地执行这项任务。我们注释（并发布）一个数量不多但细致的临床试验报告评估数据集，其中附有干预、比较者和结果的数字发现。利用这个数据集，我们评估了应用七个LLM进行无监督学习的性能，以从试验报告中有条件地提取数字发现。我们发现，能够容纳冗长输入的大型LLM几乎可以实现完全自动的Meta分析，特别是对于二分（二进制）结果（例如死亡率）。然而，LLM（包括在生物医学文本上训练的LLM）在结果措施复杂且总结结果需要推断时表现不佳。这项工作为通过LLM实现完全自动进行RCTs的Meta分析铺平了道路，同时也突显了现有模型对于这一目标的局限性。

更新时间: 2024-07-25 03:29:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.01686v2

Towards the Law of Capacity Gap in Distilling Language Models

Language model (LM) distillation is a trending area that aims to distil the knowledge residing in a large teacher LM to a small student one. While various methods have been proposed to maximize the effectiveness of the distillation, significant challenges persist, particularly when there is a substantial capacity gap between the teacher and student LMs. This issue, often referred to as the \textit{curse} of capacity gap, suggests that a larger teacher does not necessarily result in a superior student compared to one distilled from a smaller teacher. In other words, there is likely an optimal teacher yielding the best student along the scaling course of the teacher. However, the curse of capacity gap can not be tackled without notable compute overhead, as indicated in previous studies. In the context of large LMs (LLMs), previously viable approaches become much less meaningful, as it is an impossible triangle to distill an expected student from an optimal teacher student with small compute overhead. Fortunately, the impossible triangle can fortunately be possible provided an inducted \textit{law} of capacity gap. In this paper, we take the spirits of scaling law and reveal that the optimal teacher scale almost consistently follows a linear scaling with the student scale across different model architectures and data scales. The law later guides us to distil a 3B student LM (termed \textsc{MiniMA}) from LLaMA2-7B. \textsc{MiniMA} is demonstrated to outperform a wide range of 3B competitors and could even compete with several 7B models.

Updated: 2024-07-25 03:20:15

标题: 朝向在提炼语言模型中的能力差距法则

摘要: 语言模型（LM）蒸馏是一个热门领域，旨在将大型教师LM中的知识蒸馏到小型学生LM中。虽然已提出各种方法来最大化蒸馏的有效性，但仍然存在重大挑战，特别是当教师和学生LM之间存在实质性的容量差距时。这个问题通常被称为容量差距的\textit{诅咒}，表明较大的教师并不一定会导致比从较小教师蒸馏出的学生更优秀。换句话说，在教师的扩展过程中可能存在一个最佳教师，产生最好的学生。然而，容量差距的诅咒无法解决，而不需要显著的计算开销，正如以前的研究所示。在大型LM（LLMs）背景下，以前可行的方法变得意义不大，因为从一个具有小计算开销的最佳教师学生中蒸馏出一个预期的学生是一个不可能的三角形。幸运的是，这个不可能的三角形可以通过引入容量差距的\textit{法则}而变为可能。在本文中，我们借鉴了缩放定律的精神，并揭示最佳教师规模几乎始终与学生规模呈线性缩放，在不同的模型架构和数据规模下都如此。这个法则后来指导我们从LLaMA2-7B中蒸馏出一个3B学生LM（称为\textsc{MiniMA}）。\textsc{MiniMA}被证明胜过广泛范围的3B竞争对手，甚至可以与几个7B模型竞争。

更新时间: 2024-07-25 03:20:15

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.07052v3

Cost-effective Instruction Learning for Pathology Vision and Language Analysis

The advent of vision-language models fosters the interactive conversations between AI-enabled models and humans. Yet applying these models into clinics must deal with daunting challenges around large-scale training data, financial, and computational resources. Here we propose a cost-effective instruction learning framework for conversational pathology named as CLOVER. CLOVER only trains a lightweight module and uses instruction tuning while freezing the parameters of the large language model. Instead of using costly GPT-4, we propose well-designed prompts on GPT-3.5 for building generation-based instructions, emphasizing the utility of pathological knowledge derived from the Internet source. To augment the use of instructions, we construct a high-quality set of template-based instructions in the context of digital pathology. From two benchmark datasets, our findings reveal the strength of hybrid-form instructions in the visual question-answer in pathology. Extensive results show the cost-effectiveness of CLOVER in answering both open-ended and closed-ended questions, where CLOVER outperforms strong baselines that possess 37 times more training parameters and use instruction data generated from GPT-4. Through the instruction tuning, CLOVER exhibits robustness of few-shot learning in the external clinical dataset. These findings demonstrate that cost-effective modeling of CLOVER could accelerate the adoption of rapid conversational applications in the landscape of digital pathology.

Updated: 2024-07-25 03:12:57

标题: 成本效益的病理视觉和语言分析指导学习

摘要: 视觉语言模型的出现促进了人工智能模型和人类之间的互动对话。然而，将这些模型应用于临床必须应对困难挑战，如大规模训练数据、财务和计算资源。在这里，我们提出了一种名为CLOVER的成本效益的会话病理学指导学习框架。CLOVER仅训练轻量级模块，并在冻结大型语言模型的参数的同时使用指导调整。我们提出在GPT-3.5上设计良好的提示，用于构建基于生成的指导，强调从互联网来源获取的病理知识的实用性。为了增强指导的使用，我们在数字病理学的背景下构建了一套高质量的基于模板的指导。从两个基准数据集中，我们的研究结果揭示了混合形式指导在病理学中视觉问答中的优势。广泛的结果显示了CLOVER在回答开放式和封闭式问题方面的成本效益，CLOVER胜过拥有37倍更多训练参数并使用由GPT-4生成的指导数据的强基线。通过指导调整，CLOVER在外部临床数据集中展现了少样本学习的鲁棒性。这些研究结果表明，CLOVER的成本效益建模可以加速数字病理学领域中快速会话应用的采用。

更新时间: 2024-07-25 03:12:57

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2407.17734v1

A Priori Uncertainty Quantification of Reacting Turbulence Closure Models using Bayesian Neural Networks

While many physics-based closure model forms have been posited for the sub-filter scale (SFS) in large eddy simulation (LES), vast amounts of data available from direct numerical simulation (DNS) create opportunities to leverage data-driven modeling techniques. Albeit flexible, data-driven models still depend on the dataset and the functional form of the model chosen. Increased adoption of such models requires reliable uncertainty estimates both in the data-informed and out-of-distribution regimes. In this work, we employ Bayesian neural networks (BNNs) to capture both epistemic and aleatoric uncertainties in a reacting flow model. In particular, we model the filtered progress variable scalar dissipation rate which plays a key role in the dynamics of turbulent premixed flames. We demonstrate that BNN models can provide unique insights about the structure of uncertainty of the data-driven closure models. We also propose a method for the incorporation of out-of-distribution information in a BNN. The efficacy of the model is demonstrated by a priori evaluation on a dataset consisting of a variety of flame conditions and fuels.

Updated: 2024-07-25 03:06:54

标题: 使用贝叶斯神经网络对反应湍流封闭模型进行先验不确定性量化

摘要: 尽管许多基于物理的封闭模型形式已被提出用于大涡模拟（LES）中的亚滤波尺度（SFS），但直接数值模拟（DNS）提供的大量数据为利用数据驱动建模技术创造了机会。尽管灵活，数据驱动模型仍取决于所选择的数据集和模型的函数形式。增加这种模型的采用需要可靠的不确定性估计，无论是在数据驱动的还是在分布之外的情况下。在本研究中，我们使用贝叶斯神经网络（BNNs）来捕捉反应流模型中的认知和随机不确定性。具体来说，我们对滤波进展变量标量耗散率进行建模，该标量在湍流预混合火焰的动力学中起着关键作用。我们展示了BNN模型可以提供关于数据驱动封闭模型不确定性结构的独特见解。我们还提出了一种将分布之外信息纳入BNN的方法。该模型的有效性通过对包含各种火焰条件和燃料的数据集进行先验评估来证明。

更新时间: 2024-07-25 03:06:54

领域: physics.flu-dyn,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2402.18729v2

Optimal Trade and Industrial Policies in the Global Economy: A Deep Learning Framework

We propose a deep learning framework, DL-opt, designed to efficiently solve for optimal policies in quantifiable general equilibrium trade models. DL-opt integrates (i) a nested fixed point (NFXP) formulation of the optimization problem, (ii) automatic implicit differentiation to enhance gradient descent for solving unilateral optimal policies, and (iii) a best-response dynamics approach for finding Nash equilibria. Utilizing DL-opt, we solve for non-cooperative tariffs and industrial subsidies across 7 economies and 44 sectors, incorporating sectoral external economies of scale. Our quantitative analysis reveals significant sectoral heterogeneity in Nash policies: Nash industrial subsidies increase with scale elasticities, whereas Nash tariffs decrease with trade elasticities. Moreover, we show that global dual competition, involving both tariffs and industrial subsidies, results in lower tariffs and higher welfare outcomes compared to a global tariff war. These findings highlight the importance of considering sectoral heterogeneity and policy combinations in understanding global economic competition.

Updated: 2024-07-25 03:03:20

标题: 全球经济中的最佳贸易和工业政策：一个深度学习框架

摘要: 我们提出了一个深度学习框架DL-opt，旨在高效地解决可量化的一般均衡贸易模型中的最优政策。DL-opt集成了(i) 优化问题的嵌套固定点（NFXP）公式，(ii) 自动隐式微分以增强梯度下降来解决单边最优政策，以及(iii) 一种寻找纳什均衡的最佳响应动态方法。利用DL-opt，我们解决了7个经济体和44个部门之间的非合作关税和工业补贴，包括部门外部规模经济。我们的定量分析揭示了纳什政策中的显著部门异质性：纳什工业补贴随规模弹性增加而增加，而纳什关税随贸易弹性减少。此外，我们展示了全球双重竞争，涉及关税和工业补贴，导致较低的关税和更高的福利结果，与全球贸易战相比。这些发现强调了考虑部门异质性和政策组合在理解全球经济竞争中的重要性。

更新时间: 2024-07-25 03:03:20

领域: econ.GN,cs.GT,cs.LG,q-fin.EC

下载: http://arxiv.org/abs/2407.17731v1

Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations

Survival analysis stands as a pivotal process in cancer treatment research, crucial for predicting patient survival rates accurately. Recent advancements in data collection techniques have paved the way for enhancing survival predictions by integrating information from multiple modalities. However, real-world scenarios often present challenges with incomplete data, particularly when dealing with censored survival labels. Prior works have addressed missing modalities but have overlooked incomplete labels, which can introduce bias and limit model efficacy. To bridge this gap, we introduce a novel framework that simultaneously handles incomplete data across modalities and censored survival labels. Our approach employs advanced foundation models to encode individual modalities and align them into a universal representation space for seamless fusion. By generating pseudo labels and incorporating uncertainty, we significantly enhance predictive accuracy. The proposed method demonstrates outstanding prediction accuracy in two survival analysis tasks on both employed datasets. This innovative approach overcomes limitations associated with disparate modalities and improves the feasibility of comprehensive survival analysis using multiple large foundation models.

Updated: 2024-07-25 02:55:39

标题: 多模态数据绑定用于具有不完整数据和注释的生存分析建模

摘要: 生存分析在癌症治疗研究中扮演着关键的角色，对准确预测患者存活率至关重要。最近数据收集技术的进步为通过整合多种模式的信息来增强生存预测打开了道路。然而，在现实世界中，面临着数据不完整的挑战，特别是在处理被截尾的生存标签时。先前的研究已经解决了缺失模式的问题，但却忽略了不完整的标签，这可能会引入偏差并限制模型的效力。为了弥补这一差距，我们提出了一个新颖的框架，同时处理了跨模式的不完整数据和被截尾的生存标签。我们的方法采用先进的基础模型来编码各个模式并将它们对齐到一个通用的表示空间中以实现无缝融合。通过生成伪标签和结合不确定性，我们显著增强了预测准确性。所提出的方法在两个生存分析任务中展示了出色的预测准确性，这两个任务都使用了相应的数据集。这种创新的方法克服了与不同模式相关的限制，并提高了使用多个大型基础模型进行全面生存分析的可行性。

更新时间: 2024-07-25 02:55:39

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.17726v1

Your Graph Recommender is Provably a Single-view Graph Contrastive Learning

Graph recommender (GR) is a type of graph neural network (GNNs) encoder that is customized for extracting information from the user-item interaction graph. Due to its strong performance on the recommendation task, GR has gained significant attention recently. Graph contrastive learning (GCL) is also a popular research direction that aims to learn, often unsupervised, GNNs with certain contrastive objectives. As a general graph representation learning method, GCLs have been widely adopted with the supervised recommendation loss for joint training of GRs. Despite the intersection of GR and GCL research, theoretical understanding of the relationship between the two fields is surprisingly sparse. This vacancy inevitably leads to inefficient scientific research. In this paper, we aim to bridge the gap between the field of GR and GCL from the perspective of encoders and loss functions. With mild assumptions, we theoretically show an astonishing fact that graph recommender is equivalent to a commonly-used single-view graph contrastive model. Specifically, we find that (1) the classic encoder in GR is essentially a linear graph convolutional network with one-hot inputs, and (2) the loss function in GR is well bounded by a single-view GCL loss with certain hyperparameters. The first observation enables us to explain crucial designs of GR models, e.g., the removal of self-loop and nonlinearity. And the second finding can easily prompt many cross-field research directions. We empirically show a remarkable result that the recommendation loss and the GCL loss can be used interchangeably. The fact that we can train GR models solely with the GCL loss is particularly insightful, since before this work, GCLs were typically viewed as unsupervised methods that need fine-tuning. We also discuss some potential future works inspired by our theory.

Updated: 2024-07-25 02:53:11

标题: 你的图推荐器可以证明是单视图图对比学习

摘要: 图推荐器（GR）是一种用于从用户-物品交互图中提取信息的图神经网络（GNNs）编码器。由于其在推荐任务中的出色表现，GR近来受到了很大关注。图对比学习（GCL）也是一个流行的研究方向，旨在学习具有特定对比目标的GNNs，通常是无监督的。作为一种通用的图表示学习方法，GCL已被广泛采用与监督推荐损失一起训练GR。尽管GR和GCL研究存在交集，但两个领域之间的关系的理论理解令人惊讶地稀缺。这种空缺不可避免地导致科学研究效率低下。在本文中，我们旨在从编码器和损失函数的角度弥合GR和GCL领域之间的差距。在温和的假设下，我们在理论上展示了一个惊人的事实，即图推荐器等效于一种常用的单视图图对比模型。具体地，我们发现（1）GR中的经典编码器本质上是一个具有单热输入的线性图卷积网络，（2）GR中的损失函数可以由带有特定超参数的单视图GCL损失很好地界定。第一个观察结果使我们能够解释GR模型的关键设计，例如去除自环和非线性。第二个发现可以很容易地促使许多跨领域研究方向。我们在实验上展示了一个显著的结果，即推荐损失和GCL损失可以互换使用。我们可以仅使用GCL损失训练GR模型的事实特别具有启发性，因为在此之前，GCL通常被视为需要微调的无监督方法。我们还讨论了一些受我们理论启发的潜在未来工作。

更新时间: 2024-07-25 02:53:11

领域: cs.LG

下载: http://arxiv.org/abs/2407.17723v1

Text-Driven Neural Collaborative Filtering Model for Paper Source Tracing

Identifying significant references within the complex interrelations of a citation knowledge graph is challenging, which encompasses connections through citations, authorship, keywords, and other relational attributes. The Paper Source Tracing (PST) task seeks to automate the identification of pivotal references for given scholarly articles utilizing advanced data mining techniques. In the KDD CUP 2024, we design a recommendation-based framework tailored for the PST task. This framework employs the Neural Collaborative Filtering (NCF) model to generate final predictions. To process the textual attributes of the papers and extract input features for the model, we utilize SciBERT, a pre-trained language model. According to the experimental results, our method achieved a score of 0.37814 on the Mean Average Precision (MAP) metric, outperforming baseline models and ranking 11th among all participating teams. The source code is publicly available at https://github.com/MyLove-XAB/KDDCupFinal.

Updated: 2024-07-25 02:48:56

标题: 文本驱动的神经协同过滤模型用于论文来源跟踪

摘要: 在引用知识图谱的复杂相互关系中识别重要参考文献是具有挑战性的，这涵盖了通过引用、作者、关键词和其他关系属性进行连接。Paper Source Tracing（PST）任务旨在利用先进的数据挖掘技术自动识别给定学术文章的关键参考文献。在KDD CUP 2024中，我们设计了一个针对PST任务定制的基于推荐的框架。该框架采用神经协同过滤（NCF）模型生成最终预测。为了处理论文的文本属性并提取模型的输入特征，我们利用了预训练语言模型SciBERT。根据实验结果，我们的方法在平均精度（MAP）指标上取得了0.37814的分数，优于基准模型，并在所有参与团队中排名第11。源代码可在https://github.com/MyLove-XAB/KDDCupFinal上公开获取。

更新时间: 2024-07-25 02:48:56

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.17722v1

A Two-Stage Imaging Framework Combining CNN and Physics-Informed Neural Networks for Full-Inverse Tomography: A Case Study in Electrical Impedance Tomography (EIT)

Physics-Informed Neural Networks (PINNs) are a machine learning technique for solving partial differential equations (PDEs) by incorporating PDEs as loss terms in neural networks and minimizing the loss function during training. Tomographic imaging, a method to reconstruct internal properties from external measurement data, is highly complex and ill-posed, making it an inverse problem. Recently, PINNs have shown significant potential in computational fluid dynamics (CFD) and have advantages in solving inverse problems. However, existing research has primarily focused on semi-inverse Electrical Impedance Tomography (EIT), where internal electric potentials are accessible. The practical full inverse EIT problem, where only boundary voltage measurements are available, remains challenging. To address this, we propose a two-stage hybrid learning framework combining Convolutional Neural Networks (CNNs) and PINNs to solve the full inverse EIT problem. This framework integrates data-driven and model-driven approaches, combines supervised and unsupervised learning, and decouples the forward and inverse problems within the PINN framework in EIT. Stage I: a U-Net constructs an end-to-end mapping from boundary voltage measurements to the internal potential distribution using supervised learning. Stage II: a Multilayer Perceptron (MLP)-based PINN takes the predicted internal potentials as input to solve for the conductivity distribution through unsupervised learning.

Updated: 2024-07-25 02:48:22

标题: 一个结合CNN和物理信息神经网络的两阶段成像框架用于全逆向层析成像：以电阻抗层析成像(EIT)为案例研究

摘要: 物理信息神经网络（PINNs）是一种通过在神经网络中将偏微分方程（PDEs）作为损失项，并在训练过程中最小化损失函数来解决偏微分方程的机器学习技术。层析成像是一种从外部测量数据重建内部性质的方法，其高度复杂且不适定，使其成为一个逆问题。最近，PINNs在计算流体动力学（CFD）中展现出显著潜力，并在解决逆问题方面具有优势。然而，现有研究主要集中在半逆电阻抗成像（EIT）上，其中内部电势是可访问的。实际的完全逆EIT问题，只有边界电压测量可用，仍然具有挑战性。为了解决这个问题，我们提出了一个两阶段混合学习框架，结合了卷积神经网络（CNNs）和PINNs来解决完全逆EIT问题。该框架整合了数据驱动和模型驱动方法，结合了监督和无监督学习，并在EIT中的PINN框架内解耦了正向和逆向问题。第一阶段：一个U-Net通过监督学习构建了从边界电压测量到内部电势分布的端到端映射。第二阶段：基于多层感知器（MLP）的PINN将预测的内部电位作为输入，通过无监督学习解决了电导率分布。

更新时间: 2024-07-25 02:48:22

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2407.17721v1

Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment

Speech emotion recognition (SER) systems often struggle in real-world environments, where ambient noise severely degrades their performance. This paper explores a novel approach that exploits prior knowledge of testing environments to maximize SER performance under noisy conditions. To address this task, we propose a text-guided, environment-aware training where an SER model is trained with contaminated speech samples and their paired noise description. We use a pre-trained text encoder to extract the text-based environment embedding and then fuse it to a transformer-based SER model during training and inference. We demonstrate the effectiveness of our approach through our experiment with the MSP-Podcast corpus and real-world additive noise samples collected from the Freesound repository. Our experiment indicates that the text-based environment descriptions processed by a large language model (LLM) produce representations that improve the noise-robustness of the SER system. In addition, our proposed approach with an LLM yields better performance than our environment-agnostic baselines, especially in low signal-to-noise ratio (SNR) conditions. When testing at -5dB SNR level, our proposed method shows better performance than our best baseline model by 31.8 % (arousal), 23.5% (dominance), and 9.5% (valence).

Updated: 2024-07-25 02:30:40

标题: 描述你所在的位置：通过环境文本描述提高语音情绪识别的抗噪性

摘要: 语音情感识别（SER）系统在现实世界环境中经常面临困难，环境噪音严重影响其性能。本文探讨了一种利用先前对测试环境的了解来最大化噪音条件下SER性能的新方法。为了解决这个任务，我们提出了一种文本引导的、环境感知的训练方法，其中一个SER模型使用受污染的语音样本及其配对的噪音描述进行训练。我们使用一个预训练的文本编码器来提取基于文本的环境嵌入，然后在训练和推断过程中将其融合到基于transformer的SER模型中。通过我们在MSP-Podcast语料库和从Freesound存储库收集的真实际际添加噪音样本进行的实验，我们展示了我们方法的有效性。我们的实验表明，由大型语言模型（LLM）处理的基于文本的环境描述产生的表示改善了SER系统的抗噪性能。此外，我们提出的基于LLM的方法在低信噪比（SNR）条件下比我们的环境无关基线模型表现更好。在-5dB SNR水平进行测试时，我们的提出方法在唤起度方面比我们最佳基线模型表现提高了31.8％，在支配度方面提高了23.5％，在愉悦度方面提高了9.5％。

更新时间: 2024-07-25 02:30:40

领域: cs.SD,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.17716v1

Robust experimental data assimilation for the Spalart-Allmaras turbulence model

This study presents a methodology focusing on the use of computational model and experimental data fusion to improve the Spalart-Allmaras (SA) closure model for Reynolds-averaged Navier-Stokes solutions. In particular, our goal is to develop a technique that not only assimilates sparse experimental data to improve turbulence model performance, but also preserves generalization for unseen cases by recovering classical SA behavior. We achieve our goals using data assimilation, namely the Ensemble Kalman filtering approach (EnKF), to calibrate the coefficients of the SA model for separated flows. A holistic calibration strategy is implemented via the parameterization of the production, diffusion, and destruction terms. This calibration relies on the assimilation of experimental data collected in the form of velocity profiles, skin friction, and pressure coefficients. Despite using observational data from a single flow condition around a backward-facing step (BFS), the recalibrated SA model demonstrates generalization to other separated flows, including cases such as the 2D NASA wall mounted hump (2D-WMH) and modified BFS. Significant improvement is observed in the quantities of interest, i.e., skin friction coefficient ($C_f$) and pressure coefficient ($C_p$) for each flow tested. Finally, it is also demonstrated that the newly proposed model recovers SA proficiency for flows, such as a NACA-0012 airfoil and axisymmetric jet (ASJ), and that the individually calibrated terms in the SA model target specific flow-physics wherein the calibrated production term improves the re-circulation zone while destruction improves the recovery zone.

Updated: 2024-07-25 02:30:32

标题: 稳健的实验数据同化方法用于Spalart-Allmaras湍流模型

摘要: 这项研究提出了一种方法，重点是利用计算模型和实验数据融合来改进雷诺平均纳维-斯托克斯解的Spalart-Allmaras（SA）闭合模型。特别是，我们的目标是开发一种技术，不仅 assimilates 稀疏实验数据以改进湍流模型性能，而且通过恢复经典SA行为来保持对未知情况的泛化。我们使用数据同化，即集合卡尔曼滤波方法（EnKF），来校准SA模型的系数，以用于分离流。通过参数化生产、扩散和破坏项实施了一种全面的校准策略。这种校准依赖于以速度剖面、表面摩擦和压力系数形式收集的实验数据的同化。尽管仅使用来自一个反向阶跃（BFS）周围的单一流动条件的观测数据，重新校准的SA模型表现出对其他分离流的泛化，包括2D NASA 壁挂隆起（2D-WMH）和修改的BFS等情况。对于每个测试流动，观察到了感兴趣的量，即表面摩擦系数（$C_f$）和压力系数（$C_p$）的显著改善。最后，还表明新提出的模型恢复了SA对于流动的熟练度，例如NACA-0012翼型和轴对称喷流（ASJ），SA模型中单独校准的项针对特定的流动物理现象，其中校准的生产项改善了旋涡区域，而破坏项改善了恢复区域。

更新时间: 2024-07-25 02:30:32

领域: physics.flu-dyn,cs.LG,physics.comp-ph,physics.data-an

下载: http://arxiv.org/abs/2309.06679v3

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking--deliberately circumventing the ethical and operational boundaries of LLMs and VLMs--and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models. More details can be found on our website: \url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}.

Updated: 2024-07-25 02:25:11

标题: 越狱动物园：对大型语言和视觉-语言模型进行越狱的调查、景观和展望

摘要: 人工智能（AI）通过大规模语言模型（LLMs）和视觉语言模型（VLMs）的发展迅速演进，为各种技术领域带来了重大进展。虽然这些模型增强了自然语言处理和视觉交互任务的能力，但它们日益普及也引发了关于安全和道德对齐的重要关注。本调查提供了对越狱领域的广泛审查，即故意绕过LLMs和VLMs的道德和操作界限，并随之发展出的防御机制。我们的研究将越狱分为七种不同类型，并详细阐述解决这些漏洞的防御策略。通过这一全面的研究，我们确定了研究领域的空白，并提出了未来研究的方向，以增强LLMs和VLMs的安全框架。我们的发现强调了统一视角的必要性，该视角整合了越狱策略和防御解决方案，以促进下一代语言模型的健壮、安全和可靠的环境。更多详细信息请访问我们的网站：\url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}。

更新时间: 2024-07-25 02:25:11

领域: cs.CL,cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.01599v2

Improving Online Algorithms via ML Predictions

In this work we study the problem of using machine-learned predictions to improve the performance of online algorithms. We consider two classical problems, ski rental and non-clairvoyant job scheduling, and obtain new online algorithms that use predictions to make their decisions. These algorithms are oblivious to the performance of the predictor, improve with better predictions, but do not degrade much if the predictions are poor.

Updated: 2024-07-25 02:17:53

标题: 通过机器学习预测改进在线算法

摘要: 在这项工作中，我们研究了利用机器学习预测来提高在线算法性能的问题。我们考虑了两个经典问题，滑雪设备租赁和非千里眼作业调度，并获得了利用预测进行决策的新的在线算法。这些算法对预测器的性能是不可知的，随着更好的预测而改进，但如果预测不佳，它们的性能也不会下降太多。

更新时间: 2024-07-25 02:17:53

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2407.17712v1

A Learning-Based Attack Framework to Break SOTA Poisoning Defenses in Federated Learning

Federated Learning (FL) is a novel client-server distributed learning framework that can protect data privacy. However, recent works show that FL is vulnerable to poisoning attacks. Many defenses with robust aggregators (AGRs) are proposed to mitigate the issue, but they are all broken by advanced attacks. Very recently, some renewed robust AGRs are designed, typically with novel clipping or/and filtering strate-gies, and they show promising defense performance against the advanced poisoning attacks. In this paper, we show that these novel robust AGRs are also vulnerable to carefully designed poisoning attacks. Specifically, we observe that breaking these robust AGRs reduces to bypassing the clipping or/and filtering of malicious clients, and propose an optimization-based attack framework to leverage this observation. Under the framework, we then design the customized attack against each robust AGR. Extensive experiments on multiple datasets and threat models verify our proposed optimization-based attack can break the SOTA AGRs. We hence call for novel defenses against poisoning attacks to FL. Code is available at: https://github.com/Yuxin104/ BreakSTOAPoisoningDefenses.

Updated: 2024-07-25 02:14:01

标题: 一个基于学习的攻击框架，用于破解联邦学习中最先进的毒化防御

摘要: 联邦学习（FL）是一种新颖的客户端-服务器分布式学习框架，可以保护数据隐私。然而，最近的研究显示FL容易受到毒化攻击的影响。许多具有强大聚合器（AGRs）的防御方法被提出来缓解这个问题，但它们都被先进攻击所破坏。最近，一些更新的强大聚合器被设计出来，通常采用新颖的裁剪或/和过滤策略，它们显示出对先进毒化攻击有很好的防御性能。在本文中，我们展示了这些新颖的强大聚合器也容易受到精心设计的毒化攻击的影响。具体而言，我们发现打破这些强大聚合器实际上就是绕过恶意客户端的裁剪或/和过滤，我们提出了一个基于优化的攻击框架来利用这一观察结果。在这个框架下，我们设计了针对每个强大聚合器的定制攻击。在多个数据集和威胁模型上进行的大量实验验证了我们提出的基于优化的攻击可以打破SOTA AGRs。因此，我们呼吁对FL进行毒化攻击的新防御措施。代码可在以下链接找到：https://github.com/Yuxin104/BreakSTOAPoisoningDefenses。

更新时间: 2024-07-25 02:14:01

领域: cs.CR

下载: http://arxiv.org/abs/2407.15267v2

$A^*$ for Graphs of Convex Sets

We present a novel algorithm that fuses the existing convex-programming based approach with heuristic information to find optimality guarantees and near-optimal paths for the Shortest Path Problem in the Graph of Convex Sets (SPP-GCS). Our method, inspired by $A^*$, initiates a best-first-like procedure from a designated subset of vertices and iteratively expands it until further growth is neither possible nor beneficial. Traditionally, obtaining solutions with bounds for an optimization problem involves solving a relaxation, modifying the relaxed solution to a feasible one, and then comparing the two solutions to establish bounds. However, for SPP-GCS, we demonstrate that reversing this process can be more advantageous, especially with Euclidean travel costs. In other words, we initially employ $A^*$ to find a feasible solution for SPP-GCS, then solve a convex relaxation restricted to the vertices explored by $A^*$ to obtain a relaxed solution, and finally, compare the solutions to derive bounds. We present numerical results to highlight the advantages of our algorithm over the existing approach in terms of the sizes of the convex programs solved and computation time.

Updated: 2024-07-25 02:10:08

标题: $A^*$算法在凸集图中的应用

摘要: 我们提出了一种新颖的算法，将现有的基于凸规划的方法与启发式信息融合，以找到凸集图中最短路径问题（SPP-GCS）的最优性保证和接近最优路径。我们的方法受$A^*$启发，从一个指定的顶点子集开始一个类似最佳优先的过程，并逐步扩展，直到进一步的增长既不可能也不有益。传统上，获得优化问题的解与界限涉及求解一个放松问题，将放松解修改为可行解，然后比较两个解以建立界限。然而，对于SPP-GCS，我们展示了颠倒这个过程可能更有利，特别是在欧几里得旅行成本方面。换句话说，我们最初使用$A^*$来找到SPP-GCS的一个可行解，然后求解一个受限于$A^*$探索的顶点的凸松弛问题，以获得一个放松解，最后，比较解以得出界限。我们提供了数值结果来突出我们的算法相对于现有方法在解决凸规划的规模和计算时间方面的优势。

更新时间: 2024-07-25 02:10:08

领域: math.OC,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.17413v2

ReMamber: Referring Image Segmentation with Mamba Twister

Referring Image Segmentation~(RIS) leveraging transformers has achieved great success on the interpretation of complex visual-language tasks. However, the quadratic computation cost makes it resource-consuming in capturing long-range visual-language dependencies. Fortunately, Mamba addresses this with efficient linear complexity in processing. However, directly applying Mamba to multi-modal interactions presents challenges, primarily due to inadequate channel interactions for the effective fusion of multi-modal data. In this paper, we propose ReMamber, a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block. The Mamba Twister explicitly models image-text interaction, and fuses textual and visual features through its unique channel and spatial twisting mechanism. We achieve competitive results on three challenging benchmarks with a simple and efficient architecture. Moreover, we conduct thorough analyses of ReMamber and discuss other fusion designs using Mamba. These provide valuable perspectives for future research. The code has been released at: https://github.com/yyh-rain-song/ReMamber.

Updated: 2024-07-25 02:08:30

标题: ReMamber: 使用Mamba扭曲进行参考图像分割

摘要: 参考图像分割（RIS）利用变压器在解释复杂的视觉语言任务方面取得了巨大成功。然而，二次计算成本使其在捕捉长距离视觉语言依赖关系方面资源消耗大。幸运的是，Mamba通过高效的线性复杂度来解决这个问题。然而，直接将Mamba应用于多模态交互存在挑战，主要是由于通道间相互作用不足，无法有效融合多模态数据。在本文中，我们提出了ReMamber，一种集成Mamba强大功能的新型RIS架构，其中包括一个多模态Mamba Twister块。Mamba Twister明确建模图像文本交互，并通过其独特的通道和空间扭曲机制融合文本和视觉特征。我们通过简单高效的架构在三个具有挑战性的基准测试中取得了竞争性结果。此外，我们对ReMamber进行了彻底分析，并讨论了使用Mamba的其他融合设计。这为未来的研究提供了宝贵的视角。代码已发布在：https://github.com/yyh-rain-song/ReMamber。

更新时间: 2024-07-25 02:08:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.17839v2

Revisiting Machine Unlearning with Dimensional Alignment

Machine unlearning, an emerging research topic focusing on compliance with data privacy regulations, enables trained models to remove the information learned from specific data. While many existing methods indirectly address this issue by intentionally injecting incorrect supervisions, they can drastically and unpredictably alter the decision boundaries and feature spaces, leading to training instability and undesired side effects. To fundamentally approach this task, we first analyze the changes in latent feature spaces between original and retrained models, and observe that the feature representations of samples not involved in training are closely aligned with the feature manifolds of previously seen samples in training. Based on these findings, we introduce a novel evaluation metric for machine unlearning, coined dimensional alignment, which measures the alignment between the eigenspaces of the forget and retain set samples. We employ this metric as a regularizer loss to build a robust and stable unlearning framework, which is further enhanced by integrating a self-distillation loss and an alternating training scheme. Our framework effectively eliminates information from the forget set and preserves knowledge from the retain set. Lastly, we identify critical flaws in established evaluation metrics for machine unlearning, and introduce new evaluation tools that more accurately reflect the fundamental goals of machine unlearning.

Updated: 2024-07-25 02:05:15

标题: 重新审视具有维度对齐的机器遗忘

摘要: 机器遗忘是一个新兴的研究领域，专注于遵守数据隐私法规，使经过训练的模型能够删除从特定数据中学到的信息。虽然许多现有方法通过有意注入不正确的监督来间接解决这个问题，但它们可能会严重和不可预测地改变决策边界和特征空间，导致训练不稳定和不良的副作用。为了从根本上解决这个问题，我们首先分析原始模型和重新训练模型之间潜在特征空间的变化，并观察到未参与训练的样本的特征表示与先前训练中看到的样本的特征流形密切对齐。基于这些发现，我们引入了一种新颖的机器遗忘评估指标，称为维度对齐，它衡量了遗忘和保留样本的特征空间之间的对齐程度。我们将这个指标作为一个正则化损失来构建一个稳健和稳定的遗忘框架，进一步通过整合自我蒸馏损失和交替训练方案来增强。我们的框架有效地消除了遗忘集中的信息，并保留了保留集中的知识。最后，我们确定了机器遗忘的已建立评估指标中的关键缺陷，并引入了新的评估工具，更准确地反映了机器遗忘的基本目标。

更新时间: 2024-07-25 02:05:15

领域: cs.LG

下载: http://arxiv.org/abs/2407.17710v1

Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning

This paper presents a hierarchical classification system that automatically categorizes a scholarly publication using its abstract into a three-tier hierarchical label set (discipline, field, subfield) in a multi-class setting. This system enables a holistic categorization of research activities in the mentioned hierarchy in terms of knowledge production through articles and impact through citations, permitting those activities to fall into multiple categories. The classification system distinguishes 44 disciplines, 718 fields and 1,485 subfields among 160 million abstract snippets in Microsoft Academic Graph (version 2018-05-17). We used batch training in a modularized and distributed fashion to address and allow for interdisciplinary and interfield classifications in single-label and multi-label settings. In total, we have conducted 3,140 experiments in all considered models (Convolutional Neural Networks, Recurrent Neural Networks, Transformers). The classification accuracy is > 90% in 77.13% and 78.19% of the single-label and multi-label classifications, respectively. We examine the advantages of our classification by its ability to better align research texts and output with disciplines, to adequately classify them in an automated way, and to capture the degree of interdisciplinarity. The proposed system (a set of pre-trained models) can serve as a backbone to an interactive system for indexing scientific publications in the future.

Updated: 2024-07-25 02:02:58

标题: 使用深度学习对“Web of Science”中的研究领域进行层次分类

摘要: 本文介绍了一个分层分类系统，可以自动将学术出版物根据其摘要分为三级层次的标签集（学科、领域、子领域）进行多类别分类。该系统能够全面分类研究活动，并通过文章和引用来衡量知识生产和影响，使这些活动可以分为多个类别。分类系统在Microsoft Academic Graph（2018-05-17版本）中的1.6亿个摘要片段中区分了44个学科、718个领域和1,485个子领域。我们采用了模块化和分布式的批处理训练方法，以允许跨学科和跨领域的单标签和多标签分类。总共进行了3,140次实验，涵盖了所有考虑的模型（卷积神经网络、循环神经网络、转换器）。分类准确率在单标签和多标签分类中分别超过90%的77.13%和78.19%。我们通过能够更好地将研究文本和输出与学科对齐，以自动方式充分分类它们，并捕捉跨学科程度来检验我们的分类系统的优势。所提出的系统（一组预训练模型）可以作为未来科学出版物索引的交互系统的基础。

更新时间: 2024-07-25 02:02:58

领域: cs.DL,cs.AI,cs.LG,68T50,I.2

下载: http://arxiv.org/abs/2302.00390v3

Investigating and Mitigating Barren Plateaus in Variational Quantum Circuits: A Survey

In recent years, variational quantum circuits (VQCs) have been widely explored to advance quantum circuits against classic models on various domains, such as quantum chemistry and quantum machine learning. Similar to classic machine-learning models, VQCs can be optimized through gradient-based approaches. However, the gradient variance of VQCs may dramatically vanish as the number of qubits or layers increases. This issue, a.k.a. Barren Plateaus (BPs), seriously hinders the scaling of VQCs on large datasets. To mitigate the exponential gradient vanishing, extensive efforts have been devoted to tackling this issue through diverse strategies. In this survey, we conduct a systematic literature review of recent works from both investigation and mitigation perspectives. Besides, we propose a new taxonomy to categorize most existing mitigation strategies. At last, we provide insightful discussion for future directions of BPs.

Updated: 2024-07-25 01:58:46

标题: 调查和减轻变分量子电路中的贫瘠高原：一项调查

摘要: 近年来，变分量子电路（VQCs）已被广泛探索，以推进量子电路在各个领域（如量子化学和量子机器学习）上对抗经典模型。与经典机器学习模型类似，VQCs可以通过基于梯度的方法进行优化。然而，随着量子比特或层数的增加，VQCs的梯度方差可能会急剧减小。这个问题，即“贫瘠高原”（BPs），严重阻碍了VQCs在大数据集上的扩展。为了减轻指数级的梯度消失，人们已经付出了大量努力来通过多种策略解决这个问题。在这项调查中，我们从调查和缓解的角度对最近的研究进行了系统文献综述。此外，我们提出了一个新的分类法来对大多数现有的缓解策略进行分类。最后，我们为BPs的未来方向提供了深刻的讨论。

更新时间: 2024-07-25 01:58:46

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2407.17706v1

Context-aware knowledge graph framework for traffic speed forecasting using graph neural network

Human mobility is intricately influenced by urban contexts spatially and temporally, constituting essential domain knowledge in understanding traffic systems. While existing traffic forecasting models primarily rely on raw traffic data and advanced deep learning techniques, incorporating contextual information remains underexplored due to the lack of effective integration frameworks and the complexity of urban contexts. This study proposes a novel context-aware knowledge graph (CKG) framework to enhance traffic speed forecasting by effectively modeling spatial and temporal contexts. Employing a relation-dependent integration strategy, the framework generates context-aware representations from the spatial and temporal units of CKG to capture spatio-temporal dependencies of urban contexts. A CKG-GNN model, combining the CKG, dual-view multi-head self-attention (MHSA), and graph neural network (GNN), is then designed to predict traffic speed using these context-aware representations. Our experiments demonstrate that CKG's configuration significantly influences embedding performance, with ComplEx and KG2E emerging as optimal for embedding spatial and temporal units, respectively. The CKG-GNN model surpasses benchmark models, achieving an average MAE of $3.46\pm0.01$ and a MAPE of $14.76\pm0.09\%$ for traffic speed predictions from 10 to 120 minutes. The dual-view MHSA analysis reveals the crucial role of relation-dependent features from the context-based view and the model's ability to prioritize recent time slots in prediction from the sequence-based view. The CKG framework's model-agnostic nature suggests its potential applicability in various applications of intelligent transportation systems. Overall, this study underscores the importance of incorporating domain-specific contexts into traffic forecasting and merging context-aware knowledge graphs with neural networks to enhance accuracy.

Updated: 2024-07-25 01:52:12

标题: 上下文感知知识图框架用于使用图神经网络进行交通速度预测

摘要: 人类迁移受城市环境的空间和时间影响密切，构成了理解交通系统的基本领域知识。虽然现有的交通预测模型主要依赖原始交通数据和先进的深度学习技术，但由于缺乏有效的整合框架和城市环境的复杂性，融入上下文信息仍未得到充分探讨。本研究提出了一种新颖的上下文感知知识图（CKG）框架，通过有效建模空间和时间上下文来增强交通速度预测。采用依赖关系的整合策略，该框架从CKG的空间和时间单元生成上下文感知表示，以捕捉城市环境的时空依赖关系。然后设计了一个CKG-GNN模型，结合了CKG、双视图多头自注意力（MHSA）和图神经网络（GNN），用这些上下文感知表示来预测交通速度。我们的实验表明，CKG的配置显著影响嵌入性能，ComplEx和KG2E分别成为嵌入空间和时间单元的最佳选择。CKG-GNN模型超越了基准模型，实现了10至120分钟交通速度预测的平均MAE为$3.46\pm0.01$和MAPE为$14.76\pm0.09\%。双视图MHSA分析揭示了基于上下文视图的关系依赖特征的关键作用以及模型在预测中优先考虑最近的时间段的能力。CKG框架的模型无关性表明其潜在适用于智能交通系统各种应用。总的来说，本研究强调了将领域特定上下文融入交通预测以及将上下文感知知识图与神经网络相结合以提高准确性的重要性。

更新时间: 2024-07-25 01:52:12

领域: cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2407.17703v1

SOK: Blockchain for Provenance

Provenance, which traces data from its creation to manipulation, is crucial for ensuring data integrity, reliability, and trustworthiness. It is valuable for single-user applications, collaboration within organizations, and across organizations. Blockchain technology has become a popular choice for implementing provenance due to its distributed, transparent, and immutable nature. Numerous studies on blockchain designs are specifically dedicated to provenance, and specialize in this area. Our goal is to provide a new perspective in blockchain based provenance field by identifying the challenges faced and suggesting future research directions. In this paper, we categorize the problem statement into three main research questions to investigate key issues comprehensively and propose a new outlook on the use of blockchains. The first focuses on challenges in non-collaborative, single-source environments, the second examines implications in collaborative environments and different domains such as supply chain, scientific collaboration and digital forensic, and the last one analyzes communication and data exchange challenges between organizations using different blockchains. The interconnected nature of these research questions ensures a thorough exploration of provenance requirements, leading to more effective and secure systems. After analyzing the requirements of provenance in different environments, we provide future design considerations for provenance-based blockchains, including blockchain type, query mechanisms, provenance capture methods, and domain-specific considerations. We also discuss future work and possible extensions in this field.

Updated: 2024-07-25 01:46:49

标题: SOK：区块链的溯源功能

摘要: 溯源是将数据从其创建到处理的过程追溯起来，对于确保数据的完整性、可靠性和可信度至关重要。它对于单用户应用程序、组织内部的协作以及跨组织之间的合作都具有价值。由于其分布式、透明和不可篡改的特性，区块链技术已成为实施溯源的热门选择。许多关于区块链设计的研究专门致力于溯源，并在这一领域专门研究。我们的目标是通过识别面临的挑战并提出未来的研究方向，为基于区块链的溯源领域提供新的视角。在本文中，我们将问题陈述分类为三个主要研究问题，以全面调查关键问题并提出对区块链使用的新展望。第一个重点在于非协作、单一来源环境中的挑战，第二个研究了在协作环境中以及不同领域（如供应链、科学合作和数字取证）中的影响，最后一个分析了使用不同区块链的组织之间的沟通和数据交换挑战。这些研究问题的相互关联性确保了对溯源需求的全面探索，从而实现更有效和安全的系统。在分析不同环境中溯源的需求之后，我们提出了基于溯源的区块链的未来设计考虑因素，包括区块链类型、查询机制、溯源捕获方法和领域特定考虑因素。我们还讨论了未来的工作和可能的拓展方向。

更新时间: 2024-07-25 01:46:49

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2407.17699v1

Superior Scoring Rules for Probabilistic Evaluation of Single-Label Multi-Class Classification Tasks

This study introduces novel superior scoring rules called Penalized Brier Score (PBS) and Penalized Logarithmic Loss (PLL) to improve model evaluation for probabilistic classification. Traditional scoring rules like Brier Score and Logarithmic Loss sometimes assign better scores to misclassifications in comparison with correct classifications. This discrepancy from the actual preference for rewarding correct classifications can lead to suboptimal model selection. By integrating penalties for misclassifications, PBS and PLL modify traditional proper scoring rules to consistently assign better scores to correct predictions. Formal proofs demonstrate that PBS and PLL satisfy strictly proper scoring rule properties while also preferentially rewarding accurate classifications. Experiments showcase the benefits of using PBS and PLL for model selection, model checkpointing, and early stopping. PBS exhibits a higher negative correlation with the F1 score compared to the Brier Score during training. Thus, PBS more effectively identifies optimal checkpoints and early stopping points, leading to improved F1 scores. Comparative analysis verifies models selected by PBS and PLL achieve superior F1 scores. Therefore, PBS and PLL address the gap between uncertainty quantification and accuracy maximization by encapsulating both proper scoring principles and explicit preference for true classifications. The proposed metrics can enhance model evaluation and selection for reliable probabilistic classification.

Updated: 2024-07-25 01:46:05

标题: 优秀的评分规则用于单标签多类别分类任务的概率评估

摘要: 本研究引入了新颖的优越评分规则，称为惩罚Brier得分（PBS）和惩罚对数损失（PLL），以改进用于概率分类的模型评估。传统的评分规则如Brier得分和对数损失有时会比正确分类给出更好的分数。这种与奖励正确分类的实际偏好不一致的差异可能导致次优的模型选择。通过对错误分类引入惩罚，PBS和PLL修改传统的适当评分规则，以始终给出更好的分数给正确的预测。正式证明表明，PBS和PLL满足严格的适当评分规则属性，同时也更倾向于奖励准确的分类。实验展示了使用PBS和PLL进行模型选择、模型检查点和早期停止的好处。在训练过程中，PBS与F1分数的负相关性比Brier得分更高。因此，PBS更有效地识别最佳检查点和早期停止点，从而提高F1分数。比较分析验证了由PBS和PLL选择的模型实现了更优越的F1分数。因此，PBS和PLL通过包括适当的评分原则和对真实分类的显式偏好来弥合不确定性量化和准确性最大化之间的差距。提出的指标可以增强可靠的概率分类模型评估和选择。

更新时间: 2024-07-25 01:46:05

领域: cs.LG,stat.ML,68Txx, 68T05, 68T37, 68Q32,,I.2; I.2.6; G.3

下载: http://arxiv.org/abs/2407.17697v1

Cheems: Wonderful Matrices More Efficient and More Effective Architecture

Recent studies have shown that, relative position encoding performs well in selective state space model scanning algorithms, and the architecture that balances SSM and Attention enhances the efficiency and effectiveness of the algorithm, while the sparse activation of the mixture of experts reduces the training cost. I studied the effectiveness of using different position encodings in structured state space dual algorithms, and the more effective SSD-Attn internal and external function mixing method, and designed a more efficient cross domain mixture of experts. I found that the same matrix is very wonderful in different algorithms, which allows us to establish a new hybrid sparse architecture: Cheems. Compared with other hybrid architectures, it is more efficient and more effective in language modeling tasks.

Updated: 2024-07-25 01:34:13

标题: Cheems：更高效更有效的Wonderful矩阵架构

摘要: 最近的研究表明，相对位置编码在选择性状态空间模型扫描算法中表现良好，平衡SSM和注意力的架构增强了算法的效率和有效性，而专家混合稀疏激活降低了训练成本。我研究了在结构化状态空间双算法中使用不同位置编码的有效性，以及更有效的SSD-Attn内部和外部功能混合方法，并设计了更高效的跨领域专家混合。我发现相同的矩阵在不同算法中非常出色，这使我们能够建立一个新的混合稀疏架构：Cheems。与其他混合架构相比，在语言建模任务中更高效更有效。

更新时间: 2024-07-25 01:34:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.16958v2

Enhancing Agent Learning through World Dynamics Modeling

While large language models (LLMs) have been increasingly deployed across tasks in language understanding and interactive decision-making, their impressive performance is largely due to the comprehensive and in-depth domain knowledge embedded within them. However, the extent of this knowledge can vary across different domains. Existing methods often assume that LLMs already possess such comprehensive and in-depth knowledge of their environment, overlooking potential gaps in their understanding of actual world dynamics. To address this gap, we introduce Discover, Verify, and Evolve (DiVE), a framework that discovers world dynamics from a small number of demonstrations, verifies the correctness of these dynamics, and evolves new, advanced dynamics tailored to the current situation. Through extensive evaluations, we analyze the impact of each component on performance and compare the automatically generated dynamics from DiVE with human-annotated world dynamics. Our results demonstrate that LLMs guided by DiVE can make better decisions, achieving rewards comparable to human players in the Crafter environment.

Updated: 2024-07-25 01:32:41

标题: 通过世界动态建模增强代理学习

摘要: 尽管大型语言模型（LLMs）在语言理解和交互式决策中的任务中被越来越广泛地部署，它们令人印象深刻的表现主要归因于嵌入其中的全面和深入的领域知识。然而，这种知识的程度在不同领域之间可能有所不同。现有方法通常假设LLMs已经具有对其环境的全面和深入的知识，却忽视了它们对实际世界动态的理解可能存在的空白。为了解决这一缺陷，我们引入了Discover、Verify和Evolve（DiVE）框架，该框架从少量演示中发现世界动态，验证这些动态的正确性，并演变出新的、针对当前情况的高级动态。通过广泛的评估，我们分析了每个组件对性能的影响，并将DiVE自动生成的动态与人类注释的世界动态进行了比较。我们的结果表明，由DiVE指导的LLMs可以做出更好的决策，在Crafter环境中实现与人类玩家相当的奖励。

更新时间: 2024-07-25 01:32:41

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.17695v1

Predicting the structure of dynamic graphs

Many aspects of graphs have been studied in depth. However, forecasting the structure of a graph at future time steps incorporating unseen, new nodes and edges has not gained much attention. In this paper, we present such an approach. Using a time series of graphs, we forecast graphs at future time steps. We use time series forecasting methods to predict the node degree at future time points and combine these forecasts with flux balance analysis -- a linear programming method used in biochemistry -- to obtain the structure of future graphs. We evaluate this approach using synthetic and real-world datasets and demonstrate its utility and applicability.

Updated: 2024-07-25 01:31:45

标题: 预测动态图的结构

摘要: 图的许多方面已经深入研究。然而，预测在未来时间步骤中图的结构，包括未见过的新节点和边，尚未受到太多关注。在本文中，我们提出了这样一种方法。使用图的时间序列，我们预测未来时间步骤的图。我们使用时间序列预测方法来预测未来时间点的节点度，并将这些预测与通量平衡分析结合起来 -- 一种在生物化学中使用的线性规划方法 -- 以获得未来图的结构。我们使用合成和真实世界数据集评估了这种方法，并展示了其实用性和适用性。

更新时间: 2024-07-25 01:31:45

领域: cs.LG,cs.SI,stat.ML

下载: http://arxiv.org/abs/2401.04280v2

Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification

Large Language Models (LLMs) have demonstrated remarkable capabilities in executing tasks based on natural language queries. However, these models, trained on curated datasets, inherently embody biases ranging from racial to national and gender biases. It remains uncertain whether these biases impact the performance of LLMs for certain tasks. In this study, we investigate the political biases of LLMs within the stance classification task, specifically examining whether these models exhibit a tendency to more accurately classify politically-charged stances. Utilizing three datasets, seven LLMs, and four distinct prompting schemes, we analyze the performance of LLMs on politically oriented statements and targets. Our findings reveal a statistically significant difference in the performance of LLMs across various politically oriented stance classification tasks. Furthermore, we observe that this difference primarily manifests at the dataset level, with models and prompting schemes showing statistically similar performances across different stance classification datasets. Lastly, we observe that when there is greater ambiguity in the target the statement is directed towards, LLMs have poorer stance classification accuracy.

Updated: 2024-07-25 01:11:38

标题: 研究政治偏见对大型语言模型在立场分类中性能的影响

摘要: 大型语言模型（LLMs）在执行基于自然语言查询的任务方面表现出了卓越的能力。然而，这些模型在经过筛选的数据集上训练，天然地具有从种族到国家以及性别偏见的偏差。目前尚不清楚这些偏见是否会影响LLMs在某些任务中的表现。在本研究中，我们调查了LLMs在立场分类任务中的政治偏见，具体地检查这些模型是否表现出更准确分类具有政治倾向的立场的倾向。利用三个数据集、七个LLMs和四种不同的提示方案，我们分析了LLMs在政治导向的陈述和目标上的表现。我们的发现显示，在不同的政治导向立场分类任务中，LLMs的表现存在显著差异。此外，我们观察到这种差异主要在数据集水平上显现，不同的立场分类数据集上模型和提示方案表现出统计上相似的性能。最后，我们观察到，当陈述所指向的目标存在更大的歧义时，LLMs的立场分类准确性较低。

更新时间: 2024-07-25 01:11:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.17688v1

Overcome the Difficulties of NSGA-II via Truthful Crowding Distance with Theoretical Guarantees

The NSGA-II is proven to encounter difficulties for more than two objectives, and the deduced reason is the crowding distance computed by regarding the different objectives independently. The recent theoretical efficiency of the NSGA-III and the SMS-EMOA also supports the deduced reason as both algorithms consider the dependencies of objectives in the second criterion after the non-dominated sorting but with complicated structure or difficult computation. However, there is still a question of whether a simple modification of the original crowding distance can help. This paper proposes such a variant, called truthful crowding distance. This variant inherits the simple structure of summing the component for each objective. For each objective, it first sorts the set of solutions in order of descending objective values, and uses the smallest normalized L1 distance between the current solution and solutions in the earlier positions of the sorted list as the component. Summing up all components gives the value of truthful crowding distance. We call this NSGA-II variant by NSGA-II-T that replaces the original crowding distance with the truthful one, and that sequentially updates the crowding distance value after each removal. We prove that the NSGA-II-T can efficiently cover the full Pareto front for many-objective mOneMinMax and mOJZJ, in contrast to the exponential runtime of the original NSGA-II. Besides, we also prove that it theoretically achieves a slightly better approximation of the Pareto front for OneMinMax than the original NSGA-II with sequential survival selection. Besides, it is the first NSGA-II variant with a simple structure that performs well for many objectives with theoretical guarantees.

Updated: 2024-07-25 01:09:58

标题: 通过具有理论保证的真实拥挤距离克服NSGA-II的困难

摘要: NSGA-II 已被证明在超过两个目标时会遇到困难，其原因是通过独立考虑不同目标计算拥挤距离。NSGA-III 和 SMS-EMOA 的最近理论效率也支持了这一推断，因为这两种算法在非支配排序之后考虑了目标之间的依赖关系，但结构复杂或计算困难。然而，仍然有一个问题，即原始拥挤距离的简单修改是否有助于改善情况。本文提出了这样一个变体，称为真实拥挤距离。这个变体继承了为每个目标求和的简单结构。对于每个目标，它首先按降序目标值对解集进行排序，并使用当前解与排序列表中较早位置解之间的最小归一化 L1 距离作为组成部分。将所有组成部分相加得到真实拥挤距离的值。我们称这个 NSGA-II 变体为 NSGA-II-T，它用真实拥挤距离替代原始拥挤距离，并在每次移除后顺序更新拥挤距离值。我们证明了 NSGA-II-T 能够有效地覆盖许多目标 mOneMinMax 和 mOJZJ 的完整帕累托前沿，与原始 NSGA-II 的指数运行时间形成对比。此外，我们还证明，与使用顺序生存选择的原始 NSGA-II 相比，它在 OneMinMax 上在理论上实现了略微更好的帕累托前沿逼近。此外，它是第一个具有简单结构且在理论上保证在许多目标上表现良好的 NSGA-II 变体。

更新时间: 2024-07-25 01:09:58

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.17687v1

Exploring Semantic Perturbations on Grover

With news and information being as easy to access as they currently are, it is more important than ever to ensure that people are not mislead by what they read. Recently, the rise of neural fake news (AI-generated fake news) and its demonstrated effectiveness at fooling humans has prompted the development of models to detect it. One such model is the Grover model, which can both detect neural fake news to prevent it, and generate it to demonstrate how a model could be misused to fool human readers. In this work we explore the Grover model's fake news detection capabilities by performing targeted attacks through perturbations on input news articles. Through this we test Grover's resilience to these adversarial attacks and expose some potential vulnerabilities which should be addressed in further iterations to ensure it can detect all types of fake news accurately.

Updated: 2024-07-25 01:09:57

标题: 探索Grover模型上的语义扰动

摘要: 随着新闻和信息变得如此容易获取，如今比以往任何时候都更重要确保人们不会被所阅读的内容误导。最近，神经网络虚假新闻（由人工智能生成的虚假新闻）的兴起及其在愚弄人类方面的有效性已促使开发模型来检测它。其中一种模型是Grover模型，它既可以检测神经网络虚假新闻以防止其传播，又可以生成虚假新闻以展示如何可能误用模型来愚弄人类读者。在这项工作中，我们通过对输入的新闻文章进行扰动来探索Grover模型的虚假新闻检测能力。通过这种方法，我们测试Grover对这些对抗性攻击的抵抗力，并揭露一些潜在的漏洞，这些漏洞应在后续迭代中加以解决，以确保其能够准确检测各种类型的虚假新闻。

更新时间: 2024-07-25 01:09:57

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2302.00509v2

Transformers on Markov Data: Constant Depth Suffices

Attention-based transformers have been remarkably successful at modeling generative processes across various domains and modalities. In this paper, we study the behavior of transformers on data drawn from \kth Markov processes, where the conditional distribution of the next symbol in a sequence depends on the previous $k$ symbols observed. We observe a surprising phenomenon empirically which contradicts previous findings: when trained for sufficiently long, a transformer with a fixed depth and $1$ head per layer is able to achieve low test loss on sequences drawn from \kth Markov sources, even as $k$ grows. Furthermore, this low test loss is achieved by the transformer's ability to represent and learn the in-context conditional empirical distribution. On the theoretical side, our main result is that a transformer with a single head and three layers can represent the in-context conditional empirical distribution for \kth Markov sources, concurring with our empirical observations. Along the way, we prove that \textit{attention-only} transformers with $O(\log_2(k))$ layers can represent the in-context conditional empirical distribution by composing induction heads to track the previous $k$ symbols in the sequence. These results provide more insight into our current understanding of the mechanisms by which transformers learn to capture context, by understanding their behavior on Markov sources.

Updated: 2024-07-25 01:07:09

标题: 马尔科夫数据上的变压器：恒定深度足够

摘要: 基于注意力机制的Transformer在各个领域和模态中建模生成过程方面取得了显著成功。本文研究了Transformer在从第k个马尔可夫过程中提取的数据上的行为，其中序列中下一个符号的条件分布取决于先前观察到的k个符号。我们在实证中观察到了一个令人惊讶的现象，与先前的研究结果相矛盾：经过足够长时间的训练，一个具有固定深度和每层1个头的Transformer能够在从第k个马尔可夫源中提取的序列上实现低测试损失，即使k增长。此外，这种低测试损失是通过Transformer表示和学习上下文条件经验分布的能力实现的。在理论方面，我们的主要结果是，一个具有单个头和三层的Transformer可以表示第k个马尔可夫源的上下文条件经验分布，与我们的实证观察一致。在此过程中，我们证明了只使用注意力机制的Transformer可以通过组合感知头来跟踪序列中的前k个符号来表示上下文条件经验分布。这些结果更深入地揭示了我们对Transformer学习捕获上下文机制的当前理解，通过了解它们在马尔可夫源上的行为。

更新时间: 2024-07-25 01:07:09

领域: cs.LG,cs.CL,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2407.17686v1

Machine Unlearning: A Comprehensive Survey

As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine unlearning and discuss their differences, connections and open problems. We categorize current unlearning methods into four scenarios: centralized unlearning, distributed and irregular data unlearning, unlearning verification, and privacy and security issues in unlearning. Since centralized unlearning is the primary domain, we use two parts to introduce: firstly, we classify centralized unlearning into exact unlearning and approximate unlearning; secondly, we offer a detailed introduction to the techniques of these methods. Besides the centralized unlearning, we notice some studies about distributed and irregular data unlearning and introduce federated unlearning and graph unlearning as the two representative directions. After introducing unlearning methods, we review studies about unlearning verification. Moreover, we consider the privacy and security issues essential in machine unlearning and organize the latest related literature. Finally, we discuss the challenges of various unlearning scenarios and address the potential research directions.

Updated: 2024-07-25 01:03:11

标题: 机器取消学习：一项全面调查

摘要: 随着全球范围内立法规定被遗忘的权利，许多研究试图设计遗忘机制，以保护用户的隐私，当他们想要离开机器学习服务平台时。具体来说，机器遗忘是指训练模型以删除训练数据集中被擦除子集的贡献。本调查旨在系统地分类各种机器遗忘，并讨论它们的差异、联系和开放问题。我们将当前的遗忘方法分类为四种场景：中央遗忘、分布式和不规则数据遗忘、遗忘验证，以及遗忘中的隐私和安全问题。由于中央遗忘是主要领域，我们分为两部分介绍：首先，我们将中央遗忘分类为精确遗忘和近似遗忘；其次，我们详细介绍这些方法的技术。除了中央遗忘，我们注意到一些关于分布式和不规则数据遗忘的研究，并介绍联邦遗忘和图遗忘作为两个代表性方向。在介绍遗忘方法后，我们回顾了关于遗忘验证的研究。此外，我们认为隐私和安全问题在机器遗忘中至关重要，并整理了最新相关文献。最后，我们讨论各种遗忘场景的挑战，并指出潜在的研究方向。

更新时间: 2024-07-25 01:03:11

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.07406v2

Semi-Compressed CRYSTALS-Kyber

In this paper, we investigate the communication overhead of the Kyber, which has recently been standardized by the National Institute of Standards and Technology (NIST). Given the same decryption failure rate (DFR) and security argument, we show it is feasible to reduce the communication overhead of the Kyber by 54%. The improvement is based on two technologies: ciphertext quantization and plaintext encoding. First, we prove that the Lloyd-Max quantization is optimal to minimize the decryption decoding noise. The original Kyber compression function is not optimal. Second, we propose an encoding scheme, which combines Pulse-Amplitude Modulation (PAM), Gray mapping, and a binary error correcting code. An explicit expression for the DFR is derived. The minimum possible communication overhead is also derived. Finally, we demonstrate that with the Lloyd-Max quantization, 8-PAM, Gray mapping, and a shortened binary BCH(768,638,13) code, the proposed scheme encapsulates 638 bits (e.g., 2.5 AES keys) in a single ciphertext.

Updated: 2024-07-25 00:54:22

标题: 半压缩CRYSTALS-Kyber

摘要: 在这篇论文中，我们研究了最近被美国国家标准与技术研究所（NIST）标准化的Kyber的通信开销。在给定相同的解密失败率（DFR）和安全性论证的情况下，我们表明可以通过54％来减少Kyber的通信开销是可行的。这一改进基于两项技术：密文量化和明文编码。首先，我们证明Lloyd-Max量化是最优的，以最小化解密解码噪声。原始的Kyber压缩功能并不是最优的。其次，我们提出了一种编码方案，结合了脉冲幅度调制（PAM）、格雷映射和二进制纠错码。我们推导了DFR的显式表达式。还推导了可能的最小通信开销。最后，我们证明了通过Lloyd-Max量化、8-PAM、格雷映射和缩短的二进制BCH（768,638,13）码，提议的方案将638位（例如2.5个AES密钥）封装在一个单一的密文中。

更新时间: 2024-07-25 00:54:22

领域: cs.CR

下载: http://arxiv.org/abs/2407.17684v1

SLADE: Detecting Dynamic Anomalies in Edge Streams without Labels via Self-Supervised Learning

To detect anomalies in real-world graphs, such as social, email, and financial networks, various approaches have been developed. While they typically assume static input graphs, most real-world graphs grow over time, naturally represented as edge streams. In this context, we aim to achieve three goals: (a) instantly detecting anomalies as they occur, (b) adapting to dynamically changing states, and (c) handling the scarcity of dynamic anomaly labels. In this paper, we propose SLADE (Self-supervised Learning for Anomaly Detection in Edge Streams) for rapid detection of dynamic anomalies in edge streams, without relying on labels. SLADE detects the shifts of nodes into abnormal states by observing deviations in their interaction patterns over time. To this end, it trains a deep neural network to perform two self-supervised tasks: (a) minimizing drift in node representations and (b) generating long-term interaction patterns from short-term ones. Failure in these tasks for a node signals its deviation from the norm. Notably, the neural network and tasks are carefully designed so that all required operations can be performed in constant time (w.r.t. the graph size) in response to each new edge in the input stream. In dynamic anomaly detection across four real-world datasets, SLADE outperforms nine competing methods, even those leveraging label supervision.

Updated: 2024-07-25 00:46:33

标题: SLADE：通过自监督学习在边缘流中检测动态异常（无需标签）

摘要: 为了检测社交、电子邮件和金融网络等真实世界图形中的异常，已经开发了各种方法。虽然它们通常假定输入图形是静态的，但大多数真实世界图形会随着时间的推移而增长，自然地表示为边流。在这种情况下，我们的目标是实现三个目标：(a)在异常发生时立即检测异常，(b)适应动态变化的状态，(c)处理动态异常标签的稀缺性。本文提出了SLADE(Self-supervised Learning for Anomaly Detection in Edge Streams)，用于在边流中快速检测动态异常，而无需依赖标签。SLADE通过观察节点在时间上的交互模式偏离来检测节点进入异常状态的转变。为此，它训练一个深度神经网络执行两个自监督任务：(a)最小化节点表示中的漂移，(b)从短期交互模式生成长期交互模式。节点在这些任务中失败表明其偏离了正常值。值得注意的是，神经网络和任务都经过精心设计，以便对每个新边缘的输入流进行的所有必需操作都可以在常数时间内(相对于图的大小)完成。在四个真实世界数据集中的动态异常检测中，SLADE的性能优于九种竞争方法，甚至优于那些利用标签监督的方法。

更新时间: 2024-07-25 00:46:33

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2402.11933v3

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been widely used in natural language processing, to analyze image patches. Despite their advantages in modeling visual tasks, deploying ViTs on hardware platforms, notably Field-Programmable Gate Arrays (FPGAs), introduces considerable challenges. These challenges stem primarily from the non-linear calculations and high computational and memory demands of ViTs. This paper introduces CHOSEN, a software-hardware co-design framework to address these challenges and offer an automated framework for ViT deployment on the FPGAs in order to maximize performance. Our framework is built upon three fundamental contributions: multi-kernel design to maximize the bandwidth, mainly targeting benefits of multi DDR memory banks, approximate non-linear functions that exhibit minimal accuracy degradation, and efficient use of available logic blocks on the FPGA, and efficient compiler to maximize the performance and memory-efficiency of the computing kernels by presenting a novel algorithm for design space exploration to find optimal hardware configuration that achieves optimal throughput and latency. Compared to the state-of-the-art ViT accelerators, CHOSEN achieves a 1.5x and 1.42x improvement in the throughput on the DeiT-S and DeiT-B models.

Updated: 2024-07-25 00:00:18

标题: CHOSEN：编译到硬件优化堆栈，以实现高效的视觉Transformer推断

摘要: Vision Transformers (ViTs)代表了机器学习方法在计算机视觉领域的突破性转变。与传统方法不同，ViTs采用了自注意力机制，这种机制在自然语言处理中被广泛使用，用于分析图像补丁。尽管在建模视觉任务方面具有优势，但在硬件平台上部署ViTs，特别是可编程门阵列（FPGAs），会带来相当大的挑战。这些挑战主要源于ViTs的非线性计算和高计算和内存需求。本文介绍了CHOSEN，一个软硬件协同设计框架，以解决这些挑战，并为ViT在FPGAs上的部署提供一个自动化框架，以最大化性能。我们的框架建立在三个基本贡献基础上：多核心设计以最大化带宽，主要针对多个DDR内存银行的优势，近似非线性函数表现出最小的精度降级，有效利用FPGA上可用的逻辑块，并通过提出一种新颖的算法进行设计空间探索，以找到实现最佳吞吐量和延迟的最佳硬件配置。与最先进的ViT加速器相比，CHOSEN在DeiT-S和DeiT-B模型的吞吐量上分别实现了1.5倍和1.42倍的改进。

更新时间: 2024-07-25 00:00:18

领域: cs.CV,cs.AI,cs.AR

下载: http://arxiv.org/abs/2407.12736v3